Monthly Archives: May 2011

Will your cloud make money?

What comes first, the horse or the carriage? If you want to start your cloud business, do you need first to come up with a solid revenue model (how can you make money out of it) and then figure out how to implement it, or first you build it with what you know and then figure out a way to make money?

Actually, the most successful cloud companies did not follow either path. Google started as a search engine and then figured out a smart way to sell advertising to the millions eyballs skimming the text results. Amazon spun off IaaS from their main business, beating others on pricing options and rich technical choices. LinkedIn capitalized the business social network. Salesforce started from plain sales automation, now offers a full portfolio of business software. What do all of these have in common?

First, their revenue model constantly evolved down the road. Some started as spinoffs from an entirely different line of business, others offered free web services or subscription-based, yet, all of them changed their revenue models and adjusted their pricing strategies almost every quarter.

Second, they own their software and infrastructure. Instead of shopping around for bits of software, they control their code (by paying good money to talented humans to write, maintaind and update software) and their infrastructure: They own their datacenters. Large ones, with huge economies of scale.

Third, they control their supply chain and rely on partners and affiliates either for pure retail business or for added value services. If you are using their services, you pay them directly. If their services go down, you go after them directly. They do not maintain channel networks to resell their solutions, since the Internet is omnipresent and their portals one click away from your home page, so, why bother setting up a reseller or distribution network?

Fourth, and most important: They control their revenue model. Google capitalizes on hundreds of millions people using their free services. Amazon is perfectly aware that their IaaS services do not address their direct customers, but the customers of their customers and have adapted their pricing and services accordingly. LinkedIn offers a rich business communication platform where online recruiters and businesses can directly interact with their members. SalesForce constantly adapts and add new services that their existing customers can readily use.

The question is:  Can I do the same without biting the bullet and instead get bits and pieces off the shelf? For example, get some branded hardware with 24×7 support, rent some rack space, run Azure/VMware/Citrix/RH, get a decent cloud management platform, strike a few deals with cloudified software vendors and bring channel partners to do the reselling and distribution of my services?

Will this work? No.

First, you do not control your revenue model. You depend on a channel to resell your services, which means that you depend on your partner’s revenue model. In turn, you are also a reseller of somebody else services and products: You resell a hosted CRM or Azure/VMware/RedHat virtual servers and software platforms. And finally, you do not own your infrastructure, which means that you cannot control running costs of support and hosting.

Are these a bad thing? Yes.

In terms of money: Channel partner = $$ (their markup). Branded servers = $$ (acquisition & support). Renting rack space = $$. Azure, VMware and friends = $$ (licensing and support).

In terms of agility: Introducing new features and services is not a single step process: From tuning your heterogeneous software and hardware stack up to communicating the new services to your channel partners weeks, if months, pass, without revenue flowing in.

And then in terms of customer experience:  Figure out what’s wrong when something breaks (is it the application? Is it the virtual server? Is it the hardware?), call the cavalry (contact the appropriate vendor) and have them fix it. During all this time of this fascinating game of fault tracing and bug whacking, your customers’ business is directly affected: It’s down.

Advertisements

VMs, IOPS and spindles

Sizing a virtual “farm” – how difficult can that be? You need to calculate CPU power, RAM, network bandwidth and storage. CPU sizing is easy; go for two-socket 4-core systems (best value for money these days), anything above 2.5 GHz clock is enough. RAM?  Say 6 GB of RAM per core, so foa dual-socket 4-core server, 48GB of RAM is good to go. Network? Add as many ethernet interfaces you can, however, 4 GbE per chassis will give you ~400 MBps (that’s megabytes per second) and a few more cards for iSCSI or FC and you’re OK (assuming your storage I/O throughput does not exceed 1/2 gigabyte per second – which is a lot).

The above configuration is deducted from my humble experience of running vSphere for two years now. The average footprint of the VMs we use is 2 vCPU, 4 GB of RAM with network throughput not exceeding 50Mbps per VM and storage I/O… well, like this:

We have left something out: Storage IOPS. That is, disk IOPS. To be more precise, disk pool cumulative IOPS.

Disk throughput is something different that IOPS. Throughput is sustainable and depends mostly on platter speed: 15000 RPM enterprise disks (FC or SAS) have twice the throughput of SATA disks (7200 RPM) – it’s physics. Throughput is essential when accessing sequential data (usually big files like video). But, in a VM environment, this is extremely rare. In addition, when people talk about throughput, they usually mean read operations, not write operations. And here is where trouble begins: Virtual hosts do far more write operations than read, and they are truly random, not sequential, so what matters here is not throughput but disk operations per second.

What is the IOPS profile of a VM? Well, here is some insight in this really really cool white paper (section 4.3): Write to read ration can be as high as 4:1. I can confirm this. Straight from the NAS box (an Ubuntu NFS server):

administrator@jerry:~$ iostat -m
Linux 2.6.32-28-server (jerry)  05/26/2011    _x86_64_     (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.13    0.11    0.00   99.76

Device:         tps    MB_read/s   MB_wrtn/s    MB_read    MB_wrtn
sda            0.03         0.00        0.00       1318       1571
sdb           27.10         0.02        0.16      89450     619959
sdc           42.63         0.01        0.63      43258    2385091
sdd           23.19         0.00        0.05       6310     200962
dm-0         236.66         0.04        0.84     139017   3206013
dm-1           0.12         0.00        0.00       1316       1571
dm-2           0.00         0.00        0.00          0          0

This is exaggerated by the fact that the NFS server has 8 GB of RAM, a large portion of it is used for caching read operations. Write operations are substantially more that read ops.

What is the best disk pool design for this kind of workload? Obviously, we need lots of disks to accomodate for VM disk capacity. To begin with for an environment sustaining up to 50 VMs with thin provisioning, assuming an average of 200 GB allocated capacity per VM and 10% utilization, 1.5TB is enough (10% X 200GB = 20 GB net capacity per VM, total 20 GB X 50 VM = 1TB; add 50% for one year VMDK inflation). So we need 1.5TB in some sort of RAID to tolerate disk failure. What kind of RAID should we use?

The problem with RAID and write operations is that a single write op is translated to multiple write ops in disk drives. In RAID0 (concatenation/striping), a data block is broken into several write operations, with each ending up to a single disk, so the data stream is sent as is to platters. In RAID1, a single write operation must be written to two drives and in RAID5 to at least three drives (two data and the parity). This is shown in the following figure:

For the sake of simplicity, we assume that a server on top sends small write operations to the disk controller, with each operation marked with a different color. In RAID0, all write ops are sent to disks: Data are not written to more than one disk. In RAID1, the effective bandwidth is halved: With half the number of write requests, the disks are equally saturated, and in RAID5 with the same number of disks, the effective bandwith is reduced even further. In all three cases, the physical disks run at their maximum I/O but the effective IO is totally different.

The best strategy is RAID0. Well, we can’t do that, if a disk dies it’s all over. The next best configuration is RAID1, and this is the layout we have chosen for our vSphere environment: Our NAS are presently… Ubuntu NFS servers (single quad core CPU, 8 GB RAM). Disks are mirrored via hardware RAID which are then grouped into large concatenated (RAID0) logical volumes with LVM. Disk hot sparing is handled by the hardware RAID controller avoiding cranky LVM code. It has worked perfectly for the past year with more than sufficient performance: Disk throughput is more than enough and latency is kept at ~5ms at all times:

We have selected to setup two NAS boxes, one with 300GB SAS 15000 RPM drives and one with 500GB SATA 7200 RPM disks: A quick ‘n’ small disk pool and a slow but big one. All disk volumes are setup the way described above, with two hot spare disks per server. It just works and feels like it’ll run forever…

Business customer support: C-

Recently, a partner portal suddenly vanished: The DNS name (say http://www.acme.biz) was there but no web server was there. A call to our partner confirmed that they had changed the IP address of their portal since they upgraded their CRM, but such a problem was not reported by others. Their portal was up and running. The problem was on our side.

Indeed, http://www.acme.biz was alive and well from other uplinks. Time for nslookup, which showed that acme.biz was served by its own DNS servers. Yet, when our service provider’s DNS servers were queried for acme.biz, they responded as being authoritative for that domain. So, our service provider had effectively hijacked the domain and http://www.acme.biz does not point to the new IP address.

OK, that’s weird. We are a business customer. We have a pool of static IP addresses. We are entitled to decent DNS services, right? Anyway, let’s call their support. Oh joy, we cannot login to their customer portal! OK, let’s send an email to their helpdesk – wait, there is no helpdesk email address. The portal happily says

To provide its subscribers with optimum service, XXX has set up a Corporate Customer Technical Support Department. You can call 800 XXX XXXX (free of charge) or 69X XXX XXXX (from a mobile phone at a charge) on a 24-hour basis.

 
Yeah, let’s give them a call. “Dial 1 for english” “Dial 1 for product information, 2 for customer support” etc etc etc, wait a couple of minutes, a human replies. Apparently, they do not know us by company name, so they ask for our VAT code… Seems that it’s a primary key in their CRM. Found it, described the problem, “we’ll investigate it”, goodbye. No service ticket ID, no email address to reach them directly, nothing… “We’ll call you back”. Three days later nothing is resolved and nobody bothered to call back. 

So, we are a business customer of a service provider that cannot do IP address management, their self service portal does not work, their CRM is not integrated with their OSS and they do not have a mail or web gateway to their service desk… Gone with the wind.

Smart TV, where art thou?

Once upon a time, there were TVs, phones and computers. Each one had a purpose: The TV was… um, a TV, the phone was a nifty little mobile voice communicator and the PC a white box with a white monitor. Oh, and there was something called the Internet, where the PC was hooked on.

Things started to change. Now, the Internet was available on the mobile phone. The PC got slimmer and turned into a laptop and then a netbook. At the same time, there was the iPhone (and a year later Android broke the surface). Then, there were the pads. And the iPad. And Android tablets. All of a sudden, people started ditching their netbooks and going for tablets, and quite soon we will see magical things like this.

However, there is somebody left out of the party: The TV. Strangely enough, the TV is still a TV. It has slimmed down to a few mm, it has grown and stretched, but has not yet morphed into something ubiquitous and intelligent like the PC and the phone.  Google TV (its first iteration) has failed and  the Smart TV is perceived as a TV that can play Netflix, BBC iPlayer and YouTube.  That’s odd, since the TV has some unique features vs the tablet:

– More real estate: A 42” television not only is visible from everybody in the room, it can squeeze much more information for the human eye

– More usable: You can control a TV single-handed from your couch with a Nintendo-style remote control; the tablet, you must hold it with one hand and tickle it with the other hand.

– More power: Any TV set has more internal space and cooling provision to house much more powerful hardware than your smartphone or tablet. Hard disks or high capacity flash storage can easily fit in a TV and you can physically upgrade the device via modules or PCMCIA slots.

– More connections: USB, HDMI, WiFi, Ethernet, DTV…

– More home applications: A TV can be a “home server”: A NAS to store all your media content or a sync master for your DECT phone and contacts.

– More software applications: Whatever applications run on your smartphone and tablet can be hosted in your TV set, taking advantage of the huge display and superb ergonomics of your couch and remote control. Stuff you do or wish you could do everyday around the house can be done through the TV. Some examples: Video surveillance, a look at the online press, asynchronous social media updates (Facebook, twitter), smart home dashboard (A/C, heating, security) and video calls.

So, why is the TV left behind?

Phones evolved, got bigger screens and applications merged into the daily routine of using your phone. The tablet is an evolution of the netbook, so, people who have had a netbook switch to a tablet because they can see the added value of the new form factor of their iPad. They also have a shorter lifecycle and it’s easier to switch to a new device sooner than later. However, a TV is a TV is a TV. In the mindset of the consumer, it is made for watching stuff, not doing stuff. It’s static, it’s on or off, in one word, it is a passive device.

The challenge for the consumer electronics industry is to bring to the market a fresh product, a concept that will add value to your couch, not just make a new generation of televisions and profit on the sales volume. Revenue can stem from the software applications after a smart TV has exited the store, in other words, from the home cloud.

Google has tried once and failed, now they come back with a vengeance. And here’s a surprise: Check out this ZTE announcement.

The cloud quest: Who are your customers?

Cloud computing and marketing have one thing in common: They are elastic. You can stretch the term “marketing” to cover everything, from constructing smart “elevator pitches” and making glossy brochures to designing highly technical blueprints and shopping them around to CxOs. The same principle applies with cloud computing. Cloud services are cool smartphone applications, full blown ERP & CRMs but also virtual or physical computing.

However, there is a concept in marketing which is not (yet) so obvious in cloud services. In marketing, we have two distinct segments: B2B and B2C – Business to Business and Business to consumers. B2B means we do business with other businesses like us, meaning wholesale and selling stuff in beige carton boxes and pallets. In B2C we are retailers and sell stuff in fancy glitter cute packages strategically placed in shop windows. Should cloud computing be any different? No.

The people doing technical marketing (including myself) find very easy to categorize cloud stuff and name “distinct” packages and solutions with four letter acronyms (IaaS, PaaS, SaaS, DaaS etc). It’s convenient for us, it is something we understand from a technical aspect and it is completely unintelligible from the customer. Would you sell a family car with the term “Four wheel front drive vehicle, 81 kW horsepower, front drive, 1320 kg kerb weight”, or a bulldozer describing it as “tracked vehicle, one seater, yellow with tinted windows”?

Let’s step into the customer shoes. Cloud customers can be smartphone or smartTV users downloading apps that connect to their cloud backend. Or, small businesses looking for a cheap and usable CRM to run their own business. Or an enterprise looking for elastic on-demand computing and storage infrastructure for some new projects. Or a software company shifting their products to the cloud and looking for pay as you go resources for database and application hosting. Or some guy setting up his blog and at the same a small e-shop. All these are cloud customers, but they are not the same: The reds are customers, the blue ones have customers. In other words, red ones are cloud B2C customers and the blue ones are B2B customers:  Any organization small or large that consumes cloud resources for internal use only is a consumer, whereas if they consume cloud services to build up services for others have consumers. The same applies to the blogger who starts up a small online shop: He is a consumer of cloud services (blogging) and at the same time will attract customers consuming cloud services.

B2B cloud services are quite different from B2C, regardless of their flavor (IaaS, PaaS, SaaS). B2B offerings address consumers beyond the reach of the cloud provider customer, yet, these consumers utilize the same infrastructure as the Tier-1 customer – the cloud provider infrastructure. Resilience, predictable performance and elasticity are key features here: If the cloud provider infrastructure blows up, your customers’ business will blow up and their customers will be terribly annoyed… Now, if the self service management portal is using a hideous font and tacky colors, that does not matter much. On the contrary, B2C services focus on other attributes: Pricing, usability and evolution are sought after the most by end consumers. Service hiccups can be tolerated, but the service must be smart, fancy, usable and innovative: Your service has to be cooler than the app store next door. Consequently, the entire cloud provider technical and operational stack undeneath has to be structured in an entirely different manner: B2B services should be designed to just work, B2C services should be designed to tolerate failure.

So, maybe it’s time to shift from the established segregation of *aaS acronyms and assorted labeled services and look at cloud services from a different perspective: The customer view. Google have done exactly that. Google focused on B2C from day one, offering free and paid services for the end consumer (Gmail, Docs, Android via OHA) and at the same time capitalizing on these to offer B2B services (advertising, Apps for Businesses, AppEngine, Android market). And it just works.

PS Check out also this cool B2C to B2B domestic example.

EU Data Retention: A really big pile of logs

Legalize it: Keeping logs of all voice and digital communications has been mandatory in the EU under the directive 2006/24/EC , forcing member states to bring into effect domestic laws enforcing local service providers to comply. Whether this is good or bad for us is another discussion, but it sure is not good for carriers and ISPs.

Data that should be retained are primarily call records from phone switches and MSCs (whom did you call, when, from which location, duration of call, mobile phone ID and other stuff), emails sent and received (headers only, no content), in some cases, visited URLs and lots of other related stuff, like your personal information that is required to bind your phone number or IP address to your real name and home address. This is data required by law enforcement authorities to track you down if you do really bad bad things, Winston Smith.

The trouble is that such data are produced in massive quantities. Each phone call generates one or two CDRs, a few hundred bytes long. Each email a few lines in a log file and so on. Multiply these by the number of subscribers of a carrier or ISP and you have figures in the order of a few gigabytes per day. All these data must be stored in a safe place so that when the Law knocks on your door and requests the whereabouts of a Bad Guy, the service provider delivers all relevant information in a few days. Now, try and run something like:

$ gzcat logs/from_everywhere/*.gz | grep $BAD_GUY_PHONE | awk '{print $7","$23","$12 }' > /tmp/calls.csv
$ gzcat hlr_logs/from_everywhere/*.gz | grep $BAD_GUY_IMSI | awk '{print $1","$4","$32}' > /tmp/loc.csv
$ gzcat crm_export/*.gz | grep $BAD_GUY_NAME | awk '{print $3","$4","$8","$23","$7 }' > /tmp/info.csv

on gigabytes of (compressed) data, then import the CSV files to excel to try to correlate them and produce some meaningful information for the authorities… Excel will probably explode before your brain does.

The question is, is there any cool software out there that can automate this process? Let’s do a 3-minute analysis.

The lifecycle of call data retention looks like this: First, data are collected from all sources, sanitized and ingested into a safe data repository. After a predefined data expiration period (say one year) information should be automatically expunged from the database (minus the records that under investigation). At any time, the system should produce information required by law enforcement authorities in a timely and accurate manner, without direct human manual intervention on data.  Data should be archived, protected (encrypted) and be immune to alteration of any kind.

What kind of software would do the job? Certainly not conventional relational databases. Importing a few gigabytes every day in your Oracle database will try your DBA insane and the database itself doing nothing more that updating indexes and taking up disk space, let alone the fact that you need an epic disk array to handle the load. What about using your Security Information Management application? Well, SIM can do a good job in finding in real time suspicious security events from your antivirus, IPS and firewall logs, but cannot handle the massive daily data volume and accumulated information. A distributed cloud database? Maybe, if you are Google or Amazon…

Actuyally, there is software that is built for this job. It all starts with the database. What we need here is a database that can support complex queries involving joins from a number of tables, that is very efficient with read-only transactions, can talk plain old SQL and can ingest tons of data in a flash. On top of this database, you need an application that can mediate and sanitize data, implement a query and data retrieval interface that leaves out human intervention and can produce reports tailored to the needs of state authorities. The end result is a compact system that utilizes low cost commodity storage (SATA drives) and a 2- or 4-way x86 server for data ingestion and retrieval, that is rated at ingesting ~30GB of data per day and at the same time satisfy all requirements for archiving, data encryption, compression and retrieval.

Of cloud computing

If you look at (real) clouds, it’s quite likely that you’ll recognize patterns: One cloud here looks like a dog, another one over there like a cat, but for sure, they all have a thing in common: They are made of frozen water. The same applies to computing clouds; everybody has a unique perception of what it is: Infrastructure as a service, platform as a service, software as a service, storage as a service, different technologies and markets, yet, a common pattern prevails:

Service.

From the consumer standpoint, in layman’s terms, cloud computing is about doing stuff with computers without owning the infrastructure. You wanna store your MP3s and movies? Amazon Cloud Drive. Email? Gmail. Write your school homework with a neat presentation? Google Docs. Setup a collaboration portal with business email? BPOS. Evaporate your windows server? Azure. What you need to do all this cool stuff is a browser running in your smartphone, pad, smart TV, notebook and your PC.

Cloud computing is about services, and all cloud services are software or depend on software to be usable. Industrial grade software, able to survive hardware failures, serve thousands and millions of users and undergoing constant evolution and development. It’s not easy, it’s not off the shelf and not for sale: You have to develop code, integrate it, test it, fail miserably and do the same thing all over again.