Tag Archives: infrastructure

Cloud infrastructure Economics: Calculating IaaS cost

In our previous post, we referenced a number of financial factors and operational parameters that we need to take into account in order to calculate some meaningful costs for IaaS services. Let’s see now how these are combined to produce IaaS cost items.

Your mileage may vary on IaaS; you may be renting datacenter space, leasing equipment, operating your own facilities or simply using public clouds to run your business. However, at the end of the day, you need to utilize an apples-to-apples metric to find out which strategy is the most cost-effective. For IaaS, the metric is the footprint of your infrastructure, expressed in terms of virtual computing units: The monthly cost of a single virtual server, broken down to virtual CPU/Memory and virtual storage resources.

You can check online the formulas in this sheet here. If you want to take a peek on how the formulas work, continue reading.

For simplicity, we will not dive into virtua machine OS licensing costs here – they are easy to find out, anyway (those familiar with Microsoft SPLA should have an idea). We will calculate only monthly running costs of IaaS, expressed in the following cost items:

  • SRVCOST: Individual virtual server monthly cost: Regardless of virtual machine configuration, the monthly cost of spinning up a virtual machine.
  • COMPUTECOST: Virtual computing unit cost: The cost of operating one virtual memory GB and assorted CPU resources per month.
  • DISKCOST: Virtual disk unit cost: The cost of operating one virtual storage GB per month.

Almost all IaaS public cloud providers format their pricelists according to these three cost items, or bundle them in prepackaged virtual server sizes. Calculating these can help immensely in finding good answers to the “build or buy” question if you plan to adopt IaaS for your organization, or determine the sell price and margins if you are a public cloud provider.

Let’s see now each cost item one by one.

Individual server cost (SRVCOST)

How much does it cost to spin up a single virtual machine per month? What do we need to take into account? Well, a virtual machine, regardless of its footprint, needs some grooming and the infrastructure it will run on. The assorted marginal costs (cost to add one more virtual machine to our IaaS infrastructure) are the following:

  • C_SRV: Cost of maintaining datacenter network infrastructure (LAN switching, routing, firewall, uplinks) and computing infrastructure software costs (support & maintenance). We do not include here hardware costs since these are related to the footprint of the virtual machine.
  • C_DCOPS: Cost of manhours required to keep the virtual machine and related infrastructure up and running (keep the lights on)
  • C_NWHW: Cost of network related hardware infrastructure required to sustain one virtual machine. These are pure hardware costs and reflect the investment in network infrastructure needed to keep adding virtual machines.

An essential unit used in most calculation is the cost of rack unit. Referring to our older post for the EPC variable, this is expressed as

C_RU=EPC/RU

This gives us an approximation of the cost of one rack unit per month in terms of monthly electiricy and hosting cost (EPC).

C_SRV is expressed as a function of NETSUPP (monthly network operating & support costs), RU_NET (total network infrastructure footprint), CALCLIC (virtualization/computing infrastructure maintenance & software costs) and SRV (total virtual servers running). The formula is:

C_SRV=( NETSUPP + CALCLIC + C_RU*RU_NET ) / SRV

C_RU*RU_NET is the hosting cost of the entire networking infrastructure (switches, patch panels, load balancers, firewalls etc).

C_DCOPS is straightforward to calculate:

C_DCOPS = DCOPS / SRV

And finally C_NWHW is the hardware cost needed to add one more virtual server. To calculate C_NWHW we take into account the current network infrastructure cost and then we calculate how much money we have to borrow to expand it in order to provision one more virtual server. The way we do this is to divide the total network infrastructure cost with the number of provisioned virtual machines and spread this cost over the lifecycle of the hardware (AMORT), augmented with a monthly interest rate (INTRST):

C_NWHW=(NETINFRA/SRV) * (1+INTRST) / AMORT

Computing cost (COMPUTECOST)

As a computing unit, for simplicity we define one GB of virtual RAM coupled with an amount of processing power (CPU). Finding the perfect analogy between memory and CPU power is tricky and there is no golden rule here, so we define the metric as the amount of virtual RAM. The exact CPU power assigned to each virtual RAM GB depends on the amount of physical RAM configured in each physical server (SRVRAM) and the number of physical CPU cores of each server. COMPUTECOST is broken down to two cost items:

  • C_MEM: It is the cost associated with operating the hardware infrastructure that provisions each virtual RAM GB.
  • C_SRVHW: It is the cost associated with purchasing the hardware infrastructure required to provide each virtual RAM GB.

C_MEM depends on running costs and is the cost of compute rack units divided by the total virtual RAM deployed in our cloud:

C_MEM = (RU_CALC * C_RU) / TOTALMEM

Note that in some cases (like VMware’s VSPP program) you may need to add up to the above cost software subscription/license costs, if your virtualization platform is licensed per virtual GB.

C_SRVHW is calculated in a more complex way. First, we need to find out the cost of hardware associated with each virtual GB of RAM. This is the cost of one physical server equipped with RAM, divided with the amount of physical RAM adjusted with the memory overprovisioning factor:

CAPEX_MEM = (SERVER + MEMORY * SRVRAM) / (SRVRAM * MEMOVERPROV)

In a similar way with C_NWHW, we calculate the acquisition cost spread over the period of infrastructure lifecycle, with the monthly interest rate:

C_SRVHW = CAPEX_MEM * (1 + INTRST) / AMORT

Virtual storage cost (DISKCOST)

Calculating DISKCOST is simpler. The two cost items, in a similar way to COMPUTECOST are:

  • C_STOR: It is the cost associated with operating the hardware infrastructure that provisions each virtual RAM GB.
  • C_STORHW: It is the cost associated with purchasing the hardware infrastructure required to provide each virtual disk GB.

C_STOR is based on the existing operating costs for running the storage infrastructure and is calculated proportionally to the provisioned disk capacity:

C_STOR = (STORLIC + RU_STOR * C_RU) / TOTALSTOR

C_STORHW is the cost of investment for each storage GB over the infrastructure lifecycle period:

C_STORHW = (STORINFRA/TOTALSTOR) * (1 + INTRST) / AMORT

 

One can elaborate on this model and add all sorts of costs and parameters, however, from our experience, this model is quite accurate for solving an IaaS financial exercise. What you need simple datacenter metrics and easily obtained costs.

Cloud infrastructure Economics: Cogs and operating costs

Perhaps the most important benefit of adopting cloud services (either from a public provider or internally from your organization) is that their cost can be quantified and attributed to organizational entities. If a cloud service cannot be metered and measured, then it should not be called a cloud service right?

So, whenever you need to purchase a cloud service or when you are called to develop one, you are presented with a service catalog and assorted pricelists, from where you can budget, plan and compare services. Understanding how the pricing has been formulated is not part of your business since you are on the consumer side. However, you should care: You need to get what you pay for. There must be a very good reason for a very expensive or a very cheap cloud service.

In the past, we have developed a few cloud services utilizing own resources and third party services. Each and every time, determining whether launching the service commercially would be a sound practice depended on two factors:

  • Would customers pay for the service? If yes, at what price?
  • If a similar service already was on the market, where would our competitors stand?
  • What is the operating cost of the service?

Answering the first two questions is straightforward: Visit a good number of trusted and loyal customers, talk to them, investigate competition. That’s a marketing and sales mini project. But answering the last question can be a hard thing to do.

Let us share some insight on the operating costs and cost-of-goods for a cloud service and in particular, infrastructure as a service (IaaS). Whether you already run IaaS for your organization or your customers, you are in one of the following states:

  1. Planning to launch IaaS
  2. Already running your datacenter

State (1) is where you have not yet invested anything. You need to work on implementation and operational scenarios (build or buy? Hire or rent?) and do a good part of marketing plans. State (2) is where you have already invested, you have people, processes and technology in place and are delivering services to your internal or external customers. In state (1) you need to develop a cost model, in state (2) you need to calculate your costs and discover your real operating cost.

In both cases, the first thing you need to do before you move on with cost calculation is to guesstimate (state 1) or calculate (state 2) the footprint of your investment and delivered services. From our experience, the following parameters are what you should absolutely take into account in order to properly find out how much your IaaS really costs.

Financial parameters (real money)

  • EPC: Electrical power and hosting cost. How much do (or would) you pay for electricity and hosting. This can be found from your electricity bill, your datacenter provider monthly invoice or from your financial controller (just make sure you ask them the right questions, unless you want to get billed with the entire company overhead costs). EPC is proportional to your infrastructure footprint (ie number of cabinets and hardware).
  • DCOPS: Payroll for the operations team. You need to calculate the total human resource costs here for the team that will operate IaaS services. You may include here also marketing & sales overhead costs.
  • CALCLIC: Software licensing and support costs for IaaS entire computing infrastructure layer. These are software costs associated with the infrastructure (eg, hypervisor licenses), not license costs for delivered services, eg Microsoft SPLA costs.
  • STORLIC: Software licensing and support costs for your entire storage infrastructure. Include here in their entirety also data backup software costs.
  • SERVER: Cost of a single computing server. It’s good to standardize on a particular server model (eg 2-way or 4-way, rackmount or blade). Here you should include the cost of a computing server, complete with processors but without RAM. RAM to CPU ratio is a resource that is adjusted according to your expected workloads and plays a substantial role in cost calculation. If you plan to use blade servers, you should factor here the blade chassis as well.
  • MEMORY: Average cost of 1 GB or RAM.
  • STORINFRA: Cost of your storage infrastructure, as is, or the storage infrastructure you plan to purchase. Storage costs are not that easy to calculate as a factor of 1 disk GB units, since you have to take into account SAN, backup infrastructure, array controllers, disk enclosures and single disks. Of course we assume you utilize a centralized storage infrastructure, pooled to your entire computing farm.
  • NETINFRA: Cost of data network. As above, include here datacenter LAN, load balancers, routers, even cabling.
  • NETSUPP: Cost of network support (monthly). Include here software licensing, antivirus subscriptions and datacenter network costs.

Operational parameters (Facts and figures)

  • RUAmount of available rack units in your datacenter. This is the RU number you can use to install equipment (protected with UPS, with dual power feeds etc).
  • RU_STOR: Rack units occupied by storage systems
  • RU_CALC: Rack units occupied by computing infrastructure (hypervisors)
  • RU_NET: Rack units occupied by network infrastructure
  • SRV: Virtual machines (already running or how many you plan to have within the next quarter)
  • INTRST: Interest rate (cost of money): Monthly interest rate of credit lines/business loans
  • TOTALMEM: Total amount of virtual memory your SRV occupy
  • TOTALSTOR: Total amount of virtual storage your SRV occupy
  • SRVRAM: Amount of physical memory for each physical server. This is the amount of RAM you install in each computing server. It is one of the most important factors, since it depends on your average workload. A rule of thumb is that for generic workloads, a hardware CPU thread can sustain up to 6 virtual computing cores (vcpu). For each vcpu, you need 4 GB of virtual RAM. So, for a 2-socket, 6-core server you need 2 (sockets) x 6 (cores) x 6 (vcpu) x 4 (GB RAM) = 288 GB RAM. For a 4-way, 8-core server beast with memory intensive workloads (say 8 GB per vcpu) you need 4 x 8 x 6 x 8 = 1536 GB RAM (1.5 TB).
  • MEMOVERPROV: Memory overprovisioning for virtual workloads. A factor that needs tuning from experience. If you plan conservatively, use a 1:1 overprovisioning factor (1 GB of physical RAM to 1 GB of virtual RAM). If you are more confident and plan to save costs, you can calculate an overprovisioning factor of up to 1.3. Do this if you trust your hypervisor technology and have homogenous workloads on your servers (for example, all-Windows ecosystem) so that your hypervisor can take advantage of copy-on-write algorithms and save physical memory.
  • AMORT: Amortization of your infrastructure. This is a logistics & accounting term, but here we mainly use this to calculate the lifespan of our infrastructure. It is expressed in months. A good value is 36 to 60 months (3 to 5 years), depending on your hardware warranty and support terms from your vendor.

If you can figure out the above factors, you can proceed with calculating your operating IaaS costs. Keep reading here!

Usage metering and charging with Cloudstack

One of the prominent features of an IaaS cloud is that one can meter its resource usage by its consumers. Metrics are everywhere: From the hypervisor, virtual disk size, network I/O, occupied IP addresses, virtual CPUs and RAM, they are all over the place, waiting to be collected. As soon as you can grab a handful of metrics, you can implement chargeback policies and report back on your users on their resource consumption or, if you run a public IaaS shop, somehow transform these metrics to invoices.

Cloud.com’s cloudstack comes with an excellect usage server, recording metrics directly from its accounts. During installation, simply select the “Install Usage Server” option and perform some basic configuration, and you are all set to go. The usage server collect no less than thirteen (as of cloudstack release 2.2) metrics, which can be found here. In short, some of the most important ones are:

  • RUNNING_VM: Total hours a virtual machine is started and running on the hypervisor
  • ALLOCATED_VM: Total hours a VM exists (no matter if it’s up or down). Useful parameter for charging OS license usage, for example Microsoft SPLA licenses.
  • IP_ADDRESS: Self evident; applies to public (Internet) IP addresses consumed by a cloudstack account. These addresses are (according to cloudstack architecture) attached to the virtual router of the user
  • NETWORK_BYTES_SENT and NETWORK_BYTES_RECEIVED: Traffic passing through the virtual router of a user
  • VOLUME: Size in bytes of user volumes
  • TEMPLATE and ISO: Size in bytes of user-uploaded VM templates and ISO images
(For those who are not familiar with cloudstack’s architecture, cloudstack users are part of accounts. Virtual machines belonging to a single account live in their own private VLAN, totally isolated from other accounts. Access to the Internet, DHCP addressing, DNS and VPN termination, all take place in a special cloudstack virtual machine, a virtual router. Every account has its own virtual router, not directly controlled by the end user, but via the cloudstack API).
 

The service (“cloud-usage”) starts along with the rest cloud services on your cloudstack controller and its configuration variables are at the global parameters of cloudstack. The most important are usage.stats.job.aggregation.range and usage.stats.job.exec.time. The first controls the aggregation interval (in minutes) of collected metrics and the second the time the aggregation algorithm kicks in. Remember to restart the usage server service (“cloud-usage”) everytime you play with these variables.

All metrics are stored in a second database, called “cloud_usage”. To see if your usage server really works, connect to that database and see if its tables start to fill (all metrics tables start with “usage_*”). Data can be retrieved from the database, however, a more elegant way is to use the cloudstack API. The most useful API calls are:

  • listUsageRecords: Takes as arguments account, start & end date and returns usage records for the specified time interval.
  • generateUsageRecords: Starts the aggregation process asynchronously

Accessing the API is a breeze: Generate the secret and API keys from the console and pass them as arguments to a python script or a simple wget and target the API port (which is 8080 for plain http, or a designated SSL port).

So, what do you do with all these collected metrics? Well, there are two ways to deal with them. The first is to write a few complex scripts that collect the metrics from the API, sanitize them, implement your billing scheme and export to your reporting tool or ERP to generate invoices.

The second is to use an off the shelf charging/billing solution. As of January 2012, Amysta have a product in beta and Ubersmith offer complete cloudstack billing in their product, Ubersmith DE.

Breaking up with your cloud provider

Suppose you run an average SMB business and you’re proud of having achieved a fairly low TCO by hosting all your operations on your favorite cloud provider, AcmeCloud, who do an excellent job running your virtual machines flawlessly with a 100% uptime and perfect network latency figures. All is well, you have no IT in house and your computing geeks funnel their resources developing new services and bringing in new customers every day. AcmeCloud does all the chores for you (operating VMs, allocating storage, snapshoting and replicating volumes, maintaining SSL certificates and network load balancing) and charging you only for what you use. Until that day…

It started with a 2-hour outage, attributed to a lightning strike. Then, two months later in the peak of the holiday season, you are notified that you must change your SSL certificates due to an intrusion to AcmeCloud secure server – and inform your customers, all 2000 of them, to do the same. And a few weeks later, due to a mishap during a network upgrade, your data volumes suddenly become unresponsive for a whole day causing data loss and forcing a rollback to 2-day old snapshots. Time to find a new home for your data services.

As easy as it is to start using cloud services, that’s how difficult it becomes to change to a new cloud provider. Downloading VM images and uploading them to another provider over the Internet is expensive – you have to utilize high bandwidth for a long time. Exporting data sets from an SaaS data repository and importing them to another is even more difficult, since you may have to adjust data schemas and types. Hoping that all will go smooth and you will be done in a few days is equally possible to finding a pink elephant grazing in your back yard.

In the traditional, old-fashioned IT world we all love to hate, where you have your servers in your little computer room backing them up to tapes and NFS volumes, the story described above is equal to a disaster recovery event. Something breaks beyond repair and you have to rush to find new servers and Internet uplink, then restore everything – well, everything that’s possible – to the new systems and restore services. This has happened once or twice to IT veterans and it’s quite painful to recall.

One could argue that a disaster recovery site and a BC plan would be the solution to this, but, how many average SMBs do you know that can afford a second datacenter? Very few. But, what would be the analogy of a disaster recovery site in the cloudified enterprise? Simply, a second cloud provider. Let’s take some time and weigh the pros and cons of moving your SMB to the clouds, using two cloud providers from day one.

The bad stuff first:

  • Costs will double: You pay two providers. That’s straightforward (really?)
  • Complexity will also double. Each provider has their own interfaces and APIs, which you have to familiarize with.
  • You have to maintain data consistency from one provider to the other. If you go for an active-passive scheme, you have to transfer data from one provider to the other on a frequent schedule
  • You have to control your DNS domain so that you can update your domain entries when you have to switch from one provider to the other

 

The good stuff:

  • Costs will not necessarily double! By utilizing both providers at the same time, you can split services among both. When either fails, utilizing their elastic cloud infrastructure you can instantly fire up dormant Vms on the other.
  • You have the luxury to make tactical decisions at your own time, not under time pressure. For example, you can tune your online services at your own pace by balancing them across both or preferring one of them that offers better network latency, while keeping data services on the other that offers cheaper storage.
  • You can plan a cloud strategy, by eliminating one of two providers and migrating to a third without losing deployed services.
  • By being forced to move data back and forth from one provider to the other, your IT skills in data governance and transformation will be enriched, mandating your organization to retain control over your data lifecycle and not delegate this function to the cloud provider.

 

Planning a cloud strategy with two cloud providers instead of one is the same pattern that cloud providers themselves utilize: Reliable services built on unreliable resources. You cannot trust 100% any cloud provider, but you can trust a service model that is built on redundant service blocks.

A quick tour of Cloudstack

Cloud.com, now a part of Citrix, has developed a neat, compact, yet powerful platform for cloud management: Enter cloudstack, a provisioning, management and automation platform for KVM, Xen and VMware, already trusted for private and public cloud management frmo companies like Zynga (got Farmville?), Tata communications (public IaaS) and KT (major Korean Service provider).

Recently I had the chance to give cloudstack a spin in a small lab installation with one NFS repository and two Xenservers. Interested in how it breathes and hums? Read on, then.

Cloudstack was installed in a little VM in our production vSphere environment. Although it does support vSphere 4.1, we decided to try it with Xen and keep it off the production ESX servers. Installation was completed in 5 minutes (including the provisioning of the Ubuntu 10.04 server from a ready VMware tremplate) and cloudstack came to life, waiting for us to login:

The entire interface is AJAX – no local client. In fact, cloudstack can be deployed in a really small scale (a standalone server) or in a full-blown fashion, with redundant application and database servers to fulfill scalability and availability policies.

Configuring cloudstack is a somewhat more lengthy process and requires reading the admin guide. We decided to follow the simple networking paradigm, without VLANs and use NFS storage for simplicity. Then, it was time to define zones, pods and clusters, primary and secondary storage. In a nutshell:

  • A zone is a datacenter. A zone has a distinct secondary storage, used to store boot ISO images and preconfigured virtual machine templates.
  • A pod is servers and storage inside a zone, sharing the same network segments
  • A cluster is a group of servers with identical CPUs (to allow VM migration) inside a pod. Clusters share the same primary storage.
We created a single zone (test zone) with one pod and two clusters, each cluster consisting of a single PC (one CPU, 8 GB RAM) running Xenserver 5.6. Configuring two clusters was mandatory, since the two Xenservers were of different architectures (Core 2 and Xeon). After the configuration was finished, logging in to Cloudstack as administrator brings us to the dashboard.

In a neat window, the datacenter status is shown in clear, with events and status in the same frame. From here an administrator has full power over the entire deployment. This is a host (processing node in Openstack terms) view:

You can see the zone hierarchy in the left pane and the virtual machines (instances) running on the host shown in the pane on the right.

Pretty much, what an administrator can do is more or less what Xencenter and vCenter do: Create networks, virtual machine templates, configure hosts and so on. Let’s see how the cloudstack templates look like:

Cloudstack comes with some sample templates and internal system virtual machine templates. These are used internally, but more on them later. The administrator is free to upload templates for all three hypervisor clans (KVM, Xen and Vcenter). For KVM, qemu images, for VMware, .ova and for Xenserver VHD. We created one Windows 2008 server template quite easily, by creating a new VM in Xencenter, installing Xentools and then uploading the VHD file in Cloudstack:

As soon as the VHD upload is finished, it is stored internally in the Zone secondary storage area and is ready to be used by users (or customers).

How does cloudstack look like from the user/customer side? We created a customer account (Innova) and delegated access to our test zone:

Customers (depending on their wallet…) have access to one or more pods and can create virtual machines freely, either from templates of from ISO boot images they have access to, without bringing into the loop cloudstack administrators. Creating a new virtual machine (instance) is done through a wizard. First, select your favorite template:

Then, select a service offering from preconfigured sizes (looks similar to EC2?)

Then, select a virtual disk. A template comes with its own disk (in our case the VHD we uploaded earlier), but you can add more disks to your instances. This can also be done after the instance is deployed.

…and after configuring the network (step 4), you are good to go:

The template will be cloned to your new instance, boot up, and form this point on, you can log in through the web browser – no RDP or VNC client needed!

It’s kind of magic — doing this via an app server seems impossible, right? Correct. Cloudstack deploys silently and automagically its own system VMs that take care of template deployment to computing nodes and storage. Three special kinds of VMs are used:

  • Console proxies that relay to a web browser VNC, KVM console or RDP sessions of instances. One console proxy runs in every zone.
  • Secondary storage VM, that takes care of template provisioning
  • Virtual router, one for every domain (that is, customers), which supplies instances with DNS services, DHCP addressing and firewalling.
Through the virtual router users can add custom firewall rules, like this:
All these system virtual machines are managed directly from cloudstack. Login is not permitted and they are restarted upon failure. This was demonstrated during an unexpected Xenserver crash, which brought down the zone secondary storage VM. After the Xenserver was booted up, the secondary storage VM was restarted automatically by cloudstack and relevant messages showed up in the dashboard. Cool, huh?

Customers have full power over their instances, for example, they can directly interact with virtual disks (volumes), including creating snapshots:

In all, from our little cloudstack deployment we were really impressed. The platform is very solid, all advertised features do work (VM provisioning, management, user creation and delegation, templates, ISO booting, VM consoles, networking) and the required resources are literally peanuts: It is open source and all you need are L2 switches (if you go with basic networking), servers and some NFS storage. Service providers investigating options for their production IaaS platform definitely should look into cloud.com offerings, which has been a part of Citrix since July 2011.

Building a cloud

Question: How many people do you need to build and run a cloud?

Answer: As many as you can fit in a meeting room.

A cloud offering IaaS and SaaS to customers is nothing more than a compact and complex technology stack. Starting from the bottom to the top, you have servers, storage (NFS/iSCSI/FC), networking (LIR, upstream connections, VLANs, load balancers) , data protection (snapshots, replication, backup/restore), virtualization (pick your flavor), cloud management (Applogic/Openstack/Cloudstack/OpenNebula/Abiquo/vCommander/you-name-it), metering & billing (eg WHCMS), helpdesk (like Kayako), user identity management, database platform (Hadoop), application servers, hosted applications and web services. All this stuff has to work. And work efficiently, if you want to attract, retain and expand your customer base, simply because your customers simultaneously use all these resources: From their browsers, customer actions ripple through firewalls, load balancers, switches, web and application servers, databases, hypervisors and disks, crossing the entire cloud stack up, down and sideways.

The only way to run this stack is… to use humans. Of what skills? System engineering, storage management, networking, security, application architecture, coding, coding, coding, web marketing, technical management and more coding. And all of them must be able to sit around the same table, talk and understand each other, if you want your cloud stack to simply work. This calls for a small headcount of gifted people (and well compensated – slide 8) that can not only deliver on the technical side but understand the cloud business and the Internet business as well.

The trick question: What kind of company can host this ecosystem? Service providers? Datacenter hosting? Web hosters? Software vendors? Well… this would depend on the company DNA. Take for example Amazon and Google. Neither was a datacenter/network provider or software vendor; Amazon is the largest online retailer, Google is the king of online advertising. Yet, both of them fostered the right kind of people that spun off what we have and use today.

Of supermarkets and clouds

OK, no more cloud computing definitions for me. I’ve found the perfect metaphor to explain what cloud computing is: The supermarket.

Probably you don’t remember how your parents (or grandparents) did their shopping in ye olde days, when supermarkets did not exist. Well, I can still remember my grandmother; she took her shopping bag and went to the butcher around the corner, the fish market downtown, the grocery store across the street and so on. It was fun; each shop had its own smell, arrangement, window and a different face behind the bench. The whole process took hours but it sure was a pleasant thing to do. And you had to do that over and over again, at least 2-3 times a week.

Now, my grandmother has passed away and all these little shops are long gone. Behold the supermarket. Drive, park, grab a cart, cross all the aisles, fill the cart, push across the tellers, pay, load car, drive away, talk to nobody. You’re done in one hour tops. And you’ve got to do that only once per week (depending on the mouths you have to feed…)

What does this have to with cloud computing? Think about it:

  • Cloud computing is about infrastructure uniformity. Like a supermarket, you have abundance of a limited number of the latest choices: Storage is massive, yet in two or three flavors (FC, NFS, iSCSI). Servers are Intel/AMD only with the same CPU stepping. Software stacks are canned – and everything must be kept at the same current revision, otherwise things will start breaking off. In contrast, a cluster of “legacy” HP superdomes or Sun E-series boxes, complete with their own SAN, backup TAN and a team of humans to manage them smells and feels like that old local shop around the corner: It has a little bit of everything. Complex, disparate, old software stacks. Dedicated storage. Cluster-specific network interconnects. Cryptic hardware. Exotic chips. Loyal admins. Human interaction. Everything.
  • Cloud computing is about making things easier. Service provisioning is a few clicks away. Hardware provisioning does not exist; everything is racked, cabled and powered once. System reconfiguration is almost automatic. In a legacy environment (well, in a non-cloud IT shop) trips to the computer room are frequent, CD/DVD swapping does happen, system provisioning is still a ritual ceremony, installing firmware, operating systems, service packs, patches and applications. Just like paying a visit to the grocery shop, then the bakery and the butcher, carrying those heavy shopping bags. Now, think how shopping is done in a supermarket and you get the picture.
  •  Supermarkets are big, neighborhood shops are small. Big size means cheap prices and countless shelves with goods. The same applies to cloud computing: Clouds are efficient in XXL sizes; that’s why cloud provider datacenters are massive. The downside? In a supermarket you can buy only what’s on the shelf and pay what the pricetag says. Unless you buy tons of stuff, you cannot ask the management to bring in a new product at a better price. In a small shop, if the owner knows your grandmother, well, you can ask for extra candies.
  • There is a supermarket in every town, meaning you can find your preferred brand of coffee (as long as it’s on the shelf) all over the country. If your local supermarket is blown to bits by a giant spider/tsunami/alien, drive to the next town. Cloud metaphor: More or less, all cloud service providers have redundant datacenters and data replication across them, so whenever a network outage or a natural disaster strikes, it’s likely that your services will survive.