Tag Archives: networking

Cloud infrastructure Economics: Calculating IaaS cost

In our previous post, we referenced a number of financial factors and operational parameters that we need to take into account in order to calculate some meaningful costs for IaaS services. Let’s see now how these are combined to produce IaaS cost items.

Your mileage may vary on IaaS; you may be renting datacenter space, leasing equipment, operating your own facilities or simply using public clouds to run your business. However, at the end of the day, you need to utilize an apples-to-apples metric to find out which strategy is the most cost-effective. For IaaS, the metric is the footprint of your infrastructure, expressed in terms of virtual computing units: The monthly cost of a single virtual server, broken down to virtual CPU/Memory and virtual storage resources.

You can check online the formulas in this sheet here. If you want to take a peek on how the formulas work, continue reading.

For simplicity, we will not dive into virtua machine OS licensing costs here – they are easy to find out, anyway (those familiar with Microsoft SPLA should have an idea). We will calculate only monthly running costs of IaaS, expressed in the following cost items:

  • SRVCOST: Individual virtual server monthly cost: Regardless of virtual machine configuration, the monthly cost of spinning up a virtual machine.
  • COMPUTECOST: Virtual computing unit cost: The cost of operating one virtual memory GB and assorted CPU resources per month.
  • DISKCOST: Virtual disk unit cost: The cost of operating one virtual storage GB per month.

Almost all IaaS public cloud providers format their pricelists according to these three cost items, or bundle them in prepackaged virtual server sizes. Calculating these can help immensely in finding good answers to the “build or buy” question if you plan to adopt IaaS for your organization, or determine the sell price and margins if you are a public cloud provider.

Let’s see now each cost item one by one.

Individual server cost (SRVCOST)

How much does it cost to spin up a single virtual machine per month? What do we need to take into account? Well, a virtual machine, regardless of its footprint, needs some grooming and the infrastructure it will run on. The assorted marginal costs (cost to add one more virtual machine to our IaaS infrastructure) are the following:

  • C_SRV: Cost of maintaining datacenter network infrastructure (LAN switching, routing, firewall, uplinks) and computing infrastructure software costs (support & maintenance). We do not include here hardware costs since these are related to the footprint of the virtual machine.
  • C_DCOPS: Cost of manhours required to keep the virtual machine and related infrastructure up and running (keep the lights on)
  • C_NWHW: Cost of network related hardware infrastructure required to sustain one virtual machine. These are pure hardware costs and reflect the investment in network infrastructure needed to keep adding virtual machines.

An essential unit used in most calculation is the cost of rack unit. Referring to our older post for the EPC variable, this is expressed as

C_RU=EPC/RU

This gives us an approximation of the cost of one rack unit per month in terms of monthly electiricy and hosting cost (EPC).

C_SRV is expressed as a function of NETSUPP (monthly network operating & support costs), RU_NET (total network infrastructure footprint), CALCLIC (virtualization/computing infrastructure maintenance & software costs) and SRV (total virtual servers running). The formula is:

C_SRV=( NETSUPP + CALCLIC + C_RU*RU_NET ) / SRV

C_RU*RU_NET is the hosting cost of the entire networking infrastructure (switches, patch panels, load balancers, firewalls etc).

C_DCOPS is straightforward to calculate:

C_DCOPS = DCOPS / SRV

And finally C_NWHW is the hardware cost needed to add one more virtual server. To calculate C_NWHW we take into account the current network infrastructure cost and then we calculate how much money we have to borrow to expand it in order to provision one more virtual server. The way we do this is to divide the total network infrastructure cost with the number of provisioned virtual machines and spread this cost over the lifecycle of the hardware (AMORT), augmented with a monthly interest rate (INTRST):

C_NWHW=(NETINFRA/SRV) * (1+INTRST) / AMORT

Computing cost (COMPUTECOST)

As a computing unit, for simplicity we define one GB of virtual RAM coupled with an amount of processing power (CPU). Finding the perfect analogy between memory and CPU power is tricky and there is no golden rule here, so we define the metric as the amount of virtual RAM. The exact CPU power assigned to each virtual RAM GB depends on the amount of physical RAM configured in each physical server (SRVRAM) and the number of physical CPU cores of each server. COMPUTECOST is broken down to two cost items:

  • C_MEM: It is the cost associated with operating the hardware infrastructure that provisions each virtual RAM GB.
  • C_SRVHW: It is the cost associated with purchasing the hardware infrastructure required to provide each virtual RAM GB.

C_MEM depends on running costs and is the cost of compute rack units divided by the total virtual RAM deployed in our cloud:

C_MEM = (RU_CALC * C_RU) / TOTALMEM

Note that in some cases (like VMware’s VSPP program) you may need to add up to the above cost software subscription/license costs, if your virtualization platform is licensed per virtual GB.

C_SRVHW is calculated in a more complex way. First, we need to find out the cost of hardware associated with each virtual GB of RAM. This is the cost of one physical server equipped with RAM, divided with the amount of physical RAM adjusted with the memory overprovisioning factor:

CAPEX_MEM = (SERVER + MEMORY * SRVRAM) / (SRVRAM * MEMOVERPROV)

In a similar way with C_NWHW, we calculate the acquisition cost spread over the period of infrastructure lifecycle, with the monthly interest rate:

C_SRVHW = CAPEX_MEM * (1 + INTRST) / AMORT

Virtual storage cost (DISKCOST)

Calculating DISKCOST is simpler. The two cost items, in a similar way to COMPUTECOST are:

  • C_STOR: It is the cost associated with operating the hardware infrastructure that provisions each virtual RAM GB.
  • C_STORHW: It is the cost associated with purchasing the hardware infrastructure required to provide each virtual disk GB.

C_STOR is based on the existing operating costs for running the storage infrastructure and is calculated proportionally to the provisioned disk capacity:

C_STOR = (STORLIC + RU_STOR * C_RU) / TOTALSTOR

C_STORHW is the cost of investment for each storage GB over the infrastructure lifecycle period:

C_STORHW = (STORINFRA/TOTALSTOR) * (1 + INTRST) / AMORT

 

One can elaborate on this model and add all sorts of costs and parameters, however, from our experience, this model is quite accurate for solving an IaaS financial exercise. What you need simple datacenter metrics and easily obtained costs.

Advertisements

Cloud infrastructure Economics: Cogs and operating costs

Perhaps the most important benefit of adopting cloud services (either from a public provider or internally from your organization) is that their cost can be quantified and attributed to organizational entities. If a cloud service cannot be metered and measured, then it should not be called a cloud service right?

So, whenever you need to purchase a cloud service or when you are called to develop one, you are presented with a service catalog and assorted pricelists, from where you can budget, plan and compare services. Understanding how the pricing has been formulated is not part of your business since you are on the consumer side. However, you should care: You need to get what you pay for. There must be a very good reason for a very expensive or a very cheap cloud service.

In the past, we have developed a few cloud services utilizing own resources and third party services. Each and every time, determining whether launching the service commercially would be a sound practice depended on two factors:

  • Would customers pay for the service? If yes, at what price?
  • If a similar service already was on the market, where would our competitors stand?
  • What is the operating cost of the service?

Answering the first two questions is straightforward: Visit a good number of trusted and loyal customers, talk to them, investigate competition. That’s a marketing and sales mini project. But answering the last question can be a hard thing to do.

Let us share some insight on the operating costs and cost-of-goods for a cloud service and in particular, infrastructure as a service (IaaS). Whether you already run IaaS for your organization or your customers, you are in one of the following states:

  1. Planning to launch IaaS
  2. Already running your datacenter

State (1) is where you have not yet invested anything. You need to work on implementation and operational scenarios (build or buy? Hire or rent?) and do a good part of marketing plans. State (2) is where you have already invested, you have people, processes and technology in place and are delivering services to your internal or external customers. In state (1) you need to develop a cost model, in state (2) you need to calculate your costs and discover your real operating cost.

In both cases, the first thing you need to do before you move on with cost calculation is to guesstimate (state 1) or calculate (state 2) the footprint of your investment and delivered services. From our experience, the following parameters are what you should absolutely take into account in order to properly find out how much your IaaS really costs.

Financial parameters (real money)

  • EPC: Electrical power and hosting cost. How much do (or would) you pay for electricity and hosting. This can be found from your electricity bill, your datacenter provider monthly invoice or from your financial controller (just make sure you ask them the right questions, unless you want to get billed with the entire company overhead costs). EPC is proportional to your infrastructure footprint (ie number of cabinets and hardware).
  • DCOPS: Payroll for the operations team. You need to calculate the total human resource costs here for the team that will operate IaaS services. You may include here also marketing & sales overhead costs.
  • CALCLIC: Software licensing and support costs for IaaS entire computing infrastructure layer. These are software costs associated with the infrastructure (eg, hypervisor licenses), not license costs for delivered services, eg Microsoft SPLA costs.
  • STORLIC: Software licensing and support costs for your entire storage infrastructure. Include here in their entirety also data backup software costs.
  • SERVER: Cost of a single computing server. It’s good to standardize on a particular server model (eg 2-way or 4-way, rackmount or blade). Here you should include the cost of a computing server, complete with processors but without RAM. RAM to CPU ratio is a resource that is adjusted according to your expected workloads and plays a substantial role in cost calculation. If you plan to use blade servers, you should factor here the blade chassis as well.
  • MEMORY: Average cost of 1 GB or RAM.
  • STORINFRA: Cost of your storage infrastructure, as is, or the storage infrastructure you plan to purchase. Storage costs are not that easy to calculate as a factor of 1 disk GB units, since you have to take into account SAN, backup infrastructure, array controllers, disk enclosures and single disks. Of course we assume you utilize a centralized storage infrastructure, pooled to your entire computing farm.
  • NETINFRA: Cost of data network. As above, include here datacenter LAN, load balancers, routers, even cabling.
  • NETSUPP: Cost of network support (monthly). Include here software licensing, antivirus subscriptions and datacenter network costs.

Operational parameters (Facts and figures)

  • RUAmount of available rack units in your datacenter. This is the RU number you can use to install equipment (protected with UPS, with dual power feeds etc).
  • RU_STOR: Rack units occupied by storage systems
  • RU_CALC: Rack units occupied by computing infrastructure (hypervisors)
  • RU_NET: Rack units occupied by network infrastructure
  • SRV: Virtual machines (already running or how many you plan to have within the next quarter)
  • INTRST: Interest rate (cost of money): Monthly interest rate of credit lines/business loans
  • TOTALMEM: Total amount of virtual memory your SRV occupy
  • TOTALSTOR: Total amount of virtual storage your SRV occupy
  • SRVRAM: Amount of physical memory for each physical server. This is the amount of RAM you install in each computing server. It is one of the most important factors, since it depends on your average workload. A rule of thumb is that for generic workloads, a hardware CPU thread can sustain up to 6 virtual computing cores (vcpu). For each vcpu, you need 4 GB of virtual RAM. So, for a 2-socket, 6-core server you need 2 (sockets) x 6 (cores) x 6 (vcpu) x 4 (GB RAM) = 288 GB RAM. For a 4-way, 8-core server beast with memory intensive workloads (say 8 GB per vcpu) you need 4 x 8 x 6 x 8 = 1536 GB RAM (1.5 TB).
  • MEMOVERPROV: Memory overprovisioning for virtual workloads. A factor that needs tuning from experience. If you plan conservatively, use a 1:1 overprovisioning factor (1 GB of physical RAM to 1 GB of virtual RAM). If you are more confident and plan to save costs, you can calculate an overprovisioning factor of up to 1.3. Do this if you trust your hypervisor technology and have homogenous workloads on your servers (for example, all-Windows ecosystem) so that your hypervisor can take advantage of copy-on-write algorithms and save physical memory.
  • AMORT: Amortization of your infrastructure. This is a logistics & accounting term, but here we mainly use this to calculate the lifespan of our infrastructure. It is expressed in months. A good value is 36 to 60 months (3 to 5 years), depending on your hardware warranty and support terms from your vendor.

If you can figure out the above factors, you can proceed with calculating your operating IaaS costs. Keep reading here!

Building a cloud

Question: How many people do you need to build and run a cloud?

Answer: As many as you can fit in a meeting room.

A cloud offering IaaS and SaaS to customers is nothing more than a compact and complex technology stack. Starting from the bottom to the top, you have servers, storage (NFS/iSCSI/FC), networking (LIR, upstream connections, VLANs, load balancers) , data protection (snapshots, replication, backup/restore), virtualization (pick your flavor), cloud management (Applogic/Openstack/Cloudstack/OpenNebula/Abiquo/vCommander/you-name-it), metering & billing (eg WHCMS), helpdesk (like Kayako), user identity management, database platform (Hadoop), application servers, hosted applications and web services. All this stuff has to work. And work efficiently, if you want to attract, retain and expand your customer base, simply because your customers simultaneously use all these resources: From their browsers, customer actions ripple through firewalls, load balancers, switches, web and application servers, databases, hypervisors and disks, crossing the entire cloud stack up, down and sideways.

The only way to run this stack is… to use humans. Of what skills? System engineering, storage management, networking, security, application architecture, coding, coding, coding, web marketing, technical management and more coding. And all of them must be able to sit around the same table, talk and understand each other, if you want your cloud stack to simply work. This calls for a small headcount of gifted people (and well compensated – slide 8) that can not only deliver on the technical side but understand the cloud business and the Internet business as well.

The trick question: What kind of company can host this ecosystem? Service providers? Datacenter hosting? Web hosters? Software vendors? Well… this would depend on the company DNA. Take for example Amazon and Google. Neither was a datacenter/network provider or software vendor; Amazon is the largest online retailer, Google is the king of online advertising. Yet, both of them fostered the right kind of people that spun off what we have and use today.

Use python to talk to NNMi

One more post for people who work with NNMi 9.x. NNMi, being a jboss animal, has a pretty decent WS-I API to talk to the world. The interface is documented in the developer’s toolkit guide (available from the HP documentation site, need an HP passport account to access) and open.

Accessing the API is peanuts to any Java developer, but for the rest of us, it’s not straightforward. Well, with a little help from Python, everything is possible:

#!/usr/bin/python

from suds.client import Client
from suds.transport.http import HttpAuthenticated

t = HttpAuthenticated(username='system', password='NNMPASSWORD')

url = 'http://nnmi/NodeBeanService/NodeBean?wsdl'

client = Client(url, transport=t)

# Retrieve full node list
allNodes = client.service.getNodes('*')

print "Nodes in topology:", len(allNodes.item)

for i in allNodes.item[:]:
  print i.name,i.deviceModel

This small script connects to the NNMi NodeBean web service, retrieves the full list of managed nodes to populate the ‘allNodes’ object and from there it prints out the hostname and device type as discovered from NNMi. What you need is the suds library (available here, or installable with a few clicks in the Ubuntu software center).

Of datacenter transformation and clouds

Looking back a few years, when virtualization was still “under evaluation” and Facebook a fancy new thing, the term “cloud” did not exist. It’s not a coincidence that we started talking about cloud computing no sooner than Internet industries (Salesforce, Google, Amazon et al) reached the critical mass to offer services attractive to enterprise IT, and at the same time, the enterprise IT emerged as a business enabler.

What happened? Software and infrastructure as a service, a game well understood by cloud providers, came within the reach of enterprise datacenters. Private clouds now are a reality, a way to provide services to internal customers of an organization on demand, swiftly, on a consolidated multitenant architecture. For an IT worker, the change of the landscape is dramatic. The datacenter transformation from the established server-OS-application stack to the mesh of the cloud (server-hypervisor-shared storage-virtual networking-virtual OS-dynamic load balancing-automatic scaling-resource metering-automation-application server-AJAX stacks) is so immense that it’s very hard to keep up and sometimes, not understood at all.

Let’s spend 168 seconds and see what a private cloud stack would look like:

  • Uniform servers, which in some cases have dual or triple role (computing, storage, networking) with lots of RAM, CPU cores and network ports
  • The network is a massive switching core, tuned to offer troublesome automation and constant reconfiguration, using the same fabric for both data and storage traffic (check out OpenFlow)
  • Massive storage with lots of spindles to cope with mostly write traffic, integrated snapshots and data replication
  • Two or more hypervisor clans (XEN, VMware, HyperV, KVM), each with its own management and provisioning stack
  • A variety of virtual machine templates, all flavors of Linux and Windows, from desktop (VDI) to server
  • Metering of resource consumption and billing
  • Service catalog to end consumers, provisioning workflows
  • Licensing made quite complex (see this and this)
  • Lots of the usual enterprise stacks (eg Citrix, Oracle, Websphere) virtualized and highly available from the hypervisor layer
  • Virtual backup solutions (like Veeam)
  • And an automation stack to rule them all (see this)
Most of these require skills beyond those of an average IT engineer. Storage, networking and virtualization skills are mandatory in order to understand and handle this stack. But, that’s what a cloud is: more than the sum of its parts. 
Ironically, integrators that claim to offer “datacenter transformation” services in essence mean cloud computing, but have failed to realize this fact. They do not grasp the full picture and tend to offer point solutions (a hypervisor here, a smarter backup solution there and shiny powerful servers). The sooner system integrators realize this, the better for them and their customers.

Your Virtual Cisco IOS

Want to play with IOS but you don’t have a catalyst around? Try this. GNS3 is a marvelous and clever frontend to dynamips, dynagen and qemu which allow emulation/execution of IOS and JunOS code under a third operating system. That is, Cisco and Juniper virtualized on your desktop.

GNS3 topology of our virtual lab

What you will need is a decent PC with lots of memory (4GB to start with plus a fast CPU) and IOS/JunOS software images. The first is easy to do, for the second you will need access to licensed software or the actual hardware itself. My recommendation for the OS is Ubuntu with readily donwloadable and installable packages of all bits and pieces (# apt-get install gns3), however windows works just fine but with the 4GB constrains for a 32bit OS. Cool screenshots here.

How it works in a few words: MIPS and PPC based hardware (Cisco 26xx, 36XX, 37xx, 72xx) is emulated via dynamips running the IOS image unchanges. JunOS on the other hand is emulated with qemu using Olive, a stripped down version of JunOS, sort of an SDK. You design the topology via a snappy GUI (that is, GNS3), configure your virtual gear and then GNS3 fires up the emulators underneath. CPU and mem usage go skyhigh, but then, you have your own virtual private lab. Communication with the real world (the wire) is done via tap and bridge interfaces. Using a sniffer you can actually see real packets (with Cisco MAC prefixes and stuff) from your virtual devices swimming in your LAN.

What will work: All popular Cisco IOS devices with most linecards, JunOS Olive.

What will not work: Virtualizing dynamips itself is tricky. The emulator engine will work in a virtual host savagely consuming virtual CPU and memory resources, yet, the forged MAC addresses may not exit your hypervisor virtual switch. In vSphere, *sometimes* dynamips could emit packages only to other virtual machines running on the same ESX host, but this was not always the case… Also, note that performance is sluggish, so use GNS3 only as a demonstration and lab tool.