Tag Archives: databases

Virtualization and software licensing, a true Monty Python story: The case of Microsoft

I’ve watched lots of presentations and workshops on virtualization and I’ve done a few myself to customers. Quite naturally, all focus on how easy and magical it is to take your real servers, made of metal and plastic, and magically turn them into software bits and pieces, untouchable and pure, running in the Matrix. But few, very few dare to unfold the horrible truth about what happens to your software licenses as soon as you virtualize commercial software.

Straight to the point: We assume that you have a windows server shop and dare to go virtual. Every system runs some sort of Windows server (Standard & Enterprise) and your applications are Oracle databases, MS SQL Server, Exchange server and some IBM Websphere application servers. Windows are licensed by the server, Oracle, WebSphere & SQL server by the CPU cores or sockets and Exchange by server. Be prepared for a cosmic effect on your software licenses.

Let’s begin with Microsoft. Luckily, MS has a sort of guide here on how virtualization affects licensing – make sure you read the accompanying Word documents (if you can take it). First, you have to know that Microsoft allows Windows VMs loaded with MS applications to float from server to server, as long as they are in the same “server farm”. What is a server farm? Well. Up to two datacenters connected via network no more than 4 timezones away

Oh, kindly note that we refer only to Microsoft Volume Licensing, not OEM or FPP (Full Packaged Products). They don’t apply. You have been warned.

Now, how are Windows servers licensed under a virtual fabric (in the same “server farm”, so to speeak)? If you believe that a properly licensed Windows 2008R2 physical server that was sucked into the virtual fabric is allowed to run as a VM and hop from ESX to ESX, then, you are wrong. It’s now allowed, unless it is the sole Windows Server Standard Edition running on your ESX. If it was an enterprise edition, well, you can run up to four instances on that ESX. What is the solution??? Go ahead and buy Windows Server Datacenter Edition (licensed per CPU) and assign one license to each and every ESX/XEN/KVM host you have. Only then you can run as many Windows Server VMs you wish on your entire server farm….

What about Microsoft suites like Exchange, Sharepoint, SQL server? The situation with SQL server is that now it’s licensed per virtual processor  – that’s vCPU, meaning that if you have a two-socket, 4-core per CPU ESX/XEN/KVM server and you have two Windows/SQL server Enterprise VMs with four vCPUs assigned to each VM, you need 2 X 4 = 8 processor licenses, regardless if the physical system has two processors. The good thing is that your Windows/SQL server VM is allowed to hop from server to server. Now, for Sharepoint, Exchange etc, a plain old server license is sufficient for Microsoft to allow you to play.

I won’t calculate relevant costs, this is left as an exercise to the reader (Hint: For an initial P2V migration of 4 to 1, costs only for Windows licenses can rise 6-fold, however, a properly licensed virtual fabric can run an unlimited number of Windows VMs). I would advise you to contact your Microsoft TAM to clarify the details; we have only scratched the surface. VDI licenses and desktop OSs are another story.

EU Data Retention: A really big pile of logs

Legalize it: Keeping logs of all voice and digital communications has been mandatory in the EU under the directive 2006/24/EC , forcing member states to bring into effect domestic laws enforcing local service providers to comply. Whether this is good or bad for us is another discussion, but it sure is not good for carriers and ISPs.

Data that should be retained are primarily call records from phone switches and MSCs (whom did you call, when, from which location, duration of call, mobile phone ID and other stuff), emails sent and received (headers only, no content), in some cases, visited URLs and lots of other related stuff, like your personal information that is required to bind your phone number or IP address to your real name and home address. This is data required by law enforcement authorities to track you down if you do really bad bad things, Winston Smith.

The trouble is that such data are produced in massive quantities. Each phone call generates one or two CDRs, a few hundred bytes long. Each email a few lines in a log file and so on. Multiply these by the number of subscribers of a carrier or ISP and you have figures in the order of a few gigabytes per day. All these data must be stored in a safe place so that when the Law knocks on your door and requests the whereabouts of a Bad Guy, the service provider delivers all relevant information in a few days. Now, try and run something like:

$ gzcat logs/from_everywhere/*.gz | grep $BAD_GUY_PHONE | awk '{print $7","$23","$12 }' > /tmp/calls.csv
$ gzcat hlr_logs/from_everywhere/*.gz | grep $BAD_GUY_IMSI | awk '{print $1","$4","$32}' > /tmp/loc.csv
$ gzcat crm_export/*.gz | grep $BAD_GUY_NAME | awk '{print $3","$4","$8","$23","$7 }' > /tmp/info.csv

on gigabytes of (compressed) data, then import the CSV files to excel to try to correlate them and produce some meaningful information for the authorities… Excel will probably explode before your brain does.

The question is, is there any cool software out there that can automate this process? Let’s do a 3-minute analysis.

The lifecycle of call data retention looks like this: First, data are collected from all sources, sanitized and ingested into a safe data repository. After a predefined data expiration period (say one year) information should be automatically expunged from the database (minus the records that under investigation). At any time, the system should produce information required by law enforcement authorities in a timely and accurate manner, without direct human manual intervention on data.  Data should be archived, protected (encrypted) and be immune to alteration of any kind.

What kind of software would do the job? Certainly not conventional relational databases. Importing a few gigabytes every day in your Oracle database will try your DBA insane and the database itself doing nothing more that updating indexes and taking up disk space, let alone the fact that you need an epic disk array to handle the load. What about using your Security Information Management application? Well, SIM can do a good job in finding in real time suspicious security events from your antivirus, IPS and firewall logs, but cannot handle the massive daily data volume and accumulated information. A distributed cloud database? Maybe, if you are Google or Amazon…

Actuyally, there is software that is built for this job. It all starts with the database. What we need here is a database that can support complex queries involving joins from a number of tables, that is very efficient with read-only transactions, can talk plain old SQL and can ingest tons of data in a flash. On top of this database, you need an application that can mediate and sanitize data, implement a query and data retrieval interface that leaves out human intervention and can produce reports tailored to the needs of state authorities. The end result is a compact system that utilizes low cost commodity storage (SATA drives) and a 2- or 4-way x86 server for data ingestion and retrieval, that is rated at ingesting ~30GB of data per day and at the same time satisfy all requirements for archiving, data encryption, compression and retrieval.