Cloud-scale DBs in the cloud…just a quickie

4

Posted by Jason | Posted in DigiTar, Software Development | Posted on 03-17-2010

Tags: , , ,

Just a quick set of thoughts…do cloud-scale DBs save money because they’re based on commodity/cheap servers? Tonight I did some rough back-of-the-pad calculations, and was kind of surprised…

Let’s assume we’ve got an 11TB working set of data, how could we store this redundantly?

(cloud servers in these examples are dedicated servers at a cloud provider)

Option 1: Two beefy storage servers running MySQL in a master/slave config

  • CPU: 4-cores of your favorite CPU vendor
  • RAM: 16GB
  • HDDs: 48x 250 GB SATA
    • Lose 2 for mirrored boot, and 2 for RAID-6 parity
  • Cost:
    • Buy Your Own Hardware (Sun X4500): $50,000 for the pair
    • Host It in the Cloud (SoftLayer): $4,700/month for the pair

Option 2: 28 commodity servers (2 replica copies for each piece of data) running HBase or Cassandra

  • CPU: 4-cores of your favorite CPU vendor
  • RAM: 4GB
  • HDDs: 4x 250 GB SATA
    • Lose 1 for RAID-5 parity (we’ll mingle boot data and data data on the same drive pool)
  • Cost:
    • Buy Your Own Hardware (Dell R410): $43,300 for set of 28
    • Host It in the Cloud (SoftLayer): $12,000/month for the set of 28

Option 3: 42 commodity servers (3 replica copies for each piece of data) running HBase or Cassandra

  • CPU: 4-cores of your favorite CPU vendor
  • RAM: 4GB
  • HDDs: 4x 250 GB SATA
    • Lose 1 for RAID-5 parity (we’ll mingle boot data and data data on the same drive pool)
  • Cost:
    • Buy Your Own Hardware (Dell R410): $64,900 for set of 42
    • Host It in the Cloud (SoftLayer): $18,000/month for the set of 42

Now the issue here that surprised me isn’t the raw cost differential between stuffing your own hardware in your colo or using a cloud provider. And the other thing is, I’m not picking on SoftLayer…Rackspace and Voxel all work out to the same cost scaling as SoftLayer (and in the case of the other two vendors worse).

What surprised me:

  • When you buy your own hardware, “cloud-scale” databases do cost you less (~$7K) than buying beefy storage servers and running MySQL for the same data set.
  • However, when you are at a cloud provider, using cloud-scale databases on “cheap” hardware costs you 3x more than using beefy storage cloud servers running MySQL.

As I said, I’m not comparing the cost of running Option 1 on your own hardware vs. Option 1 at a cloud provider. Yes those costs are more at the cloud provider, but it’s to be expected (they’re bundling in bandwidth, colo, power, and most importantly people to manage the hardware and network).

What’s stunning is that beefy servers at a cloud provider are much more cost efficient. Beefy cloud servers cost you roughly 1/15 of the cost of the hardware every month. Whereas, “cheap” commodity cloud servers cost you roughly 1/3 of the cost of the hardware every month. Much higher mark up on the cheaper volume servers.

Please comment and correct me if I’m wrong in my analysis…I would actually like to be.