<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Cloud-scale DBs in the cloud&#8230;just a quickie</title>
	<atom:link href="http://blogs.digitar.com/jjww/2010/03/cloud-scale-dbs-in-the-cloud-just-a-quickie/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.digitar.com/jjww/2010/03/cloud-scale-dbs-in-the-cloud-just-a-quickie/</link>
	<description>thoughts &#38; musings</description>
	<lastBuildDate>Sat, 24 Jul 2010 02:26:30 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Jason</title>
		<link>http://blogs.digitar.com/jjww/2010/03/cloud-scale-dbs-in-the-cloud-just-a-quickie/comment-page-1/#comment-17192</link>
		<dc:creator>Jason</dc:creator>
		<pubDate>Sat, 24 Jul 2010 02:24:41 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.digitar.com/jjww/?p=134#comment-17192</guid>
		<description>You&#039;re right in your comments...my point is the hype is way overblown as to where that tipping point of outgrowing an RDBMS is. Also, even accounting for power the markup on the smaller servers is extremely high. </description>
		<content:encoded><![CDATA[<p><code>You&#8217;re right in your comments&#8230;my point is the hype is way overblown as to where that tipping point of outgrowing an RDBMS is. Also, even accounting for power the markup on the smaller servers is extremely high.</code></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alex Leverington</title>
		<link>http://blogs.digitar.com/jjww/2010/03/cloud-scale-dbs-in-the-cloud-just-a-quickie/comment-page-1/#comment-16386</link>
		<dc:creator>Alex Leverington</dc:creator>
		<pubDate>Sat, 19 Jun 2010 11:43:47 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.digitar.com/jjww/?p=134#comment-16386</guid>
		<description>I believe your analysis is accurate and it makes sense: If you have beefy data processing needs, it&#039;s more efficient to use beefy servers. In answer to your query about why &quot;cheap&quot; (in bulk) servers cost more, it&#039;s because they actually cost more! I&#039;ll explain...

For 28 commodity servers you&#039;re looking at powering 2 racks (@110v) in a climate controlled room. At 3amps, the electric bill to cool and power the servers would be around $1300/mo. What consumes 3amps when a 100W processor is &lt; 1 amp?

The power consumed by servers are for the CPU, Fans, and Disks. When you take all those disks and cram them into one machine, based on your example @28 servers, you&#039;re eliminating 104 CPU cores, 26 CPU Fans, and probably 104 system fans -- bringing power usage down to about $48/mo rather than $1300/mo.

If all you want to do is store 11TB of data, it&#039;s not a matter of beefy server + rdbms or many small servers + cassandra. The benefit of those 28 servers + cassandra is that cassandra will not need more than a single core on each of those servers and you can probably use 2-3 of the other cores for distributing computations. If you don&#039;t need that kind of process distribution, chances are you can do all your processing with an RDBMS and you&#039;re better off with one beefy server. In most cases, if someone was spanning their 11TB of data across 28 servers, it&#039;s because they can optimally split up their dataset into 14 segments.

In a nutshell, most folks who expand out from the &quot;beefy database&quot; infrastructure do so because their data processing needs exceed the CPU, memory, or network capacity of one server. In most cases, the limitation of a beefy server isn&#039;t the amount of storage available but the latency of random seek access and ability of the system to quickly store+index new data.</description>
		<content:encoded><![CDATA[<p><code>I believe your analysis is accurate and it makes sense: If you have beefy data processing needs, it&#8217;s more efficient to use beefy servers. In answer to your query about why &#8220;cheap&#8221; (in bulk) servers cost more, it&#8217;s because they actually cost more! I&#8217;ll explain&#8230;</p><p>For 28 commodity servers you&#8217;re looking at powering 2 racks (@110v) in a climate controlled room. At 3amps, the electric bill to cool and power the servers would be around $1300/mo. What consumes 3amps when a 100W processor is < 1 amp?</p><p>The power consumed by servers are for the CPU, Fans, and Disks. When you take all those disks and cram them into one machine, based on your example @28 servers, you&#8217;re eliminating 104 CPU cores, 26 CPU Fans, and probably 104 system fans &#8212; bringing power usage down to about $48/mo rather than $1300/mo.</p><p>If all you want to do is store 11TB of data, it&#8217;s not a matter of beefy server + rdbms or many small servers + cassandra. The benefit of those 28 servers + cassandra is that cassandra will not need more than a single core on each of those servers and you can probably use 2-3 of the other cores for distributing computations. If you don&#8217;t need that kind of process distribution, chances are you can do all your processing with an RDBMS and you&#8217;re better off with one beefy server. In most cases, if someone was spanning their 11TB of data across 28 servers, it&#8217;s because they can optimally split up their dataset into 14 segments.</p><p>In a nutshell, most folks who expand out from the &#8220;beefy database&#8221; infrastructure do so because their data processing needs exceed the CPU, memory, or network capacity of one server. In most cases, the limitation of a beefy server isn&#8217;t the amount of storage available but the latency of random seek access and ability of the system to quickly store+index new data.</code></code></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jason</title>
		<link>http://blogs.digitar.com/jjww/2010/03/cloud-scale-dbs-in-the-cloud-just-a-quickie/comment-page-1/#comment-14749</link>
		<dc:creator>Jason</dc:creator>
		<pubDate>Thu, 18 Mar 2010 22:48:09 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.digitar.com/jjww/?p=134#comment-14749</guid>
		<description>The question is, how do you get 11TB of highly available database. And as noted in the article, the servers used at the cloud provider were dedicated servers not cloud VMs per se. If you have a large dataset, something like Cassandra will cost you disproportionally more money than using MySQL vertically scaled and then partitioned (when deployed at a cloud provider).  Alternately, you could vertically scale and use 3 beefy Cassandra nodes, but that obviates the point of the architecture. 

My real goal with this posting was to get cloud providers to realize they&#039;ve got to cut the costs on their 1U servers if things like Cassandra are going to be affordable when deployed in the cloud with multi-TB datasets. The 1U costs are way out of line with what the same cloud provider charges for the beefier box. If you bundle colo space/power/bandwidth into the costs of the 1U and 4U servers deployed on your own you&#039;re still at a 1/5 ratio for 1Us vs 1/11 ratio for 4Us (ratio being calculated as (cost per month at cloud provider/cost of hardware itself plus colo/power/bandwidth). Ideally, 1U or 4U beefy the ratio would be the same.</description>
		<content:encoded><![CDATA[<p><code>The question is, how do you get 11TB of highly available database. And as noted in the article, the servers used at the cloud provider were dedicated servers not cloud VMs per se. If you have a large dataset, something like Cassandra will cost you disproportionally more money than using MySQL vertically scaled and then partitioned (when deployed at a cloud provider).  Alternately, you could vertically scale and use 3 beefy Cassandra nodes, but that obviates the point of the architecture. </p><p>My real goal with this posting was to get cloud providers to realize they&#8217;ve got to cut the costs on their 1U servers if things like Cassandra are going to be affordable when deployed in the cloud with multi-TB datasets. The 1U costs are way out of line with what the same cloud provider charges for the beefier box. If you bundle colo space/power/bandwidth into the costs of the 1U and 4U servers deployed on your own you&#8217;re still at a 1/5 ratio for 1Us vs 1/11 ratio for 4Us (ratio being calculated as (cost per month at cloud provider/cost of hardware itself plus colo/power/bandwidth). Ideally, 1U or 4U beefy the ratio would be the same.</code></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jonathan Ellis</title>
		<link>http://blogs.digitar.com/jjww/2010/03/cloud-scale-dbs-in-the-cloud-just-a-quickie/comment-page-1/#comment-14746</link>
		<dc:creator>Jonathan Ellis</dc:creator>
		<pubDate>Thu, 18 Mar 2010 16:12:32 +0000</pubDate>
		<guid isPermaLink="false">http://blogs.digitar.com/jjww/?p=134#comment-14746</guid>
		<description>You&#039;re not getting apples to apples here.  The amount of disk is similar but you have an order of magnitude more CPU and RAM in the scale-out hardware, which is huge for many workloads.

That said, I also blogged recently about why cloud VMs aren&#039;t usually the right fit vs bare metal.</description>
		<content:encoded><![CDATA[<p><code>You&#8217;re not getting apples to apples here.  The amount of disk is similar but you have an order of magnitude more CPU and RAM in the scale-out hardware, which is huge for many workloads.</p><p>That said, I also blogged recently about why cloud VMs aren&#8217;t usually the right fit vs bare metal.</code></p>
]]></content:encoded>
	</item>
</channel>
</rss>
