Democratizing Storage


Posted by Jason | Posted in DigiTar, Solaris | Posted on 04-21-2008


As a company that was heavily populated with Linux zealots, it’s been surreal for us to watch OpenSolaris develop for the past 3 years. While technologies like DTrace and FMA are features we now use everyday, it was storage that brought Solaris into our environment and continues to drive it deeper into our services stack. Which begs the question: Why? Isn’t DTrace just as cool as ZFS? Haven’t Solaris Containers dramatically changed the way we provision and utilize systems? Sure…but storage is what drives our business and it doesn’t seem to me that we’re alone.

Everything DigiTar does manipulates or massages messaging in some way. When most people think of what drives our storage requirements they think of quarantining or archiving e-mail. But when you’re dealing with messages that can make or break folks’ businesses, logging the metadata is perhaps the most important thing we do.

Metadata is flooding in every second. It’s at the center of everything from proving a message was delivered to ensuring we meet end-to-end processing times and SLAs. If we didn’t quarantine any more messages, we’d still generate gigabytes of data every day that can’t be lost. Without reliable and scalable storage we wouldn’t exist.

Lost IOPs, Corruption and Linux…oh my!

What got us using OpenSolaris was Linux’s (circa 2005) unreliable SCSI and storage subsystems. I/Os erroring out on our SAN array would be silently ignored (not retried) by Linux, creating quiet corruption that would require fail-over events. It didn’t affect our customers, but we were going nuts managing it. When we moved to OpenSolaris, we could finally trust that no errors in the logs literally meant no errors. In a lot of ways, Solaris benefits from 15 years of making mistakes in enterprise environments. Solaris anticipates and safely handles all of the crazy edge cases we’ve encountered with faulty equipment and software that’s gone haywire.

When it comes to storing data, you’ll pry OpenSolaris (and ZFS) out of our cold dead hands. We won’t deploy databases on anything else.

Liberation Day

While we moved to Solaris to get our derrières out of a sling, being on OpenSolaris has dramatically changed the way we use and design storage.

When you’ve got rock-solid iSCSI, NFS, and I/O multipathing implementations, as well as a file system (ZFS) that loves cheap disks…and none of it requires licensing…you can suddenly do anything. Need to handle 3600 non-cached IOPs for under $60K? No problem. Have an existing array but can’t justify $10K for snapshotting? No problem. How ‘bout serving line-rate iSCSI with commodity storage and CPUs? No problemo.

That’s the really amazing thing about OpenSolaris as a storage platform. It has all of the features of an expensive array and because it allows you to build reliable storage out of commodity components, you can build the storage architecture you need instead of being held hostage by the one you can afford. But features like ZFS don’t mandate that you change your architecture. You can pick and choose the pieces that fit your needs and make any existing architecture better too.

So how has OpenSolaris changed the way DigiTar does storage? For one thing, it’s enabled us to move almost entirely off of our fibre-channel SAN. We get better performance for less money by putting our database servers directly on Thumpers (Sun Fire X4500) and letting ZFS do its magic. Also, because its ZFS, we’re assured that every block can be verified for correctness via checksumming. By doing application-level fail-over between Thumpers, we get shared-nothing redundancy that has increased our uptime dramatically.

One of the things that always has bugged me about traditional clustering is its reliance on shared storage. That’s great if the application didn’t trash its data while crashing to the ground. But what if it did? To replicate the level of redundancy we get with two X4500s, we’d have to install two completely separate storage arrays…not to mention also buy two very large beefy servers to run the databases. By using X4500s, we get the same reliability and redundancy for about 85% less cost. That kind of savings means we can deploy 6.8x more storage for the same price footprint and do all sorts of cool things like:

  • Create multiple data warehouses for data mining spam and mal-ware trends.
  • Develop and deploy new service features whenever we want without considering storage costs.
  • Be cost competitive with competitors 10x our size.

Whether you’re storing pictures of your kids, or archiving business critical e-mail (or anything in between), it seems to me that being able to store massive amounts of data reliably is as fundamental to computing today as breathing is to living. OpenSolaris allows us as a company to stop worrying about what its going to cost to store the results of our services, and focus on what’s important: developing the services and features themselves. When you stop focusing on the cost of “air”, you’re liberated to actually make life incredible.

I could continue blathering about how free snapshotting (both in terms of cost and performance hit) can allow you to re-organize your backup priorities, or a bunch of other very cool benefits of using OpenSolaris as your storage platform. But you should give it a shot yourself, because OpenSolaris’ benefits are as varied and unique as your environment. Once you give it a try, I think you’ll be hard pressed to go back to vendor lock-in…but I’m probably a bit biased now.  I think you’ll also find an community around OpenSolaris that is by far the friendliest and most mature open source group of folks you’ve ever dealt with.

Comments (3)

Dude! I can vouch for the coolness of ZFS. I was an early adopter way back in September 2006. Look at my work described in Slide 17 of the "ZEBRA, ZAMBONI, ZEN & THE ART OF ZFS" paper presented at SAS Global Forum (thanks to Maureen for incorporating my work into her presentation). I also had a situation wherein we had silent in-flight corruption of data in the FC due to firmware issues. Good that I had ZFS to tell me I had bad data; Bad that I didn't mirror (or RAIDZ) the drive relying upon the SAN to provide me that cover.

All I want now is lots of drives and a good connection, 'nuff of the SAN features. I'll use ZFS to deal with my needs.

ZFS rocks. We build Solaris's virtual zones each on it's own zfs. Then we send snapshots to a remote location, recieve on the other end to replicate the servers for backup. And since the "-F" option, we can do incremental snapshots and receive those on the remote server to save lots of bandwidth. And I wrote scripts that automate the whole process. And it's all done live. We have not had a single data corruption since switching to ZFS. This was not the case on ZFS and Metadb. ZFS is simply the best.

Thanks so much for the excellent post on your switch to an open SAN. I'm monitoring the developments with Sun's storage, and hoping to use it for an upcoming project.

The main sticking point for me is the failover between the primary and secondary storage arrays. You mentioned here that you do application-level failover.

Can you elaborate a bit more on your failover scheme: How the two thumpers synchronize with each other, and how a failure is detected and a failover takes place?

Write a comment