Posted by Jason | Posted in DigiTar, NoSQL, Software Development | Posted on 03-07-2011
- secondary indexes
- multi-data center replication (would be great if Basho would open source this)
Both Riak and CouchDB excel at storing records under a primary key, but sometimes you need to know a different axis on the data. For example, all of our records are stored in JSON dictionaries and at times we want to know all the records that match a particular field in the dictionaries. That’s the situation we’re in for a new service we’ll be standing up soon. We wanted to generate coupon codes that customers could redeem for service as an alternative to providing a credit card. The coupon codes can be sold through affiliates, so one of the axes we’ll need to know in addition to the coupon code is what are all the coupon codes belonging to affiliate ID X?
One approach (without using secondary indexes) would be to try and encode both the code and the affiliate ID in the record’s key (e.g. prefix_code_affiliate_id). The main issue with that approach is that some of our access patterns don’t have access to the affiliate ID (e.g. a user signing up for service), they only know the coupon code. So we need fast lookup of the record based on the code alone. That pretty much eliminated map/reducing for the coupon code, and firmly established the code alone as the right choice for the data key. The perfect solution would be a secondary index on the “affiliate_id” of the JSON dictionary…in other words, a map of affiliate ID to the data. Normally, this is where we’d turn to CouchDB’s views and call it a day. But we’re planning on having millions of coupons in the system with thousands of parallel accesses that need to be evenly loaded across the datastore…not a scenario where CouchDB excels. Riak would be the perfect choice…except there’s no native secondary indexes.
Doing indexes in Riak
There’s a couple ways you can do your own indexes in Riak. The simplest approach is to just create an index key of the form
idx_<field_name>_<field_value> and shove in a JSON list of the keys containing matching records. What you’ll run into very quickly is multiple clients trying to update that index key with new records, and overwriting each other. Since Riak keeps multiple versions in the event of a conflict, you can code your clients to auto-merge conflicted versions into one master list and re-post the index. But…that puts a lot of maintenance logic in your clients, and in the event one of those updates is deleting a key from the index, the merge process can put the deleted key back into the index.
Since we don’t want index management to have that many moving parts we came up with a different approach:
- For every field being indexed on a particular record, create a new separate index key in a dedicated index bucket.
- Store the indexed field’s name in the bucket name for the index key and store the indexed value and data key’s name in the name of the index key.
- MapReduce to get a list of matching index keys for any particular question by iterating over the keys of an index’s bucket and splitting the index key name apart to analyze the value.
The format of an index key becomes (we use key prefixes to namespace different data key types in the same bucket):
Bucket Name: idx=<field_name>=<data_key_prefix>
Key Name: <data_key_name>/<field_value>
Key Value: empty
The immediate advantage of having an index key for every indexed field of a data key is the reduced chance of write conflicts when maintaining the index (you pretty much eliminate the chance a deleted index key is going to get resurrected). Asking the question “How many coupons have a redeemed count < 50?” becomes a simple MapReduce job that iterates over the idx=redeemed_cnt=coupon index bucket to find index keys where the field_value is < 50.
You might have noticed that we don’t store any data in the value of the index key… That’s on purpose, because it allows us to leverage a new feature of Riak 0.14…key filters for MapReduce jobs.
Luckily the smart folks at Basho gave us a new tool to do just that with key filters. By encoding the indexed value in the key name we can tell Riak via a key filter to:
- Tokenize the index key name using “/” as the separator.
- Look at the second token after the split (i.e. the indexed value).
- Treat that token as an integer.
- Only give the index key to the MapReduce job if the integer value is < 50.
In fact, with key filters we actually don’t have to write our own MapReduce phases to answer this question anymore. All we have to do is construct the key filter, and tell Riak to use the “identity reduce” reduce phase that’s built in (skip the map phase entirely). What we’ll get back is a list of index keys whose indexed value is < 50. We can then split those index key names in our client to get the key names of the data keys they map to.
Rubber meets the road…
So what does the performance look like with all of this? We wrote a couple of tests using Twisted Python to benchmark loading 100,000 coupons into CouchDB and Riak and then asking both how many of those coupons had a redeemed count < 50. Here’s a legend for what the different test titles mean:
- CouchDB 1.0.2 (cold view): The amount of time it takes the view (index) to answer the question the first time the view is queried. This number is important because CouchDB doesn’t build the view until you query it the first time. From then on it just incrementally updates the view with changed values.
- CouchDB 1.0.2 (computed/warm view): Amount of time it takes the view to answer the question on subsequent queries after the view has been computed initially.
- Riak 0.14.1 (Raw MapReduce) 1-node: No indexes used as described above. A brute force MapReduce job that iterates over the data keys and examines the reduced_count field in the JSON. 1-node Riak “cluster”.
- Riak 0.14.1 (Indexed w/ Raw MapReduce) 4-node: Same as “Indexed w/ RawMapReduce” above except done on a 4-node Riak cluster.
- Riak 0.14.1 (Indexed w/ Key Filter MR) 4-node: Same as “Indexed w/ Key Filter MR” above except done on a 4-node Riak cluster.
Before I show the numbers, you’d probably like to know what the test setup looked like. Each node was a SoftLayer Cloudlayer server with these specs (if you haven’t tried them, SoftLayer is really a phenomenal provider):
- 1x 2.0GHz Xeon CPU
- 2GB RAM
- 25GB HDD
- Gigabit NICs
- Ubuntu 10.04.1 64-bit
- Dallas 05 Datacenter
- CouchDB 1.0.2 was built from source.
- Riak 0.14.1 was installed from the .deb available from Basho.
- Before each type of test the servers were rebooted to clear the filesystem cache.
- Tests were run from a 5th node not running Riak or CouchDB. For the 4-node tests, the 5th node ran HAProxy 1.4.8 to round-robin client connections amongst the Riak nodes.
So without further ado…the numbers:
|Generate Keys (secs)||Show Keys w/ redeemed_count < 50 (secs)|
|CouchDB 1.0.2 (cold view)||495||74|
|CouchDB 1.0.2 (computed/warm view)||495||11|
|Riak 0.14.1 (Raw MapReduce) 1-node||358||82|
|Riak 0.14.1 (Indexed w/ Raw MapReduce) 1-node||692||65|
|Riak 0.14.1 (Indexed w/ Key Filter MR) 1-node||692||56|
|Riak 0.14.1 (Indexed w/ Raw MapReduce) 4-node||1025||40|
|Riak 0.14.1 (Indexed w/ Key Filter MR) 4-node||1025||34|
Or if you’re more visual like me:
Analyzing the outcome
One thing that’s very clear is how fast computed views can be in CouchDB (11 seconds flat is nothing to shake a stick at). However, what did we learn from the Riak numbers?
- Indexed insertion is 91% slower than storing just the key data.
- MapReduce with indexes is 20% faster than MR on the data keys alone.
- MapReduce with indexes and key filters is 32% faster than MR on the data keys alone.
- Adding Riak nodes substantially reduces query time. Adding 3 more nodes speeds up queries by 40%.
What’s a little puzzling is why insertion of keys on a 4-node cluster is 40% slower than a 1-node cluster? I had expected insertion to be the same speed or 25% faster. The reason I’d expected this is because Riak was set to use a write n-value of 3…meaning for every key inserted 3 copies were stored throughout the cluster. Accounting for coordination latency on a 3-node cluster, I’d expect almost the same insertion speed as a 1-node Riak instance. With an extra node in the cluster, I’d expect slightly faster performance since only 3 nodes out of the cluster are engaged in any given write.
Regardless, the query performance proves Riak is a good choice for our use case. While 34 seconds to answer the question is slower than the 11 seconds it took CouchDB, it’s clear that as we scale the cluster our query performance will scale with the size of our dataset. Providing we can find a solution for the 50% slower insertion speed, Riak will definitely be our datastore of choice for this project. Once again, Riak is incredibly impressive at how well it handles large data sets and how adept it’s simple toolset is at answering complex questions.
Where we go from here…and nagging questions
The indexed key approach has worked so well for us that we’re currently writing a library to layer on top of txRiak to transparently handle writing/updating indexes. I’ll put up another blog entry on that once it’s posted to Github (there’s a few issues we have to handle like escaping the separators, and we intend to use Riak’s Links to provide an alternate connection between index and data keys). Even more exciting is the news that Basho is currently working on adding native secondary index support to Riak. No news on how that will take shape, but I expect it will blow the performance of our homegrown indexes out of the water. I think built-in indexes are a cleaner more maintainable approach. Maintaining indexes in the client puts a lot of pressure on the clients not to screw up the index accidentally…especially if you’ve got code written in multiple languages accessing that data.
The only real nagging question right now was an issue we saw when we attempted to add a 5th node to the Riak cluster. I had originally intended to do an analysis of how much query performance improved with each node added. However, when the 5th node was added to the cluster it took close to 1 hour for Riak to fully redistribute the keys…and even then 3 of the 5 nodes showed that they were still waiting to transfer one partition each to another node. When we attempted to run the MapReduce index query against the newly expanded cluster, we received MapReduce errors that Riak couldn’t find random index keys as it attempted to pass these “missing” keys into the map phase. I suspect the culprit maybe some “failed node” testing we did before adding the 5th node.
Overall, the takeaway for me is that Riak is a phenomenally flexible data store, and that just because it’s missing a feature doesn’t mean you should shun it for that workload. More often than not, a little thought and chaining together of Riak’s other very powerful tools will give you the same result. To me, Riak vs CouchDB (or vs. SQL) is like a RISC chip vs. a CISC chip. It may not have one complex instruction you need, but you can build that instruction out of much simpler ones that you can run and scale at twice the speed.