Monday, September 28, 2009

Notes of Key-Value Stores

Key-Value Stores:


It’s a storage system that stores values, indexed by a key. [1]

Sometimes, a key-value store works as a under layer storage system for a data management system.


When to consider Key-Value Stores:


- You need something that can query simple data faster than a relational system can.

- There are scaling requirements that are difficult to meet with a relational system.

- You want to avoid tying an application to a requirement for a relational system that may well have its own maintenance needs over time.

- You just want something simpler than a relational database.

- ……

In all of these cases, a key-value store may be just the tool for you.


Typical data structure:


Hash

Hash based key-value stores have fast lookup and update speeds, and require that keys be unique.

Lookup in a hash based tree is limited to key-only lookup.


For example:

To an object storage system which stores billions of photos, its applications would only query/get a photo by its unique key/id, like Facebook’s Haystack. [2]


The general API: Put/Get/Delete.

The important issues of hash based key-value stores are to limit the size of in-memory index/metadata, and to implement the distribution.


Tree (e.g. B-tree or B+tree)

Tree based stores generally allow multiple identical keys, and because tree based structures (like the b-tree) are ordered, they allow you to query individual keys, as well as ranges of keys.

The drawback to tree based stores is that those structures are generally slower than simple hashes.


For example:

A log aggregation system that stores messages keyed by timestamp, but which uses a hash based store, will be troublesome to query in a useful way, despite the speed of lookup, because you can only lookup specific timestamps. A system that uses a b-tree based store may be marginally slower for individual queries, but by permitting lookup based on a range of timestamps, can easily query the records of interest. This more than offsets the difference in basic query speeds. BigTable is based on B-Tree based key-value store. [3]


The general API: Put/Get/Delete/Scan.

The important issue of tree based key-value stores is to implement distributed tree structures.


References:

[1] Engine Yard, Key-Value Stores in Ruby (Key-Value Stores Part 1) http://www.engineyard.com/blog/2009/key-value-stores-in-ruby/

[2] Facebook, Needle in a haystack: efficient storage of billions of photos, http://www.facebook.com/note.php?note_id=76191543919

[3] Data Management Projects at Google-200803, http://www.cs.washington.edu/homes/mjc/papers/dataprojects-google-sigmodrecord08.pdf

No comments:

Post a Comment