Introducing Kyoto Cabinet

Most developers tend to use a general purpose SQL databases for all kinds of data storage, which is okay most of the times but can add great overhead if you have a massive amount of data and when you want to minimise the deployment overhead. Different data storage databases exist to solve different programming problems, as developer, you should focus on the most efficient and scalable solution that fits your programming problem without taking the SQL database as a granted/generic solution.

I've started using Tokyo cabinet 2 years ago in one of the most sophisticated dispersed storage solutions I've ever seen/worked-on. I was astonished by the capabilities and performance of such lightweight database, developed by a Japanese engineer called Mikio Hirabayashi who also wrote QDBM previously. A couple of years later he and his brother (obviously) - Hatsuki Hirabayashi - Launched Kyoto cabinet, which is better than tokyo cabinet in almost every aspect. Those two brothers formed a startup called FAL Labs.

Koyoto cabinet is a lightweight concurrent key-value database that has some amazing performance characteristics, it can store one million records in 900 ms (0.9s) in a hash database (HDB) and 1100 ms (1.1s) in a B+ tree (BDB) database with minimal storage overhead of 4 bytes for BDB and 16 bytes for HDB!

Kyoto cabinet is a standalone file-based database, but if you want to use is as a server-client, take a look at Kyoto Tycoon.

Kyoto cabinet has bindings for almost every famous programming language out there, like, C++, C, Python, Ruby, Java, Perl, and Lua. Best of all, Kyoto cabinet is FREE and is licensed under the GNU GPL license.

In this article I will walk you through Kyoto Cabinet Python API which is pretty easy for newbies and will give you the sense of how things work.

Database types

Kyotocabinet supports two types of data storage models, Hash (HDB) and B+ Tree. Every type has pros and cons and can be used in certain situations. The core of the database is the same but those two different algorithms define the storage and retrieval algorithms.

Hash database is like having a hash-table stored on your hard drive, every key is hashed and indexed for O(1) time complexity access to data, the order of the keys is not maintained so don't expect to retrieve the data in the same order you've inserted them.

B+ tree database is like a traditional B+ tree data structure that gives you O(logN) query time to your data, keys are stored in sorted natural order (natural sorting), best used if you want sequential access to your keys.

Walkthrough

Install kyoto cabinet using your package manager (apt, yum, macports, etc.) and install the python bindings, then launch iPython and start playing with as follows:

import kyotocabinet as kc  
db = kc.DB()  
db.open('/tmp/soliman.kch', kc.DB.OWRITER | kc.DB.OCREATE)  
db.set('FirstName', 'Ahmed')  
db.set('LastName', 'Soliman')

print db.get('FirstName')  
>> 'Ahmed'

The extension of the database filename is very important, in fact, the whole name is very important, you control the database type (BDB or HDB) using the filename and extension kch is Kyotocabinet Hash, if you changed it to kct Kyotocabinet Tree, then you will be using B+ tree database. Also tricks like using '-' or '+' as database name will result in memory-based database, '-' means that it will be hash-based and '+' means B+ Tree-based.

As you can see I've created the database and added two keys and values and retrieved using the key. You can also traverse records using "Cursor". And you can try out the B+ tree database like that:

import kyotocabinet as kc  
db = kc.DB()  
db.open('/tmp/soliman.kct', kc.DB.OWRITER | kc.DB.OCREATE)  
db.add('Key1', 'Value1')  
db.add('Key2', 'Value2')  
db.add('Key3', 'Value3')

db.get('Key2')  
>> 'Value2'

len(db)  
>> 3
cur = db.cursor()  
cur.jump()  
cur.next()  
>> 'Key1'
cur.next()  
>> 'Key2'
cur.next()  
>> 'Key3'

Conclusion

Kyoto cabinet is the fastest key-value lightweight database around and supports many programming languages, it's the best choice for problems that fit into the key-value data model. It also has very low record overhead - 4/16 bytes — and remarkable internal design. I would definitely suggest this database to everybody to start playing with.

comments powered by Disqus