Sunday, April 3, 2011

Cassandra elevator pitch

This is a good elevator pitch for the Cassandra database in 50 words written by Eben Hewitt:

"Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault tolerant, tuneably consistent, column-oriented database that bases its distribution design on Amazon's Dynamo and its data model on Google's Bigtable. Created at Facebook, it is now used at some of the most popular sites on the Web".


Distributed and decentralized means that it is meant to be deployed on more then one, possible hundreds of common commodity hardware servers and it can run on them seemingly as a unified whole, application can talk to it like a simple single-box database. It is decentralized as every node of the database is equal, there are no master servers coordinating the rest and thus there is no single point of failure. Every node is equal and holds part of the data and share the total load equally.

Cassandra is elastically scalable as it's is easy to add new nodes as the application becomes more popular or remove some should the opposite happen. You can just add new nodes and Cassandra will begin to send it work without the application even knowing that anything has changed.

Replication of data among the nodes is used so if a node fails, the data is still available in some other nodes so it can be recovered. Users can tune this replication factor to choose between the overhead and peace of heart. In fact there are several replication strategies. The simple, rack unaware method simply places replicated data on the next node along the ring while if you are running a bigger deployment on several data centers/availability zones, you can use the rack-aware strategy in which case replica 2 is placed in the first node along the ring that belongs in another data center than the first; the remaining N-2 replicas, if any, are placed on the first nodes along the ring in the same rack as the first. This sort of replication means that Cassandra is highly available and since every node is equals and failing of any random one does not cause the whole system to become unstable, it is also fault tolerant.

Consistency means that whenever you read from the database, you get the most recently written value. Imagine buying something off eBay and someone else buying the exact same thing at almost the exact same moment. Who-ever got their request processed first should be able to buy the item while the other buyer should be notified that the product is already sold. This again seems like a logical thing to expect from a database, but this comes at a price in distributed databases as all update operations should be executed synchronously, meaning they must block, locking all replicas until the operation completes and locking in distributed systems is very expensive.

In many cases, you don't actually need this as for example when you have replies in a forum, it won't hurt anyone if right after changing a post, someone views the thread and still sees the unchanged entry. In Cassandra, one can choose in each case whether absolute consistency is needed or not and tune the value in between. This is why Cassandra can be called "tuneably consistent". This eventual consistency means that all updates propagate through the system to all replicas in some time. The optimistic approach to replication is not blocking the client and propagate changes in the background but this means we have to detect and resolve possible conflicts. Whether to do this during writing or reading the data is an important design choice and dictates whether the system is always readable or writable - again in the world of compromises, you can't have it all.

Cassandra chose to be always writable, deferring the complexity of conflict resolution to read operations. By tuneably consistent, it is mean that clients can choose how many replicas to block for during update operations or consult during read operations. You could always set this consistency level to the number of replicas, but this would impact performance and lose the point of using something like Cassandra in the first place. Setting it to below the replication factor means that operations can succeed even if some nodes are down and you can not expect to always read the most recently written value but as described earlier, this is not always necessary. Nice thing about Cassandra is that you can still choose to opt for consistency for critical operations that require this.

No comments:

Post a Comment