Saturday, April 2, 2011

ACID, what and why?

One of the fundamental features of modern databases are transactions and confirming to the ACID properties. A transaction is transformation of state from one consistent state to another that happens virtually in a single step.

ACID is an acronym for:

  • Atomic means all or nothing meaning when a transaction is executed, all the statements contained must all succeed or none of them can have any effect, there can be no partial failure where some parts of it succeed and others fail, this would put the database in an inconsistent state. Common example is with money transfers where it's obviously very important that if on one side, the transferred amount is  subtracted, on the other side, it is added, even if the power goes out somewhere in the middle.
  • Consistent means that the database moves from one consistent state to another with no possibility that some other client could read the data in some intermediate state.
  • Isolated means that transactions executed concurrently will not clash with each other, each executes in it's own space and if they wish to update the same data, they have to wait for one of them to complete.
  • Durable means simply that once transaction succeeds, the data is not lost, the modified data is available for future queries to read and modify.
These properties are obviously desirable but on distributed systems, they may be hard to enforce without sacrificing performance and in future entries we will learn how clients of Cassandra can choose to tune between the different merits of consistency, availability and partition tolerance.

These compose the CAP theorem which states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:
  • Consistency - all nodes see the same data at all times
  • Availability - node failures do not stop the rest of the system operating
  • Partition tolerance - the system can be split up to run on multiple nodes and can continue to function in spite of node and network failures
Cassandra has chosen availability and partition tolerance (AP) while relational databases are available and consistent (AC) and for example MongoDB, HBase, Google Bigtable chose CP.

Transactions can be implemented across several machines using two phase commit, but as this locks all affected resources, it makes transactions even more expensive then they already are making this quickly not an optimal solution for high performance applications.

In Cassandra, one can choose the balance between consistency and latency but more on this later.

Next we will take a closer look at Cassandra's architecture and design philosophies.

No comments:

Post a Comment