Cassandra in PHP: May 2011

Lightspeed-php performance
The still under heavy development Lightspeed-PHP has been designed from ground up to be lightweight, easily understandable and extendable, full-featured yet web-oriented high performance PHP framework that plays well with the Hiphop PHP to C++ code transformer-compiler and is well suitable for distributed deployments.

It's now maturing and mostly ready for prime-time, taking on it's first projects. There is thorough auto-generated documentation from the comments but proper manual will be created once it has driven an actual project and become stable enough.

A sample application was created on it - a simple blog where people can register and add posts. This was used for the performance and scaling testing and it uses the Cassandra-driven sharding architecture discussed so far.

The more code and heavy-lifting there is in the code, the more Hiphop shines. Below is a graph of the example application front-page benchmark, comparing vanilla-PHP with Hiphop-driven.

As can be seen, Hiphop performs nearly ten times better at higher loads. Similarly the 90% render times show greatly improved performance.

As the frontpage is very lightweight, it's not a good indication of the real performance. To test this, the system simulated registering a used that required several roundtrips to Cassandra, MySQL shard and Memcached. Graphs of requests per second and 90% time for this comparison come below.

Again, the results given by Hiphop-PHP driven application are impressive, translating into great savings on hardware of heavily-loaded systems.

Scaling testing
To test scaling, the cluster in the Amazon Cloud was doubled so now it contained two of everything - webservers, Cassandra and Memcache instances and RDS databases. With a testing machine and load-balancer, this basically meant ten machines. Graphs for the results follow.

As can be seen, the requests-per-second almost doubled while the 90% time lowered too. As neither of them alone is sufficient indication for the performance comparison, I devised a method that I called performance index that is simply the division of requests-per-second by the 90% time as this gives a bigger index for higher req/s and lower 90% time that are both valued qualities. Putting this data on a graph gives the following:

As the req/s stat almost doubled and the 90% time also decreased, the increase of performance seems to be actual higher than doubling, ~350% to be precise. I'm not entirely sure what are the implication of this but it seems great.

This testing has shown the good performance of Lightspeed-PHP framework, especially when using the Hiphop-PHP transformer. It also displayed that the Cassandra-driven sharding architecture works and should indeed provide high and linear scalability.

Detailed analysis
This test is more thoroughly explained with all the numbers in a PDF document that you can download here.

Design
The main design objectives implementing the database sharding described in the previous post on PHP was to make it pluggable and unobstrusive meaning that the rest of the data manipulation code is loosely coupled to it making using sharding optional and as painless as possible.

The implementation uses strategy pattern enabling plugging in different sharding backends (relational database and cassandra based strategies implemented) and also sharders meaning that one might use cassandra to shard relational databases, or perhaps filesystems instead.

Following is a class diagram of the core of the sharding implementation showing the various entities and their relationships.

Shard represents one of the shards that the sharder maps keys to and has a name, belongs to a group, knows its capacity and the number of keys it's currently storing. As shards are not backend-specific, it does not have explicit fields for database DSN or username-password but rather a single string properties field that stores whatever information needed to connect to the shard based on its type. The Sharder implementation knows how to parse this and access the actual shard (that may be a relational database in main usecase).

A ShardGroup is, as the name suggests, a group of shards and has a name and the properties needed to set up a new shard on a new host. Thinking of it now, the "getSetupSql()" method is likely to be renamed to something more generic as all shards may not be databases.

The ShardSchema is the main manager of shards and shard keys and implements managing groups, shards, creating and resolving keys etc. It uses the strategy pattern so it can be implemented on various backends.

The Sharder is a higher level proxy for ShardSchema that has the main purpose of adding a cache layer on top of the actual schema as operations like resolving an identifier to a key and a key to a shard can easily be cached eliminating hitting the sharding backend for every request and use the faster cache instead. It actually implements three levels of cache for most operations:

Consecutive requests for the same lookups during a single request are returned from internal hashmap so it's very cheap - not even hitting the cache.
For a short period of time, lookups are served from local cache (APC by default), that is very fast but not distributed.
For a longer period of time, lookups hit the global distributed cache that is the same for all webservers (Memcache by default).

Cassandra setup

The relational schema explained in the previous post was mainly converted to Cassandra's domain. As Cassandra is not good for numeric auto-increment indices common in relational databases, I switched to using named identifiers instead for both Cassandra and the earlier PDO implementation giving rise to a major rewrite - good thing I had covered the code 100% with unit-tests making the refactoring easier.

Just for reference, the new relational sharder schema is now as follows:

CREATE TABLE IF NOT EXISTS `alias` (

`group` varchar(20) COLLATE utf8_unicode_ci NOT NULL,

`identifier` varchar(60) COLLATE utf8_unicode_ci NOT NULL,

`key_id` int(10) unsigned NOT NULL,

PRIMARY KEY (`group`,`identifier`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS `group` (

`name` varchar(20) COLLATE utf8_unicode_ci NOT NULL,

`setup_sql` text COLLATE utf8_unicode_ci NOT NULL,

PRIMARY KEY (`name`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS `key` (

`id` int(11) unsigned NOT NULL AUTO_INCREMENT,

`shard` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,

PRIMARY KEY (`id`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=876 ;

CREATE TABLE IF NOT EXISTS `shard` (

`name` varchar(20) COLLATE utf8_unicode_ci NOT NULL,

`group` varchar(20) COLLATE utf8_unicode_ci NOT NULL,

`keys` int(10) unsigned NOT NULL,

`capacity` int(10) NOT NULL,

`properties` varchar(1024) COLLATE utf8_unicode_ci NOT NULL,

PRIMARY KEY (`name`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

For Cassandra, I managed to keep the schema rather similar where the primary keys became row keys and on using multiple fields for primary key, they are both present in Cassandra too (aggregate key).

Above is the Cassandra schema and as I don't know of proper standards of how to describe it in class-diagram'ish notation, I invented my own. As you can see a alias points to a key which in return points to the shard where the data of given key lives on and shards belong to groups - very simple really.

With [IdxGroup] behind the "group" field of "shard" column family, I mean that this is using a new feature of Cassandra called secondary index, that means that I do not have to create a "materialized view" of shards by group myself but Cassandra does it for me and allows me to do basic "where" queries so for example, I can fetch a list of shards by group name and exactly that is required by my implementation. Secondary indexes make using Cassandra for more general-purpose database much easier so it's a feature I'm very excited about.

The additional column family "counter" comes from the fact that I need something like auto-increment for key identifiers but Cassandra does not have it built in so I have my own counter that I can increment and get value of by name. I am actually lying a bit as latest version of Cassandra has counters, but this does not seem to be implemented in the PHP library phpcassa that I'm using. I will definitely come back to this as using a counter table this way is very bad as it's subject to race conditions where I may pick a counter value and before using and incrementing it, another thread takes it too.

Next steps

I actually managed to get this working with the Hiphop-PHP compiler too requiring some modifications to the underlying Thrift library so I'm pretty much ready to start testing the performance of such a setup. Hoping to get some cloud resources and set it through its paces with some JMeter tests and an actual cluster of webservers, Cassandra instances, shard databases and memcache and see whether and how well it scales.

Cassandra in PHP

Tuesday, May 24, 2011

Lightspeed-PHP performance and scaling tests

Monday, May 9, 2011

Sharding PHP implementation