Monday, May 9, 2011

Sharding PHP implementation



Design
The main design objectives implementing the database sharding described in the previous post on PHP was to make it pluggable and unobstrusive meaning that the rest of the data manipulation code is loosely coupled to it making using sharding optional and as painless as possible.

The implementation uses strategy pattern enabling plugging in different sharding backends (relational database and cassandra based strategies implemented) and also sharders meaning that one might use cassandra to shard relational databases, or perhaps filesystems instead.

Following is a class diagram of the core of the sharding implementation showing the various entities and their relationships.

Shard represents one of the shards that the sharder maps keys to and has a name, belongs to a group, knows its capacity and the number of keys it's currently storing. As shards are not backend-specific, it does not have explicit fields for database DSN or username-password but rather a single string properties field that stores whatever information needed to connect to the shard based on its type. The Sharder implementation knows how to parse this and access the actual shard (that may be a relational database in main usecase).

A ShardGroup is, as the name suggests, a group of shards and has a name and the properties needed to set up a new shard on a new host. Thinking of it now, the "getSetupSql()" method is likely to be renamed to something more generic as all shards may not be databases.

The ShardSchema is the main manager of shards and shard keys and implements managing groups, shards, creating and resolving keys etc. It uses the strategy pattern so it can be implemented on various backends.

The Sharder is a higher level proxy for ShardSchema that has the main purpose of adding a cache layer on top of the actual schema as operations like resolving an identifier to a key and a key to a shard can easily be cached eliminating hitting the sharding backend for every request and use the faster cache instead. It actually implements three levels of cache for most operations:

  • Consecutive requests for the same lookups during a single request are returned from internal hashmap so it's very cheap - not even hitting the cache.
  • For a short period of time, lookups are served from local cache (APC by default), that is very fast but not distributed.
  • For a longer period of time, lookups hit the global distributed cache that is the same for all webservers (Memcache by default).
Cassandra setup
The relational schema explained in the previous post was mainly converted to Cassandra's domain. As Cassandra is not good for numeric auto-increment indices common in relational databases, I switched to using named identifiers instead for both Cassandra and the earlier PDO implementation giving rise to a major rewrite - good thing I had covered the code 100% with unit-tests making the refactoring easier.

Just for reference, the new relational sharder schema is now as follows:

CREATE TABLE IF NOT EXISTS `alias` (
  `group` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
  `identifier` varchar(60) COLLATE utf8_unicode_ci NOT NULL,
  `key_id` int(10) unsigned NOT NULL,
  PRIMARY KEY (`group`,`identifier`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS `group` (
  `name` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
  `setup_sql` text COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS `key` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `shard` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=876 ;

CREATE TABLE IF NOT EXISTS `shard` (
  `name` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
  `group` varchar(20) COLLATE utf8_unicode_ci NOT NULL,
  `keys` int(10) unsigned NOT NULL,
  `capacity` int(10) NOT NULL,
  `properties` varchar(1024) COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

For Cassandra, I managed to keep the schema rather similar where the primary keys became row keys and on using multiple fields for primary key, they are both present in Cassandra too (aggregate key).

Above is the Cassandra schema and as I don't know of proper standards of how to describe it in class-diagram'ish notation, I invented my own. As you can see a alias points to a key which in return points to the shard where the data of given key lives on and shards belong to groups - very simple really.

With [IdxGroup] behind the "group" field of "shard" column family, I mean that this is using a new feature of Cassandra called secondary index, that means that I do not have to create a "materialized view" of shards by group myself but Cassandra does it for me and allows me to do basic "where" queries so for example, I can fetch a list of shards by group name and exactly that is required by my implementation. Secondary indexes make using Cassandra for more general-purpose database much easier so it's a feature I'm very excited about.

The additional column family "counter" comes from the fact that I need something like auto-increment for key identifiers but Cassandra does not have it built in so I have my own counter that I can increment and get value of by name. I am actually lying a bit as latest version of Cassandra has counters, but this does not seem to be implemented in the PHP library phpcassa that I'm using. I will definitely come back to this as using a counter table this way is very bad as it's subject to race conditions where I may pick a counter value and before using and incrementing it, another thread takes it too.

Next steps
I actually managed to get this working with the Hiphop-PHP compiler too requiring some modifications to the underlying Thrift library so I'm pretty much ready to start testing the performance of such a setup. Hoping to get some cloud resources and set it through its paces with some JMeter tests and an actual cluster of webservers, Cassandra instances, shard databases and memcache and see whether and how well it scales.

2 comments:

  1. Thanks and Impressive Article. Our offshore Development Center in India that caters clients across the globe with its professional PHP development services. Our professional, experienced dedicated PHP development team ensures complete transparency and work in line with client’s staff. The team will coordinate and adhere with the client’s requirements and time schedules to ensure timely completion of the process.

    ReplyDelete
  2. Nice blog has been shared by you. before i read this blog i didn't have any knowledge about this but now i got some knowledge.
    so keep on sharing such kind of an interesting blogs. PHP Course in Delhi

    ReplyDelete