Saturday, April 2, 2011

Why consider non-relational databases?

Why would you want to be using non-relational databases such as Cassandra in the first place when we have great relational databases like MySQL that have proven to work great for decades, there is lots of know-how available and learning to be productive at it is rather easy. Relational databases provide us with great tools like data normalization, consistency, transactions, complicated joins and queries.

The question to "what's wrong with relational databases" is really "nothing". They work great for many applications and are simple to use. The problem with relational databases arise when we begin to need more performance out of our database than normal RDBMS deployments can handle.

Modern web applications often connect millions of people in an interactive way with lots of communication and data stored and exchanged at any time and unfortunately, there is no simple way to scale the performance of relational databases horizontally by simply adding more nodes.

When an application is becoming popular, the performance issues often arise. This is first mitigated by throwing beefier machines at it, which can postpone the problem for some time until there are no more affordable powerful enough machines. Then often a cache layer is introduced into the application if not already there, this can reduce the read load significantly but will make the application much more complex.

Read load can be further reduced by creating read-only replicas of the main database, so all the reads go to any of the replicas and the main database is only used for writing. This complicates things further and brings problems like what happens if you update something and redirect user to a page that shows the updated content. On this page, you will read from the replica and perhaps store the new value to a cache that you cleared when updating the record. Replicas lag slightly behind the main and what happens if you read from it before it has had time to propagate. This can happen often as the redirect is fast. Now you read old data and show it to the user that can get confused why the update had no effect. So instead of clearing some caches on data updating, deleting, you have to update them that further complicates everything. Notice that we have so far only been able to improve read performance, all the writes still happen on the main database.

Now you've implemented caching and read replicas and optimized all the queries but your site is becoming ever more popular so it's still not enough. You might now partition the database by functionality, for example by moving the search, comments or any other loosely coupled part to some other machine but you can do this only that many times. I've been down this exact road in my own experience and it's not a pretty ride.

At this point, you need a solution that really scales horizontally, meaning you can easily add or remove some machines handling all the data as your needs change. There are a couple of ways to do this, most popular being sharding your database so different rows of the same tables live of different machines or by using a relational database. Actually sharding is very powerful and when designing a truly scalable architecture, I would try to mix both sharding and non-relational databases to get the best out of both worlds.

3 comments:

  1. Truly noteworthy article imparted by you. This article genuinely contains information which was gainful for me and I am sure that this might benefit many programmers as well. Thank you for the share.
    Website Design Agency | Website design company

    ReplyDelete
  2. Very Nice article. Thanks for sharing content and such nice information for me. I hope you will share some more content about. relational databases Please keeps sharing!

    altsols

    ReplyDelete
  3. Great information...I am happy to find this post Very Helpful for me, as it contains lot of information.

    Mobile App Ideas
    Web Development
    Dating App Development Company

    ReplyDelete