Skip to main content

Shard – A Database Design

Scaling Database is one of the most common and important issue that every business confronts in order to accommodate the growing business and thus caused exponential data storage and availability demand. There two principle approaches to accomplish database scaling; v.i.z. vertical and horizontal.

Regardless of which ever scaling strategy one decides to follow, we usual land-up buying ever bigger, faster, and more expensive machines; to either move the database on them for vertical scale-up or cluster them together to scale horizontally.

While this arrangement is great if one has ample financial support, it doesn't work so well for the bank accounts of some of our heroic system builders who need to scale well past what they can afford.

In this write-up, I intend to explain a revolutionary and fairly new database architecture; termed as Sharding, that some websites like Friendster and Flickr have been using since quite sometime now. The concept defines an affordable approach to horizontal scaling with no compromise at all.

For instance Flickr handles more than 1 billion transactions per day, responding in less then a few seconds and can scale linearly at a low cost.


What is sharding...?

While working for an auction website, somebody got the idea to solve the site’s scaling problems by creating a database server for a group of users and running those servers on cheap Linux boxes. In this scheme the data for User A is stored on one server and the data for User B is stored on another server. It's a federated model. Groups of 500K users are stored together in what are called shards.

The advantages are:

  • High availability. If one box goes down the others still operate.
  • Faster queries. Smaller amounts of data in each user group mean faster querying.
  • More write bandwidth. With no master database serializing writes you can write in parallel which increases your write throughput. Writing is major bottleneck for many websites.
  • You can do more work. A parallel backend means you can do more work simultaneously. You can handle higher user loads, especially when writing data, because there are parallel paths through your system. You can load balance web servers, which access shards over different network paths, which are processed by separate CPUs, which use separate caches of RAM and separate disk IO paths to process work. Very few bottlenecks limit your work.

How is Sharding different from traditional architectures...?

Sharding is different than traditional database architecture in several important ways; following are the key factors -

Data is denormalized. Traditionally we normalize data. Data are splayed out into anomaly-less tables and then joined back together again when they need to be used. In sharding the data are denormalized. You store together data that are used together.

This doesn't mean you don't also segregate data by type. You can keep a user's profile data separate from their comments, blogs, email, media, etc, but the user profile data would be stored and retrieved as a whole. This is a very fast approach. You just get a blob and store a blob. No joins are needed and it can be written with one disk write.

Data is across many physical instances. Historically database servers are scaled up. You buy bigger machines to get more power. With sharding the data are parallelized and you scale by scaling out. Using this approach you can get massively more work done because it can be done in parallel.

Data is small. The larger a set of data a server handles the harder it is to cash intelligently because you have such a wide diversity of data being accessed. You need huge gobs of RAM that may not even be enough to cache the data when you need it. By isolating data into smaller shards the data you are accessing is more likely to stay in cache.

Smaller sets of data are also easier to backup, restore, and manage.

Data are more highly available. Since the shards are independent a failure in one doesn't cause a failure in another. And if you make each shard operate at 50% capacity it's much easier to upgrade a shard in place. Keeping multiple data copies within a shard also helps with redundancy and making the data more parallelized so more work can be done on the data. You can also setup a shard to have a master-slave or dual master relationship within the shard to avoid a single point of failure within the shard. If one server goes down the other can take over.

It doesn't use replication. Replicating data from a master server to slave servers is a traditional approach to scaling. Data is written to a master server and then replicated to one or more slave servers. At that point read operations can be handled by the slaves, but all writes happen on the master.

Obviously the master becomes the write bottleneck and a single point of failure; and as the load increases the cost of replication increases. Replication costs in CPU, network bandwidth, and disk IO. The slaves fall behind and have stale data. The folks at YouTube had a big problem with replication overhead as they scaled.

Sharding cleanly and elegantly solves the problems with replication.

The most recommended approach to implement database shards is using the Hibernate Shards framework. The said framework offers critical data clustering and support for horizontal partitioning along with standard Hibernate services. Which enable the businesses to keep data in more than one relational database without any add-on complexity whilst building the applications.

Other than Hibernate; shards can also be implemented with any of the following toolkits –

Well, thats all for the starters folks. Hope this was an useful read and has provided enough thoughts for your brains to work quite sometime now...

Comments

Anonymous said…
I don't understand how you can state "no down sides".

Here's a couple:
* requirement to do distributed transactions/2PC if you want to co-ordinate changes across shards

* singificantly increased complexity in reporting across the business

* complexity around management of common data that you want to reference by foreign keys from shard-specific data - eg a 'member' record referencing a 'member group'.

These are issues that don't exist if you use a single database.

I'd be interested to hear how folks using this approach tackle these issues.
Steve said…
If he's talking about Ebay, then they don't do distributed transactions.

If you care about performance, then you don't want to EVER do distributed transactions as they depend upon pessimistic locking, and that doesn't scale.

Now you ask, "What about data consistency?". In Ebay's case, they don't care about short-term inconsistencies, but their architecture will eventually be consistent. See here for more on that: http://www.infoq.com/news/2008/06/interview-shoup-ebay
Rahul Roy said…
Find your answers herein -

What is the requirement to do distributed transactions/2PC if you want to co-ordinate changes across shards...?

Should one want to perform distributed transactions over shards, then all one needs to ensure is that the target databases are XA transaction compliant.

Most of the leading databases of today are XA compliant anyways.

Would it introduce a singificant increased complexity in reporting across the business?

Well, it all depends upon how do you architect your data access strategy. Think a little radical on the lines of restful web services ;)

Will complexity around management of common data that one wants to reference by foreign keys from shard-specific data increase?

Definately, thats a valid point but you must remember that shard is not yet a very matured technology yet. Until the point in time when Cross-Shard referencing is possible; one needs to design the data tier a little differently whilst implementing shards.
Anonymous said…
This comment has been removed by a blog administrator.
kucink said…
i just want to ask : whats is differences between shard design and nosql database like mongodb?
Anonymous said…
Hi Rahul,

we came across your blog post on XRX. We are a Delhi based startup working on XRX. We wish to connect with you. Please send me your contact info.

anshuman
anshuda@yahoo.com
Anonymous said…
Can u shard with mysql?
Rahul Roy said…
Ofcourse, sharding is a database architecture which is not vendor specific.

Popular posts from this blog

FAINT - Search for faces

Lately, I've been playing around a bit with facial pattern recognition algorithms and their open source implementations. I came across many reference implementation but a very few were implemented in Java, and the Eigenfaces algorithm by far happens to be the best amongst them all. During my research around the said topic i happened to stumble-upon an implementation called FAINT (The Face Annotation Interface - http://faint.sourceforge.net). Faint by far the best facial pattern recognition API and as you must have already guessed, it implements the Eigenfaces algorithm. Now enough of theory talks, how about implementing an example with faint...? Here is one for all you face-recognition enthusiasts. The following example simply searches for faces in a given photograph and thumbnails them. Now, I know thats not face recognition; but be a little creative here. Once you have the facial thumbnails extracted, its never a big deal to look further in the Faint API and find methods which ca...

Is Java String really immutable...?

In many texts String is cited as the ultimate benchmark of Java's various immutable classes. Well, I'm sure you'd have to think the other way once you have read this article. To start with, let's get back to the books and read the definition of immutability. The Wikipedia defines it as follows - 'In object-oriented and functional programming, an immutable object is an object whose state cannot be modified after it is created.' I personally find this definition good as it mentions that an immutable instance's state should not be allowed to be modified after it's construction. Now keeping this in the back of our minds, let's decompile Java's standard String implementation and peep into the hashCode() method - public int hashCode() { int h = hash; if (h == 0) { int off = offset; char val[] = value; int len = count; for (int i = 0; i h = 31*h + val[off++]; } hash = h; } return h; } A detailed ...