Custom Search

Tuesday, August 19, 2008

Bad Ideas in Database Partitioning (Episode #1)

Warning, the following is a technical post that may make your eyes glaze over...

OK, if you are still here, this is a short one about something I recently saw in an Oracle database.

First of all, a bit about partitioning. In some databases, you can partition a table in a variety of ways depending on why you want to partition your data. For example, you may want to partition it based on date ranges if you will be deleting older data or if you will generally be accessing data from the same time periods. This way, the database "knows" that it can safely ignore the other partitions outside of the date range you are looking for. This can result in the remaining data being searched much faster than if the whole table needed to be examined.

Another way you can partition a table in Oracle is called hash partitioning. In this type of partitioning, the field in question is "hashed" or calculated into a new value that may not uniquely identify that value. For example, a hash for state abbreviations could calculate the numeric value for each letter (a=1, b=2, etc) and add them together. MN would be 27 (13 +14) as would AZ (1+26). If the hash calculation was to use the last digit, the values for AZ and MN would be stored in the same partition (7) whereas the data for CO (3+15=18) would be in partition #8. If there were 10 partitions (0-9), this would be a good strategy.

Hopefully, that makes a little sense. The key is that the partitioning criteria should look at the data and why it is being partitioned and devise a plan accordingly.

The case I am calling Episode #1 (order of blog entry, not of the importance or frequency) deals with this little fact: if you have no values for a given partition, it will be empty and thus not beneficial to the database. If you had 51 partitions and planned to use one for the invoices for each state, but only did business with customers in a few states, the rest of the partitions would remain empty. This would be ok if you were on the verge of a huge advertising campaign and your company was expecting to grow across the country. But if you were a local service business with customers in only 1 state, this would be a waste.

In summary, when planning a partitioning strategy, make sure you will be using most (if not all) of the partitions you will be creating. If you have inherited your systems, take a look around to see if you have a bunch of empty partitions and if so, you can start discussions to better understand the data and how it is used so you can recommend a new partitioning strategy.


McM said...

I'm not sure whether to be delighted or appalled that not only did I read but I actually understood the whole thing! :)

Stephen said...

I'll say that I'm delighted that anyone read it and understood it :)