Inheritance in relational databases

Everybody loves Object-Oriented Programming, right? Well, maybe not necessarily, but it's a good solid proven way of writing code — especialy for large-scale production — and it's something that your technically challenged boss will feel comfortable with. Everybody also loves relational databases. SQL is great and used everywhere so, chances are, you'll have to use it too. And you'll be using it for giving your objects some persitance.

What, of course, is odd here, is that the two things don't line up well. Relational databases are not object oriented and so you're going to have to do some mapping between the two; especially when your using that lovely object-oriented technique: inheritance.

Relational databases don't do inheritance. If you have a table storing instance data for one class, and another table storing the instance data for the child class, then you're gonna have to do something to make the whole thing hang together without breaking polymorphism. There are three basic methods to get around this problem:

Class Table Inheritance

In class table inheritance, each class has its own table which stores the data for that class. If it is a subclass then it only stores the extra properties; inherited properties are stored in the superclass' own table.

This is the method that the Everything Engine itself uses. The data in this very node you are reading is an object of class writeup (well, in everything-speak it's actually a node of nodetype writeup but the principle is the same). Some of its data is stored in a database table called 'writeup'. It inherits some of its properties (including the doctext property which holds the text of this node) from the document class with data stored in a table called 'document'. In turn, this inherits more properties (title, author and so on) from the node superclass. The nodetype is also stored here.

This is the nicest way to solve the problem from a clean design poin of view. It's also the obvious way to do it; it's pretty much a copy of how things are done in OOP. However it does come with penalties. When an object is instantiated with data from the database, the system must look up the class and then (in SQL or in code) join the properties together. This is more expensive than a simple select and is one of the reasons that e2 can be a bit slow — although the main one is that each page made is made up of so many separate nodes.

But the real pain comes when creating new objects, or updating existing ones. The system has to trawl back through the databsse to find the correct table in which to make the change before it can be done. You can end up with some really confusing code whilst going down this route.

Single Table Inheritance

You may come across this method if you ever use Ruby on Rails (and I suggest you do). It's the simplest way of doing things and involves a single table (hence the name) for a whole inheritance structure. In other words, the table is defined for the superclass and includes all the extra properties for the subclasses, leaving empty fields where applicable.

The rationale for this case is simply that it minimises join operations; there won't be any. All the data is in once place and can be quickly found and changed without any confusion as to which table to use. And it is very fast. It's also very simple to implement, especially when compared to Class Table Inheritance.

The downside are quite obvious here, though. Imagine if the everything engine used single table inheritance. Since all nodes inherit from node, then all the data would be stored in that table. It wouldn't have any effect on the number of rows (since all nodes have an entry here) but we would end up with huge amounts of columns too. To make matters worse, most of the columns would be empty since they would represent properties of a completely different class.

Concrete (or Leaf) Table Inheritance

This solution falls somewhere between the other two. All the properties of an object are stored in a table for that class; including inherited properties. This means that one class table includes columns for it's particular type and its own columns representing inherited properties.

It has the same advantage as Single Table inheritance; since all the data's in one place. However if you're listing objects of a type you may find that they're scattered all over the shop. You'll also have to have a separate table listing all the objects and what class they are - since you can't find out their properties until you know that.

This technique also plays havoc with Polymorphism unless you have some pretty clever code in there.