I am currently working on a database design problem. I work in the financial domain and I deal with data related to benchmarks and securities. The problem with such databases are that each security has multiple identifiers in the real world. They could be Uniquely identified by a SEDOL, CUSIP or ISIN or the many other type of identification standards available.

This lead me to the question of designing the tables using a surrogate keys but I was still not sure how my team would react as they had already design loads based on real world identifiers and named it security_id. Unfortunately security_id could be a SEDOL, a CUSIP or ISIN or whatever depending on what region or benchmark the security belongs to. That isn’t really a reliable identifier. And hence to avoid annoying the guy in my team who does data loading I redesigned my queries to do some intelligent identifier resolutions. This cost us a lot. Queries that would perform in barely 500ms now take about 1500ms.

So I had explain the advantages of a surrogate key. I couldn’t do that alone as I am a younger resource, older people needn’t take my suggestions seriously. So I kept relying on my complex queries till one day we had a review with some experts in Vertica. And as soon as they saw the query they asked us why we didn’t have a unique identifier. That clearly solved my problem at least. I didn’t have to convince anyone anymore. The experts made my team aware of the advantages of having a surrogate key. And how that could benefit query performance.

So what is this thing called surrogate keys?

A surrogate key in a database is a unique identifier for either an entity in the modeled world or an object in the database. The surrogate key is not derived from application data. That is taken from (Wikipedia - Surrogate key).

Now I’d like to quote something I read in stack overflow, which I felt is a very good answer that sums it up for me. Using a natural key is rarely the best option. I would probably go for the surrogate key approach as in your first example. The following are the main disadvantages of the natural key approach:

  • You might have an incorrect type name, or you may simply want to rename the type. To edit it, you would have to update all the tables that would be using it as a foreign key.
  • An index on an int field will be much more compact than one on a varchar field.
  • In some cases, it might be difficult to have a unique natural key, and this is necessary since it will be used as a primary key. This might not apply in your case.

Source: (StackOverflow - Surrogate Keys v/s Natural Keys)