Background

Back in the 90s, when I was growing up, I did not have a mobile phone. I had a landline, and we had it extended such that one was corded, and the other was cordless. For convenience, one could walk around the house using the cordless phone while talking. However, someone else could snoop in if they really wanted to by picking the other phone.

The fascinating bit was that I had memorized the phone numbers of most of my mates! I have no idea how, maybe just by habit! Sometimes I couldn’t tell the number out loud but would remember how my fingers should move to reach them. It was that kind of muscle memory. Hard to explain, but if you know, you know. I rarely had to refer to a secondary source to check if the number was right. But there were times I did look at my personal phone book (yea, I’m that old), a pocket sized phone-book. This came in handy when I occasionally forgot a number or couldn’t remember it at all.

But the important and relevant part to this post is the way that I knew that every friend of mine had a unique number that I could dial to reach them. This is the fundamental concept of the internet. Every website has an IP address - internet protocol address, something that looks like 192.168.234.134. IP addresses are what makes it possible to uniquely identify a website.

But wait, when was the last time you used an IP address to access a website? Unless you are a security researcher or a cybersecurity professional, not something most people would do. How is it possible for us to reach the website without knowing its IP address? That’s where DNS comes in.

What is DNS?

Domain name system.

This is the internet’s phone book equivalent that provides human friendly names to those funky internet protocol addresses that make our lives so much easier!

When you type a URL into your browser, it goes and speaks to the DNS call center (metaphorically), finds out the IP address and forwards your GET request to the destination server. All this happens within seconds making you feel like it never existed!

So there is one DNS server that stores this entire directory of websites?

Nope.

DNS is a distributed system of several name servers. DNS servers that respond to user queries are called name servers.

How does a DNS database do this?

Introducing resource records

DNS entry - or resource record as it is officially known as is a record of couple of key pieces of information.

  • type
  • name
  • value

There are various types of resource records. Thank you captain obvious, I hear you say. You might have guessed this already from the type part of the DNS record.

What types?

A

Yes, A is a type of resource record. If you have ever registered a website’s URL with a domain registrar, you probably dealt with these types. A type provides the hostname to IP address mapping. So how does an A Type record look like?

A, shop.eakan.dev, 100.160.189.23

That represents, type,name,value as comma separated values.

NS

This provides the hostname that is the authoritative DNS for a domain name. All this means is that the NS records points to a subdomain that is a set of name servers. Example:

NS, eakan.dev, dns.eakan.dev

CNAME

This type provides a mapping from an alias to the canonical hostname. Example:

CNAME, eakan.dev, s1.master.eakan.dev

MX

This is type of record that is associated to a mail server - mapping an alias to a canonical name! Example:

MX, mail.eakan.dev, mailserver9.google.eu

Types of DNS servers

Recursive Servers or resolvers

The helpers that answer queries on behalf of users.

When you request a URL to load in your browser, after having checked the browser cache and hosts file, the browser then forwards the request to your ISP. From there on the server searches for the correct IP address for the URL, by forwarding requests to many other DNS servers until it finds the answer.

Root level name servers

These servers receive requests from resolvers/recursive servers and maintain name-servers based on top level domain names, like .com, .dev, .io and so on.

To give you an example, when you request softwarecraftsperson.com the root level domain name servers will return a list of top level domain name servers that keep the IP addresses of the .com domain.

Top Level domain name servers

These servers are where we hold records of authoritative name servers. Do you see where this is going? Recursive DNS Server => Root Level name servers => top level name servers which then point to the next category.

Authoritative name servers

They are the authority for a specific domain, officially. These could also be like your hosting company, that holds records about your personal domain. They provide the final answer about the IP address of a domain.

E.g. The last and final part of the url eakan.dev after going through a resolver => root level => TLD (.dev) => authoritative eakan.dev.

Caching and forwarders DNS Servers

Caching servers exist to speed up DNS resolution. It caches results for a short time based on the time-to-live value. These could be at the ISP or at the organisation where you work.

Forwarding servers are just helpers to the recursive server.

DNS query resolution techniques

A DNS query can be resolved in two ways. These are just good to know information. I have never heard anyone ask this in a system design interview! But I have limited experience attending system design interviews.

Iterative resolution

You make a request to a local server, which then requests the root, TLD and authoritative servers for the IP address.

Recursive resolution

Like the name, the request goes recursively. You request an address, it goes to local server, which makes a request to root name server, which then makes a request to the top level domain server, which goes further and requests from authoritative name servers. Until it finds or doesn’t find the record and comes all the way back.

How is DNS so fast?

DNS employs caching at various different layers to reduce latency for the user. Popular domains are cached in multiple local domain name servers making the request a lot quicker to respond to.

Let’s take a look at the caches in play when you type in a URL in your browser

  1. Browser checks browser cache for the URL, if it’s a hit, retrieve the site. If it is a miss, go to step 2.
  2. Check OS cache for the DNS record, if hit, then update browser cache. If it was a cache miss, then go to step 3.
  3. Check local DNS resolver, if it is a HIT then update OS Cache then update browser cache and serve website. If it was a MISS, then go to step 4.
  4. Check ISP cache, if hit then update all the previous caches and serve website. If it was a MISS, then go to step 5.
  5. Check further DNS infrastructure on the internet, if hit, respond and update all the caches so far and serve website. If it was a MISS, then website is probably not found!

What did we learn from DNS?

It is the distributed system that makes the internet possible. DNS is reliable because:

There is no single point of failure. Multiple DNS servers and caches and hence if one of them is down for an upgrade or security patching, others continue to serve requests and the internet moves on. This makes it highly available!

DNS ensures reliability by employing several layers of caching as explained earlier and because DNS records are replicated across multiple servers, there are many servers that can provide us with the right address for a URL.

Another factor that contributes to reliability is that although most of the internet works on TCP protocol, DNS lookups happen over the less reliable UDP protocol, this is much faster is partly the reason why DNS is so performant. There are exceptions to this though.

DNS works on the principle of eventual consistency. All the DNS record replication happens gradually, over a relatively short but still a duration of time. The time it takes to propagate DNS record information depends on the infrastructure involved and the size of the updates - how many records to update and the likes.

Because DNS employs caching at multiple layers, consistency could be compromised as a cached entry could be out of date and might need a forced update. Thus cached records always have a Time to Live (TTL) to ensure that the records expire and will be forced to update.

Some questions after all that DNS talk

How does the computer know the address of the DNS server or the resolver?

Apparently your OS comes with some DNS resolvers’ IP addresses which in turn obtain all the information for them. This is down to the Dynamic Host Configuration Protocol, that responds with all the necessary settings and IP addresses of the local DNS server, when you connect to your network for the first time.