Most good organisations expect their engineering hires to have gone through a system design interview. For a vast majority of people this might sound like a weird thing to do as most people don’t individually design large scale systems. So expecting someone to design a highly scalable and available system in less than 60 minutes is a daunting task.
We must also consider the fact that not everyone gets to work in organisations that build large scale distributed systems. I think the time when I worked in FactSet was when I dealt with extremely large volumes of data and we focussed on the performance of queries from the database to sub 5 ms in order to ensure that even with latency, users would be able to see their graphs plotted in a second or so. Most applications wouldn’t really care about performance at that level because for their line business it probably doesn’t matter that much.
Those were exciting challenges and don’t often involve writing 1000s of lines of code. But they were experiments with different strategies of sharding data, based on how the data was fetched. It did take time and felt like a massive task at first, but as we progressed and realised the speed gain, we were more than happy with our efforts.
However, data at scale is not the only thing that we have to be concerned about. Large scale distributed systems are complex. Nobody designs an application as a distributed system to begin with. It evolves to become one as it scales, based on usage over time. This is why system design is an interesting topic.
What do system design interviewers expect?
System design interviews are not trying to test your experience with system design. No one person has worked on all the scalable distributed systems in the world. Most people work on one part of a very large system. The interviewer wants to understand how you apply fundamental distributed systems knowledge to think through a large ambiguous problems and how much you know of the pitfalls of designing such systems.
There are no right or perfect answers in a system design interview. It is almost a 60 minute discovery into the way you think about the system and then probing the choices you make during the interview. Thus is it mostly a conversation to understand:
- how you think about large scale systems
- whether you are aware of the basics of large scale systems
- whether you are eager to apply what you know
- whether you can talk through different technology choices and their trade-offs
- whether you try to understand the problem by asking clarifying questions at every stage of the interview.
Focusing on these areas shows the interviewer that you can deal with a customer’s request and avoid making the wrong assumptions, understand constraints of the problem better and make better choices based on the needs of the customer.
How to prepare for a system design interview?
If you haven’t worked on large scale systems, then this is often the question that makes everyone scared about attending a system design interview.
Ideally for a system design interview, you need to prepare or revise your Distributed systems module from University, if you did computer science as your major. If not you can buy a book or learn it online these days.
The next thing you should do is go through the architecture of various large scale web applications that exist today - that are popular and almost always have engineering blogs writing about their approaches to solving scalability, availability and reliability problems.
Once you have covered these, you will be able to design a distributed system or at least talk about various ways you would take to solve different parts of the problem.
Starting with some fundamentals of distributed systems
What are the basics or fundamentals when it comes to distributed systems? When you think about it, you might be like, “where do I start?”. This is normal. We live in a world where 15 second videos of random topics pop up in our social media feed and we spend minutes to hours scrolling through them. Structured learning has almost been forgotten.
I have in the past written about software architecture and it covers some of the basics that you need to be aware of when preparing for system design interviews.
For the sake of it, let me mention some concepts
Data durability and consistency
What does that mean?
Different storage solutions have different capabilities. They differ in how resilient their corruption rates are in case of failures. Similarly data in a distributed system can get fragmented and unordered, so keeping it all consistent is a challenge.
The way the system replicates data helps the system recover in ase of disasters and also provides backups. This does however, have other concerns but the benefits of having a replica almost always outweighs its downside.
Sharding or partitioning is the method of dividing data across different notes in your system based on the value of a certain field. The choice of which field to shard data by depends on the purpose of the application. Clever sharding enables parallel writes and reads and can improve I/O performance significantly.
In distributed database systems, there may be a master node and several hundred worker nodes writing data. And in some others there may not be a single master but many. Thus in order to ensure order and consistency, the nodes have to agree on what was written and what is the latest value of the data that is written and make trade-offs as to if the system gives the user the latest value, or the one that was consistently checked and verified a certain duration ago.
Data systems have to ensure that related data is written in one-go, i.e. if a part of hte data could not be written, then none of it must be written. This decision might have to be made across distributed systems using different ways.
Common large scale web application architecture
In distributed systems, information is processed at different levels, sometimes like a pipeline - from one to another to the next and so on until finally data is written or archived. The number of layers in a system determines the n-tier nature of the system.
HTTP, REST and GraphQL
HTTP is the protocol that is the backbone of the internet. Every website, every web service is served using this. Knowing the difference between REST and HTTP, which people sometimes use interchange-ably is important. Where REST is a design principles suite for applications to interact over HTTP, GraphQL is a query language for APIs to enable more efficient, flexible and powerful ways of working with data from APIs. Knowing when to use REST vs GraphQL is the sign of a good web developer!
DNS and Load balancing
When it comes to scaling your application, the first thing that springs to mind is the ability to add more nodes to the system. This is great but it does introduce the problem of who directs requests to the right nodes and how to keep track of all the nodes? These are problems that have been solved already but every distributed system you build will need to address these sort of changes.
Despite storing your precious data on SSDs, if the network latency is relatively high, you might want to be creative about how you serve your data to clients. That’s when you start thinking about caching and distributed caching. This then introduces the problem of who keeps the data int he cache up to date and how and what happens when we run into a miss.
Certain types of data is better suited to streaming to ensure the latest data is always passed to the user, this comes with various other challenges of how to keep the user interface up to date, what if the data stream is huge, how do we ensure consistency in this case and so on.
That’s an overview of the things that need to be covered. Of course we have to go slightly deep into it. But this isn’t impossible and certainly might take only a week or so to prepare and feel confident.
Let’s take a look at how we go about this.
This is a start of a series on how to tackle system design interviews. I’m only just starting this and I have a day job, so please be patient with me.
I’ll gradually unveil the series.