Warning
🚧 This series is a Work in Progress
I am writing, rewriting and publishing almost everyday. Some of the write-up links to others and sometimes I realise that context should have been given earlier, so I reorganise the pages in the series.
It felt more valuable to share this series as I wrote it, rather than wait till I polish everything up. This keeps me motivated to iterate quickly and avoid procrastination.
Feel free to share your thoughts and opinions as comments.
Let’s take a high level view of a k8s cluster, I’m borrowing a beautiful image from the official docs below.
Overview
Notice that there is a Control Plane which comprises of several components that interact with things outside it - several nodes!
The nodes outside the *control plane also seem to to have things running on them.
The control plane
This terminology is probably familiar to those coming from networking.
A plane is an abstract conception of where certain processes take place.
The two most commonly referenced planes in networking are:
- The control plane
- The data plane, aka forwarding plane
Control plane is the part of the network that controls how data packets are forwarded/sent from one place to another. It forms the brain of the network.
In a traffic analogy - control plane is like the traffic signals at intersections of a city. Whereas the data-plane is the traffic that flows - the cars and other vehicles using those roads in the city. The automotives on the road have to obey the traffic signals - stop or move as the lights show.
In k8s, the control plane is a set of components that collectively make decisions about the state of the cluster, manages the nodes and ensures that the desired state as defined by the user or operator of the cluster or applications is always maintained.
Let’s explore the various components of the control plane as you’ll definitely hear people talk about these concepts as you work with k8s.
The components of the control plane
API server (kube-apiserver)
This is the central component of the control plane that exposes the k8s API. It is the front end of the k8s control plane. The main implementation of the API server is called kube-apiserver. It is designed to scale horizontally, and can be configured to be load balanced to coordinate traffic between them for the purpose of High Availability.
etcd
etcd is a distributed, reliable key-value store for the most critical data of a distributed system. k8s uses etcd to store several configuration information, state information and other metadata about the cluster.
kube-scheduler
The scheduler is responsible for assigning nodes to pods based on resource availability and other constraints. It actively monitors resource usage across the cluster - on every node, to make these critical decisions on where to deploy pods.
Some of the various factors taken into consideration when making scheduling decisions are:
- individual and collective resource requirements
- hardware/software/policy constraints
- affinity and anti-affinity specifications
- data locality
- inter workload interference
- deadlines
Some of those terms might need a bit more explanation.
Affinity
Affinity is used to express Pod scheduling constraints that can match characteristics of a node and potentially pods that are already running on that node(s).
a pod that has an affinity to a given node, is more likely to be scheduled on that node, conversely, anti-affinity makes it less likely to be scheduled on that node.
The overall balance of these weights (numeric values) are used to determine the final node for each pod.
These assessments can result in
- hard outcomes - node must have characteristics defined by the affinity expression
- soft outcomes - a preference that indicates to the scheduler that it should use a node with characteristics if one is available and where the criteria isn’t met, it could still be selected if necessary
Types of affinity
- Node affinity - constrains the node that can receive a Pod by matching labels of those nodes.
- Interpod affinity - constrains the nodes that can receive a pod by matching labels of the pods that are already running on those nodes. They can be attracting (affinity) or repelling (anti-affinity)
Use cases
Why would one need this concept of affinity?
- A pod might need specialized hardware such as an SSD drive or a GPU
- Pods that communicate frequently might need to be collocated with each other
- Pods that have a high computational requirements might need to be confined to specific nodes or spread out among the beefier nodes
- Anti-affinity can be used as a way to ensure high availability - avoiding a single point of failure by scheduling a service across multiple pods ensuring each pod runs on a different separate node
Controller Manager
There maybe several controllers running in your k8s cluster, each one controlling a unique aspect of the cluster. This one is the big boss of all those controllers - it watches the shared state of the cluster through the API server and ensures that the state always matches the desired state and manages all the controllers for nodes, endpoints, replication, etc.
What is a controller though?
Control Loop
Let’s start with understanding what a control loop is.
A control loop is a concept that originated in control systems and automation. A control loop is a component that regulates the state of a system or maintains a variable of the system.
There is a sensor sensing the current state of the system, then a controller that compares the current state to a desired state of the system, then a final control element that adjusts the system to change the current state to the desired state.
K8s does the sensing through something like a sensor - called a SharedInformer
, which creates a single cache of key events from a variety of k8s cluster resources shared by several different controllers. SharedInformer
picks resource changes from a queue called the Workqueue
, and then distributes the changes to the the various controllers!
In k8s, a controller is a control loop that watches the state of the cluster and makes or requests changes where needed to move the current cluster state to the desired cluster state.
If you have done some front-end development, you might have encountered UI frameworks that use a Controller component that does something similar. It watches the state of the user interfaces and adapts its state based on user interaction. Some examples of controllers in action in the front-end:
- click a button then submit to API
- type a value in an input box, then validate input and display validation message
- on tab selection - hide one tab and show another tab
Only sharing this here as this reminded me of the controllers I am familiar with in Angular.
Each controller tracks at least one k8s resource type. There are many types of controllers. Listing a few below to help you understand some of their responsibilities.
- Node controller: responsible for noticing and responding when nodes go down - yea they can go down
- Job controller: watches for Job objects, i.e. k8s objects that represent one off tasks. Also creates pods to run those tasks to completion
There are various types of controllers based on how they do what they are supposed to. Like the ones stated above are built-in controllers that do their job by interacting with the cluster API Server. Meanwhile, there are others that need to make changes to things external to the cluster directly, based on input from the API server regarding the state of the cluster.
Why is this state loop useful?
K8s is able to handle change constantly because of this state loop that the controller looks after.
It is able to attempt to recover your failed pods to back to how it was intended to have been. But there may be certain scenarios where due to a serious bug in the application that is running in the pod, k8s could end up in an infinite loop trying to restart the pod and having to kill it too soon.
A cluster admin or operator or owners of the applications running on k8s can help k8s do its job properly - i.e. determine when to kill a pod vs when to restart it, by configuring three different types of probes - liveness, readiness and startup probes. This helps k8s give the cluster, the ability to self-heal where possible without human intervention, by automatically killing or restarting pods as needed based on results from the probes.
However, these are distributed systems we are talking about. Therefore, there will be times when multiple applications are potentially deployed to the cluster at the same time which has bugs making the cluster unstable. This may trigger k8s to try and achieve a stable state by terminating and reinstating pods of the apps that were misbehaving, but repeatedly in a loop until the cluster is stable, which may not be possible until a bug-free version of the application(s) is deployed.
Failure is the norm. As good SREs you must ensure your system is robust enough to recover and retry.
cloud-controller-manager
Yes, we did and this one is special.
This one embeds cloud provider specific control logic. It allows you to link your cluster into your cloud provider’s API. It is an interface from your cluster to the cloud provider. The cloud controller manager only runs controllers that are specific to your cloud provider. This controller wouldn’t exist in a k8s cluster that you are running on bare-metal or on your personal laptop for learning purposes. They only exist when your cluster is on the cloud.
But the principle, is essentially the same - this controller manager watches several different control loops and acts accordingly.
kube-proxy
This is a network proxy, i.e. an intermediary between a client and a server, often used to improve privacy, security and performance. This proxy runs on every node. It is responsible for maintaining network rules on the nodes which enable network communication to your pods from sessions outside your cluster.
Container Runtime
Head over to the summary in the post on Containers on K8s
Addons
So far you have seen the core components of the control plane of a kubernetes cluster. Addons are like the name says, extras, plugins to extend the functionality of a cluster beyond what comes out of the box! A lot of the popular tooling that makes a cluster useful are built as Addons.
Some popular examples of Addons:
- Cilium extends the networking interface on k8s
- Prometheus extends the monitoring capabilities
- Fluentd adds log collection unification abilities
- Kyverno extends k8s’s policy management
And there are so many more to explore.
Some of these Addons are not optional and are essential.
DNS
All k8s clusters must have a DNS for the cluster. This is usually provided through CoreDNS addon, deployed by default while you setup your k8s cluster.
Containers started by k8s automatically include this DNS server in their DNS searches.
Some other Addons that are usually installed are:
- a web based dashboard for your cluster - for managing and troubleshooting apps in the cluster.
- logging and monitoring solution - container resource monitoring
- cluster level logging - ensure logs from all the containers in the cluster goes to a central log store with excellent search and browse capabilities
- networking - addons that implement the container network interface specification, that allocates IP addresses to pods, enabling them to talk to one another in the cluster.