SLO-driven architecture diagram showing reliability metrics integrated from design to deployment

Shift Reliability Left: Use SLOs to Guide Architecture Early

Reliability planning is most effective when it starts during design, not shortly before release. This post focuses on how Service Level Objectives (SLOs) and Service Level Indicators (SLIs) can be used early in the SDLC to guide architecture and delivery choices. Remind me what they are again SLI - Service Level Indicator A quantitative metric for a service’s performance, as experienced by the user of the service. It is a measure of a property of the service that is a good proxy for your user experience. ...

December 4, 2025 · 5 min · 903 words · eakangk
An incident commander from the future potentially a cyborg

What is Incident Management in Software Engineering?

Background Any software that has ever been built has had a bug or problem of some sort. Generally, these bugs might be silly things that aren’t of any major concern - like a button looking odd or only clicking when the mouse is at a certain part of it. Some bugs, on the other hand, could have serious impact on the users of the software or those that are indirectly affected by the software - e.g. a problem with the billing system in an energy billing platform, could potentially impact the amount the customers have to pay - what if the bug resulted in final amount being multiplied by a certain arbitrary number! Imagine the reputation of the energy company who are clients of a Billing SaaS provider if such a bug were to happen. ...

March 2, 2024 · 19 min · 3953 words · eakangk
Site Reliability Engineering

Site Reliability Engineering vs DevOps — How they differ and when to use each

What is SRE? SRE stands for Site Reliability Engineering. That’s just a lot of words. What does it mean though? Site Reliability engineering is what IT operations would be if it was run by software engineers. That’s an interesting take. But it was not helpful in clarifying anything about SRE just yet. Let’s try probing more. How did we go from Development to SRE? You know the part where people deploy software and then ensure things run fine in production. ...

December 4, 2021 · 14 min · 2849 words · eakangk