Fault Tolerance

Fault Tolerance - Designing Resilient Systems

Complete guide to fault tolerance covering failure modes, redundancy strategies, circuit breakers, graceful degradation, and designing systems that survive failures.

October 30, 2022 · 6 min · 1207 words · Eakan Gopalakrishnan
Failure Models

Failure Models in Distributed Systems

Complete guide to failure models covering crash failures, omission failures, Byzantine failures, network partitions, and fallacies of distributed systems.

October 30, 2022 · 2 min · 419 words · Eakan Gopalakrishnan
System Reliability

System Reliability - Building Dependable Systems

Complete guide to system reliability covering MTBF (Mean Time Between Failures), MTTR (Mean Time To Recovery), fault tolerance, and building dependable distributed systems.

October 30, 2022 · 1 min · 212 words · Eakan Gopalakrishnan