Explores dependable architectures, error detection, fault-tolerant structures, and software reliability through examples like the Patriot Missile failure and ABB dual controller.
Explores redundancy as a technique for dependable systems, covering fault tolerance, reliability, and fault models, emphasizing the importance of idempotency and leases.
Explores dependability in industrial automation, covering reliability, safety, fault characteristics, and examples of failure sources in various industries.
Explores the design of a general-purpose distributed execution system, covering challenges, specialized frameworks, decentralized control logic, and high-performance shuffle.
Explores decentralized systems engineering, consensus algorithms, fault tolerance, Byzantine faults, and the practical applications of fault-tolerant systems.
Explores data locality in scheduling decisions for multi-tenant platforms and discusses Hadoop's architecture, execution engine optimizations, and fault tolerance strategies.