Explores data locality in scheduling decisions for multi-tenant platforms and discusses Hadoop's architecture, execution engine optimizations, and fault tolerance strategies.
Explores Hadoop's execution models, fault tolerance, data locality, and scheduling, highlighting the limitations of MapReduce and alternative distributed processing frameworks.
Explores the design of a general-purpose distributed execution system, covering challenges, specialized frameworks, decentralized control logic, and high-performance shuffle.
Discusses advanced Spark optimization techniques for managing big data efficiently, focusing on parallelization, shuffle operations, and memory management.
Covers the operating system's role as a referee in managing resources and ensuring security through fault isolation, resource sharing, and communication.
Introduces the Applied Data Analysis course at EPFL, covering a broad range of data analysis topics and emphasizing continuous learning in data science.
Explores coordination and scheduling in operating systems, covering lost wakeup problems, scheduling algorithms, and coordination primitives like sleep and wakeup.
Delves into the intersection of physics and data in machine learning models, covering topics like atomic cluster expansion force fields and unsupervised learning.