Lecture

Apache Spark Ecosystem: Basics and Operations

Related lectures (32)

Covers best practices and guidelines for big data, including data lakes, architecture, challenges, and technologies like Hadoop and Hive.

Digital Transformation: Solutions and Data

Explores digital transformation opportunities, big data, analytics, and technology innovations in business and research.

Advanced Spark Optimization Techniques: Managing Big Data

Discusses advanced Spark optimization techniques for managing big data efficiently, focusing on parallelization, shuffle operations, and memory management.

Introduction to Data Stream Processing: Concepts and Applications

Covers the principles of data stream processing and its applications in real-time data analysis.

Accelerating Data Analytics: Innovations in Post-Moore Era

Covers advancements in data analytics systems and the role of hardware-software co-design in enhancing performance in the Post-Moore era.

Data Analysis to AI and ML, Social Media

Explores the evolution from data analysis to AI and ML, emphasizing big data, machine learning, and social media interaction.

Introduction to Applied Data Analysis

Introduces the Applied Data Analysis course at EPFL, covering a broad range of data analysis topics and emphasizing continuous learning in data science.

Big Data Challenges: Scaling to Massive Data

Explores challenges of handling massive data in the era of big data, discussing solutions like MapReduce and Spark.

Data Wrangling with Hive: Managing Big Data Efficiently

Covers data wrangling techniques using Apache Hive for efficient big data management.

Introduction to Data Stream Processing

Covers the fundamentals of data stream processing, including tools like Apache Storm and Kafka, key concepts like event time and window operations, and the challenges of stream processing.

Big Data: Processing and Dimensions

Explores Big Data generation, storage, processing, and dimensions, along with challenges in data analytics, cloud computing elasticity, and security.

Introduction to Spark Runtime Architecture

Covers the Spark runtime architecture, including RDDs, transformations, actions, and caching for performance optimization.

Water Consumption in Geneva

Explores water consumption data in Geneva, including charts on consumption and losses, available datasets, and data processing phases.

Introduction to Data Science

Introduces the basics of data science, covering decision trees, machine learning advancements, and deep reinforcement learning.

Data Wrangling Techniques: HBase and Hive Integration

Covers data wrangling techniques using HBase and Hive, focusing on integration and practical applications.

Machine learning: Physics and Data

Delves into the intersection of physics and data in machine learning models, covering topics like atomic cluster expansion force fields and unsupervised learning.

Introduction to Spark Runtime Architecture

Introduces Apache Spark, covering its architecture, RDDs, transformations, actions, fault tolerance, deployment options, and practical exercises in Jupyter notebooks.

Analytics on Data at Rest and Data in Motion

Explores combining data at rest with data in motion, emphasizing the Lambda architecture complexities and quality assessment of streams and batches.

General Introduction to Big Data

Covers data science tools, Hadoop, Spark, data lake ecosystems, CAP theorem, batch vs. stream processing, HDFS, Hive, Parquet, ORC, and MapReduce architecture.

Data Issues in Research

Explores challenges in data assumptions, biases, and more in research, including incomplete write-ups and frustrations of newcomers.