Lecture

Introduction to Spark runtime architecture

Related lectures (32)

Covers best practices and guidelines for big data, including data lakes, architecture, challenges, and technologies like Hadoop and Hive.

Advanced Spark Optimization Techniques: Managing Big Data

Discusses advanced Spark optimization techniques for managing big data efficiently, focusing on parallelization, shuffle operations, and memory management.

Data Wrangling with Hive: Managing Big Data Efficiently

Covers data wrangling techniques using Apache Hive for efficient big data management.

General Introduction to Big Data

Covers data science tools, Hadoop, Spark, data lake ecosystems, CAP theorem, batch vs. stream processing, HDFS, Hive, Parquet, ORC, and MapReduce architecture.

Big Data Challenges: Distributed Computing with Spark

Explores big data challenges, distributed computing with Spark, RDDs, hardware requirements, MapReduce, transformations, and Spark DataFrames.

Introduction to Spark Runtime Architecture

Introduces Apache Spark, covering its architecture, RDDs, transformations, actions, fault tolerance, deployment options, and practical exercises in Jupyter notebooks.

Introduction to Spark Runtime Architecture

Covers the Spark runtime architecture, including RDDs, transformations, actions, and caching for performance optimization.

Big Data Challenges: Scaling to Massive Data

Explores challenges of handling massive data in the era of big data, discussing solutions like MapReduce and Spark.

Big Data: Best Practices and Guidelines

Covers best practices and guidelines for big data, including data lakes, typical architecture, challenges, and technologies used to address them.

Scaling up: Spark and Big Data

Explores the challenges of big data processing and introduces Spark as a solution.

Data Wrangling with Hadoop

Covers data wrangling techniques using Hadoop, focusing on row versus column-oriented databases, popular storage formats, and HBase-Hive integration.

Water Consumption in Geneva

Explores water consumption data in Geneva, including charts on consumption and losses, available datasets, and data processing phases.

Integrating Scalable Data Storage and Map Reduce Processing with Hadoop

Covers the integration of scalable data storage and map reduce processing using Hadoop, including HDFS, Hive, Parquet, ORC, Spark, and HBase.

Data Wrangling Techniques: HBase and Hive Integration

Covers data wrangling techniques using HBase and Hive, focusing on integration and practical applications.

Introduction to Data Stream Processing: Concepts and Applications

Covers the principles of data stream processing and its applications in real-time data analysis.

Cache Memory

Explores cache memory design, hits, misses, and eviction policies in computer systems, emphasizing spatial and temporal locality.

Data Issues in Research

Explores challenges in data assumptions, biases, and more in research, including incomplete write-ups and frustrations of newcomers.

Big Data Ecosystems: Technologies and Challenges

Covers the fundamentals of big data ecosystems, focusing on technologies, challenges, and practical exercises with Hadoop's HDFS.

Introduction to Data Stream Processing

Covers the fundamentals of data stream processing, including tools like Apache Storm and Kafka, key concepts like event time and window operations, and the challenges of stream processing.

Apache Spark Ecosystem: Basics and Operations

Provides an overview of the Apache Spark ecosystem, covering basics, operations, and key components.