Lecture

Data Stream Processing: Apache Kafka and Spark

Related lectures (32)

Introduction to Data Stream Processing: Concepts and Applications

Covers the principles of data stream processing and its applications in real-time data analysis.

Covers the fundamentals of data stream processing, including tools like Apache Storm and Kafka, key concepts like event time and window operations, and the challenges of stream processing.

Analytics on Data at Rest and Data in Motion

Explores combining data at rest with data in motion, emphasizing the Lambda architecture complexities and quality assessment of streams and batches.

Introduction to Data Stream Processing: Concepts and Applications

Covers data stream processing concepts, focusing on Apache Kafka and Spark Streaming integration, event time management, and project implementation guidelines.

Advanced Spark Optimization Techniques: Managing Big Data

Discusses advanced Spark optimization techniques for managing big data efficiently, focusing on parallelization, shuffle operations, and memory management.

Data Wrangling with Hadoop: Storage Formats and Hive

Explores data wrangling with Hadoop, emphasizing storage formats and Hive for big data processing.

Introduction to Spark Runtime Architecture

Covers the Spark runtime architecture, including RDDs, transformations, actions, and caching for performance optimization.

Advanced Data Stream Processing Concepts

Explores event time vs. processing time, stream processing operations, stream-stream joins, and handling late/out-of-order data in data stream processing.

Data Wrangling with Hive: Managing Big Data Efficiently

Covers data wrangling techniques using Apache Hive for efficient big data management.

Big Data Best Practices and Guidelines

Covers best practices and guidelines for big data, including data lakes, architecture, challenges, and technologies like Hadoop and Hive.

General Introduction to Big Data

Covers data science tools, Hadoop, Spark, data lake ecosystems, CAP theorem, batch vs. stream processing, HDFS, Hive, Parquet, ORC, and MapReduce architecture.

Data Wrangling Techniques: HBase and Hive Integration

Covers data wrangling techniques using HBase and Hive, focusing on integration and practical applications.

General Introduction to Data Science

Offers a comprehensive introduction to Data Science, covering Python, Numpy, Pandas, Matplotlib, and Scikit-learn, with a focus on practical exercises and collaborative work.

Data Wrangling with Hadoop

Covers data wrangling techniques using Hadoop, focusing on row versus column-oriented databases, popular storage formats, and HBase-Hive integration.

Elements of Collaborative Data Science

Introduces collaborative data science tools like Jupyter notebooks, Docker, and Git, emphasizing data versioning and containerization.

Data Modeling: Concepts and Applications

Explores data modeling concepts, SQL implementations, and practical applications in handling missing data.

Collaborative Data Science: Tools and Techniques

Introduces collaborative data science tools like Git and Docker, emphasizing teamwork and practical exercises for effective learning.

Handling Data: Data Models and Wrangling

Explores data handling fundamentals, including models, sources, and wrangling, emphasizing the importance of understanding and addressing data problems.

Data Science: Python for Engineers - Part II

Explores data wrangling, numerical data handling, and scientific visualization using Python for engineers.

Data Wrangling and Analysis

Covers a homework assignment on data wrangling and analysis using Python's pandas library for real-world datasets.