CS-422: Database systems

About
Privacy
Disclaimer

Graph Chatbot

Lectures in this course (104)

Views: SQL Queries Simplification

Explores how views simplify query writing and are key in data warehouses.

Resource Management in Spark

Explores resource management, fault tolerance, job recovery, and Spark SQL in Spark.

Data Wrangling: Structuring and Wrangling Issues

Covers data wrangling stages, structuring techniques, and common issues in data preparation.

Optimizing Join Operations: Challenges and Solutions

Explores optimizing join operations in distributed systems, addressing skewness and introducing the 1-Bucket-Theta algorithm.

Data Accuracy: Assessing Faithfulness and Error Detection

Explores data accuracy through faithfulness assessment, error detection, outlier handling, correlations, functional dependencies, violation detection, denial constraints, and data repairing techniques.

Data Stream Processing: Management and Challenges

Explores data stream management, real-time applications, challenges in analysis, and efficient stream management strategies.

Temporality and Entity Resolution

Explores challenges in data temporality and techniques for entity resolution.

Intro to Distributed Frameworks

Covers challenges of handling large data sizes and characteristics of big data.

Spark Streaming: Fault Tolerance and DStreams

Explores fault tolerance and DStreams in Spark Streaming for real-time big data analysis.

Approximate Query Processing: BlinkDB

Introduces BlinkDB, a framework for approximate query processing using sampling techniques.

Distributed Query Processing: Execution Models and Declustering Tradeoffs

Covers analytical query processing, declustering strategies, and distributed operations.

Privacy-preserving Data Management: Operations and Protocols

Explores privacy-preserving data management operations and summarization techniques for sensitive data protection.

MapReduce: Execution Models for Distributed Computing

Introduces the MapReduce programming model for distributed computing, focusing on its vision and under-the-hood mechanisms.

Privacy-Preserving Data Mining

Explores privacy-preserving data mining techniques, including k-anonymity, attacks, and differential privacy.

Dataflow: Execution Models for Distributed Computing

Explores the data flow model for distributed computing using RDDs in Spark.

Large-scale SQL Processing

Introduces data frames as a space-efficient data representation with an extensible SQL-like language.

Data Summarization: Minhashing and Locality-Sensitive Hashing

Explores Jaccard similarity, minhashing, and locality-sensitive hashing for data summarization.

Scheduling: Under the Hood

Explores the complexities of scheduling in distributed computing frameworks, emphasizing data locality optimization and multitenancy strategies.

Transactions and ACID: Overview

Explores transactions, ACID properties, concurrency challenges, and scheduling in database management systems.

Concurrency Control: Lock-Based Protocols

Covers lock-based concurrency control protocols and various transaction models.

Page 4 of 6