Lecture

Data Science Essentials: Python, Numpy, Pandas, and Scikit-learn

Related lectures (32)

Covers data manipulation and exploration using Python with a focus on visualization techniques.

Offers a comprehensive introduction to Data Science, covering Python, Numpy, Pandas, Matplotlib, and Scikit-learn, with a focus on practical exercises and collaborative work.

Introduction to Data Science

Introduces the basics of data science, covering decision trees, machine learning advancements, and deep reinforcement learning.

Decision Tree Classification

Covers decision tree classification using KNIME Analytics Platform for data preprocessing and model creation.

Critical Data Studies: Reproducibility and Renku

Explores the significance of reproducibility in data science and introduces Renku, a platform for managing data-driven projects.

General Introduction to Big Data

Covers data science tools, Hadoop, Spark, data lake ecosystems, CAP theorem, batch vs. stream processing, HDFS, Hive, Parquet, ORC, and MapReduce architecture.

Data Wrangling with Hadoop

Covers data wrangling techniques using Hadoop, focusing on row versus column-oriented databases, popular storage formats, and HBase-Hive integration.

Data Wrangling with Hadoop: Storage Formats and Hive

Explores data wrangling with Hadoop, emphasizing storage formats and Hive for big data processing.

Elements of Collaborative Data Science

Introduces collaborative data science tools like Jupyter notebooks, Docker, and Git, emphasizing data versioning and containerization.

Renku: Collaborative Data Science

Introduces Renku, a platform for collaborative data science enabling reproducibility and promoting code and data reusability.

Collaborative Data Science

Covers collaborative data science tools, big data concepts, Spark, and data stream processing, with tips for the final project.

Analytics on Data at Rest and Data in Motion

Explores combining data at rest with data in motion, emphasizing the Lambda architecture complexities and quality assessment of streams and batches.

Gitlab Agent for Kubernetes (`agentk`)

Covers the setup of a Gitlab agent for Kubernetes, focusing on installation, version control, and troubleshooting.

Python Lists: Manipulation and Comprehension

Covers Python list manipulation and comprehension, emphasizing memory representation and mutability.

Logistic Regression: Fundamentals and Applications

Explores logistic regression fundamentals, including cost functions, regularization, and classification boundaries, with practical examples using scikit-learn.

Collaborative Data Science: Tools and Techniques

Introduces collaborative data science tools like Git and Docker, emphasizing teamwork and practical exercises for effective learning.

Data Wrangling with Hive: Managing Big Data Efficiently

Covers data wrangling techniques using Apache Hive for efficient big data management.

Decision Trees: Classification

Explores decision trees for classification, entropy, information gain, one-hot encoding, hyperparameter optimization, and random forests.

Structures and Mechanisms: Opening a Box

Explores the analysis of structures and mechanisms through a sample problem of opening a box with a string-held lid.

Advanced Pandas Functions

Focuses on advanced pandas functions for data manipulation, exploration, and visualization with Python, emphasizing the importance of understanding and preparing data.