CS-456: Deep reinforcement learning

About
Privacy
Disclaimer

Graph Chatbot

Lectures in this course (96)

Eligibility Traces in Policy Gradient Algorithms

Delves into the emergence of eligibility traces in policy gradient algorithms, making learning rapid and efficient.

Eligibility Traces for Policy Gradient and Actor-Critic

Explores eligibility traces in policy gradient and actor-critic architectures, leading to an elegant online learning rule.

BackProp Algorithm: Pseudocode and Processing Steps

Covers the BackProp algorithm, including initialization, signal propagation, error computation, weight updating, and complexity comparison with numerical differentiation.

Neural Networks for Action Learning: Three-Factor Rules and Dopamine

Explores neural networks learning by reward, actor-critic structures, synaptic plasticity, and the role of dopamine in synaptic changes.

Three-factor rules: DeepRL1.5A

Explains three-factor rules in policy gradient algorithms and their implementation in biological and hardware systems.

Neural Networks for Action Learning: Brain Implementation Insights

Explores neural networks for action learning and the brain's reinforcement learning implementation.

The Problem of Overfitting

Discusses the problem of overfitting in deep networks and the importance of controlling flexibility to avoid it.

Learning to Find a Goal

Delves into a biologically inspired version of Reinforcement Learning, focusing on maze navigation and the implementation of spiking neurons.

Regularization Methods: Training and Validation Base

Explores regularization methods in neural networks, emphasizing the importance of training and validation bases to prevent overfitting.

Model-based versus Model-free Reinforcement Learning

Compares model-based and model-free reinforcement learning, highlighting the advantages of the former in adapting to reward changes and planning future actions.

Careful Cross-validation

Emphasizes the significance of careful cross-validation in deep neural networks, including the split of data and the concept of K-fold cross-validation.

Regularization by Early Stopping

Explores regularization by early stopping in deep neural networks to control flexibility and avoid overfitting.

Reinforcement Learning: Reward-based Learning

Explores artificial neural networks, reward information in the brain, animal conditioning, deep reinforcement learning, and a quiz on rewards.

Elements of Reinforcement Learning

Introduces the fundamental elements of Reinforcement Learning and demonstrates their application with the Acrobot system.

Elements of Reinforcement Learning: Quiz on States

Features a quiz on the number of discrete states in backgammon, highlighting the immense complexity of reinforcement learning applications.

Reinforcement Learning: One-step Horizon (Bandit Problems)

Covers Bandit Problems in Reinforcement Learning, focusing on one-step horizon games and Q-values.

Reinforcement Learning: Bandit Problems

Covers the convergence in expectation for the Q value in reinforcement learning.

Exploration vs. Exploitation: Softmax Policy Quiz

Presents a quiz on the exploration vs. exploitation dilemma using the softmax policy.

Bellman Equation: Value Consistency and Optimal Actions

Covers the Bellman equation, Q-values, discount factor, and optimal actions.

Relation of SARSA and Bellman equation

Explores the relation between fluctuating Q-values in SARSA and the Bellman equation through expectations and policy constancy.

Page 2 of 5