CS-456: Deep reinforcement learning

About
Privacy
Disclaimer

Graph Chatbot

Lectures in this course (96)

Eligibility Traces

Introduces eligibility traces, explaining how they enable rapid information flow in reinforcement learning.

N-step TD Methods: SARSA and Expected SARSA

Explores n-step TD methods in reinforcement learning, improving information flow and estimation accuracy.

Modeling the input space

Explores modeling continuous input spaces in reinforcement learning using neural networks and radial basis functions.

Comparison n-step SARSA and eligibility traces

Presents a quiz comparing the n-step SARSA algorithm with SARSA using eligibility traces.

Artificial Neural Networks and Deep Learning: Loss landscape and optimization methods

Explores error function landscape, optimization methods, and deep neural networks for classification.

Loss Landscape: Minima and Saddle Points

Discusses minima in error functions, multiple minima, saddle points, weight space symmetry, and near-equivalent good solutions in deep neural networks.

Why are there so many saddle points?: Loss landscape and optimization methods

Explores the reasons behind the abundance of saddle points in deep learning optimization, emphasizing statistical and geometric arguments.

Gradient Descent with Momentum

Explores the use of momentum in gradient descent to enhance speed and stability.

Optimization Methods: RMSprop and ADAM

Explores RMSprop and ADAM optimization methods in Artificial Neural Networks, focusing on error functions, momentum, and signal-to-noise ratio.

No Free Lunch Theorem: Deep Learning

Explores the No Free Lunch Theorem and how deep networks match real-world problem structures.

Deep Networks versus Shallow Networks: Artificial Neural Networks and Deep Learning

Compares deep networks with shallow networks in artificial neural networks and deep learning, exploring reasons for their performance differences.

First steps toward deep reinforcement learning

Explores the shift to deep reinforcement learning through neural networks for direct policy learning, bypassing Q-values and V-values.

Policy Gradient Methods: Binary Actor Example

Introduces policy gradient methods using a simple example of a single neuron with binary output.

Policy Gradient Methods: Single Neuron Example

Covers policy gradient methods using a single neuron with binary output.

Policy Gradient Evaluation: Example (1-step horizon)

Explores policy gradient evaluation with a 1-step horizon, update rules, comparisons with Perceptron and biology, and generalization techniques.

Quiz: policy gradient methods

Presents a quiz discussing claims related to reinforcement learning algorithms.

Log-likelihood Trick: From Batch to Online

Covers the log-likelihood trick for transitioning from batch to online learning.

Policy Gradient Methods: Multiple Time Steps

Explores Policy Gradient methods over multiple time steps, focusing on updating policy parameters to maximize rewards.

Subtracting the mean reward via the value function

Covers the significance of subtracting the mean reward in policy gradient methods for deep reinforcement learning, reducing noise in the stochastic gradient.

Policy Gradient Algorithms and V Values

Explores the use of V values in policy gradient algorithms for quicker convergence.

Page 4 of 5