Provides an overview of policy gradient methods in reinforcement learning, focusing on the log-likelihood trick and the transition from batch to online learning.
Covers transformer architecture and subquadratic attention mechanisms, focusing on efficient approximations and their applications in machine learning.