Explores learning the kernel function in convex optimization, focusing on predicting outputs using a linear classifier and selecting optimal kernel functions through cross-validation.
Covers transformer architecture and subquadratic attention mechanisms, focusing on efficient approximations and their applications in machine learning.