Covers information measures like entropy, Kullback-Leibler divergence, and data processing inequality, along with probability kernels and mutual information.
Covers Kernel Density Estimation focusing on bandwidth selection, curse of dimensionality, bias-variance tradeoff, and parametric vs nonparametric models.
Explores non-parametric estimation using kernel density estimators to estimate distribution functions and parameters, emphasizing bandwidth selection for optimal accuracy.
Explores graphical model learning with M-estimators, Gaussian process regression, Google PageRank modeling, density estimation, and generalized linear models.
Covers transformer architecture and subquadratic attention mechanisms, focusing on efficient approximations and their applications in machine learning.