Explores decoding from neural models in modern NLP, covering encoder-decoder models, decoding algorithms, issues with argmax decoding, and the impact of beam size.
Explores the Transformer model, from recurrent models to attention-based NLP, highlighting its key components and significant results in machine translation and document generation.
Explains the full architecture of Transformers and the self-attention mechanism, highlighting the paradigm shift towards using completely pretrained models.
Explores chemical reaction prediction using generative models and molecular transformers, emphasizing the importance of molecular language processing and stereochemistry.
Covers the foundational concepts of deep learning and the Transformer architecture, focusing on neural networks, attention mechanisms, and their applications in sequence modeling tasks.
Explores coreference resolution models, challenges in scoring spans, graph refinement techniques, state-of-the-art results, and the impact of pretrained Transformers.
Explores the mathematics of language models, covering architecture design, pre-training, and fine-tuning, emphasizing the importance of pre-training and fine-tuning for various tasks.