Lecture

Pretraining Sequence-to-Sequence Models: BART + T5

Related lectures (30)

Explores pretraining models like BERT, T5, and GPT, discussing their training objectives and applications in natural language processing.

Transformers: Pretraining and Decoding Techniques

Covers advanced transformer concepts, focusing on pretraining and decoding techniques in NLP.

Neural Networks Optimization

Explores neural networks optimization, including backpropagation, batch normalization, weight initialization, and hyperparameter search strategies.

Pretraining Sequence-to-Sequence Models: BART and T5

Covers the pretraining of sequence-to-sequence models, focusing on BART and T5 architectures.

Decoding from Neural Models

Explores decoding from neural models in modern NLP, covering encoder-decoder models, decoding algorithms, issues with argmax decoding, and the impact of beam size.

Transformers: Pre-Training

Discusses challenges and advancements in Transformers, pretraining models, and subword tokenization in NLP.

Contextual Representations: ELMO and BERT Overview

Covers contextual representations in NLP, focusing on ELMO and BERT architectures and their applications in various tasks.

Transformer: Pre-Training

Explores the Transformer model, from recurrent models to attention-based NLP, highlighting its key components and significant results in machine translation and document generation.

BERT: Pretraining and Applications

Delves into BERT pretraining for transformers, discussing its applications in NLP tasks.

Transformers: Full Architecture and Self-Attention Mechanism

Explains the full architecture of Transformers and the self-attention mechanism, highlighting the paradigm shift towards using completely pretrained models.

Chemical Reaction Prediction: Molecular Transformer

Explores chemical reaction prediction using generative models and molecular transformers, emphasizing the importance of molecular language processing and stereochemistry.

Data Annotation: Collection and Biases in NLP

Addresses data collection, annotation processes, and biases in natural language processing.

Foundations of Deep Learning: Transformer Architecture Overview

Covers the foundational concepts of deep learning and the Transformer architecture, focusing on neural networks, attention mechanisms, and their applications in sequence modeling tasks.

Transformers: Overview and Self-Attention

Provides an overview of Transformers, self-attention, multi-headed attention, and the Transformer decoder and encoder.

Pre-Training: BiLSTM and Transformer

Delves into pre-training BiLSTM and Transformer models for NLP tasks, showcasing their effectiveness and applications.

Generative Models: Self-Attention and Transformers

Covers generative models with a focus on self-attention and transformers, discussing sampling methods and empirical means.

Contextual Representations: ELMo & BERT

Explores the development of contextualized embeddings in NLP, focusing on ELMo and BERT's advancements and impact on NLP tasks.

Transformer Architecture: The X Gomega

Delves into the Transformer architecture, self-attention, and training strategies for machine translation and image recognition.

Coreference Resolution: Models and Evaluation

Explores coreference resolution models, challenges in scoring spans, graph refinement techniques, state-of-the-art results, and the impact of pretrained Transformers.

Language Models: From Theory to Computation

Explores the mathematics of language models, covering architecture design, pre-training, and fine-tuning, emphasizing the importance of pre-training and fine-tuning for various tasks.