Lecture

Transformers: Overview and Self-Attention

Related lectures (31)

Explores pretraining models like BERT, T5, and GPT, discussing their training objectives and applications in natural language processing.

Deep Learning for NLP

Introduces deep learning concepts for NLP, covering word embeddings, RNNs, and Transformers, emphasizing self-attention and multi-headed attention.

Transformers: Pretraining and Decoding Techniques

Covers advanced transformer concepts, focusing on pretraining and decoding techniques in NLP.

Natural Language Processing: Understanding Transformers and Tokenization

Provides an overview of Natural Language Processing, focusing on transformers, tokenization, and self-attention mechanisms for effective language analysis and synthesis.

Deep Learning for NLP

Explores deep learning for NLP, covering word embeddings, context representations, learning techniques, and challenges like vanishing gradients and ethical considerations.

Transformer: Pre-Training

Explores the Transformer model, from recurrent models to attention-based NLP, highlighting its key components and significant results in machine translation and document generation.

Pre-Training: BiLSTM and Transformer

Delves into pre-training BiLSTM and Transformer models for NLP tasks, showcasing their effectiveness and applications.

Pretraining Sequence-to-Sequence Models: BART and T5

Covers the pretraining of sequence-to-sequence models, focusing on BART and T5 architectures.

Contextual Representations: ELMO and BERT Overview

Covers contextual representations in NLP, focusing on ELMO and BERT architectures and their applications in various tasks.

Language Models: From Theory to Computation

Explores the mathematics of language models, covering architecture design, pre-training, and fine-tuning, emphasizing the importance of pre-training and fine-tuning for various tasks.

Sequence to Sequence Models: Overview and Applications

Covers sequence to sequence models, their architecture, applications, and the role of attention mechanisms in improving performance.

Graph-to-Graph Transformers: Syntax-aware Graph Encoding

Introduces the Syntax-aware Graph-to-Graph Transformer architecture for effective conditioning on syntactic dependency graphs.

BERT: Pretraining and Applications

Delves into BERT pretraining for transformers, discussing its applications in NLP tasks.

Deep Learning for NLP

Delves into Deep Learning for Natural Language Processing, exploring Neural Word Embeddings, Recurrent Neural Networks, and Attentive Neural Modeling with Transformers.

Transformer Architecture: The X Gomega

Delves into the Transformer architecture, self-attention, and training strategies for machine translation and image recognition.

Transformers: Revolutionizing Attention Mechanisms in NLP

Covers the development of transformers and their impact on attention mechanisms in NLP.

Modern NLP and Ethics in NLP

Delves into advancements and challenges in NLP, along with ethical considerations and potential harms.

Generative Models: Self-Attention and Transformers

Covers generative models with a focus on self-attention and transformers, discussing sampling methods and empirical means.

Chemical Reaction Prediction: Molecular Transformer

Explores chemical reaction prediction using generative models and molecular transformers, emphasizing the importance of molecular language processing and stereochemistry.

Transformers: Full Architecture and Self-Attention Mechanism

Explains the full architecture of Transformers and the self-attention mechanism, highlighting the paradigm shift towards using completely pretrained models.