Intermediate

Transformer Architecture & Attention

Understand the transformer architecture that powers modern LLMs. Learn about self-attention, multi-head attention, and the encoder-decoder structure fundamental to NLP.

Estimated Time 30 hours

Introduction

Understand the transformer architecture that powers modern LLMs. Learn about self-attention, multi-head attention, and the encoder-decoder structure fundamental to NLP.

4 Lessons
30h Est. Time
4 Objectives
1 Assessment

By completing this module you will be able to:

Understand self-attention mechanism and its computational benefits
Grasp the complete transformer architecture
Learn about positional encoding and tokenization
Work with transformer implementations in PyTorch

Lessons

Work through each lesson in order. Each one builds on the concepts from the previous lesson.

1

The Transformer Architecture

60 min

Start Lesson
2

BERT and Encoder Models

55 min

Start Lesson
3

GPT and Decoder Models

55 min

Start Lesson
4

Hugging Face Transformers Library

50 min

Start Lesson

Recommended Reading

Supplement your learning with these selected chapters from the course library.

📖

Transformers for NLP and Computer Vision 3e

Chapters 1-6

📖

Mastering NLP from Foundations to LLMs

Chapters 5-8

Module Assessment

Transformer Architecture & Attention

Question 1 of 3

In self-attention, what are the Query, Key, and Value matrices used for?