Intermediate

Transformer Architecture & Attention

Understand the transformer architecture that powers modern LLMs. Learn about self-attention, multi-head attention, and the encoder-decoder structure fundamental to NLP.

Estimated Time 30 hours

Introduction

Understand the transformer architecture that powers modern LLMs. Learn about self-attention, multi-head attention, and the encoder-decoder structure fundamental to NLP.

4 Lessons

30h Est. Time

4 Objectives

1 Assessment

By completing this module you will be able to:

✓ Understand self-attention mechanism and its computational benefits

✓ Grasp the complete transformer architecture

✓ Learn about positional encoding and tokenization

✓ Work with transformer implementations in PyTorch

Lessons

Work through each lesson in order. Each one builds on the concepts from the previous lesson.

The Transformer Architecture

60 min

Start Lesson

BERT and Encoder Models

55 min

Start Lesson

GPT and Decoder Models

55 min

Start Lesson

Hugging Face Transformers Library

50 min

Start Lesson

Module Assessment