Intermediate

Multimodal Prompting

Learn to work with vision-language models that process images and text. Understand image prompting techniques and multimodal analysis.

Estimated Time 18 hours

Introduction

Learn to work with vision-language models that process images and text. Understand image prompting techniques and multimodal analysis.

4 Lessons
18h Est. Time
4 Objectives
1 Assessment

By completing this module you will be able to:

Write effective prompts for image analysis
Use vision-language models for diverse tasks
Combine text and image inputs effectively
Optimize prompts for vision models

Lessons

Work through each lesson in order. Each one builds on the concepts from the previous lesson.

1

Vision Prompting — Working with Images

45 min

Start Lesson
2

Audio, Video, and Document Prompting

45 min

Start Lesson
3

Structured Output from Multimodal Inputs

50 min

Start Lesson
4

Building Multimodal Applications

50 min

Start Lesson

Recommended Reading

Supplement your learning with these selected chapters from the course library.

📖

Visualizing Generative AI

Chapters 1-6

Module Assessment

Multimodal Prompting

Question 1 of 3

How do vision-language models process images compared to text?