Generative AI Course

Week 2: Multimodal Generation

Build image and vision-powered workflows with diffusion models, adapters, and multimodal prompting.

Duration: 5 Sessions

Labs: 4

Project: Brand Asset Generator

Week Plan

Day 1Diffusion Fundamentals
Day 2Prompting for Images
Day 3ControlNet and LoRA
Day 4Vision-Language Workflows
Day 5Studio Build Day

DAY 1

Diffusion Fundamentals

90 mins lecture30 mins concept map

Topics

How noise schedules and denoising loops work
Latent diffusion versus pixel diffusion
Sampling strategies and tradeoffs

Diffusion Pipeline

Text Prompt

Tokenizer

UNet Steps

Image

DAY 2

Prompting for Images

70 mins lecture50 mins guided practice

Prompt Design

Subject, style, camera, lighting, and quality tokens
Negative prompts for cleaner outputs
Prompt libraries and reusable style profiles

prompt

portrait of a startup founder, cinematic light, 85mm lens,
soft shadows, neutral background, photorealistic, ultra detailed
negative: blurry, watermark, extra fingers, low quality

DAY 3

ControlNet and LoRA Adapters

80 mins lecture40 mins coding

Adapter Techniques

Pose, depth, edge, and segmentation controls
LoRA for style transfer with low compute cost
Reusable checkpoints and versioned experiment tracking

DAY 4

Vision-Language Workflows

70 mins lecture50 mins lab

Practical Patterns

Image captioning and OCR + summarization loops
Visual Q&A over product screenshots and documents
Safety checks for generated visuals

DAY 5

Studio Build Day

150 mins implementation

Hands-On Labs

Lab 6: Generate campaign visuals for three brand styles
Lab 7: Build a prompt-to-poster mini app
Lab 8: Add pose control with ControlNet
Lab 9: Build a multimodal image critique assistant

Week 2 Outcomes

Generate consistent visual outputs with reliable prompting
Apply adapter-based customization for brand-safe assets
Create a multimodal mini-product ready for demos

GUIDED PATH

Beginner Walkthrough: From Prompt to Visual Product

Step-by-stepPortfolio focused

Simple explanation of this week

This week teaches you how to generate and control images using text instructions. Instead of just typing random prompts, you will learn a repeatable method: write a clear scene prompt, add quality/style details, include negative prompts, then tune settings until outputs are usable.

Daily workflow (2 to 3 hours/day)

Day 1: Generate 20 images from the same prompt using different steps and samplers. Record differences.
Day 2: Build a reusable prompt template with sections: subject, setting, style, camera, quality, negative.
Day 3: Apply one control method (pose, edge, or depth) and compare controlled versus uncontrolled output.
Day 4: Build a basic vision-language flow: image input, caption extraction, then rewrite caption into marketing copy.
Day 5: Combine everything into one mini studio where user enters brand style and gets 3 campaign images.

Common mistakes and fixes

Mistake: Vague prompts. Fix: Use concrete nouns, style, lens, and lighting details.
Mistake: Distorted hands/faces. Fix: Add negative prompts and increase guidance strength carefully.
Mistake: Inconsistent brand style. Fix: Save and reuse template blocks for all outputs.
Mistake: Overprocessing. Fix: Keep one change per experiment so results are traceable.

Assignment to complete Week 2

Create a Brand Asset Generator that takes three inputs: brand personality, product type, and campaign tone. Output at least 6 final images in two style groups. Include your prompt templates and explain how you improved output quality.

Submission checklist

Prompt template file with reusable blocks
Before/after examples showing quality improvements
At least one controlled generation method (ControlNet or equivalent)
Short notes on ethical concerns and content safety filters used
Demo output set grouped by style

Pass rubric

Control: You can intentionally change composition/style on demand
Consistency: At least 80% outputs match requested brand style
Process: You document experiments and decisions clearly
Usability: Final assets are good enough for a sample campaign

← Previous Week Next Week: Fine-Tuning + RAG →