AI Video ProductionModule 2

2.2Visual Generation — Imagen, Midjourney & Stock AI

25 min 6 code blocks Practice Lab Quiz (4Q)

Visual Generation

Your voiceover is invisible audio; your visuals are the canvas audiences stare at for 6+ minutes. Weak visuals equal early drop-offs. Strong visuals equal a 5-minute average view duration and a CPM that premium advertisers will pay for. Pakistani creators who master visual sourcing and AI generation earn 40% higher CPM because premium brands — banks, telcos, FMCG — want to advertise on production-quality content. This lesson teaches you to generate, source, and coordinate visuals so every frame reinforces your message.

The Visual Pipeline: From Script to Screen

Understanding where your visuals come from — and in what sequence to acquire them — saves 2+ hours per video. Do not source footage randomly. Follow the pipeline.

code
VISUAL PRODUCTION PIPELINE
════════════════════════════

SCRIPT LOCKED
     │
     ▼
SHOT LIST CREATED (15 min)
│    ├── Every script sentence → 1 visual descriptor
│    ├── Tag each visual: [STOCK] [AI-GEN] [SCREEN] [GRAPHIC]
│    └── Estimate clip duration needed
     │
     ▼
PARALLEL SOURCING (run simultaneously)
│
├── STOCK FOOTAGE SEARCH
│   ├── Pexels / Pixabay → generic scenes
│   ├── Envato Elements / Shutterstock → premium niche clips
│   └── Download + rename: scene_01_pakistan_city.mp4
│
├── AI VISUAL GENERATION
│   ├── Runway AI → 4-sec video clips for unique B-roll
│   ├── Imagen 4.0 → key frame images + thumbnails
│   └── Save to: /visuals/ai-generated/
│
└── SCREEN RECORDINGS
    ├── OBS Studio → UI walkthroughs, data demos
    ├── Canva / After Effects → animated charts
    └── Save to: /visuals/screen/
     │
     ▼
QUALITY CHECK (10 min)
│    ├── All clips HD? (1080p minimum)
│    ├── Color palette consistent?
│    └── Every script sentence has a visual?
     │
     ▼
IMPORT TO CAPCUT
     └── Organized by scene number → drag-drop to timeline

Sourcing in parallel — stock, AI, and screen simultaneously — compresses a 3-hour process to under 90 minutes. The shot list is the master document that makes parallel sourcing possible.

Visual Source Comparison: Cost, Quality, Speed

Every visual type has trade-offs. Knowing which to use when is what separates amateur channels from professional ones.

SourceCostQuality CeilingTime per ClipUniquenessBest Use Case
Pexels (free)PKR 01080p2–3 minLow (millions use same clips)Generic scenes: cities, nature, crowds
Pixabay (free)PKR 01080p2–3 minLowB-roll filler, transitions
Envato ElementsUSD 16.50/mo (PKR 4,600)4K3–5 minMediumNiche professional footage
ShutterstockUSD 29/mo (PKR 8,100)4K3–5 minMediumFinance, business, tech niches
Runway AIUSD 12/mo (PKR 3,350)1080p cinematic30 sec–2 minVery HighUnique B-roll, concept visualization
Imagen 4.0Free (50 images/day via AI Studio)Photorealistic5–15 secHighThumbnails, key frames, portraits
OBS (screen rec)FreeNative resolution10–30 minVery High (your content)Tutorials, product demos, data
Canva ProUSD 10/mo (PKR 2,800)1080p vectors10–20 minHighAnimated charts, graphics, text overlays

Budget recommendation for Pakistani creators:

  • Month 1 (starting out): Pexels + Pixabay + Imagen 4.0 + OBS = PKR 0
  • Month 2 (scaling): Add Runway AI = PKR 3,350/month (pays back with first sponsorship)
  • Month 3+ (professional): Full stack (Runway + Envato + Canva Pro) = PKR 10,750/month total

Even the full professional stack costs less than one hour of a freelance video editor's time in Karachi.

Stock Footage: The Professional Search Method

Free sites (Pexels, Pixabay) have 10 million+ clips but are repetitive — every channel uses the same "person typing on laptop" footage. The key is searching with 5–7 descriptors, not 1–2.

Bad search: "business" Professional search: "Pakistani entrepreneur working laptop focused natural light side angle"

The second search returns footage 100x more relevant and less likely to appear on your competitor's channel.

Search strategy by niche:

code
SEARCH KEYWORD FORMULAS BY NICHE
══════════════════════════════════

Finance / Economy:
"[currency / chart / graph] [animated / close-up / dynamic]
 [professional / cinematic] [warm light / cool tone]"

Tech / AI:
"[code screen / server room / circuit board] [bokeh / close-up]
 [blue neon / dark background] [4K / cinematic]"

Freelancing / Career:
"[person working / focused / laptop] [home office / cafe / window]
 [natural light / side angle] [Pakistani / South Asian aesthetic]"

Health / Lifestyle:
"[person exercising / cooking / walking] [outdoor / bright]
 [diverse / South Asian] [warm golden hour]"

Timing trick: Source footage BEFORE writing scripts. Watch 20 videos in your niche, note which visual patterns get the highest comment engagement. Reverse-engineer those patterns. If finance channels use dramatic slow-motion money close-ups, your audience has been trained to expect that aesthetic — deliver it.

AI Visual Generation: Runway AI Deep Dive

Runway AI (USD 12/month) generates 4-second video clips from text prompts. This is your competitive edge — your competitors use generic stock; you use bespoke cinematic footage.

Runway's strength is concept visualization. If your script covers blockchain, AI, or abstract economic forces, no stock footage library has what you need. Runway creates it.

Runway prompt formula:

code
RUNWAY PROMPT STRUCTURE
════════════════════════

[SUBJECT] + [LOCATION/SETTING] + [CAMERA ANGLE] +
[LIGHTING] + [MOVEMENT] + [MOOD/AESTHETIC]

Example 1 — Pakistan Location B-roll:
"Cinematic aerial footage of Lahore's Badshahi Mosque at sunset,
drone perspective slowly descending, warm golden hour light,
slight lens flare, cinematic color grade, 4K quality"

Example 2 — Abstract Concept:
"Glowing network of nodes and connections spreading across a dark
background, blue and gold particles, smooth camera pullback,
representing AI data flow, hyperrealistic digital art style"

Example 3 — Finance / Economy:
"Extreme close-up of Pakistani rupee notes fanning out on a wooden
desk, shallow depth of field, warm candlelight, slow zoom out,
cinematic grain, moody and dramatic"

Example 4 — Freelancer Lifestyle:
"Young South Asian woman working on laptop in a bright Karachi
apartment, morning light streaming through windows, focused
expression, steaming tea on desk, documentary aesthetic"

Runway cost math: A 6-minute video needs 6–8 unique AI-generated clips. At Runway's standard tier (USD 12/month = ~125 seconds of video), each clip costs approximately USD 0.08 in GPU credits. Total: under USD 1 per video in AI visual costs.

Imagen 4.0: Free Photorealistic Image Generation

Imagen 4.0 (free via Google AI Studio, 50 images/day) generates photorealistic images for thumbnails and key frames. At zero cost, this is the highest-ROI tool in your visual stack.

Imagen prompt formula:

code
IMAGEN PROMPT STRUCTURE
════════════════════════

[SUBJECT DESCRIPTION] + [SETTING DETAIL] + [LIGHTING] +
[COMPOSITION] + [STYLE] + [MOOD]

Example 1 — Thumbnail Portrait:
"Modern Pakistani woman freelancer, smiling confidently at laptop,
bright home office in Lahore, natural window light, shallow depth
of field, contemporary furniture, warm color grading, photorealistic,
Canon 5D quality"

Example 2 — Conceptual Key Frame:
"Futuristic Pakistani city skyline with AI hologram overlays,
Karachi at night, neon blue and gold lights, cinematic wide shot,
architectural photography style, ultra detailed"

Example 3 — Business / Finance:
"Pakistani entrepreneur in a meeting room, reviewing data charts
on large monitor, corporate Islamabad aesthetic, professional
attire, confident posture, soft studio lighting, editorial
photography style"

Generate 10 images per video: 3–4 key frames for critical script points, 1 thumbnail base (you will overlay text in CapCut), and 5–6 scene illustrations for abstract concepts.

The Shot List System

A shot list maps every script sentence to a specific visual. Without it, you waste 45 minutes in CapCut searching for the right clip. With it, assembly takes 20 minutes.

Shot list template:

code
SHOT LIST FORMAT
═════════════════

| # | Script Line (first 5 words) | Visual Type | Description | Duration |
|---|---|---|---|---|
| 01 | "Pakistan's freelance market grew..." | STOCK | Pakistani youth in cafe on laptops, busy, daylight | 4 sec |
| 02 | "Top earners make USD 5,000..." | AI-GEN (Imagen) | Confident freelancer with laptop, PKR notes on desk | 5 sec |
| 03 | "The skill they all share is..." | SCREEN REC | Upwork profile screenshot with $5K+ earnings badge | 6 sec |
| 04 | "Specialization, not generalization..." | GRAPHIC | Canva animated chart: specialist vs generalist income | 5 sec |
| 05 | "Here is exactly how they did it..." | STOCK | Person writing notes, focused, close-up hand shot | 3 sec |

A 6-minute video needs 35–45 shots. Create the shot list in 15 minutes, save 45 minutes in editing. Net gain: 30 minutes per video. Over 100 videos, that is 50 hours returned.

Color Grading for Brand Consistency

Every visual on your channel must share a color palette. This trains your audience to recognize your aesthetic instantly in their YouTube feed — which drives clicks even before they read your title.

Color grading presets guide:

code
COLOR GRADE PRESETS — BY CHANNEL NICHE
════════════════════════════════════════

TECH / AI CHANNELS
├── Primary tone: Cool blue-teal (#0A1628 darks, #4FC3F7 highlights)
├── DaVinci Resolve preset: "Sci-Fi Blue"
├── CapCut equivalent: "Neon" filter at 60% intensity
└── Examples: MKBHD, Linus Tech Tips aesthetic

FINANCE / BUSINESS CHANNELS
├── Primary tone: Warm gold-amber (#1A0F00 darks, #F59E0B highlights)
├── DaVinci Resolve preset: "Corporate Warm"
├── CapCut equivalent: "Warm" filter at 50% intensity + +10 contrast
└── Examples: Graham Stephan, Andrei Jikh aesthetic

LIFESTYLE / VLOG CHANNELS
├── Primary tone: Natural desaturated (#1C1C1C darks, creamy whites)
├── DaVinci Resolve preset: "Film Look"
├── CapCut equivalent: "Film" filter at 70% intensity
└── Examples: Peter McKinnon, VSCO-style

EDUCATION / HOW-TO CHANNELS
├── Primary tone: Clean neutral (high contrast, no color cast)
├── CapCut: -10 saturation, +10 contrast, +5 sharpness
└── Goal: Content is center stage, not aesthetics

Quick color grade in CapCut:

  1. Select all clips on timeline (Ctrl+A)
  2. Open Adjust panel
  3. Apply: Contrast +8, Saturation -8, Sharpness +5
  4. Add a Color Filter matching your niche aesthetic at 40–60% opacity
  5. Lock these settings as a preset — apply to every future video in one click

This 3-minute process unifies stock footage from 5 different sources into one cohesive visual identity.

Practice Lab

Practice Lab

Task 1: Shot List Creation Take a 300–400 word script (2–3 minute video) and create a complete shot list. For each of the 15–20 shots, specify: script line (first five words), visual type (Stock/AI-Gen/Screen/Graphic), specific footage description with lighting and angle, and duration in seconds. The goal is zero ambiguity — when you sit down to source visuals, you should never need to think about what to find.

Task 2: Visual Sourcing Sprint Using your shot list, collect all visuals with a timer running. Target: under 90 minutes total. Collect at minimum 10 stock clips from Pexels using the 5–7 keyword search formula, generate 3 AI images via Imagen 4.0 using the prompt structure provided, record 60 seconds of screen footage using OBS, and organize everything into folders named by shot number. Post your folder structure screenshot in the course community.

Task 3: Color Grade Experiment Take any 5 clips from different sources (different lighting, different color casts). Apply the same CapCut color grade to all five. Export a 30-second test clip. Compare the before and after. If the clips look unified and professional, your color grade is calibrated correctly. If they still look mismatched, adjust contrast and saturation until cohesion is achieved. This exercise builds the visual intuition that separates 1K-subscriber channels from 100K-subscriber channels.

Pakistan Case Study: "Tech Reviews Karachi"

Ahmed Raza, a 24-year-old computer science dropout from Karachi, launched "Tech Reviews Karachi" in early 2025 with a PKR 15,000 setup (phone + free editing tools). His first 20 videos used generic Pexels stock footage and averaged 2,000 views each, resulting in a 2,000-subscriber count after three months.

His problem was visible: his visuals were indistinguishable from the 500 other tech channels his audience already watched. The content was solid, but the aesthetic was forgettable.

In month 4, Ahmed subscribed to Runway AI (USD 12/month = PKR 3,350) and built a new visual workflow. Every transition featured custom Runway footage — Karachi cityscapes at night with neon overlays, product unboxings rendered in slow-motion AI cinematography, abstract tech concepts visualized as glowing particle networks. His color grade shifted to cool blue-teal, consistent across every frame.

Results within 30 days of the change:

  • Average views per video: 2,000 → 18,000
  • Subscribers: 2,000 → 12,000 (10,000 new subscribers in one month)
  • CPM: USD 3 → USD 6 (premium tech brands — Samsung Pakistan, Daraz tech category — began placing ads)
  • Monthly revenue: PKR 90,000 → PKR 180,000 from YouTube ads alone

The cost increase: PKR 3,350/month for Runway. The revenue increase: PKR 90,000/month. ROI: 2,686%.

By month 6 his channel hit 67,000 subscribers with a click-through rate of 8.2% (industry average: 3%). His cool blue aesthetic was so recognizable that viewers reported clicking his videos reflexively in their feed.

"Pehle mera channel dekh ke koi nahi ruk-ta tha — ab log comments mein puchte hain ke yeh background footage kahan se milta hai. Woh sab Runway se hai — PKR 3,350 mein."

His next move: A "Video Editing for Tech Channels" course (PKR 5,000) including his complete Runway prompt library and CapCut color grade preset. Projected first month: 30 sales = PKR 150,000 additional revenue.

Key Takeaways

  • The three-part visual diet (60% stock, 25% AI-generated, 15% screen recording) balances cost, uniqueness, and credibility — deviate from this ratio only when your niche demands it
  • Stock footage searches with 5–7 descriptors return results 100x more relevant than 1–2 word searches, and dramatically reduce the chance of appearing identical to competitor channels
  • Runway AI at USD 12/month (PKR 3,350) is the highest-ROI paid tool in the visual stack — one sponsorship from a brand that values premium aesthetics covers 6+ months of the subscription
  • Imagen 4.0 is free at 50 images/day — there is no justification for paying a graphic designer for thumbnails or key frames when Imagen produces photorealistic output in 5 seconds
  • The shot list is the single biggest time-saver in video production — 15 minutes to create saves 45 minutes in editing, compounding to 50 hours saved over 100 videos
  • Color grading all clips with the same CapCut preset takes 3 minutes and trains your audience's brain to recognize your aesthetic — this directly increases click-through rate over time
  • DaVinci Resolve (free) and CapCut's Adjust panel are sufficient for professional color grading — you do not need After Effects or FilmConvert to achieve a premium aesthetic
  • Source visuals in parallel (stock + AI + screen simultaneously), not sequentially — this compresses a 3-hour process to under 90 minutes
  • Ahmed Raza's case proves that visual quality is the primary lever for CPM growth — doubling production quality can double CPM and add PKR 90,000/month in revenue
  • The full professional visual stack (Runway + Envato + Canva Pro) costs PKR 10,750/month — less than one hour of a professional video editor's time, and it replaces the need for one entirely

Lesson Summary

Includes hands-on practice lab6 runnable code examples4-question knowledge check below

Visual Generation Quiz

4 questions to test your understanding. Score 60% or higher to pass.