2.2 — Visual Generation — Imagen, Midjourney & Stock AI
Visual Generation
Your voiceover is invisible audio; your visuals are the canvas audiences stare at for 6+ minutes. Weak visuals = early drop-offs. Strong visuals = 5-minute average view duration. This lesson teaches you to generate, source, and coordinate visuals so every frame reinforces your message. Pakistani creators who master this earn 40% higher CPM because premium brands want to advertise on production-quality content.
Three Visual Sources: Stock, AI, Screen
Most successful faceless channels use a three-part visual diet: (1) Stock footage (60%): Free sites (Pexels, Pixabay) or paid (Shutterstock USD 30/month). Speed: Find in 2-3 minutes per video. (2) AI-generated visuals (25%): Runway AI, Imagen 4.0, or Midjourney. Adds uniqueness. Time: 5-10 minutes per image. (3) Screen recordings (15%): Tutorial videos, product demos, graphs animating real data. Time: 10-15 minutes to record and edit.
The psychology: Stock footage builds recognition (your audience sees familiar scenes and relaxes); AI visuals create wow moments (unique imagery that competitors don't have); screen recordings add credibility (real data, not hypothetical).
Sourcing Stock Footage: The Professional Way
Free sites (Pexels, Pixabay) have 10 million+ clips but are repetitive—everyone uses the same "person typing on laptop" footage. Paid sites (Shutterstock, iStock, Envato Elements USD 15-50/month) have 100M+ clips and exclusive footage. For Pakistani creators on tight budgets, the hybrid approach: Free for generic scenes (nature, cities, transitions), paid for niche footage (specific professions, tech, finance).
Pro search technique: Use 5-7 keywords per search. Bad: "business." Good: "Pakistani entrepreneur working on laptop, focused, natural light, side angle." The second returns 100x more relevant results.
Timing trick: Source footage BEFORE writing scripts. Watch 20 relevant videos in your niche, note which visual patterns get highest engagement (comments about cinematography). Reverse-engineer those patterns. Example: If crypto channels use dramatic blue lighting, you know your audience expects cinematic visuals—stock footage won't cut it; use Runway AI instead.
AI Visual Generation: Runway & Imagen
Runway AI (USD 12/month) generates 4-second video clips from text prompts. Prompt: "Cinematic footage of Lahore's Badshahi Mosque at sunset, drone perspective, warm golden light, cinematic camera movement." Output: Professional 4-second clip in 30 seconds. This is your edge—competitors use generic stock; you use bespoke cinematic footage.
Runway's sweet spot: Opening sequences, transitions, B-roll for data/concepts that have no stock equivalent ("How blockchain solves this problem" = no stock footage exists, but Runway can visualize it).
Imagen 4.0 (free via Google AI Studio, limited to 50 images/day) generates high-resolution images from detailed prompts. Prompt: "Modern Pakistani woman freelancer in bright home office, smiling at laptop, natural light, Lahore apartment aesthetic, contemporary furniture, warm color grading." Output: Photorealistic image in 5 seconds, perfect for thumbnails and key frames.
Pricing for visuals: Runway (1 video/minute of final video = 6 clips per 6-minute video) costs USD 1-2 total per video in GPU credits. Imagen (10 images per video for key frames + thumbnail) costs USD 0. Combined visual generation: USD 2-5 per video—less than hiring a freelancer to find stock footage.
Screen Recording & Animation
For tutorial content, product reviews, or data visualization, screen recordings are mandatory. Tools: OBS Studio (free, Windows/Mac), ScreenFlow (USD 99, Mac only), or Camtasia (USD 180, professional). Most Pakistani creators use OBS—it's free and powerful.
Record your screen at 1440p (YouTube's default) with 60 FPS for smooth motion. Add a cursor highlighter (PointerFocus, built into OBS) so viewers follow your clicks. Pause for 0.5 seconds at key moments—this gives your voiceover room to land the point.
Animation trick for data: Use Canva (free) or After Effects (USD 55/month) to animate charts and graphs. Example: A bar chart should grow from left to right, not appear fully formed. Growth animation = 3x longer viewer attention than static charts. Canva Pro (USD 120/year) has pre-built animated chart templates—use them.
The Shot List System
Before filming, create a "shot list"—a document mapping each script sentence to a specific visual. Example:
Script: "Pakistan's freelance market grew 45% in 2025." Visuals: [STAT GRAPHIC: "45% Growth" animated bar chart], [3-second stock footage: Pakistani youth working on laptops in cafes]
Script: "Top earners make USD 5,000/month by specializing in one skill." Visuals: [Portrait image of freelancer], [Screen recording: Upwork profile with $5K+ hourly rate], [Time-lapse: 8 hours of focused work]
A shot list prevents "video soup"—random footage that doesn't match your voiceover. It forces you to be intentional. Time to create: 15 minutes for a 6-minute video. Time saved in editing: 45 minutes (no guessing which clip goes where).
Color Grading & Consistency
All your visuals must share a color palette—this trains your audience's brain to recognize "your" aesthetic. Example: If your first 5 videos are cool-toned (blues, teals), stick to cool tones. If they're warm (golds, oranges), stay warm.
Use Canva or DaVinci Resolve to apply a color grading preset to all footage. Free presets: DaVinci Resolve's built-in "Summer," "Cool," "Cinematic." Paid presets: FilmConvert (USD 300 one-time license). Most Pakistani creators use DaVinci Resolve's free presets—perfectly professional.
Practice Lab
Task 1: Shot List Creation — Take a script (300-400 words, 2-3 minute video). Create a detailed shot list with: (1) Script sentence, (2) Visual type (stock/AI/screen), (3) Specific footage description, (4) Duration (in seconds). Share your shot list—it's your blueprint for sourcing and editing.
Task 2: Visual Sourcing — Using your shot list, collect all visuals: (1) Download 10+ stock clips from Pexels. (2) Generate 3 AI images on Imagen. (3) Record 1 minute of screen footage. (4) Organize into folders. (5) Time yourself—goal under 1.5 hours total.
Pakistan Example: "Tech Reviews Karachi"
Ahmed, a tech reviewer from Karachi, used generic stock footage for his first 20 videos (2,000 subscribers). Then he switched to Runway AI-generated B-roll. His videos went from "decent" to "cinematic." Every transition featured custom Runway footage—Pakistani cityscapes, product unboxings animated uniquely.
Cost increase: USD 2-5 per video. Subscriber increase: 10,000 new subs in 30 days. CPM increase: USD 3 → USD 6 (premium tech brands want high-production-quality content). Revenue: PKR 90,000 → PKR 180,000/month just from YouTube ads, before sponsorships.
His color grading: Cool blue tones (professional tech aesthetic). Every video, every thumbnail, every transition uses the same color palette. Viewers see Ahmed's video in their feed—the aesthetic is instantly recognizable. Engagement: 8% click-through rate (industry average: 3%).
His next move: Launch a "Video Editing for Tech Channels" course (PKR 5,000, including his Runway templates and Canva presets). Projected 20-30 students = PKR 100,000-150,000 revenue.
Lesson Summary
Visual Generation Quiz
4 questions to test your understanding. Score 60% or higher to pass.