Should I use batch or grid image generation to pre-visualize my AI film?
Last updated June 26, 2026
Yes — generate grids for pre-vis, not one image at a time. Image generation is cheap inside invideo, and grids give you the option set a real director wants: multiple angles, lighting variants, and costume reads per scene node, with the panels you pick becoming the continuity anchors for everything downstream.
Ask the invideo agent for grids of 3–9 panels per scene node — wide, close, side, lighting variants, costume reads — instead of asking for one image and re-rolling. invideo is an agentic video tool with the current image models (Recraft, Nano Banana, GPT-Image-2) and video models (Veo, Kling, Seedance 2.0) on tap, so the agent routes each grid to the right model and you stay in selection mode rather than prompt-engineering mode.
Batch by theme, not by single reference. When no one image explains the look, split your references into thematic batches — spatial logic in one, screen-function in another, color palette in a third — and tell the agent explicitly what to take from each batch and what to ignore. One documented production ran 3 grid options per round to explore different parts of the world before locking any frame.
Lock four assets before you generate anything else. Run a 4-options pass on each character sheet, antagonist reference, key prop, and environment plate; pick one of the four; lock it. After that, the locked panels — not your original references — become the seeds for every subsequent scene. This is the single step that prevents drift across the rest of the film: in a 70-second short with 2 characters, this approach held consistency across every scene with no LoRA, on a 4-options-per-asset workflow.
Use grids for shot coverage, single images for surgical fixes. Grids are right for exploration: angle coverage of a new scene, costume options when you only have a mood, lighting variants of a key beat. A single targeted image is right for surgical work — a close-up crop of an existing wide, or a one-panel fix to a character sheet (ask the agent which panel has the error; it identifies the exact one and corrects only that). Don't grid what you've already locked.
Sizing and seeding the grid. 4 panels for an A/B/C/D selection pass (the standard asset-lock pattern), 9 for broader exploration when the world isn't defined yet. Hand the agent your locked character sheet and environment plate inside the same prompt so every panel in the grid inherits the same identity; without those attached, grids drift. Hridaye, invideo's creative director, frames it this way: "Rather than generating one, one, one, one, one images to generate grids. Image generation doesn't cost much, especially in invideo. Use that to your advantage."
The cost case. Image generation is the cheapest part of the pipeline — the credits go to video. Documented productions ran $750–$5,000 all-in (a 70-second short at $750 / 3,000 credits; a 3-minute animated episode at $950; a 2-minute brand promo at $1,500 / 6,000–6,500 credits; a multi-day short at $5,000 / 20,000 credits), and the image-gen line inside those budgets is small — one production used 11 image generations to cover 4 characters and 1 prop, another used 30 image generations against ~400 video generations. Spending more grids upfront to lock assets pays back many times over in avoided video re-rolls, where the real cost lives (avg 3 generations per usable video shot; ~25% of clips make the cut).
These are the patterns that work — what's optimal depends on how defined your world already is and how many characters and locations you're carrying.
Watch some of these to see what works for you:
Rather than generating one, one, one, one, one images to generate grids. Image generation doesn't cost much, especially in invideo. Use that to your advantage.
— Hridaye, invideo's creative director