AI Filmmaking

How many reference images do you need for consistent AI video character generation?

Last updated June 26, 2026

Consistent AI video characters need 2–6 reference images per character — a headshot plus a head-to-toe reference at minimum, a 4-angle turnaround sheet with face close-ups for best results — and no LoRA fine-tuning. One clean image is the floor for a single shot; multi-scene consistency comes from a small locked sheet reused on every generation.

Plan for 2–6 reference images per character, built as a character sheet you reuse on every generation. invideo is an agentic video creation tool with all the current video and image models available, and the invideo agent stores character sheets in project context and attaches them to generations automatically — which is why the count stays small.

The floor — 1 image. Current models generate a character from a single reference image, and that holds for a one-off shot or a short clip. Quality beats count: the image must show the character exactly as you want it rendered — high-resolution, well-lit, fully visible — because anything the model can't see it will invent. With zero references (text-only prompting), the character redraws itself every generation, so one image is the real minimum for any consistency at all.

The multi-scene standard — a 4–6 panel character sheet. Documented productions held characters consistent across entire films with small reference sets. One generated 360-degree turnaround sheets at 4K with four angles plus face and mid-angle close-ups per character. Another covered 4 characters and 1 prop with 11 images total — roughly a headshot and a head-to-toe reference each — and kept them consistent across a 3-minute animated episode assembled from 164 generated clips. A 70-second short film kept 2 characters identical across every scene using character sheets and agent context alone, no fine-tuning. Two rules for building the sheet: include close-up panels so small details like scars and accessories survive across models, and remove objects from the character's hands before generating the turnaround so props don't vary between angles.

When the count grows — one sheet per look. If a character's appearance evolves — costume changes, accumulating accessories — create a separate character sheet for each visual beat; one production needed a distinct sheet for every sequence because the character picked up a new trinket in each location. The per-sheet image count stays the same; the number of sheets grows with the number of looks.

Two adjacent points, briefly: lock the best version of each sheet before generating any video (one production averaged about 5 generations, roughly $9.78, to finalize a character), and count character sheets separately from style references — style frames define the look of the whole film, character sheets define one person.

Watch some of these to see what works for you:

Batching reference images by category to lock visual consistency across AI scenes

the AI always needs to see what the character is exactly, right? Or else it'll kind of hallucinate and imagine something that's under the cap. So, we don't want to do that. We always want the character to be seen as we see it on the character sheet.

— invideo's creative team

Share

More on AI Filmmaking