AI Filmmaking

How do you use reference images in AI video generation for consistent characters and scenes?

Last updated June 26, 2026

Use reference images as locked, persistent context rather than one-off attachments: multi-angle character sheets (front, side, back, plus close-ups), a saved style block, and themed scene references attached to every generation — then reference-to-video to carry identity between shots. One production kept 2 characters consistent across a 70-second film this way, with no LoRA fine-tuning.

Lock your references before generating any video, then attach them to every prompt — consistency comes from persistent context, not from re-describing characters each time. invideo is an agentic video creation tool with all the current video and image models available, so this whole workflow runs in one place, with the invideo agent holding your references in context across the project.

1. Use reference-to-video, not just a start frame. A start-frame (image-to-video) input only seeds the first frame — the model has no context beyond it, so identity drifts as the clip plays. Reference-to-video ingests character and location references throughout generation: Seedance 2.0's reference-to-video accepts both simultaneously, which is why it holds continuity better than extend, and Veo and Kling accept subject references as well. The invideo agent routes each shot to the right model, so you never pick a platform per model. Division of labor: references anchor who and where; the prompt drives action and camera.

2. Build multi-angle character sheets and lock them before video. Generate each character's sheet with four angles plus face and mid-angle close-ups — close-up panels are what keep small details like scars and accessories consistent across models. Remove objects from characters' hands before generating turnarounds to avoid angle-to-angle inconsistency. Generate roughly 4 options per sheet and lock the best one: in one documented production, 11 images covered full reference sheets for 4 characters and 1 prop, and locking one character took about 5 generations (~$9.78 per character). For the stills, Recraft produces photoreal faces with pores, lines, and stubble, while Nano Banana and GPT-Image-2 handle multi-angle sheet layouts. If a model can't render a complex multi-character arrangement from text, a hand-drawn sketch uploaded as a reference image works as the visual anchor.

3. Save style references to context once, then reuse the block. Upload a batch of style frames in a single message — one production uploaded 64 frames from its target aesthetic — with an explicit instruction to analyze the style and save it to persistent context. Then open every subsequent generation prompt with that locked style block plus the relevant character sheet, and state negative constraints explicitly (e.g., "not live action, not photorealistic") to prevent style drift.

4. Batch scene references by theme, with include/exclude instructions. For environments, separate references into thematic batches — spatial logic in one, color theory in another, a specific concept in a third — and tell the invideo agent what to adopt and what to ignore from each batch ("take the screen idea, ignore the small room scale"). Stating what to leave out matters as much as what to take. For illustrated or animated references, don't attach them directly to photoreal prompts: instruct the invideo agent to read their color palette and texture qualities and translate those into the prompt — in one production this returned hyper-realistic generations at the exact intended color temperature.

5. Promote your own outputs to continuity anchors. Generate image grids rather than single frames, iterate on the grids you prefer, then extract the best panels — those extracted images replace your original references and serve as anchors for every subsequent scene generation, pulling each shot closer to the locked look.

6. Chain references across shots for continuous scenes. For a sequence that must read as one location and one character, clip the final seconds of each generated segment and re-upload it alongside the character sheet and location references; Seedance 2.0 reference-to-video reads camera movement and atmosphere from the clip's end and continues the next segment seamlessly. If a character's appearance evolves across the sequence — added props, costume changes — create a distinct character sheet per beat. The invideo agent can also scout real-world landmark images from the web to use as location reference plates.

7. Fix consistency errors at the source, not the shot. When a continuity error appears, ask the invideo agent to inspect the character sheet instead of re-rolling the shot: it can identify the exact panel containing the error, correct it, store the updated sheet in context, and regenerate only what's needed — so every later shot inherits the fix automatically.

Watch some of these to see what works for you:

I want you to deeply understand this art style and save it into context for further generations. All of these attached images are the art style that I want for this entire project.

— invideo's creative team, exact prompt language used to lock style references into persistent context

Share

More on AI Filmmaking