How do you use multiple reference images to improve AI video consistency and quality?
Last updated June 26, 2026
Give each reference image exactly one job and feed them in deliberate, labeled batches instead of one catch-all mood board. Six methods that work:
- Theme-batched references with take/ignore instructions
- Bulk style-frame upload locked to context
- Multi-angle character sheets attached to every shot
- Grid-to-anchor — promote the best generated panels to references
- Character + location references in reference-to-video
- Color-and-texture extraction from illustrated references
invideo is an agentic video creation tool with all the current models — Seedance 2.0, Kling, Veo, Nano Banana, Recraft, GPT-Image-2 — so the methods below run through one interface, with the invideo agent routing your references to the right model per shot.
1. Batch references by theme, and say what to take from each. Separate your references into thematic batches — spatial logic in one, screen function in another, color theory in a third — and give the invideo agent explicit inclusion and exclusion instructions per batch. In one production, the director fed stills with the note to extract only the screen-as-dome idea and ignore the small room scale: "I told it what to take and just as importantly, what to leave out." The inverse also holds — a stray wrong attachment produces completely incorrect output, so audit what's attached to each prompt; removing one mis-attached image fixed a clock continuity error in a documented project.
2. Upload a bulk style-frame set and lock it to context once. For a whole-project look, upload a large batch of frames from your target aesthetic in a single message with explicit save instructions: "I want you to deeply understand this art style and save it into context for further generations." One 2-person team locked an entire hand-painted animation style this way with 64 reference frames, then prefixed every subsequent generation prompt with that style block — producing a 3-minute episode for ~$950 (~$315 per finished minute). Add explicit negative constraints to the block ("not live action, not photorealistic") so the style doesn't drift back toward the model's defaults.
3. Build multi-angle character sheets and attach them to every character shot. A character needs more than one image: generate sheets with four angles plus face and mid-angle close-ups — close-up panels are what keep small details like scars and accessories consistent across models, because the AI hallucinates anything it can't see on the sheet. Remove objects from characters' hands before generating turnarounds to avoid cross-angle inconsistency, and if a character's appearance evolves across a sequence, make a distinct sheet per beat. One production covered 4 characters and a key prop with just 11 reference images; another held 2 characters visually consistent across a 70-second film using only sheets and the invideo agent's context — no LoRA fine-tuning. For the generation itself, Recraft handles photoreal portraits with skin-level imperfections while Nano Banana builds the sheets; the invideo agent can run the same character prompt on two image models in parallel so you pick the better aesthetic before committing.
4. Generate option grids, then promote the best panels to your reference set. Instead of single images, ask the invideo agent for multiple grids per round — one documented workflow requested 3 grids at a time — iterate on the grid you prefer, then extract the strongest panels. Those extracted panels replace your original references and become continuity anchors for every subsequent scene, because they're already in your film's exact look rather than borrowed from elsewhere. A related discipline: generate 4 options per asset (character sheets, environment references), select one, and lock it before any video generation begins — locking references upfront is what prevents consistency problems for the rest of the film.
5. Pair character and location references in reference-to-video. When generating motion, attach both character sheets and location references to the same generation — Seedance 2.0 reference-to-video accepts both simultaneously, which is why it holds continuity better than extend (which takes neither) and better than start/end-frame methods that see nothing beyond the uploaded frame. For continuous sequences, clip the end of each generated segment and re-upload it alongside the same character and location references so camera movement and atmosphere carry into the next segment.
6. Translate illustrated references into color-and-texture prompts. Dropping animated or illustrated reference images directly into a photoreal generation does not work. Instead, have the invideo agent read the palette and texture qualities of the reference and write those into the prompt — in one documented case the generations came back hyper-realistic at the exact color temperature the reference implied, because the invideo agent extracted intent rather than copying pixels.
These are some of the ways to problem-solve multi-reference consistency — which combination works depends on whether your weak point is character identity, world continuity, or overall style.
Watch some of these to see what works for you:
I told it what to take and just as importantly, what to leave out.
— invideo's creative team, on batching reference images with explicit inclusion and exclusion instructions