Why do you need a new character reference sheet every time a character changes costume in an AI film?
Last updated June 26, 2026
AI video models reconstruct a character from scratch on every generation by reading the reference sheet you attach — they have no memory of prior shots. So once the costume changes, the old sheet actively misleads the model: it averages old wardrobe with new and the character drifts. A new sheet per beat replaces that memory.
Treat the reference sheet as the character's identity for that beat — every angle the model needs to rebuild the look, locked in one place. The invideo agent is an agentic video tool that holds your project context and routes each shot to the right image and video model, but the sheet you attach is still what the model sees. If the wardrobe in that sheet doesn't match the wardrobe in the shot you're prompting, the model will hybridize — old jacket creeping into the new look, a trinket vanishing, hair length resetting — because it's interpolating between two visual states it has no reason to separate.
The rule that follows: a new sheet for every moment the character's on-screen appearance changes in a way a viewer would notice. Costume swap, prop added, trinket picked up, hair or makeup beat, injury, age jump — each is a new visual identity state and needs its own front, side, back, and close-up panels. In one documented production where the lead character accumulates a new trinket in every city across a continuous sequence, the team built a distinct character sheet for every beat — "Juicebox keeps adding a trinket onto himself in every different city. So we needed different character sheets for every single sequence." Without that, the trinket from city three bleeds backward into city one the next time you generate.
Why beats and not just scenes: the unit isn't the scene, it's the visible state. Two scenes with the same wardrobe share one sheet; one scene where the character changes mid-action needs two. Match the sheet count to the wardrobe/prop changes, not the script structure.
How to build each new sheet without losing the underlying identity: generate four options of the new look and pick one before any video runs — across documented productions, locking four variations per asset upfront is what prevents drift downstream. Keep all panels at the same resolution and angle set as the original (four angles plus face and mid-angle close-ups at 4K is the standard used across these productions), remove anything the character isn't holding in that beat (objects in hands corrupt turnarounds), and include close-up panels for small details — scars, accessories, the new trinket itself — because wide-only sheets lose those across generations. Then store the new sheet in the invideo agent's context tagged to the beat it covers, and attach only that sheet to every prompt inside that beat.
If a continuity error shows up later, fix the sheet, not the shot. Ask the invideo agent to inspect the sheet for the offending beat — it can identify the exact panel with the error, correct it there, and every downstream shot inherits the fix. Re-rolling the shot instead leaves the broken sheet in place and the next shot drifts again.
Numbers from documented productions for scale: 11 image generations covered headshots and head-to-toe references for 4 characters and 1 prop on one 3-minute episode; another production locked each character in about 5 generation attempts at roughly $9.78 per character; a 70-second short ran with 2 characters consistent across every scene with no fine-tuning needed — "Seventy seconds. Two characters. The same person across every scene. No LoRA needed." Per-beat sheets are what make those numbers hold once wardrobe starts moving.
Watch some of these to see what works for you:
Juicebox keeps adding a trinket onto himself in every different city. So we needed different character sheets for every single sequence.
— invideo's creative team, on per-beat character sheets in a continuous AI sequence