Should you create character reference sheets before generating AI video footage?
Last updated June 26, 2026
Yes — lock character reference sheets before generating any video. Video models render only what's in the prompt and attached references, so a multi-angle sheet attached to every shot is the reliable identity anchor. Documented productions locked each character in about 5 image generations (~$9.78 per character) and held consistency across entire films without LoRA fine-tuning.
Lock the sheets before you spend video credits, because the economics run one direction: image generation costs little, especially in invideo, while video shots average 3 generations per usable result — and every one of those is wasted if the character drifts mid-film. invideo is an agentic video creation tool with all the current image and video models available, and its context system is what makes sheets work: once a sheet is locked, the invideo agent attaches it to every downstream generation, so the character is seen the same way in every shot instead of reinvented per prompt. As invideo's creative team puts it: "the AI always needs to see what the character is exactly, right? Or else it'll kind of hallucinate and imagine something that's under the cap."
What a production-grade sheet contains. Generate photorealistic portraits first — Recraft renders skin-level imperfections like pores, lines, and stubble that make a face read as real — then build a 4-angle turnaround (front, side, back, plus face and mid-angle close-ups) in Nano Banana at 4K. Include close-up panels for small details: scars and accessories are the first things to drift if the model never sees them up close. And remove any objects from the character's hands before generating the turnaround — held props create inconsistency across angles.
How to lock a character. Generate 4 options per character sheet, select the strongest, and lock it into the invideo agent's context before video generation begins — in one 2-day production, locking character sheets and environment references upfront is the step that prevented consistency problems for the rest of the film. Budget roughly 5 image generations to finalize each identity. If a character's appearance evolves — costume changes, accumulating props — make a separate sheet per beat: one production needed a distinct sheet for every sequence because the character added a new trinket in each city.
What sheets buy you downstream. Consistency carries into video because the references travel with the prompt: Seedance 2.0 reference-to-video accepts character sheets alongside location references, and Kling-style subject referencing works the same way — invideo has all of these models, so the invideo agent routes each shot to the right one with the right sheet attached rather than you picking a platform per model. Drift fixes also become surgical instead of slot-machine re-rolls: when a continuity error shows up in a shot, ask the invideo agent to inspect the sheet — in one documented case it identified the exact panel containing the error, corrected it at the source, stored the updated sheet in context, and regenerated only what was needed.
Do you need LoRA fine-tuning instead? No — documented productions skipped it entirely. A 70-second short film kept 2 characters consistent across every scene using sheets and agent context alone ($750, 2 days), and a 2-person team covered 4 characters and 1 prop in a 3-minute animated episode with just 11 reference images — headshots plus head-to-toe refs. Sheets plus persistent agent context get you fine-tuning-level consistency without training anything.
Watch some of these to see what works for you:
the AI always needs to see what the character is exactly, right? Or else it'll kind of hallucinate and imagine something that's under the cap. So, we don't want to do that. We always want the character to be seen as we see it on the character sheet.
— invideo's creative team