What decisions do you need to lock before generating assets for an AI short film?
Last updated June 26, 2026
Lock four decisions before generating a single asset: who your character is (face, body, wardrobe), what your antagonist or reference entity looks like, what each hero prop is, and what your delivery format is. Unresolved upstream becomes inconsistent downstream — hallucinated faces, drifting worlds, lifeless props, and aspect-ratio mismatches you can't fix in post.
Run this as a pre-production checklist with the invideo agent before any video model touches your project. The reasoning is mechanical: every frame of an AI film inherits these four inputs, so anything you leave fuzzy gets re-guessed by the model on every generation — and the audience disengages the moment they wonder "is that the same person?"
Lock the character (face, body, wardrobe). Generate a 4K portrait first for skin-level detail — pores, lines, stubble — using Recraft or GPT-Image-2, then build a 360° character sheet with four angles plus face and mid closeups in Nano Banana. Generate four options per sheet, pick one, and lock it. Remove props from the character's hands before the multi-angle pass so turnarounds stay consistent. In one 70-second short, this approach held two characters across every scene with no fine-tuning; another production locked one character in five generations at about $9.78. Without the close-up panels, small details (scars, accessories) drift between shots.
Lock the antagonist or reference entity. If your story has a second character, creature, or visual entity, it needs its own reference sheet treated with the same rigor — not an afterthought generated mid-production. Decide its visual lineage now ("closer to Bathsheba, or further?"), generate four variations, and lock one. Productions that skip this hit the wall later, when the entity's reveal shot is the emotional peak and the model has nothing stable to anchor to.
Lock the hero props. Treat each prop as a narrative object with its own creative direction pass. A lifeless toy or generic weapon breaks audience investment regardless of how well the character is rendered — "character's good, the toy's lifeless, why would any girl play with that?" is the kind of note that surfaces only if you've locked the prop early. Generate alternatives, pick on story logic, and bake physical characteristics into the brief ("hard material, so it makes a horrible sound when it falls") so diegetic sound logic carries into later asset generation.
Lock the deliverable format. Decide your aspect ratio, resolution, and whether you're going frames-first or straight to video — these change every prompt downstream. Frames-first is the safer order: direct static frames to approved quality, then move to Seedance 2.0, Veo, or Kling for motion. The invideo agent routes each shot to the right model, so you don't pick a platform per model — you pick the shot.
Lock your world and style separately from your character. Character consistency and style consistency are different problems. Style means lighting grammar, color palette, film stock, and texture language — load a treatment document (one production used a 25-page Wong Kar-wai style guide; another encoded 14 principles across camera, lens, palette, atmosphere) so the invideo agent holds it across every shot. World means specific locations and spatial logic — batch your references by theme (spatial logic, screen function, color theory), tell the agent what to take AND what to leave out, then generate grids of three options per round and extract the best panels to use as continuity anchors for every later scene. Without this, every new scene re-rolls the world.
Use the four-options gate on every locked asset. For each character sheet, entity reference, prop, and environment plate, generate four variations, pick one, and lock it before the first video clip is generated. This is the single step that prevents most consistency problems across the rest of the film, and it costs almost nothing because image generation on invideo is cheap relative to video.
Once these are locked, the rest of the pipeline collapses into directing instead of prompting. "Everything should match" becomes a sufficient continuation instruction because the invideo agent already holds the answers to the four questions. Skip the lock step and you spend your credit budget regenerating the same shot trying to chase a face the model never had a stable target for.
Watch some of these to see what works for you:
Before I build assets, four things will change every frame: The Girl: What does she look like? What era? The Entity: Closer to Bathsheba? The Toy: Doll, ball, something else? The Deliverable: The frames first, then video? These four answers unlock everything.
— Hridaye, invideo's creative director