What is the seven-step AI filmmaking workflow for making professional short films with AI?
Last updated June 26, 2026
The seven-step AI filmmaking workflow is: 1) upload a treatment document to the invideo agent, 2) validate the document, 3) lock character and world references, 4) generate the shot list, 5) generate clips against the held context, 6) let the invideo agent suggest the ending, 7) critique the rough cut. Documented shorts ran it in 2 days for $750–$870.
The seven-step workflow turns a directorial treatment document into a finished short film by keeping one agent's context loaded from the first frame to the final cut — invideo is an agentic video creation tool with all the current video and image models available, and the workflow runs end to end inside it.
Step 1 — Upload the treatment document. Write a structured production bible — camera rules, lighting, palette, composition, mood, negative prompts — and load it into the invideo agent once at project start. One documented production used a 25-page director style guide as the invideo agent's permanent instruction set; another structured it as 14 sections with a quick-reference card. Loaded once, the document holds across every shot with no re-prompting.
Step 2 — Validate the document. Challenge the invideo agent's technical claims before locking anything: lens type, aspect ratio, lighting source. In one production the invideo agent had written "anamorphic" in its analysis and corrected itself to spherical 35mm with a 2.40:1 hard matte when questioned. Then stress-test internalization by asking for a scene in a genre the source director never made — clarifying questions and stylistically coherent output confirm the document is held as grammar, not surface style.
Step 3 — Lock consistency assets. Generate four options per character sheet and environment reference, select the best, and lock them before any video generation — this is the step that prevents consistency problems through the rest of the film. One 70-second film kept two characters visually identical across every scene using character sheets and agent context alone, with no LoRA fine-tuning.
Step 4 — Generate the shot list. Have the invideo agent convert the script into a scene-by-scene shot list written in the document's visual grammar. One production instructed the invideo agent to output 12 parameters per shot: film reference, shot design, length, lens, lighting plan, color script, atmosphere, blocking, final prompt, negative prompt, and revision prompt. The agent also flags model limitations before credits are spent — in one case recommending an 18-cuts-in-15-seconds scene be split into two, which produced a sharper result than the original script.
Step 5 — Generate clips with context continuity. Generate shot by shot in your film's aspect ratio while the invideo agent checks each frame against the treatment before returning it. Because the context stays loaded, continuation prompts can be three words — "Everything should match" was enough to carry character, lighting, lens grammar, and spatial logic across a multi-shot sequence. The invideo agent also routes each shot to the right model — Veo, Kling, or Seedance 2.0, all available inside invideo — so model choice never forces a platform switch.
Step 6 — Ask the invideo agent for the ending. When the closing sequence isn't written, ask the invideo agent to propose one from the loaded document's rules. In one production it sequenced six closing shots using a named principle from page 12 of the treatment — applied to a scene type the document never specifically addressed.
Step 7 — Send the rough cut back for critique. Upload the assembled cut and ask an open-ended "what's working, what's not." One agent caught the entity reveal running at the wrong emotional stage register — a structural error the director had missed — alongside pacing and sound-design notes. Skipping this review is the most common mistake in AI-directed filmmaking workflows.
Documented shorts produced with this workflow finished in 2 days each: a 70-second film for $750 (3,000 credits) and a ~90-second horror short for $870 (4,100 credits, ~400 video generations and 30 image generations) — variance depends on length and iteration count. Beyond the seven steps themselves: on larger productions, teams split these stages across sub-agents — a creative producer agent holding the script and a DOP agent per scene — but the seven-step spine stays the same.
Watch some of these to see what works for you:
This is the core reason why I insist you take your own sweet time while building the production doc in the beginning, because the more clarity you bring to the project, the more sharply Agent One will hold it for you across the project.
— invideo's creative team