AI Filmmaking

Do you need a treatment document before generating AI video, or is a storyboard enough?

Last updated June 26, 2026

It depends on scope. For a single scene, a short ad, or anything under ~2 minutes with one visual world, a storyboard plus a one-page style reference is enough. For a multi-scene short film, a named director style, or anything where pacing and tone must hold across many shots, build a treatment — it pays back the first time the agent has to make a decision you didn't anticipate.

Use this triage rule: more than 5 scenes, more than 3 distinct visual environments, or a named directorial style you want held across every shot — write a treatment. Under that, a storyboard with character sheets and a style reference document is enough. The treatment isn't paperwork; it's the persistent context the invideo agent reads once and applies to every downstream decision, so you stop re-prompting style on every shot.

invideo is an agentic video creation tool with all current video and image models (Seedance 2.0, Kling, Veo, Recraft, Nano Banana, GPT-Image-2) and upscalers available, routed by the invideo agent — so what you load up front becomes the brief every sub-agent inherits.

When a storyboard is enough

For short-form, single-environment, or single-scene work — a 15–30 second promo, a product cutdown, a single sequence — a storyboard plus a one-page style reference (palette, lens feel, 3–5 reference frames) covers it. You're directing one look across a handful of shots, so a creative producer agent with the script, the storyboard frames, and the style reference loaded will hold continuity without a full treatment. One documented production hit a complex top-down shot on the first generation attempt working this way — storyboard, references, conversational direction, no treatment doc.

When a treatment earns its keep

The moment the project crosses into multi-scene narrative, a named director's visual language, or a tonal arc that has to escalate across the film, write the treatment. The pattern across documented productions is consistent: a 70-second Wong Kar-wai-style short ran on a 25-page treatment loaded as the agent's system prompt, with 12 parameters evaluated per shot (film reference, shot design, lens, lighting plan, color script, atmosphere, blocking, negative prompt, and more). A ~90-second horror short in a James Wan style ran on a treatment structured around 5 escalating emotional stages, each with locked rules for camera, lighting, and sound — the agent flagged a scene running at the wrong stage register that the director missed. A 3-minute Arcane-style animated episode ran on 64 style-reference frames ingested into context as the locked style block, prefixed onto every prompt afterward. Across these, productions ran $750–$5,000 total and $315–$750 per finished minute — the treatment is what made consistency cheap.

What goes in the treatment

Write it as the brief you'd hand a real crew: a structured visual language section (camera, lens, angles, color tone, lighting, composition, movement, palette with hex values, prompt templates, negative prompts, quick-reference card — 14 sections is a solid spine), a section on sound and audio architecture, a per-stage or per-act breakdown with explicit "what never to do" rules so the agent can make autonomous calls, and the full script loaded so the agent has character arcs and themes. Hridaye, invideo's creative director, frames it this way: "This is the core reason why I insist you take your own sweet time while building the production doc in the beginning, because the more clarity you bring to the project, the more sharply [the invideo agent] will hold it for you across the project."

Storyboard still matters — even with a treatment

A treatment doesn't replace a storyboard, it sits above it. With the treatment loaded, run a storyboard sub-agent to visualize each shot before you direct the DOP agent, costume agent, or production designer agent — the storyboard becomes a visual brief that makes subsequent direction precise. With multi-shot video generation, you also need far fewer storyboard frames than older first-frame/last-frame workflows, since a single 15-second clip carries 4–7 usable shot candidates.

Validate the treatment before you generate

Before burning credits, stress-test it: ask the invideo agent to apply the treatment's style to a genre or subject the reference director never worked in. If the agent asks sharp clarifying questions and produces stylistically coherent output, the doc has been internalized as grammar. If it pattern-matches surface aesthetics, rewrite the weak sections before generating a single frame.

These are the cases where each document earns its place — what works depends on the length, the scene count, and how locked the visual language needs to be.

Watch some of these to see what works for you:

See how a 25-page treatment doc replaced constant re-prompting across a full short film
Watch a 91-page director's treatment doc drive a horror short from first shot to final cut

No treatment doc, just 64 reference frames — see how the storyboard-only approach holds up

This is the core reason why I insist you take your own sweet time while building the production doc in the beginning, because the more clarity you bring to the project, the more sharply Agent One will hold it for you across the project.

— Hridaye, invideo's creative director

Share

More on AI Filmmaking