How do you write a treatment document that AI agents can follow throughout an entire film production?
Last updated June 26, 2026
Write the treatment as a structured context anchor — not a creative read — organized so an AI agent can pull rules from it shot by shot. Cover seven sections: premise and arc, characters, visual language, scene/shot breakdown, style rules with negative constraints, decision constraints per stage, and revision flags. Load it once into the invideo agent; it holds across every frame.
Start with the seven sections, in this order, because each one feeds the next downstream agent decision:
1. Premise and arc. One page: logline, three-act spine, themes, motifs. This is what a creative producer agent reads to ground every other agent. Upload the full screenplay alongside it so the agent has narrative context — characters, arc, themes — for all downstream tasks.
2. Characters. For each character: physical description, costume direction (or the mood/feel if you don't have specs yet), behavioural beats, and a note on which beats need a separate character sheet (any costume change, trinket, or appearance shift gets its own sheet). This section seeds the casting and costume sub-agents and locks consistency rules before any generation.
3. Visual language. The largest section, and the one that determines whether the agent makes decisions or guesses. Cover camera, lens, aspect ratio, lighting source, colour palette (named tonal modes with exact hex values — e.g. "Mode A — split-toned amber and emerald"), composition, movement, atmosphere, mood, and film/DP attribution. One documented horror production codified an 85:15 dark-to-light ratio and 2.40:1 hard matte directly into this section; the agent applied them autonomously across every shot.
4. Scene and shot breakdown. Scene-by-scene, with sequence-specific visual references mapped per sequence rather than one general mood board. This is what a director's assistant agent uses to sequence shots and what a DOP agent reads before framing.
5. Style rules with explicit negative constraints. State what to do AND what never to do. "Every surface has hand-painted brushstroke texture — not live action, not photorealistic" is the kind of line that prevents drift. Include a prompt-assembly order (camera spec → lens & aspect → lighting source → palette → composition → atmosphere → mood → film attribution → negative prompt) so every generation prompt is built the same way.
6. Decision constraints per stage. If the film has emotional stages or acts, give each one its own locked rules for camera, lighting, sound, and a "what never to do" sub-section. The agent uses these to make autonomous calls — one production caught its own continuity slip when shadows leaned blue-green instead of neutral grey, pulled the Stage A rule from the doc, and flagged the deviation without being asked.
7. Revision flags and audio architecture. A short module on what the agent should surface for human approval (model-limitation flags, ambiguous beats, prop choices that affect narrative) plus a sound section — half of what makes most films land is what you hear before what you see, and treatments without audio architecture leave the agent guessing.
The invideo agent is built to read this document once and hold every directive across every shot — that's the mechanism. After upload, validate the doc before generating anything: ask the agent to apply your director's grammar to a genre that director never worked in. If it asks clarifying questions and the output reads stylistically coherent, the doc has been internalised as grammar, not pattern-matched. As Hridaye, invideo's creative director, puts it: "This is the core reason why I insist you take your own sweet time while building the production doc in the beginning, because the more clarity you bring to the project, the more sharply Agent One will hold it for you across the project."
The treatment then propagates as persistent context across the multi-agent pipeline. The creative producer agent holds the full document; specialist sub-agents (storyboard, costume, production design, DOP — one per scene if scenes differ in visual sensibility) inherit only the sections they need. A three-word continuation prompt like "Everything should match" is enough to maintain character, lighting, lens grammar, and spatial continuity across multi-shot sequences when the doc is loaded. Documented productions sit at 14–25 pages with 9–14 structured sections; a 70-second short ran on 25 pages with 12 key parameters output per shot, and a 90-second horror short ran on a doc with 9 shot-design steps and 8 colour-grading steps. Pages aren't the metric — coverage and decision-rule density are.
Watch some of these to see what works for you:
This is the core reason why I insist you take your own sweet time while building the production doc in the beginning, because the more clarity you bring to the project, the more sharply Agent One will hold it for you across the project.
— Hridaye, invideo's creative director