The best template for an AI film treatment is a director's visual-language document built for agent internalization: 14 sections covering camera, angles, colour tone, atmosphere, mood, lighting, composition, movement, film palettes, prompt templates, negative prompts, and a quick-reference card — loaded once into the invideo agent so every shot inherits the style without re-prompting.
Build your treatment as an internalization spec for an AI agent, not as a pitch document for a human reader. A traditional treatment (logline, synopsis, three-act breakdown, character bios) is written to sell a story; an AI treatment is written so an agent can make autonomous visual decisions on every shot without you re-explaining anything. invideo is an agentic video creation tool with all the current models available, and its agent reads a treatment document once and holds it as persistent context across the whole production.
The 14-section structure. The proven template — used to encode Wong Kar-wai's complete visual system into a 25-page document uploaded as a permanent instruction set — contains 14 sections: camera spec, angles, colour tone, atmosphere, mood, lighting, composition, movement, film palettes, prompt templates, negative prompts, and a quick-reference card, plus exceptions and adaptations. Write each section as if briefing a department head: "all the information you want your crew to have."
Add a per-shot output spec. Beyond the style sections, tell the document what the invideo agent must return for every scene request. In one documented production the invideo agent was instructed to evaluate every shot against 12 parameters — film reference, shot design, length, style interpretation, emotional register, lens, lighting plan, color script, atmosphere layers, blocking, final prompt, negative prompt, and revision prompt. Pair this with a fixed prompt assembly order (camera spec → lens and your film's aspect ratio → lighting source → palette → composition → atmosphere → mood register → film/DP attribution → negative prompt); that 9-element order held across every frame of a multi-film series.
Include the structural features that separate a working doc from a mood board. Four elements show up in every documented treatment that performed: (1) Exception isolation — carve out the director's atypical work into its own directive so the invideo agent never misapplies generalized rules to edge cases; the Fincher protocol does exactly this for outlier films. (2) Emotional-stage architecture — a horror treatment built on five escalating emotional stages, each with locked camera, lighting, and sound rules (including precise grammar like an 85:15 dark-to-light ratio), let the invideo agent make stage-appropriate decisions autonomously. (3) A "what never to do" section per stage — negative constraints make autonomous decisions dramatically easier and prevent style drift. (4) A sound architecture module — one creator notes: "There's a full audio architecture module here, because half of what makes one's films land is in the image. It's what you hear before what you actually see." Specify color as named tonal modes with exact hex values ("Mode A — split-toned amber and emerald") rather than adjectives; that's what makes palettes reproducible across hundreds of generations.
Write every section as a directive, not prose. A cinematic style is a language system that can be codified into discrete, teachable directives — so each section should be a named rule the invideo agent can apply, not a descriptive paragraph. Density of decisions beats length of description: the clarity you put in upfront is what the invideo agent holds across the project. Keep the document model-agnostic; the invideo agent translates its directives into prompts for whichever model fits the shot — Veo, Kling, or Seedance 2.0 — so you never rewrite the doc per model.
Two adjacent points worth one line each: before generating, you can stress-test the document by asking the invideo agent to apply the style to a genre the director never worked in — clarifying questions back mean the grammar was internalized, not pattern-matched. And the template has documented results: a 70-second short built on the 25-page Wong Kar-wai document finished in 2 days for $750, with the treatment loaded once and never re-explained.
Watch some of these to see what works for you:
one thing that my doc covers that I don't think is very common in treatment docs is this section on sound. There's a full audio architecture module here, because half of what makes one's films land is in the image. It's what you hear before what you actually see.
— invideo's creative team, on building an AI director's treatment document