AI Filmmaking

What is a director's visual language document and how do you use it in AI video production?

Last updated June 26, 2026

A director's visual language document is a structured, machine-readable codification of a filmmaker's complete visual system — camera, lenses, lighting, palette, composition, atmosphere, mood, prompt templates, and negative prompts — uploaded once to an AI agent as persistent context. Documented versions run 14 sections to 25 pages; the invideo agent then enforces that grammar on every shot without re-prompting.

Build the document as a translation of a director's style into discrete, teachable directives — not a mood board. One documented version encoded Wong Kar-wai's visual language into 14 sections: camera, angles, colour tone, atmosphere, mood, lighting, composition, movement, film palettes, prompt templates, negative prompts, and a quick-reference card — a 25-page treatment uploaded as a permanent instruction set before generating a single frame. invideo is an agentic video creation tool, and the invideo agent reads a document like this once and holds every directive across every shot and scene.

What goes in it. The strongest documents encode style as quantified, reproducible rules rather than adjectives. Encode colour philosophy as named tonal modes with exact hex values (e.g. 'Mode A — split-toned amber and emerald'), lighting as ratios (a James Wan document specified an 85:15 dark-to-light ratio), and format as the director actually shot it (2.40:1 hard matte for The Conjuring — widescreen by extraction, not anamorphic optics). Fix a prompt assembly order so every generated frame follows the same grammar — one documented order runs nine elements: camera spec, lens and aspect ratio, lighting source, palette, composition, atmosphere, mood register, film/DP attribution, negative prompt. Add an exceptions directive: a Fincher-style protocol separated outlier films (The Curious Case of Benjamin Button, The Killer) into their own section so generalized rules don't get misapplied. Include a 'what never to do' list per section — that makes autonomous decisions easier for the invideo agent — and consider an audio architecture module: half of what makes a Wan film land is what you hear before what you see. Structure emotional arc as rules too: the Wan document defined five emotional stages, each with locked camera, lighting, and sound behavior.

How to use it. Upload the document to the invideo agent at project start, before any generation — it becomes the system the invideo agent checks every output against. Then stress-test it: ask for a genre the director never worked in (the Wan document was validated with a courtroom thriller request); if the invideo agent asks clarifying questions and returns stylistically coherent output, the grammar has been internalized rather than pattern-matched. From there, direct in plain language and let the document carry the style: the invideo agent assembles each prompt in the locked order, checks generated frames against the treatment before returning them, and can output a full parameter set per shot — one production had the invideo agent evaluate every scene request against 12 parameters, from film reference and lens to colour script, blocking, and revision prompt. Continuation prompts collapse to almost nothing: with the document loaded, 'Everything should match' was sufficient to hold character, lighting, lens grammar, and spatial logic across a multi-shot sequence. The document also enables behavior you didn't ask for — in one production the invideo agent applied a slow-shutter motion smear from page 17 of the document unprompted, pulled a named principle from page 12 and applied it to a scene type the document never addressed, and flagged a model limitation before generation to avoid wasted credits. After assembly, send the rough cut back for a 'what's working, what's not' pass against the document — in one case it caught an entity reveal running at the wrong emotional stage register, which the director had missed.

What it produces. A 70-second short film in the Wong Kar-wai style was completed in 2 days for $750 (3,000 credits) with the 25-page document as the only style control; a ~90-second horror short in the Wan style took 2 days, ~400 video generations, and $870; and a curated series ran 3 director documents through one agent across 3 films. This is the alternative to re-prompting style scene by scene, which drifts — a persistent document keeps the film in your head instead of breaking flow to construct prompts, and camera continuity carries forward because you set it once.

Beyond the document itself: in a multi-agent setup, load it into a creative producer agent first so every DOP agent and storyboard agent inherits the same grammar, and pair it with character sheets, which handle character consistency as a separate mechanism — one production held 2 characters consistent across a 70-second film with no LoRA. For animation styles, a batch of style-reference frames saved to the invideo agent's context serves the same locking function.

Watch some of these to see what works for you:

How a 25-page director's style doc runs an entire AI short film
Building a James Wan director's bible and using it to make an AI horror film
Live AI directing session: feeding a director's bible and correcting the agent in real time

This is the core reason why I insist you take your own sweet time while building the production doc in the beginning, because the more clarity you bring to the project, the more sharply Agent One will hold it for you across the project.

— invideo's creative team

Share

More on AI Filmmaking