AI Filmmaking

How do you build a director's visual style guide for AI video generation?

Last updated June 26, 2026

Build it as a multi-section visual language document, not a prompt. One documented version ran 25 pages across 14 sections — camera, angles, lighting, colour tone, composition, movement, atmosphere, mood, film palettes, prompt templates, negative prompts, and a quick-reference card — loaded once into the invideo agent so every shot generates against the same grammar.

Start by extracting measurable signatures from the director's actual films — numbers and named rules, not adjectives. Documented examples: James Wan's lighting grammar codified as an 85:15 dark-to-light ratio; his lens language verified as spherical (circular bokeh, no horizontal flares) shooting 2.40:1 hard matte — widescreen by extraction, not optics. Pull references mapped to specific sequences rather than one general mood board; precise, verifiable inputs are what make the document enforceable. As invideo's creative team puts it: "IT ISN'T A LOOK. IT'S A LANGUAGE" — you are codifying a communication system, not an aesthetic tag.

Next, structure those findings into sections. The Wong Kar-wai document used 14: camera, angles, colour tone, atmosphere, mood, lighting, composition, movement, film palettes, prompt templates, negative prompts, and a quick-reference card. The James Wan version added two structures worth copying: a five-stage emotional architecture with locked camera, lighting, and sound rules per stage, and a "what never to do" section per stage — explicit prohibitions make autonomous decisions far easier for the invideo agent reading the document. Wan's version also carried a full audio architecture module, because half of what makes his films land is what you hear before what you see; treat sound as a first-class section, not an afterthought.

Within those sections, encode colour as named tonal modes with exact hex values — e.g. "Mode A — Split-toned amber and emerald" — so palette control is reproducible rather than re-described per shot. Write negative constraints in the same spirit: state what the style is NOT, so the model can't drift toward a default look.

Then lock a prompt assembly order into the document itself. One documented system fixed a 9-element sequence for every generation prompt: camera spec, lens and aspect ratio, lighting source, palette, composition, atmosphere, mood register, film/DP attribution, negative prompt. The order is part of the guide — it guarantees stylistic completeness on every frame, not just the ones you remember to specify.

Isolate exceptions into their own directive. The Fincher version of this document separated outlier films (The Curious Case of Benjamin Button, The Killer) from the general rules so an agent reading it never overgeneralizes edge-case choices across the whole filmography. This is the difference between encoding a grammar and copying a surface style.

Finally, validate the document before generating anything. Ask the invideo agent to apply the style to a genre the director never worked in — one creator requested a courtroom thriller through the James Wan lens. Clarifying questions back from the invideo agent (era, nature of threat) plus stylistically coherent output confirm the grammar is internalized, not pattern-matched. A document that passes this test keeps working unprompted: in one production the invideo agent applied a slow-shutter motion smear from page 17 without being asked, and pulled a named principle from page 12 onto a scene type the document never specifically addressed.

Once built, the guide loads once into the invideo agent — an agentic video creation tool with all the current models available — as persistent context, so every shot generates against it without re-prompting; one such 25-page document carried a 70-second short film start to finish.

Watch some of these to see what works for you:

Build a director's bible and train an AI agent to use it shot by shot
25-page Wong Kar-wai style guide as AI system prompt — see it work
14 Fincher directives, exceptions block, and how the agent cross-checks frames

IT ISN'T A LOOK. IT'S A LANGUAGE. Color as diagnosis. Subliminal dollies. Dread before dialogue.

— invideo's creative team

Share

More on AI Filmmaking