How do you extract measurable signatures from a director's films?

Pull numbers and named rules from actual film sequences rather than general mood boards. For example, James Wan's lighting was codified as an 85:15 dark-to-light ratio and his lens language verified as spherical shooting 2.40:1 hard matte.

Why should a style guide include negative constraints?

Stating what the style is NOT prevents the AI model from drifting toward a default look. Explicit prohibitions per stage make autonomous decisions far easier for the AI agent reading the document.

How should colour be encoded in the style guide?

Encode colour as named tonal modes with exact hex values, such as Mode A — Split-toned amber and emerald, so palette control is reproducible rather than re-described on every shot.

How do you validate a director's style guide before generating video?

Ask the AI agent to apply the style to a genre the director never worked in. Coherent output and clarifying questions back from the agent confirm the grammar is internalized rather than surface-level pattern-matched.

Build a Director's Visual Style Guide for AI Video

Q: What should a director's visual style guide for AI video generation include?

It should cover camera specs, lens language, lighting ratios, colour tone with hex values, composition, movement, atmosphere, mood, film palettes, prompt templates, negative prompts, and a quick-reference card. One documented version ran 25 pages across 14 sections.

Build it as a multi-section visual language document, not a prompt. One documented version ran 25 pages across 14 sections — camera, angles, lighting, colour tone, composition, movement, atmosphere, mood, film palettes, prompt templates, negative prompts, and a quick-reference card — loaded once into the invideo agent so every shot generates against the same grammar.

Start by extracting measurable signatures from the director's actual films — numbers and named rules, not adjectives. Documented examples: James Wan's lighting grammar codified as an 85:15 dark-to-light ratio; his lens language verified as spherical (circular bokeh, no horizontal flares) shooting 2.40:1 hard matte — widescreen by extraction, not optics. Pull references mapped to specific sequences rather than one general mood board; precise, verifiable inputs are what make the document enforceable. As invideo's creative team puts it: "IT ISN'T A LOOK. IT'S A LANGUAGE" — you are codifying a communication system, not an aesthetic tag.

Next, structure those findings into sections. The Wong Kar-wai document used 14: camera, angles, colour tone, atmosphere, mood, lighting, composition, movement, film palettes, prompt templates, negative prompts, and a quick-reference card. The James Wan version added two structures worth copying: a five-stage emotional architecture with locked camera, lighting, and sound rules per stage, and a "what never to do" section per stage — explicit prohibitions make autonomous decisions far easier for the invideo agent reading the document. Wan's version also carried a full audio architecture module, because half of what makes his films land is what you hear before what you see; treat sound as a first-class section, not an afterthought.

Within those sections, encode colour as named tonal modes with exact hex values — e.g. "Mode A — Split-toned amber and emerald" — so palette control is reproducible rather than re-described per shot. Write negative constraints in the same spirit: state what the style is NOT, so the model can't drift toward a default look.

Then lock a prompt assembly order into the document itself. One documented system fixed a 9-element sequence for every generation prompt: camera spec, lens and aspect ratio, lighting source, palette, composition, atmosphere, mood register, film/DP attribution, negative prompt. The order is part of the guide — it guarantees stylistic completeness on every frame, not just the ones you remember to specify.

Isolate exceptions into their own directive. The Fincher version of this document separated outlier films (The Curious Case of Benjamin Button, The Killer) from the general rules so an agent reading it never overgeneralizes edge-case choices across the whole filmography. This is the difference between encoding a grammar and copying a surface style.

Finally, validate the document before generating anything. Ask the invideo agent to apply the style to a genre the director never worked in — one creator requested a courtroom thriller through the James Wan lens. Clarifying questions back from the invideo agent (era, nature of threat) plus stylistically coherent output confirm the grammar is internalized, not pattern-matched. A document that passes this test keeps working unprompted: in one production the invideo agent applied a slow-shutter motion smear from page 17 without being asked, and pulled a named principle from page 12 onto a scene type the document never specifically addressed.

Once built, the guide loads once into the invideo agent — an agentic video creation tool with all the current models available — as persistent context, so every shot generates against it without re-prompting; one such 25-page document carried a 70-second short film start to finish.

Watch some of these to see what works for you:

Build a director's bible and train an AI agent to use it shot by shot

25-page Wong Kar-wai style guide as AI system prompt — see it work

14 Fincher directives, exceptions block, and how the agent cross-checks frames

IT ISN'T A LOOK. IT'S A LANGUAGE. Color as diagnosis. Subliminal dollies. Dread before dialogue.

— invideo's creative team

How do you build a director's visual style guide for AI video generation?

More on AI Filmmaking

How do you build a director's visual style guide for AI video generation?

Related questions

More on AI Filmmaking