AI Filmmaking

What cinematic vocabulary terms improve AI video prompts the most?

Last updated June 26, 2026

The highest-impact terms are precise camera and shot specs, lens and aspect-ratio language (spherical vs anamorphic, bokeh), source-specific lighting, named palette values, atmosphere and mood register, film/director attribution, and an explicit negative prompt — in that order. One documented production held exactly this 9-element prompt assembly order across every frame of a finished film.

Assemble every prompt in a fixed vocabulary order instead of stacking adjectives. One production encoded a 9-element assembly order — camera spec, lens & aspect ratio, lighting source, palette, composition, atmosphere, mood register, film/DP attribution, negative prompt — and held it across every frame; that order doubles as a ranking of which term categories carry the most weight.

1) Camera and shot spec. Open with the shot itself: shot size (extreme close-up, close-up, medium, wide), angle, and movement (dolly, tracking shot, rack focus, handheld, static hold). These are the single most legible terms to video models because they map directly to how film footage is labeled. Write the framing before anything else — it sets what every other term modifies.

2) Lens and aspect-ratio language. Spherical vs anamorphic is a meaningful distinction for prompt accuracy: spherical lenses produce circular bokeh and no horizontal lens flares. In one horror production the invideo agent initially noted 'anamorphic' for a James Wan reference and corrected itself when challenged — '35mm, 2.40:1 hard matte. Widescreen by extraction, not optics.' Focal length and depth-of-field terms (85mm portrait look, shallow depth of field, bokeh) belong in this slot, and name your film's aspect ratio precisely rather than implying it.

3) Source-specific lighting. Generic descriptors underperform: 'warm lighting' produces less accurate results than 'warm yellow from the lamps only, like all the refs.' Name the light source, its direction, and a ratio where you have one — James Wan's signature 85:15 dark-to-light ratio was extracted and used as literal prompt language in one production. Terms like motivated light, practicals, and golden hour work because they describe a physical lighting setup, not a vibe.

4) Palette as named values. Encode color as named tonal modes with exact hex values — 'Mode A — split-toned amber and emerald' — rather than 'moody colors.' This makes palette control reproducible across shots instead of re-rolled each time.

5) Atmosphere and mood register. Separate the two: atmosphere is physical (haze, rain, smoke, dust in the light), mood is an emotional register you name explicitly. One production's agent evaluated every scene request against 12 parameters including emotional register, atmosphere layers, and blocking — treating mood as a parameter, not an adjective.

6) Film/DP attribution. A named director or cinematographer anchor compresses dozens of micro-decisions into one term. Documented productions ran an entire film on Wong Kar-wai's visual language codified as 14 principles, and another on a James Wan grammar — and the anchor works as grammar, not surface style: one creator validated it by requesting a courtroom thriller through the Wan lens, a genre the director never made, and got stylistically coherent output. Pair the name with the specific trait you want from it, not the name alone.

7) Negative-prompt vocabulary. State what must never appear. The style block 'This MUST look and feel like Arcane animation — not live action, not photorealistic. Every surface has hand-painted brushstroke texture' is what prevented style drift across 164 generated clips in one animated episode. A negative prompt closes the 9-element order for a reason: models drift toward defaults unless you prohibit them.

What to drop. 'Cinematic,' 'beautiful,' and 'high quality' carry almost no instruction on their own — each one should be replaced by a term from the categories above. Before/after: 'cinematic shot of a woman in a bar, moody lighting' becomes 'close-up, slow dolly-in, 85mm, shallow depth of field, warm yellow from the bar practicals only, split-toned amber and emerald palette, light haze, melancholic register, in the style of Wong Kar-wai, not photorealistic-clean — film grain texture.'

This vocabulary is model-agnostic — Veo, Kling, and Seedance 2.0 all parse it — and inside invideo all of these models are available, with the invideo agent routing each shot to the right one. If you're writing many shots, you can also load your lens grammar, palette modes, and lighting ratios into the invideo agent's context once so every downstream prompt inherits the full vocabulary without re-typing it.

Watch some of these to see what works for you:

Full unedited AI session: director's bible, lens terms, lighting corrections in action
Wong Kar-wai style guide as system prompt: how named visual grammar drives every shot
14 Fincher directives replace vague prompts and lock style across every shot

IT ISN'T A LOOK. IT'S A LANGUAGE. Color as diagnosis. Subliminal dollies. Dread before dialogue.

— invideo's creative team

Share

More on AI Filmmaking