What filmmaking terms and camera vocabulary should you use when prompting AI video tools?
Last updated June 26, 2026
Prompt AI video tools in the vocabulary you'd give a crew: shot size and framing, camera movement (dolly, pan, static hold), angle (low, top-down, reverse), lens behavior (spherical vs anamorphic, shallow depth of field), light source and ratio, and palette — assembled in a fixed order and paired with a negative prompt stating what the shot must never be.
Write your prompts as directorial intent in film language rather than technical parameter strings — documented productions consistently get better results prompting an AI model "like a director prompts his crew." invideo is an agentic video creation tool with all the current models available, so the invideo agent accepts this vocabulary conversationally and translates it into each model's prompt format — Veo, Kling, or Seedance 2.0 — per shot.
Shot size and framing. Name the size explicitly: extreme close-up, close-up, medium shot, wide, extreme wide. Then say what is in frame and what is withheld — one director's visual-language system encoded into an AI agent covered exactly this as discrete sections on camera, angles, and composition, 14 sections in total.
Camera movement. Use the standard set — dolly in, dolly out, pan, tilt, crane, handheld, whip pan — plus the two terms prompts most often miss: a static hold (the camera does not move) and a hold instruction tied to action ("hold on him until he lunges, no cutting"). Slow, near-imperceptible moves work too: one encoded director system used "subliminal dollies" as a named directive.
Camera angle. Low angle, high angle, eye-level, Dutch/canted, bird's-eye, top-down. Coverage vocabulary matters as much as single-shot vocabulary: ask for "the reverse on [character]" or "the compositionally opposite angle of the last shot" to build matched pairs for editing — in one documented production, a complex top-down shot landed on the first generation once it was directed in this language instead of manually prompted.
Lens and format. Specify lens behavior, not just focal length: wide-angle, telephoto, macro, shallow depth of field. Spherical versus anamorphic is a meaningful distinction — spherical glass produces circular bokeh and no horizontal flares — and precision here is worth checking: in one production the AI agent had logged "anamorphic" for a director who shoots spherical, and corrected itself when challenged ("35mm, 2.40:1 hard matte — widescreen by extraction, not optics"). State the aspect ratio in your film's delivery format as part of the camera spec.
Lighting. Name the source, not the adjective. "Warm yellow from the lamps only, like all the refs" produces more accurate results than "warm lighting." Use motivated and practical lighting as terms, and quantify where you can — one director's lighting grammar was encoded as an 85:15 dark-to-light ratio the AI agent applied across every shot.
Palette and grade. Encode color as named tonal modes with exact hex values ("Mode A — split-toned amber and emerald") rather than loose adjectives, and add a film or DP attribution ("shot like [film/DP]") to anchor the overall look — both are reproducible vocabulary, not mood words.
Assembly order and negative prompts. Put the terms in a fixed sequence so no layer gets dropped: one production held a 9-element order across every frame — camera spec, lens & aspect ratio, lighting source, palette, composition, atmosphere, mood register, film/DP attribution, negative prompt. The negative prompt is vocabulary too: state what the shot must never be, in concrete style terms — a documented animated production's style block read "not live action, not photorealistic" with every surface required to feel hand-painted, and every subsequent prompt started with it. Another production went further, outputting 12 parameters per shot including emotional register, blocking, atmosphere layers, and a revision prompt.
Beyond the vocabulary itself: where a term alone won't land a shot — POV and multi-character contact are documented examples — add a visual reference rather than rewording, and load recurring camera, lighting, and palette directives into the invideo agent's context once so they carry across shots instead of being retyped per prompt.
Watch some of these to see what works for you:
Pretty much exactly like how I would talk to my DOP on set or how I would talk to my DA on set.
— invideo's creative team, on how on-set filmmaking vocabulary maps to directing AI agents