AI Filmmaking

Should you use one mood board or separate visual references per scene for an AI film?

Last updated June 26, 2026

Use both layers: lock one global style document for the whole film, then pull separate visual references mapped to each sequence. A single mood board flattens the visual variation across your film's emotional arc; per-scene references with no global layer drift the overall grammar. Documented AI productions run the two together.

Set up the global layer first: one style block or treatment document loaded into your agent's context once, covering the elements every shot shares — camera language, lighting logic, palette, composition, mood register, and negative constraints. invideo is an agentic video creation tool, and the invideo agent holds a document like this in persistent context so you never re-explain the style per shot. One documented production locked an entire animated episode's style by uploading 64 reference frames in a single message with the instruction to save the style to context; another encoded a director's complete visual system in a 14-section document, and a third ran a 25-page treatment as the permanent instruction set. The global layer answers "what does this film look like" — and it stays fixed.

Then pull sequence-level references on top of it. Rather than one general mood board, map specific references to individual sequences: the lighting of the night sequence, the palette shift of the climax, the spatial logic of one location. Pulling sequence-specific visual references improves generation precision during pre-production, because each scene gets imagery that matches its emotional beat instead of a project-wide average. Batch those references by theme — spatial logic in one batch, screen function in another, color theory in a third — and tell the invideo agent explicitly what to adopt and what to ignore from each batch; exclusion instructions matter as much as inclusion. The same logic applies to characters whose look evolves: one production built a distinct character sheet for every sequence because the character added an item in each new location.

Convert sequence references into locked anchors before generating video. Generate image grids per sequence — one production requested 3 grid options per round — iterate, then extract the best panels. Those extracted panels replace your original references and carry continuity through every shot in that sequence. If you want per-scene variation to stay inside the global grammar, lock rules per emotional stage: one horror production structured its document around five escalating emotional stages, each with fixed camera, lighting, and sound rules, so the invideo agent varied scenes without breaking the film's language. One pointer if your references are illustrated or animated: have the invideo agent read their colours and textures and prompt for those qualities rather than attaching the frames directly.

A single mood board alone is sufficient only for short, single-location, single-mood pieces where the whole film lives in one visual register. For anything multi-act or tonally varied, run the global document plus per-sequence references — that combination is what documented multi-scene productions used to hold consistency past 21+ scenes.

Watch some of these to see what works for you:

How to batch visual references by sequence theme for AI films
Horror short film built on per-stage visual rules, not one mood board
See how a global treatment doc plus per-scene rules keeps AI film consistent

For this film, there was no one image that sort of explained the look of the film instantly. So I batched my references.

— invideo's creative team

Share

More on AI Filmmaking