How do you separate reference images by theme to improve AI scene consistency?
Last updated June 26, 2026
Separate your references into thematic batches — one batch per conceptual layer, such as spatial logic, a key screen-function concept, and color theory — then feed each batch to the AI with paired instructions on what to adopt and what to ignore. Every image teaches one idea; the exclusion instruction stops concepts bleeding between layers.
Start by accepting that one mood-board image rarely carries your film's full visual intent — split the board by what each image teaches. invideo is an agentic video creation tool whose agent holds your reference batches in persistent context across the whole project, which is what makes this method work without re-prompting the look every scene. The separation runs in three steps:
-
Identify the conceptual layers your world needs. List the distinct ideas your visuals must carry, and give each one its own batch. In one documented production the batches were exactly three: spatial logic, a dome-as-screen concept, and color theory. As the filmmaker put it: "For this film, there was no one image that sort of explained the look of the film instantly. So I batched my references."
-
Assemble each batch from references that carry only that layer. A batch can mix sources — film stills, photographs, location plates — as long as every image in it teaches the same single idea. Don't let a color-theory reference smuggle in unwanted set design; if an image teaches two things, it belongs in whichever batch matches your instruction for it.
-
Feed each batch to the invideo agent with paired inclusion AND exclusion instructions. State what to take and, just as importantly, what to leave out. The documented example: TV-series stills fed for the dome-as-screen concept with explicit instructions to extract only the screen idea and ignore the small-room scale. Exclusion is load-bearing — a stray or wrongly attached reference produces completely incorrect output, so scoping each batch prevents one layer contaminating another. Once the batches are in context, the invideo agent autonomously chooses which references to attach for each generation, and holding references in persistent context is the same mechanism that kept two characters consistent across an entire 70-second short film with no LoRA fine-tuning.
Downstream of the separation itself, the documented production generated three image-grid options per round from these batches and promoted the best panels to continuity anchors — that extraction step is its own workflow, but theme-separated batches are what make it land.
Watch some of these to see what works for you:
For this film, there was no one image that sort of explained the look of the film instantly. So I batched my references.
— invideo's creative team