How do you use reference images to control what AI includes and excludes in scene generation?
Last updated June 26, 2026
Control what AI takes from reference images by pairing every upload with explicit take-and-leave instructions. Five methods work:
- Batch references by theme, with adopt/ignore notes per batch
- Have the AI extract qualities, not copy the image
- Pair references with explicit exclusion language
- Promote approved generated panels to be your new references
- Keep the reference set clean
invideo is an agentic video creation tool with all the current models available, so the techniques below run through one interface — the invideo agent routes your references to the right image or video model per shot.
Batch references by theme, with adopt/ignore notes per batch. Instead of one general mood board, separate your references into thematic batches — spatial logic in one, screen function in another, color theory in a third — and feed each batch to the invideo agent with explicit instructions on what to adopt and what to ignore. In one production, TV stills were uploaded with the instruction to extract only the dome-as-screen concept and ignore the small room scale — "I told it what to take and just as importantly, what to leave out." Telling the AI what to exclude is as load-bearing as telling it what to include; without the exclusion note, scale, props, and background details bleed into your scenes.
Have the AI extract qualities, not copy the image. Dropping illustrated or animated references directly into prompts produces poor results — instead, instruct the invideo agent to read the colour palette and texture qualities of the reference and translate those into a photorealistic prompt. In a documented production, the generations came back hyper-realistic with the exact colour temperature the director wanted, because the invideo agent understood creative intent from the image rather than replicating it. This is your finest inclusion control: you name the specific quality (palette, texture, light source) the reference is there to contribute.
Pair references with explicit exclusion language in the prompt. References set what to include; a written exclusion block prevents drift away from them. One 2-person team uploaded 64 style frames in a single message with the instruction to "deeply understand this art style and save it into context," then attached a style block to every prompt that explicitly prohibited unwanted output: "not live action, not photorealistic." Every generation prompt after that started with the block — the explicit negative constraint is what kept 164 generated clips in one consistent hand-painted style.
Promote approved generated panels to be your new references. Generate image grids rather than single frames — one director requested 3 grids per round — iterate on the grids you like, then extract the best individual panels. Those panels replace your original references as continuity anchors: they contain only what you approved, so nothing unwanted carries forward into scene generation. Image generation costs little, especially in invideo, which makes grid rounds an affordable filter. Once anchors are set, the invideo agent attaches the relevant ones autonomously based on the grid or scene it's building.
Keep the reference set clean. Attaching wrong or stray reference images causes completely incorrect output — in one project, removing a single stray attachment fixed a clock continuity error that re-prompting couldn't. Clean references before upload, too: remove objects from characters' hands before generating multi-angle character sheets, and include close-up panels for small details like scars and accessories, because the AI needs to see exactly what a character is or it will hallucinate what it can't see.
These are some of the ways to problem-solve this — what works depends on your references and your shot.
Watch some of these to see what works for you:
I told it what to take and just as importantly, what to leave out.
— invideo's creative team