Why does AI video look wrong when I use a reference image — and how do I fix it?
Last updated June 26, 2026
AI video looks wrong with a reference image because models treat attached references as authoritative visual anchors that silently override your prompt. The common causes: 1) a stray or wrong attachment, 2) an illustrated or stylized reference fed in raw, 3) one image doing too many jobs, 4) an error baked into the reference itself. Fix the input, not the prompt.
Start by auditing what's actually attached to the failing generation — the cause is almost always in the inputs, not the prompt text. invideo is an agentic video creation tool with all the current models available, and its context system makes each of these fixes a chat instruction rather than a re-roll.
A stray or wrong attachment overrides a correct prompt. Video models weight an attached image above your written instructions, so one wrong reference produces completely incorrect output even when the prompt is right. In one documented production, a clock continuity problem was traced to a stray reference attachment — removing the attachment fixed the shot. Before regenerating, check every image attached to the request; in the invideo agent, Always Ask mode shows you the exact prompt and references before any credits are spent, which is the cheapest place to catch this.
Illustrated or animated references bleed their style into the output. Dropping a stylized image directly into a prompt does not work — the model copies the rendering style, not your intent. Instead, instruct the invideo agent to read the colour palette and texture qualities of the reference and translate those into a prompt for your target look; in one production the generations "came back hyper-realistic with the exact colour temperature" the director wanted. The same specificity applies to lighting notes: "warm yellow from the lamps only, like all the refs" outperforms generic "warm lighting."
One reference image carrying too many jobs confuses the model. No single image explains a whole look, so the model adopts elements you never wanted — scale, set dressing, framing. Separate your references into thematic batches (spatial logic, screen function, colour theory) and feed each batch with explicit instructions on what to adopt and what to ignore — telling the model what to leave out matters as much as what to take. Pull references mapped to specific sequences rather than one general mood board, and as your own approved frames accumulate, use those as the references going forward — extracted panels from approved generations carry continuity better than the external images you started with.
The error is baked into the reference itself. If your character sheet contains a mistake, every shot inherits it — so fix the source, not the shot. Ask the invideo agent to inspect the sheet; in one documented case it identified the exact panel containing a stray accessory, corrected it, stored the updated sheet in context, and only the affected shots needed regeneration. Prevent this class of failure upfront: include close-up panels (not just wides) so small details like scars and accessories stay consistent, remove objects from characters' hands before generating turnaround angles, and generate several options per reference asset and lock the best before any video generation — one production locked 4 characters and a prop with just 11 reference images, and locking a single character's identity took about 5 generations at roughly $9.78 per character.
These are some of the ways to problem-solve this — which one applies depends on which input caused the drift, so diagnose the attachment chain before you spend credits regenerating.
Watch some of these to see what works for you:
I told it what to take and just as importantly, what to leave out.
— invideo's creative team, on batching reference images with explicit inclusion and exclusion instructions