AI Filmmaking

Why should you use negative prompts when giving AI a reference image?

Last updated June 26, 2026

Negative prompts tell the model what to push AWAY from the reference image, while the positive prompt tells it what to pull toward. Without that exclusion lever, the model defaults to its training-set averages — drifting into photorealism, generic lighting, and anatomical artifacts even when your reference clearly says otherwise. Negatives are how you make the reference actually stick.

A reference image is a strong signal, not an instruction. The model still steers itself by blending what you uploaded with everything it has seen before — so a hand-painted frame quietly slides toward photoreal skin, a stylized character grows an extra finger, a moody low-key plate brightens into generic studio light. The negative prompt is the second steering wheel: under classifier-free guidance, the model is pulled toward your positive embedding AND pushed away from your negative one. Drop the negatives and you're only steering with one hand.

Three failure modes show up again and again when you skip them:

Style drift toward photorealism. Stylized references (animation frames, painterly stills, illustrated boards) get "corrected" into live-action looks because photoreal footage dominates training data. On a documented Arcane-style episode, the style block had to explicitly forbid the drift — "This MUST look and feel like Arcane animation — not live action, not photorealistic. Every surface has hand-painted brushstroke texture" — and every prompt after that started with it. Without that negative clause, the reference alone was not enough to hold the look across 164 generations.

Anatomical and structural artifacts. Extra limbs, merged faces, warped hands, broken multi-character contact — these are the model averaging across training images. A negative block listing "extra limbs, deformed hands, merged faces, duplicate characters, warped anatomy" suppresses the worst of it before you spend credits re-rolling.

Bleed-through from the reference itself. Sometimes you want the palette of a reference but not its lighting, or the texture but not the composition. Without negatives, the model copies everything. Tell it what to ignore — "ignore background, ignore lens flare, ignore color cast" — and the reference contributes only the dimensions you actually want.

A reusable negative block for shot generation usually combines all three layers: format violations ("live action, photorealistic, 3D render" when you want 2D, or vice versa), anatomy faults ("extra fingers, deformed hands, merged faces, duplicate characters"), and quality faults ("blurry, low detail, oversharpened, plasticky skin, watermark, text"). Keep it short and specific — long negative lists dilute each term's weight and can over-exclude.

The invideo agent holds these as part of your project's prompt assembly (camera spec, lens, lighting, palette, composition, atmosphere, mood, film attribution, negative prompt) so the exclusion clause attaches to every generation automatically rather than being re-typed per shot. As Hridaye, invideo's creative director, puts it: "Every prompt after this started with it." That discipline — negatives on 100% of prompts, not just the ones that misfired — is what keeps the reference image's intent intact across a full film.

One caveat worth knowing: faster/distilled video models weight negative prompts less aggressively than full-step models, so for shots routed to a turbo variant you may still need an iteration pass. The invideo agent routes between models (Veo, Kling, Seedance 2.0, Runway) per shot, so when a negative clause isn't biting on one model, switching the routing — not rewriting the prompt — is often the fix.

Watch some of these to see what works for you:

See how batched references with ignore instructions shape AI output

Every prompt after this started with it.

— Hridaye, invideo's creative director, on attaching the style and negative block to 100% of generations

Share

More on AI Filmmaking