AI Filmmaking

How does AI video inpainting work — and when should you use it instead of regenerating a clip?

Last updated June 26, 2026

Use AI video inpainting when the broken region is small (under ~20% of frame), motion in that area is slow or static, and the rest of the clip is keeper-quality — you mask the region and a diffusion model refills it using surrounding pixels and adjacent frames. Regenerate the whole clip when the error is large, fast-moving, or the shot's composition itself isn't working.

The decision rule, first. Inpaint when three conditions hold together: the affected region is small, the motion inside that region is slow or static, and the surrounding seconds are worth keeping. Regenerate when any one breaks — a wrong character pose covering half the frame, a fast pan with artifacts smeared across moving subjects, or a shot whose blocking and lighting were never going to work. The economics back this up: across documented productions, average 3 generations per usable shot and only ~25% of clips make the final cut (41 of 164 in one 3-minute episode), so a clean inpaint on a near-keeper saves a full re-roll cycle.

How it actually works. Video inpainting masks the pixels you want replaced and runs a diffusion model's denoising pass on just that region, conditioned on (a) the surrounding unmasked pixels in the same frame for spatial context and (b) adjacent frames for temporal context. The temporal attention is the hard part — it's what keeps the fill from flickering as the clip plays. Image inpainting only has to solve spatial coherence; video has to solve frame-to-frame coherence too, which is why it fails on fast motion and large masks where the model has too little stable reference to propagate.

When to inpaint (use these as your checklist):

  • Remove a small unwanted object or prop — a stray hand, a logo, a continuity slip like an accessory the character shouldn't be wearing. One documented continuity case: the agent identified the exact panel in a character grid where an errant earpiece had been generated and surgically fixed only that panel, leaving the rest of the film intact.
  • Clean up a localized artifact — a melted finger, a warped texture patch, a background glitch in an otherwise good take.
  • Patch a background region while keeping the foreground performance and camera move.
  • Fix a face or hand in a near-static close-up — slow motion gives the model enough temporal anchors to hold consistency.

When to regenerate instead:

  • The broken region covers more than roughly a fifth of the frame — large masks produce visible seam artifacts because the model is hallucinating too much new content.
  • The error sits on a fast-moving subject — temporal flicker on moving regions is the dominant failure mode.
  • The shot's framing, lighting, or blocking is what's wrong — no fill fixes a bad composition.
  • Multi-character physical contact is involved (bodies touching, ropes, props passed between hands) — this category breaks current models faster than anything else; regenerate with stronger reference inputs rather than patching.

Pick the right model for the fix. Inside the invideo agent, every video model on the roster — Veo, Kling, Seedance 2.0 — is available, and the agent routes a region-edit to whichever handles the shot type best. Seedance 2.0 reference-to-video is the strongest choice when you want to regenerate a segment while carrying character and location context across the boundary, which makes the patched section blend with the surrounding shot. For pure region masking on a single clip, the agent treats it as a targeted edit; for continuity-style fixes, it traces the error back to the source character sheet, repairs it there, and the fix propagates through every downstream shot — surgical edits, not slot-machine re-rolls.

Prompting the inpaint. Describe the fill content concretely (what should be there) and the surrounding context (lighting source, palette, surface texture) so the model matches grain and grade. Vague prompts on a masked region produce seam artifacts where the fill's color temperature drifts from the plate. Reference the source explicitly — "warm yellow from the lamps only, like the surrounding frames" beats "warm lighting."

The cost case for inpainting. Documented productions ran $750–$5,000 all-in and $315–$750 per finished minute, with average 3 generations per usable shot. A full re-roll on a 15-second clip burns credits and risks losing the take you already liked; a targeted region edit on a near-keeper is usually one or two generations. As Hridaye puts it, the working philosophy is "surgical edits, not slot-machine re-rolls."

The failure modes to watch for. Seam artifacts on masks larger than ~20% of frame area. Temporal flicker when the masked region overlaps fast motion. Color and grain mismatch when the prompt doesn't anchor the fill to surrounding lighting. If you hit any of these twice in a row on the same shot, stop inpainting and regenerate the full clip with proper reference inputs.

Watch some of these to see what works for you:

Watch the invideo agent perform a surgical fix instead of a full re-roll
When AI can't fix a shot: pivot workflow instead of looping prompts

Surgical edits. Not slot-machine re-rolls.

— Hridaye, invideo's creative director

Share

More on AI Filmmaking