AI Filmmaking

How do you lock a visual style in AI filmmaking using reference frames?

Last updated June 26, 2026

Lock a visual style by uploading a batch of reference frames that define the target aesthetic, instructing the invideo agent to save them as persistent style context, then attaching that locked style block to every subsequent generation prompt. The reference frames carry the lock; the persistent context holds it; the explicit negative constraints prevent drift.

Start by gathering reference frames that genuinely encode the look you want — not a mood board, but actual frames from the target source mapped to specific sequences in your film. One documented production fed 64 frames from a single animated episode into the invideo agent in one message with the instruction: "I want you to deeply understand this art style and save it into context for further generations. All of these attached images are the art style that I want for this entire project." That single ingestion step is what makes the lock persistent — the invideo agent is an agentic video tool that holds loaded context across every shot, so you set the style once and it carries scene to scene without re-prompting.

Write an explicit style block alongside the frames, and make it specify what to AVOID as much as what to match. The same production wrote: "This MUST look and feel like Arcane animation — not live action, not photorealistic. Every surface has hand-painted brushstroke texture. Every element in frame must feel painterly and handcrafted like a moving Arcane frame." Negative constraints (no photorealism, no live-action lighting, no CGI gloss) are what stop model drift mid-project. Then attach that style block to the start of every generation prompt — every prompt, no exceptions. That repetition is the lock's enforcement mechanism.

For live-action or photoreal targets, don't drop illustrated or painted references straight into prompts — that produces inconsistent output. Instead, instruct the invideo agent to read the colour palette and texture qualities from those references and translate them into a photorealistic prompt. Hridaye, invideo's creative director, put it directly: "The better move was to have Agent 1 read the colours and textures of them and prompt for that instead." The result one team described: "The gens came back hyper-realistic with the exact colour temperature I was looking for."

Where style precision really matters — codifying a named director's visual language — go beyond frames and write a treatment document: camera grammar, lens choices, lighting ratios, palette modes with hex values, composition rules, and a quick-reference card. One 25-page treatment built around a director's visual language was uploaded to the invideo agent once at project start and held across the full short — the agent gated every generated frame against the document before returning it. The frames give the agent the look; the treatment gives it the reasoning behind the look, which is what lets it apply the style to shots the references never explicitly covered.

Pull references at the sequence level, not just the project level. Different scenes need different anchor frames — an interior dialogue and an exterior chase need separate reference batches, each tagged with what to take and what to leave out. When you batch references, tell the invideo agent explicitly: extract the colour theory from this set, the spatial logic from this set, ignore the room scale in that one. Exclusion instructions matter as much as inclusion instructions.

Across documented productions, the cost of locking style and characters before any video generation ran roughly $9.78 per character lock at five generations each, with full short-film budgets ranging $750–$5,000 depending on scope. The invideo agent routes the actual generation to the right model for each shot — Seedance 2.0 for reference-to-video continuity, Veo or Kling where their grammar fits — so you keep one style lock and one project context across whichever model produces each clip.

One practical check: temporal drift between shots is the failure mode to watch for. If frame two feels off from frame one, the fix is rarely re-rolling — it's usually that the style block didn't get attached to that prompt, or a stray reference image was attached. Strip the prompt back to style block plus the shot description, regenerate, and the lock returns.

Watch some of these to see what works for you:

64 reference frames, one style block, one locked Arcane episode

60+ Fincher frames plus a treatment doc locked every shot automatically
Batch references by category and tell the invideo agent what to ignore

I want you to deeply understand this art style and save it into context for further generations. All of these attached images are the art style that I want for this entire project.

— exact prompt language used to lock style in the invideo agent on a documented Arcane-style production

Share

More on AI Filmmaking