AI Filmmaking

How do you upload style reference frames to an AI agent to lock a visual aesthetic across an entire project?

Last updated June 26, 2026

Send your style reference frames to the invideo agent as one batch in a single message, with an explicit instruction to save the style to persistent context — one documented production uploaded 64 frames from its target aesthetic — then add negative constraints stating what the style must never be, and attach that style block to every generation prompt afterward.

Batch the upload and make the save-to-context instruction explicit — that is the core move; everything else enforces it. invideo is an agentic video creation tool with all current video and image models built in, and its agent holds project context persistently, which is what makes a style lock possible.

1. Upload the frames as one batch with an explicit save instruction. Collect frames from your target aesthetic and send them in a single message rather than scattering them across the conversation. The instruction matters as much as the images: "I want you to deeply understand this art style and save it into context for further generations. All of these attached images are the art style that I want for this entire project." This loads the style into the invideo agent's working context — it is context loading, not fine-tuning, so no LoRA training is involved.

2. Write negative constraints into the style block. Positive frames alone leave room for drift, so state what the style must never be. A production locking a hand-painted animation look wrote: "This MUST look and feel like Arcane animation — not live action, not photorealistic. Every surface has hand-painted brushstroke texture." Without the prohibition, video models pull toward photorealism over successive generations.

3. Tell the invideo agent what to take and what to leave out. If no single set of frames explains the whole look, batch references by theme — spatial logic, palette, lighting — and give each batch explicit inclusion and exclusion instructions. Exclusion is as load-bearing as inclusion: telling the invideo agent which qualities of a reference to ignore prevents unwanted elements (scale, setting, medium) from leaking into the locked style.

4. Translate illustrated or animated references instead of pasting them into prompts. Dropping stylized reference images directly into generation prompts does not work; instruct the invideo agent to read the colour palette and texture qualities of the reference and prompt for those instead. In one production this returned generations with the exact colour temperature the director wanted.

5. Attach the locked style block to every generation prompt. In the documented animated production, every prompt after ingestion started with the style block — that discipline, not the upload alone, is what held the aesthetic across 164 generated clips. Persistent agent context replaces scene-by-scene re-prompting, which is the failure pattern behind style drift in per-generation tools.

6. Promote approved outputs to become the new references. After a few rounds, extract your strongest generated frames and let them replace the original reference frames as continuity anchors — subsequent scenes then generate closer to the locked look every round, because the invideo agent is referencing images already inside your project's aesthetic.

On how this relates to per-generation reference features: models like Kling and Seedance 2.0 accept reference images per clip, which controls one generation at a time; loading the style into agent context holds it across the entire project. Inside invideo the invideo agent routes each shot to the right model with the style block attached, so you never manage references platform by platform. As proof of scale: a 2-person team held a hand-painted style across a 3-minute animated episode — 164 clips generated, 41 in the final cut — for ~$950 total, about $315 per finished minute, in 2 days with no pre-production. If your project needs a director's full visual system rather than a single look, the same context mechanism accepts a complete visual-language or treatment document, which is its own workflow.

Watch some of these to see what works for you:

64 reference frames uploaded once to lock an Arcane style across 164 clips

Batch references by theme, tell the agent what to take and what to leave out
Feed a director's bible PDF plus reference stills — full unedited the invideo agent session

I want you to deeply understand this art style and save it into context for further generations. All of these attached images are the art style that I want for this entire project.

— invideo's creative team, exact style-ingestion prompt used in production

Share

More on AI Filmmaking