AI Filmmaking

How do you extract colour palettes and textures from a reference image to guide AI video generation?

Last updated June 26, 2026

Don't drop illustrated or animated references straight into the prompt — that copies the image. Instead, have the invideo agent READ the reference's colour palette and texture qualities and translate them into a written prompt: named tonal modes with hex values, surface descriptors (brushstroke, plasticky-sharp, hand-painted), and explicit negative constraints. Then attach those as conditioning across every generation.

invideo is an agentic video tool with every current image and video model (Recraft, Nano Banana, GPT-Image-2, Veo, Kling, Seedance 2.0) available inside one agent — so palette and texture extraction lives in the same conversation as the generation that uses them.

Step 1 — extract the palette as named tonal modes with hex values. Upload the reference(s) and instruct the invideo agent to analyze the colour script and return it as discrete modes — for example "Mode A — split-toned amber and emerald, #E8A547 / #2F5D4A, 70/30 weighting, warm key from practicals only" rather than vague descriptors like "warm cinematic". Sampling dominant colours from keyframes (the same logic k-means clustering uses on reference frames) is the foundation here, and reducing it to named modes with hex is what makes the palette reproducible across shots. Hridaye, invideo's creative director, was explicit on why this matters: "The better move was to have Agent 1 read the colours and textures of them and prompt for that instead." In one documented production the result was immediate — "The gens came back hyper-realistic with the exact colour temperature I was looking for."

Step 2 — extract the texture qualities as written surface descriptors. Ask the invideo agent to describe the reference's material qualities in words a video model can act on: brushstroke vs. photographic, matte vs. specular, grain density, edge softness, painterly vs. plasticky. For a hand-painted reference, that came out as "Every surface has hand-painted brushstroke texture. Every element in frame must feel painterly and handcrafted." This is the prose equivalent of SVBRDF / material-palette decomposition — you're separating surface response (roughness, normal) from colour so the model conditions on both independently. If you need exact maps rather than descriptors, generate them as image inputs (Recraft or Nano Banana) and attach those alongside the palette modes.

Step 3 — lock the extracted palette + texture as a persistent style block. Tell the invideo agent: "deeply understand this and save it into context for further generations." Once locked, every prompt downstream inherits it — across one documented production "Every prompt after this started with it," applied to all 164 generated clips for a 3-minute episode at ~$315 per finished minute. This is what gives you temporal consistency across shots: the same palette modes and surface rules condition every clip, instead of drifting per generation.

Step 4 — write the negative constraints explicitly. Palette and texture extraction fails when the model defaults to its training prior (usually photorealistic). Spell out the inverse: "This MUST look and feel like [the reference's surface language] — not live action, not photorealistic." Negative prompts belong in the same style block — they enforce the texture half of the extraction.

Step 5 — route to the right video model. Inside the invideo agent, Seedance 2.0 reference-to-video carries palette and material context across clips natively; Kling holds multi-shot palette continuity well; Veo handles photoreal colour scripts cleanly. You don't pick the platform per look — the invideo agent routes each shot to the model that best holds the extracted palette, then keeps the style block attached. Across documented productions, palette-and-texture-locked workflows produced finished work at $315–$750 per minute (Arcane-style episode $315/min; horror short ~$580/min; Wong Kar-wai short ~$643/min; 2-minute brand promo $750/min) — variance is mostly iteration budget, not look-development cost.

Beyond extraction itself: where exact colour matching across a sequence matters more than stylistic translation, palette-transfer / HALD CLUT tooling (e.g. VideoColorMatch) handles shot-to-shot grade matching as a post step — complementary to the extraction-and-condition workflow, not a replacement for it.

Watch some of these to see what works for you:

Batch references by category and tell the invideo agent exactly what to extract
Lock a colour palette and cinematic style into the invideo agent once, use it everywhere

The better move was to have Agent 1 read the colours and textures of them and prompt for that instead.

— Hridaye, invideo's creative director

Share

More on AI Filmmaking