How do you extract color and texture from a reference image to write better AI video prompts?
Last updated June 26, 2026
Extract colour and texture from a reference by instructing the AI to read the image's palette and texture qualities and translate them into prompt language for your target style — never by dropping an illustrated or animated reference directly into the prompt, which makes the model copy the source aesthetic instead of borrowing its look.
Start with the failure mode this technique solves: dropping an illustrated or animated reference image straight into a video prompt does not work — the model replicates the cartoon aesthetic rather than borrowing its colour and texture. "The better move was to have Agent 1 read the colours and textures of them and prompt for that instead," as invideo's creative team documented after testing both approaches. invideo is an agentic video creation tool with all the current video and image models available, so the steps below run through the invideo agent rather than a raw prompt box.
1. Upload the reference and ask for a read, not a copy. Instruct the invideo agent to read the colour palette and texture qualities of the image and write them into a prompt for your target rendering style — for example: "read the warm amber-green split tones and the rough surface texture of this frame, and translate them into a photorealistic scene." In one documented production, the generations "came back hyper-realistic with the exact colour temperature I was looking for" — the invideo agent understood creative intent from the image rather than ripping it off.
2. Say what to take and, just as importantly, what to leave out. Tell the invideo agent the reference is there for colour theory only, and to ignore its composition, scale, or subject. When no single image explains the look, batch references by theme — one batch for colour, another for texture or spatial logic — and give each batch its own adopt/ignore instructions; one production requested 3 grids per generation round off batched references this way. Exclusion prompting is what keeps unwanted attributes from leaking into the output.
3. Quantify the extracted palette so it's reproducible. Have the invideo agent name what it read as tonal modes with exact hex values — e.g. "Mode A — split-toned amber and emerald" plus its hex anchors. One production encoded a director's entire colour philosophy as named modes this way; hex-anchored modes let you repeat the exact palette across every shot instead of re-describing colours from memory each time. Anchor lighting language to the reference too — "warm yellow from the lamps only, like all the refs" produces more accurate results than generic "warm lighting."
Once extracted, treat the colour-and-texture block as a fixed component: place it in the same position in every prompt (one production held a fixed 9-element assembly order with palette as its own slot across the whole film), and tell the invideo agent to store the block in context so subsequent shots inherit it without re-prompting.
One boundary case: if your reference's style IS your target style — say, animated frames for an animated film — skip extraction and upload the frames directly with an instruction to save the art style to context; extraction is specifically for when reference style and output style differ. And any colour nuance you can't fully lock at generation can be finished in the grade afterwards.
Watch some of these to see what works for you:
The better move was to have Agent 1 read the colours and textures of them and prompt for that instead.
— invideo's creative team