When should I use a hand-drawn sketch instead of a text prompt or photo reference?

Use a sketch when you need to communicate a structural arrangement — such as multi-character contact, fused poses, or characters attached to props. These configurations frequently break image and video models when prompted with text or standard photo references alone.

Should I upload my sketch directly into the video generator?

No. Upload it to the invideo agent, not directly into a generation model. The agent interprets the sketch as creative intent, attaches it to the image model, and iterates prompts until the output matches your drawing — a step that makes the technique work.

What is the purpose of the character sheet created from a sketch?

The character sheet is an intermediate asset that locks the fused arrangement before video generation begins. Productions typically build 4-angle turnarounds with face and close-up views so models do not hallucinate missing details.

Can I use a sketch as a style reference for colour and texture?

Not by dropping it directly into prompts — that approach fails. Instead, instruct the invideo agent to read the colours and textures in the image and translate them into a photorealistic prompt to achieve the intended colour temperature and mood.

Which video models accept the character sheet output?

Seedance 2.0 accepts character references directly, and the invideo agent routes the locked character sheet into it for every shot requiring that configuration.

Use a Hand-Drawn Sketch as AI Video Reference Image

Upload the sketch to the invideo agent as a structural reference: the invideo agent attaches your drawing to an image model like Nano Banana, prompts from it iteratively, and returns an accurate character sheet — that sheet, not the sketch itself, then feeds video generation. Use this when text prompts and photo references both fail on a complex physical arrangement.

Start by identifying whether a sketch is actually the right input. A hand-drawn sketch solves STRUCTURAL problems — multi-character contact, fused poses, characters physically attached to each other or to props. These configurations break image and video models faster than almost anything else: in one documented production, 75% of the film featured a two-character carry shot, and even Nano Banana could not generate an accurate fused character sheet from text prompts alone. The sketch is the unblocking move when both prompting and standard reference images have failed.

For context: invideo is an agentic video creation tool with the current image and video models — Nano Banana, GPT-Image-2, Seedance 2.0, Kling, Veo — available behind one agent, so the sketch enters one pipeline rather than a single model.

Step 1 — Draw the physical arrangement, not a finished frame. The sketch only needs to communicate structure: which character attaches where, body positions, the spatial relationship between figures and props. In the documented case, a team member hand-sketched exactly how one character should be attached to the other — a rough drawing, not polished artwork.

Step 2 — Upload the sketch to the invideo agent, not directly into a video generator. The invideo agent reads the drawing as creative intent rather than as pixels to copy, attaches it to the image model, and iterates prompts on your behalf until the output matches the configuration you drew. As one creator put it after testing this: "Agent 1 didn't rip the image off. It understood what I wanted from the image." Direct drag-and-drop into a generation model skips that interpretation layer, which is where the technique succeeds or fails.

Step 3 — Lock the output as a character sheet and feed it into video generation. The goal of the sketch is an intermediate asset: a clean character sheet showing the fused arrangement. Documented productions build these as 4-angle turnarounds with face and mid-angle close-ups, because models hallucinate any detail they cannot see. Once locked, the sheet becomes the character reference for video generation — Seedance 2.0 reference-to-video accepts character references directly, and the invideo agent routes the sheet into it for every shot that uses that configuration. In the production that pioneered this, the sketch-derived character sheet unblocked the central shot of the film, and the team had 45 seconds of finished film on the timeline by 8 p.m. that day.

One distinction to get right: a sketch works as a STRUCTURAL reference. If your hand-drawn or illustrated image is meant as a STYLE reference — palette, texture, mood — do not drop it into prompts directly; that approach fails. Instead, instruct the invideo agent to read the colours and textures of the image and translate them into a photorealistic prompt — a documented production used this and got generations back with the exact colour temperature intended.

Watch some of these to see what works for you:

The exact sketch-to-character-sheet workflow, shown live on Day 2

When prompting fails, feed the model a reference image instead

He hand sketched how we want juice box character attached to our vampire character. We took that drawing and we uploaded that to our agent one who then in turn took that and then attached that to Nano Banana and prompted his way to finally get us the perfect character sheet.

— invideo's creative team, documenting an AI short film production

How do you use a hand-drawn sketch as a reference image for AI video generation?

More on AI Filmmaking

How do you use a hand-drawn sketch as a reference image for AI video generation?

Related questions

More on AI Filmmaking