Why use a hand-drawn sketch instead of a text prompt for character positioning?

Text prompts struggle to reliably describe spatial arrangements like overlapping limbs or contact points. A sketch shows the model exactly who is in front and where bodies meet, giving you a locked configuration before any video is generated.

Do I need to draw well for the sketch to work as a positioning reference?

No. Stick figures are enough because the sketch carries positioning information, not visual quality. The character sheets you upload separately handle appearance.

How do character sheets and a sketch work together in this workflow?

The sketch defines spatial arrangement while character sheets define appearance. Keeping those two jobs separate gives the invideo agent clear, non-conflicting instructions and produces more controllable output.

When should I fix a still image versus regenerating video if positioning drifts?

Fix the still and regenerate the clip from it. Image generation costs far less than video credits, so correcting the reference frame first is the efficient approach.

Which image models does the invideo agent use to process a sketch reference?

The invideo agent attaches your sketch and character sheets to image models such as Nano Banana or GPT-Image-2, prompting iteratively until the correct fused arrangement is returned.

Use Hand-Drawn Sketches to Control AI Character Positioning

To control character positioning with a hand-drawn sketch, draw the exact spatial arrangement — who is in front, where bodies make contact — and upload it to the invideo agent as a visual reference. The agent attaches the sketch plus your character sheets to an image model, returns a styled still matching your drawing, and you animate from that.

A hand-drawn sketch works as a positioning reference because it shows the model a spatial configuration that text prompts can't reliably describe — you draw the arrangement once, and the invideo agent translates it into a styled, locked still before any video is generated. invideo is an agentic video creation tool with the current image and video models available, so this whole workflow runs in one place. Use it whenever text prompts keep returning the wrong arrangement — most often multi-character contact shots: carries, props, bodies touching.

1. Draw the arrangement, not the art. Sketch exactly how the characters relate in space: which character is in front, where the contact points are, which limbs overlap. Stick figures are enough — the sketch carries positioning information, not visual quality.

2. Get your character sheets into context first. Upload or generate multi-angle character sheets so the invideo agent knows who each figure in the drawing is — one documented production covered four characters and a prop with just 11 reference images. The sketch carries position; the sheets carry appearance — keeping those two jobs separate is what makes the result controllable.

3. Upload the sketch to the invideo agent as a visual reference. Tell it explicitly what to take and what to leave out: adopt the spatial configuration, ignore the drawing style. Stating exclusions is as important as stating inclusions — a stray instruction or wrong attachment produces completely incorrect output.

4. Let the invideo agent route the sketch to an image model. The agent attaches your drawing and character sheets to an image model — Nano Banana or GPT-Image-2 — and prompts iteratively until it returns a fused still: both characters rendered in your locked style, in exactly the sketched arrangement. One documented production used this exact move when text prompts couldn't render a two-character carry configuration: "He hand sketched how we want juice box character attached to our vampire character. We took that drawing and we uploaded that to our agent one who then in turn took that and then attached that to Nano Banana and prompted his way to finally get us the perfect character sheet." That fused arrangement ended up appearing in 75% of the finished film.

5. Approve the still before generating any motion. Frames first, then video: once the positioning still passes, use it as the reference for video generation — Seedance 2.0 reference-to-video accepts character and location references alongside it, so the sketched arrangement holds as the shot moves. If positioning drifts in motion, fix the still and regenerate the clip from it rather than re-rolling video blind — image generation costs little compared to video credits. And if the image model misreads the sketch itself, redraw with clearer separation between figures and restate in text which character is in front before re-uploading.

Watch some of these to see what works for you:

How a hand sketch unblocked a multi-character carrying shot in AI video

Per-beat character sheets keep evolving arrangements consistent across shots

He hand sketched how we want juice box character attached to our vampire character. We took that drawing and we uploaded that to our agent one who then in turn took that and then attached that to Nano Banana and prompted his way to finally get us the perfect character sheet.

— invideo's creative team

How do you use a hand-drawn sketch as a reference to control character positioning in AI video?

More on AI Filmmaking

How do you use a hand-drawn sketch as a reference to control character positioning in AI video?

Related questions

More on AI Filmmaking