AI Filmmaking

How do you use a hand-drawn sketch as a reference to control character positioning in AI video?

Last updated June 26, 2026

To control character positioning with a hand-drawn sketch, draw the exact spatial arrangement — who is in front, where bodies make contact — and upload it to the invideo agent as a visual reference. The agent attaches the sketch plus your character sheets to an image model, returns a styled still matching your drawing, and you animate from that.

A hand-drawn sketch works as a positioning reference because it shows the model a spatial configuration that text prompts can't reliably describe — you draw the arrangement once, and the invideo agent translates it into a styled, locked still before any video is generated. invideo is an agentic video creation tool with the current image and video models available, so this whole workflow runs in one place. Use it whenever text prompts keep returning the wrong arrangement — most often multi-character contact shots: carries, props, bodies touching.

1. Draw the arrangement, not the art. Sketch exactly how the characters relate in space: which character is in front, where the contact points are, which limbs overlap. Stick figures are enough — the sketch carries positioning information, not visual quality.

2. Get your character sheets into context first. Upload or generate multi-angle character sheets so the invideo agent knows who each figure in the drawing is — one documented production covered four characters and a prop with just 11 reference images. The sketch carries position; the sheets carry appearance — keeping those two jobs separate is what makes the result controllable.

3. Upload the sketch to the invideo agent as a visual reference. Tell it explicitly what to take and what to leave out: adopt the spatial configuration, ignore the drawing style. Stating exclusions is as important as stating inclusions — a stray instruction or wrong attachment produces completely incorrect output.

4. Let the invideo agent route the sketch to an image model. The agent attaches your drawing and character sheets to an image model — Nano Banana or GPT-Image-2 — and prompts iteratively until it returns a fused still: both characters rendered in your locked style, in exactly the sketched arrangement. One documented production used this exact move when text prompts couldn't render a two-character carry configuration: "He hand sketched how we want juice box character attached to our vampire character. We took that drawing and we uploaded that to our agent one who then in turn took that and then attached that to Nano Banana and prompted his way to finally get us the perfect character sheet." That fused arrangement ended up appearing in 75% of the finished film.

5. Approve the still before generating any motion. Frames first, then video: once the positioning still passes, use it as the reference for video generation — Seedance 2.0 reference-to-video accepts character and location references alongside it, so the sketched arrangement holds as the shot moves. If positioning drifts in motion, fix the still and regenerate the clip from it rather than re-rolling video blind — image generation costs little compared to video credits. And if the image model misreads the sketch itself, redraw with clearer separation between figures and restate in text which character is in front before re-uploading.

Watch some of these to see what works for you:

How a hand sketch unblocked a multi-character carrying shot in AI video
Per-beat character sheets keep evolving arrangements consistent across shots

He hand sketched how we want juice box character attached to our vampire character. We took that drawing and we uploaded that to our agent one who then in turn took that and then attached that to Nano Banana and prompted his way to finally get us the perfect character sheet.

— invideo's creative team

Share

More on AI Filmmaking