Why won't text prompts alone keep two characters consistent in contact or carry shots?

Text prompts rarely resolve who holds whom, where hands sit, and how bodies connect, even in strong image models. A hand-sketched carry arrangement uploaded as a visual reference solves what words cannot.

What is a fused reference sheet and why do you need one?

A fused reference sheet shows both characters together in the specific contact arrangement — such as one carrying the other. It locks the spatial relationship between characters before any video generation begins, preventing inconsistencies that individual character sheets cannot address.

How do you maintain continuity across multiple video segments in a carry sequence?

Use Seedance 2.0 reference-to-video, which accepts character references and location references simultaneously. Clip the end of each generated segment and re-upload it so the next segment continues seamlessly from the same pose and identities.

Do you need fine-tuning or LoRA to keep characters consistent this way?

No. A 70-second short film kept two characters consistent across every scene using only character sheets and the invideo agent's persistent context — no LoRA or fine-tuning was required.

Should character sheets be updated mid-sequence if a character's appearance changes?

Yes. If a character gains a new accessory or detail at any story beat, produce a distinct character sheet for that beat rather than reusing one sheet across the entire sequence.

Keep Two Characters Consistent in AI Contact Scenes

Keep two characters consistent through contact or carry shots by locking each character's multi-angle reference sheet separately, then creating a fused reference of the contact arrangement itself — hand-sketch the carry pose and upload it if text prompts can't resolve it — and attaching both locked references to every generation. One documented production held a two-character carry across 75% of its runtime this way.

Lock each character individually before attempting any shot of them together. invideo is an agentic video creation tool with all the current image and video models available, so this whole workflow runs in one place. Generate a multi-angle character sheet per character — front, side, and back views plus face and mid close-ups — and include the close-up panels deliberately: small details like scars and accessories only survive across models when the sheet shows them up close. Remove any objects from the characters' hands before generating turnarounds; held props create inconsistencies across angles. The numbers are modest: one production locked 4 characters and a key prop with 11 reference images total, and another averaged 5 generations to lock one character at about $9.78 per character.

Next, build a fused reference of the contact arrangement itself — the step single-character workflows skip. A text prompt rarely resolves who holds whom, where the hands sit, and how the bodies connect, even in a strong image model. Sketch the arrangement by hand, upload the drawing to the invideo agent, and have it feed that sketch into Nano Banana or GPT-Image-2 as a visual reference to produce a fused two-character sheet. One documented production used exactly this when prompting alone couldn't produce the configuration of one character carrying another — a setup that appears in 75% of the finished film. Generate several options of the fused sheet and lock the best one before any video generation; one team generated 4 variations per asset and selected one, which prevented consistency problems through the rest of production.

Then attach both locked references — the individual sheets plus the fused contact sheet — to every video generation. Run the invideo agent in Always Ask mode so you approve each prompt and its attached references before credits are spent. For a sustained contact sequence or continuous take, use Seedance 2.0 reference-to-video: it accepts character references and location references simultaneously, so both identities and the carry geometry persist across segment boundaries. Clip the end of each generated segment and re-upload it to the invideo agent, which attaches it to Seedance 2.0 reference-to-video to continue the next segment seamlessly.

Update the sheets per beat if either character's appearance evolves mid-sequence. In one continuous-take sequence, the carried character gained a new trinket in every location, so the team produced a distinct character sheet for each beat instead of reusing one sheet across the whole take.

None of this requires fine-tuning: a 70-second short film kept 2 characters consistent across every scene using character sheets and the invideo agent's persistent context alone — no LoRA. And because invideo carries all the current models, the invideo agent routes each step to the right one — Nano Banana or GPT-Image-2 for the fused sheets, Seedance 2.0 for reference-to-video continuity — so you never assemble this pipeline across separate tools.

Watch some of these to see what works for you:

How hand-sketching a carry pose unlocked two-character AI consistency

Fix one drifted character panel without regenerating the whole scene

He hand sketched how we want juice box character attached to our vampire character. We took that drawing and we uploaded that to our agent one who then in turn took that and then attached that to Nano Banana and prompted his way to finally get us the perfect character sheet.

— invideo's creative team

How do you keep two characters consistent in AI video scenes involving physical contact or carrying?

More on AI Filmmaking

How do you keep two characters consistent in AI video scenes involving physical contact or carrying?

Related questions

More on AI Filmmaking