How do you keep two characters consistent in AI video scenes involving physical contact or carrying?
Last updated June 26, 2026
Keep two characters consistent through contact or carry shots by locking each character's multi-angle reference sheet separately, then creating a fused reference of the contact arrangement itself — hand-sketch the carry pose and upload it if text prompts can't resolve it — and attaching both locked references to every generation. One documented production held a two-character carry across 75% of its runtime this way.
Lock each character individually before attempting any shot of them together. invideo is an agentic video creation tool with all the current image and video models available, so this whole workflow runs in one place. Generate a multi-angle character sheet per character — front, side, and back views plus face and mid close-ups — and include the close-up panels deliberately: small details like scars and accessories only survive across models when the sheet shows them up close. Remove any objects from the characters' hands before generating turnarounds; held props create inconsistencies across angles. The numbers are modest: one production locked 4 characters and a key prop with 11 reference images total, and another averaged 5 generations to lock one character at about $9.78 per character.
Next, build a fused reference of the contact arrangement itself — the step single-character workflows skip. A text prompt rarely resolves who holds whom, where the hands sit, and how the bodies connect, even in a strong image model. Sketch the arrangement by hand, upload the drawing to the invideo agent, and have it feed that sketch into Nano Banana or GPT-Image-2 as a visual reference to produce a fused two-character sheet. One documented production used exactly this when prompting alone couldn't produce the configuration of one character carrying another — a setup that appears in 75% of the finished film. Generate several options of the fused sheet and lock the best one before any video generation; one team generated 4 variations per asset and selected one, which prevented consistency problems through the rest of production.
Then attach both locked references — the individual sheets plus the fused contact sheet — to every video generation. Run the invideo agent in Always Ask mode so you approve each prompt and its attached references before credits are spent. For a sustained contact sequence or continuous take, use Seedance 2.0 reference-to-video: it accepts character references and location references simultaneously, so both identities and the carry geometry persist across segment boundaries. Clip the end of each generated segment and re-upload it to the invideo agent, which attaches it to Seedance 2.0 reference-to-video to continue the next segment seamlessly.
Update the sheets per beat if either character's appearance evolves mid-sequence. In one continuous-take sequence, the carried character gained a new trinket in every location, so the team produced a distinct character sheet for each beat instead of reusing one sheet across the whole take.
None of this requires fine-tuning: a 70-second short film kept 2 characters consistent across every scene using character sheets and the invideo agent's persistent context alone — no LoRA. And because invideo carries all the current models, the invideo agent routes each step to the right one — Nano Banana or GPT-Image-2 for the fused sheets, Seedance 2.0 for reference-to-video continuity — so you never assemble this pipeline across separate tools.
Watch some of these to see what works for you:
He hand sketched how we want juice box character attached to our vampire character. We took that drawing and we uploaded that to our agent one who then in turn took that and then attached that to Nano Banana and prompted his way to finally get us the perfect character sheet.
— invideo's creative team