Why does AI video fail when two characters touch or hug?

AI video models have no collision logic, so they treat bodies in contact as one blended shape. This causes fused limbs, merged skin tones, and extra arms unless you guide the model with references and precise prompts.

What kind of prompt language works best for character contact shots?

Use anatomical, body-part-specific language instead of vague terms like they hug. Describe exactly which limb does what, where boundaries stay, and add a negative prompt targeting fused limbs, body merging, and skin blending.

Should I generate a hug or carry as one clip or multiple shots?

Break it into separate generations: one for the approach, one for the moment of contact, and one for the hold. Reattach character sheets each time and stitch the best segments together in editing.

Which AI video model handles two-character contact best?

Seedance 2.0 reference-to-video handles carries and sustained contact best because it accepts both character reference sheets and a reference video simultaneously, preserving spatial context across the clip.

How does a phone mock or hand sketch help with character contact shots?

Filming the pose on your phone gives the model a real spatial map of how two bodies fit together. For angles a phone cannot capture, a hand sketch of the arrangement can be uploaded to anchor the video generation instead.

How to Prompt AI Video With Two Characters Touching

Two-character contact breaks AI video faster than almost any other shot because the models have no collision logic — they treat bodies in contact as one blended shape. Solve it with a four-part stack: lock both characters as references, stage the contact with a phone-shot mock or hand sketch, prompt the physics in body-part-specific language, and frame the contact to avoid full-body overlap.

Start by locking each character as its own reference sheet before you attempt any contact shot. Generate four options per character, pick one, and include close-up panels — not just wides — so small details (skin tone, sleeve edge, hairline) stay distinct when two bodies overlap. The invideo agent holds these sheets in context and reattaches them on every generation, which is what stops the model from collapsing two characters into one body.

Then give the model a physical reference for the contact itself instead of prompting it cold. Act the shot out on your phone — one person carrying, hugging, or touching another — and upload that clip as a reference video; the invideo agent routes it to Seedance 2.0 reference-to-video along with your character sheets so the model has a real spatial map of how the two bodies fit together. For configurations a phone can't capture (a character carried piggyback at an odd angle, two characters tangled in a prop), hand-sketch the arrangement and upload the drawing — the agent feeds it into the image model to build a fused character sheet that anchors the video generation. One documented production where 75% of the film was a vampire carrying a juice-box character used exactly this loop: phone mocks for POVs, a hand sketch for the carry rig, then reference-to-video with both character sheets attached.

Write the contact in anatomical, body-part-specific language — never "they hug." Spell out which limb does what and where the boundary stays: "the taller character's left arm wraps around the shorter character's shoulders, fingers naturally extended on the upper back, distinct skin tones and sleeve boundaries maintained, no body merging." Pair this with a negative prompt aimed at the exact failure mode: "no fused limbs, no body merging, no skin blending between characters, no extra arms, no morphing." Hridaye, invideo's creative director, puts the underlying problem plainly: "Multi-character consistency (ropes, props, bodies in contact) breaks models faster than anything else" — which is why the prompt has to do the collision work the model won't.

Break the interaction into separate generations instead of asking for the whole motion in one clip. Generate the approach as one shot, the moment of contact as a second, and the carry or hold as a third — each with its own prompt, each with the character sheets reattached. This is also where Frankenstein shot assembly earns its keep on contact scenes: across documented productions, average 3 generations per usable shot and roughly 40% of final shots stitch together segments from 2 or more generations — for a hug or a carry, you'll often keep the approach from one gen, the contact beat from another, and the hold from a third.

Use framing to dodge the hardest frames entirely. Cut on the moment of contact rather than asking the model to resolve full-body overlap; shoot the touch from behind one character so only one body is fully visible; use shallow depth of field to soften the contact zone; or play the touch as a reaction shot on the other character's face. These are the same choices a cinematographer makes on set, and they remove the frames where models fail.

On model routing: Seedance 2.0 reference-to-video handles two-character carries and sustained contact best because it accepts character references and a reference video simultaneously, carrying spatial context across the clip. Kling 3.0 handles natural motion well for embraces with less rigid contact. The invideo agent has every current video model available and routes each shot to the right one — so you don't pick a platform per model, you describe the shot and let the agent send the carry to Seedance 2.0 and the gentler embrace to whichever model the context favors.

If you're running this inside an agent crew, give the storyboard agent the contact shot first so the geometry is settled before any video gen — who is taller, which arm goes where, what the camera sees — then hand the locked storyboard plus both character sheets to the DOP agent for the actual prompts. Settling the geometry on paper costs nothing; settling it in video costs credits.

Watch some of these to see what works for you:

See how a hand sketch and phone mock cracked a two-character carry shot

Multi-character consistency (ropes, props, bodies in contact) breaks models faster than anything else.

— Hridaye, invideo's creative director

How do you prompt AI video to show two characters touching, hugging, or carrying each other?

More on AI Filmmaking

How do you prompt AI video to show two characters touching, hugging, or carrying each other?

Related questions

More on AI Filmmaking