How many reference images do you need per character for a two-character AI video?

Build a 4-angle head-to-toe character sheet plus a headshot per character, then lock one version from four generated options before any video work begins. One documented short used 11 total reference images across its full cast before generating a single video frame.

How do you handle physical-contact shots like hugs or fights between two AI characters?

Feed a hand-sketched or photographed mock of the contact configuration alongside character sheets so the model sees the geometry, or split the action into a wider lead-in and a tighter coverage shot. Overlapping subjects accelerate identity drift faster than any other shot type.

How many generations should you expect per usable two-character shot?

Plan for an average of three generations per usable shot, with each 15-second clip typically yielding only four to seven usable seconds. About 25% of generated clips make a final cut, so budget for overgeneration from the start.

Can you keep two AI characters consistent without LoRA fine-tuning?

Yes, the locked character sheet workflow can hold two characters consistent across 70 seconds with no LoRA needed. LoRA fine-tuning with 20 to 30 references per character is the higher-effort ceiling for long-form or hero characters requiring maximum fidelity.

Film Two-Character Scenes with Consistent Faces in AI Video

Q: How do you prevent face-merging when two AI characters share the same frame?

Use a model that accepts multiple references, like Seedance 2.0 or Kling, and pass both locked character sheets plus a costume reference into every two-character prompt. Avoid plain text-to-video for shared-frame shots, as that is where identity contamination happens most.

Lock each character separately before they ever share a frame: build a multi-angle character sheet per person (front, 3/4, profile, close-up plus head-to-toe in costume), lock 4 options and pick one, then feed BOTH locked sheets plus a costume reference into every two-character generation. Generate in 15-second chunks, approve shot by shot, and stitch the best seconds across takes.

Start with the four pre-production answers the invideo agent forces before any pixel: who Character A is, who Character B is, what each is wearing/holding, and your delivery format. The invideo agent is an agentic video tool that holds all current video and image models (Recraft, Nano Banana / Nano Banana Pro, Seedance 2.0, Veo, Kling, Runway) behind a single context — so the same locked character data flows from image gen into video gen without you reattaching it every time.

1. Build a separate reference pack per character. Generate headshots in Recraft (it produces real skin — pores, lines, stubble — which stops faces from looking like the same default AI mannequin). Then build a 4-angle, head-to-toe character sheet per character in Nano Banana Pro at 4K (front, 3/4, profile, back, plus a face close-up). One documented two-character short produced 11 total reference images across its full cast — headshots and head-to-toe refs for every character and key prop — before generating a single video frame. Remove props from hands during turnaround generation; props inconsistency across angles is a common break.

2. Lock 4 options per asset, pick one, then freeze. For each character sheet and each costume, generate four variations and choose the strongest before any video work begins. This single step is what prevents drift across the rest of the film — the same documented short used this 4-options-then-lock pass and held two characters consistent across 70 seconds with no LoRA. For long-form or hero characters where you need maximum fidelity across dozens of scenes, LoRA fine-tuning (20–30 refs per character) is the higher-effort ceiling, but the locked-sheet workflow gets you there for most short-form work.

3. For shared-frame shots, route to a model that accepts multiple references. This is where model choice matters and where the invideo agent does the routing for you. Seedance 2.0 reference-to-video accepts character references plus location references in one call, which is what you want for two people in one frame — it carries both identities into the generated clip rather than re-imagining them. Kling's multi-reference inputs (up to four reference images) work for the same purpose. Pass both locked character sheets, the costume reference, and a location plate into the same prompt. Avoid plain text-to-video for two-character shots — that's where face-merging and identity contamination (Character A's features bleeding into Character B) happen most.

4. Generate in 15-second chunks with shot-by-shot approval. Use the invideo agent in Always Ask mode so every prompt and every attached reference is approved before credits spend. Attach both character sheets and the costume reference to EVERY two-character prompt — not once at project start, every time. The repetition is the consistency mechanism.

5. Plan to stitch — most usable two-character shots are composites. Across one documented production, average 3 generations per usable shot, and 17 of the final shots were stitched from 2+ generations — Frankenstein shot assembly, where you take the strongest seconds of one generation and the strongest seconds of another and cut them into one shot. Each 15-second clip typically yields 4–7 candidate seconds; pick the ones where both faces and both costumes are correct and discard the rest. Plan overgeneration as a budget line, not a failure — about 25% of generated clips make a final cut.

6. Handle physical-contact shots as a special case. Hugs, fights, handshakes, carries — overlapping subjects accelerate identity drift faster than any other shot type. As one documented production noted, "multi-character consistency (ropes, props, bodies in contact) breaks models faster than anything else." When prompting alone fails, two fixes work: (a) generate the shot with a stronger reference — feed a hand-sketched or photographed mock of the configuration alongside the character sheets so the model sees the geometry, and (b) split the action into a wider lead-in shot and a tighter coverage shot rather than holding both faces sharp through the contact moment. A continuity error in the result doesn't mean re-rolling the whole shot — ask the invideo agent to inspect the character sheet, find the panel with the error (it identifies which specific panel), fix it there, and only the affected shots regenerate.

7. For dialogue between the two characters, shoot coverage. Generate the two-shot wide once, then ask for the opposite angle in the same conversational session (over-the-shoulder on A, then over-the-shoulder on B) so you get a matched coverage pair. Cutting between singles is also how you mask any residual drift — if Character B's collar drifts slightly between shots, a cut to Character A's coverage hides it.

As Hridaye, invideo's creative director, puts it: "Seventy seconds. Two characters. The same person across every scene. No LoRA needed." The mechanism behind that is the four steps above — locked sheets, multi-reference routing, chunked approval, and composite assembly — held in one agent context so you never re-explain who these two people are.

Watch some of these to see what works for you:

See how the invideo agent cracks multi-character physical contact shots with hand-drawn sketches

Watch the invideo agent build character sheets and lock costumes before a single video frame generates

Real production stats: how many clips it takes to hold two characters consistent across a full episode

Seventy seconds. Two characters. The same person across every scene. No LoRA needed.

— Hridaye, invideo's creative director

How do you film a two-character scene with consistent faces and costumes in AI video?

More on AI Filmmaking

How do you film a two-character scene with consistent faces and costumes in AI video?

Related questions

More on AI Filmmaking