What is the best pose for a character reference sheet used in AI video generation?
Last updated June 26, 2026
The best pose is a neutral, prop-free, front-facing full-body stance — T-pose for maximum limb clarity, A-pose for organic characters, or a relaxed neutral stand for stylized ones. Pair that with side and back views on the same sheet, and remove anything the character is holding before generating turnarounds.
Generate the sheet to these specs — every item exists because AI image and video models misread held objects as body parts and read crossed or dynamic limbs as different anatomy across angles, which is what causes character drift later in motion:
- Pose: neutral T-pose, A-pose, or relaxed stand — never dynamic or action.
- Framing: full body, character centered, shot at eye level.
- Arms: away from the torso so no limb overlaps the body silhouette.
- Hands: open and empty — no props, no weapons, no accessories held.
- Costume: fully visible head to toe, nothing occluded.
- Background: plain and neutral, no environment detail.
Rank your pose by character type. T-pose (arms straight out at shoulder height) is the most unambiguous — every limb is fully visible, nothing occludes the torso, and it's the cleanest reference for downstream rigging or motion. A-pose (arms ~45° down) reads more natural on organic, human characters and avoids the shoulder distortion T-pose can introduce on realistic anatomy. A relaxed neutral stand (arms loosely at sides, weight even) is acceptable for stylized or chibi characters where T/A-pose looks unnatural — but only if the arms stay clear of the torso silhouette.
Build the sheet as a multi-angle grid, not a single image. Standard layout is front, 3/4, side, and back of the same neutral pose, plus a face close-up and a mid-angle closeup so small details (scars, accessories, costume seams) survive the model's compression. In documented productions, four angles per character at 4K were generated through the invideo agent using Nano Banana (with Nano Banana Pro preferred where character fidelity matters most), and Recraft handled the photoreal face portrait with the skin-level imperfections — pores, lines, stubble — that keep the face from looking plastic.
Remove props before the turnaround, then sheet them separately. Anything in the character's hands — a weapon, a phone, a toy — gets stripped out of the neutral sheet and given its own reference. One documented production hand-sketched a complex physical arrangement between two characters, uploaded the drawing to the invideo agent, and had it routed into the image model to produce a fused sheet that text prompting alone couldn't visualize. Same logic: solve identity first, then layer the prop interaction as a separate brief.
Add a per-beat sheet when the character changes. If your character picks up a trinket, swaps a costume, or evolves across the film, generate a distinct neutral sheet for each beat. In one production the character accumulated a new trinket in every sequence, which required a fresh sheet per beat — without that, the model averages the variants and identity drifts. Action poses live on a separate deliverable from the neutral turnaround, never on the same sheet.
Lock the sheet before any video generation. Generate four options per sheet, pick the best, and store it in the invideo agent's context so every downstream shot inherits the same identity. Across documented productions, locking one character this way took roughly five generation attempts at about $9.78 per character; eleven total reference images covered four characters and one prop on a 3-minute episode. Once locked, the invideo agent routes that sheet into the right video model per shot — Seedance 2.0 for reference-to-video continuity, Kling or Veo where their strengths fit — without you switching tools. If a continuity error shows up later, ask the invideo agent to inspect the sheet rather than re-rolling the shot: in one documented case it identified the exact panel containing the error, fixed it at the source, and every subsequent shot inherited the correction.
As Hridaye, invideo's creative director, puts it: "the AI always needs to see what the character is exactly, right? Or else it'll kind of hallucinate and imagine something that's under the cap. So, we don't want to do that. We always want the character to be seen as we see it on the character sheet." That's the whole reason the pose has to be neutral and the hands have to be empty — the sheet is the model's only ground truth for who this character is.
Watch some of these to see what works for you:
the AI always needs to see what the character is exactly, right? Or else it'll kind of hallucinate and imagine something that's under the cap. So, we don't want to do that. We always want the character to be seen as we see it on the character sheet.
— Hridaye, invideo's creative director