AI Filmmaking

What is the mock-shot reference technique and how do you use it in AI video generation?

Last updated June 26, 2026

The mock-shot reference technique is when you physically act out and film a difficult shot on your phone — a POV walk, a tricky camera move, a specific blocking — then upload that clip as a reference video alongside your prompt, so the AI model anchors on real motion and framing instead of guessing from text alone.

Use it whenever prompting alone keeps missing the shot — POVs, unusual angles, multi-character physical contact, or camera moves the model won't hold. Text prompts describe; a mock clip shows the model the motion arc, framing, eye-line, and pacing in one input. The invideo agent is an agentic video creation tool that lets you upload reference video alongside your prompt and routes the generation to the model that handles reference-to-video best (Seedance 2.0 carries the most context from an uploaded clip; Kling and Veo handle different motion characteristics).

The four-step workflow:

1. Stage and film the mock on your phone. Act out the shot in the office or wherever you are — walk the POV, swing the camera, block the action. Frame it roughly in your film's aspect ratio and hold it for the full duration you need generated. Quality doesn't matter; motion and framing do.

2. Strip it down to what you want preserved. Decide what the reference is carrying — the motion arc, the camera angle, the eye-line, the pacing — and what you want the model to replace — the subject, costume, setting, lighting. Be explicit about both when you upload.

3. Upload to the invideo agent with character and world references attached. Hand the mock clip to the agent together with your locked character sheets and location plates, and direct it conversationally: "hold this camera move and pacing, swap the person for our vampire character, relight in the world's palette." The agent feeds the clip into reference-to-video generation with full project context — character identity, lighting, palette — already loaded.

4. Iterate and select. Expect about three generations per usable shot on average. Across one documented production, 17 of the final shots ended up stitched from two or more generations of the same prompt — the mock anchors the model, but you still pick the strongest seconds across attempts.

A documented production used exactly this to crack a POV shot that text prompts kept botching: the team filmed a quick mock on a phone in the office, uploaded it as reference video, and the model finally held the POV. As Hridaye, invideo's creative director, puts it: "The lesson for the day truly is that when the models get stuck you draw, you shoot, you bring your hands in and you get it done. And that's when agent one meets you there and takes it over the line."

Where it's strongest: POV shots, complex camera movement (orbits, top-downs, push-ins through tight spaces), multi-character contact shots where bodies, ropes, or props touch, and any blocking the model keeps mangling from text. Where you don't need it: simple coverage and static framings — those resolve from prompt + character sheet alone.

Watch some of these to see what works for you:

Watch the mock-shot method crack a POV shot text prompts couldn't fix
See how the invideo agent solves shots video models refuse to generate

The lesson for the day truly is that when the models get stuck you draw, you shoot, you bring your hands in and you get it done. And that's when agent one meets you there and takes it over the line.

— Hridaye, invideo's creative director

Share

More on AI Filmmaking