AI Filmmaking

Reference-to-Video vs start/end frame chaining — which produces better AI video continuity across multiple clips?

Last updated June 26, 2026

Reference-to-Video produces better continuity. Start/end frame chaining gives the model context of exactly two still images — camera trajectory, lighting, and spatial logic between clips drift because nothing carries them. Reference-to-Video ingests the entire previous clip plus character and location references in the same generation pass, so movement, atmosphere, and identity hold across segment boundaries.

The reason start/end frame chaining drifts is structural: when you export the last frame of a clip and re-upload it as the start frame of the next, the model knows nothing about the shot except that single still. As one documented production put it, "it had no context of anything apart from the frame that you are uploading" — each chained clip re-derives camera movement, lighting direction, and location logic from scratch. That is why drift in frame-chained sequences shows up not just in character faces but in camera speed, atmosphere, and spatial geography, and why the standard fix in frame-chaining workflows is to keep re-anchoring to an original reference sheet.

Reference-to-Video changes what the model actually sees. You upload the full prior clip — not a frame — and Seedance 2.0 reads context from the end of that video to continue the next shot, so camera movement and stitching carry across the cut. Just as important, it accepts character references and location references in the same generation pass, which is something neither start/end frame chaining nor a plain extend operation can do: extend takes no character or location references at all. That simultaneous multi-reference input — prior clip + character sheet + location plate together — is the discriminating factor in this comparison. invideo is an agentic video creation tool with all the current video models available, and in practice you run this as a chained loop there: clip the usable end of each segment, re-upload it to the invideo agent, and the invideo agent attaches it to Seedance 2.0 Reference-to-Video with your locked references to continue the take — a documented 3-person production used exactly this loop to complete a multi-city continuous shot in a 2.5-hour window.

Reference-to-Video chaining is not lossless either — small deviations can still accumulate over many chained segments, so control drift by re-anchoring every segment to the same locked references rather than relying on the prior clip alone. Attach the character sheet and location reference to each pass, and if your character's appearance evolves mid-take (costume changes, accumulating props), generate a separate character sheet for each beat of the sequence — one production needed a distinct sheet per city because the character picked up a new trinket in each one.

Start/end frame chaining keeps one legitimate use case: a single clip where you know the exact opening and closing composition and nothing needs to carry beyond those two frames. For continuity across multiple clips, Reference-to-Video is the stronger architecture — and since Seedance 2.0 runs inside invideo alongside the other current models, the invideo agent handles the routing without you switching platforms.

Watch some of these to see what works for you:

See Reference-to-Video vs. legacy extend, live in production

Because you're uploading the entire video, Seed Dance seemingly takes some more context from the end of that video to continue the next shot. So even in terms of camera movement, stitching and things like that, it just feels way more seamless compared to the older way of doing the one-take with AI.

— invideo's creative team

Share

More on AI Filmmaking