What is the true cost per usable AI video clip when you account for failed generations?
Last updated June 26, 2026
The true cost per usable AI video clip is roughly 3–4x the per-generation sticker price. Documented productions average 3 generations per usable shot, and editorial yield runs about 25% — on one 3-minute animated episode that put the all-in cost near $23 per final-cut clip versus under $6 per raw generation.
Per-generation pricing tells you what a clip costs to attempt; cost per usable clip is what it costs to get one into your edit, and the gap between the two is the failure multiplier. Documented production data puts that multiplier at 3–4x.
The yield math. One animated episode tracked this end to end: "Out of 164, 41 videos made the cut, and on average only 5 seconds of each 15-second clip was used. That's how 41 clips became a 3-minute episode," per invideo's creative team. At a $950 all-in spend, that works out to under $6 per raw generation. A ~90-second horror short shows the same pattern at higher volume: roughly 400 video generations plus 30 image generations for $870 — about $2 per attempt — to finish 90 seconds. Even reference assets carry the multiplier: locking one character's visual identity took ~5 generation attempts, about $9.78 per character.
Cost per finished minute, across documented productions:
| Production | Finished length | All-in cost | Cost per finished minute |
|---|---|---|---|
| 3-minute animated episode (2-person team) | 3:00 | $950 | $315 |
| ~90-second horror short | 1:30 | $870 | ~$580 |
| 70-second short film | 1:10 | $750 | ~$643 |
| 2-minute brand promo | 2:00 | $1,500 | $750 |
The spread is $315–$750 per finished minute, with totals from $750 to $5,000 across five documented productions (the top end a 4-person short with multiple locations and VFX on 20,000 credits). The variance is natural — team size, style complexity, and shot types all move the multiplier.
Failed generations are partially recoverable. More than 40% of that animated episode's final shots — 17 shots — were Frankenstein shots: the strongest seconds from 2 or more generations of the same prompt stitched into one composite. Each 15-second generation typically contains 4–7 usable shot candidates, so before re-rolling, mine the takes you already paid for — it raises effective yield without spending new credits.
Lowering the multiplier before you generate. invideo is an agentic video creation tool with all the current video models — Veo, Kling, Seedance 2.0 — available, and the invideo agent routes each shot to the right model so a model mismatch doesn't become a failed generation. Run it in Always Ask mode to approve every prompt and attached reference before credits are spent. Lock character sheets and environment references before any video generation — that single step prevents the consistency failures behind most discarded clips. Let the invideo agent flag model limits up front: in one production it caught that an 18-cuts-in-15-seconds scene exceeded what the model could deliver and recommended splitting it before any credits were burned. And when a good clip has one continuity error, trace and fix the source character sheet and regenerate only what's needed instead of re-rolling the whole shot.
Budgeting rule. Plan raw generation volume at 3–4x your target runtime and treat the overage as a deliberate budget line, not waste — the yield rates above are the norm with current models, not a failure of technique.
Watch some of these to see what works for you:
Out of 164, 41 videos made the cut, and on average only 5 seconds of each 15-second clip was used. That's how 41 clips became a 3-minute episode.
— invideo's creative team