Make an AI animated short film in six steps: pick a dialogue-light story and test animation styles, lock the style with a batch of reference frames, load your full script, lock character sheets before any video, generate in short approved chunks, then edit for yield. One documented 2-person team finished a 3-minute animated episode in 2 days for ~$950.
Start by choosing a story that plays to current video models: short, dialogue-light, with a clear visual arc — multi-character physical contact (ropes, props, bodies touching) breaks models faster than almost anything else, so minimize those beats at the script stage. Before committing to a look, generate the same two or three script frames in each candidate animation style side by side (one production tested Ghibli against 3D this way) and decide from the actual frames, not a hunch. invideo is an agentic video creation tool with all the current image and video models available, so every step below runs in one place.
Step 1 — Lock the visual style in one batch. Upload a large batch of reference frames from your target aesthetic in a single message and tell the invideo agent to save it to context — one production uploaded 64 frames from its reference show before generating anything. Make the style block explicit about what the output must NOT be: one team's block specified hand-painted brushstroke texture and prohibited live-action and photorealistic results to stop drift. Every generation prompt afterward starts with that style block.
Step 2 — Load the full script. Give the invideo agent the complete screenplay before any generation so it holds characters, arc, and themes for every downstream task. On longer projects, work act by act — complete storyboarding, generation, and editing for one act before starting the next — so the context never degrades; a 7-minute animated short was built in 25% increments this way.
Step 3 — Lock characters and props before any video. Generate multi-angle character sheets (front, side, back, plus face close-ups — close-up panels are what keep small details like scars and accessories consistent), produce several options per asset, and pick one before moving on. One animated production locked 4 characters and 1 prop with just 11 reference images; locking a character's identity took about 5 generations at roughly $9.78 per character. Nano Banana handles character sheets well, while Recraft or GPT-Image-2 work for portrait-level reference images — all available inside invideo. A 70-second short kept 2 characters consistent across every scene using only character sheets and the invideo agent's context — no LoRA fine-tuning required.
Step 4 — Generate the film in short chunks with approval on. Have the invideo agent build a shot list from the script, then generate clip by clip in 15-second segments in your film's aspect ratio, attaching the character sheets and style block to every prompt. Run the invideo agent in Always Ask mode so you approve each prompt and its references before credits are spent. On model choice: Seedance 2.0 reference-to-video carries character and location references across clips, and Kling 3.0 generates multi-shot sequences natively — the invideo agent routes each shot to the right model, so you never have to pick a platform per model.
Step 5 — Edit for yield, not for single perfect takes. Budget for overgeneration: one animated episode generated 164 clips and used 41 — a ~25% selection rate, averaging 3 generations per usable shot and only 5 seconds kept from each 15-second clip. Use Frankenstein shot assembly — stitching the best seconds from two or more generations of the same prompt into one shot — as a standard tool, not a fallback; 17 of that episode's final shots were composited this way. Assemble the cut in your editor (Premiere Pro or DaVinci Resolve both work), then upload the rough cut back to the invideo agent with an open "what's working, what's not" prompt — it catches pacing and register problems, and skipping this review is the most common mistake in AI-directed filmmaking.
Cost and timeline to plan around. Documented productions ran $750–$5,000 and 2–5 days depending on team and ambition — $315–$750 per finished minute across productions with known length and cost. A 2-person animated episode landed at the low end with no pre-production at all, while a 4-person multi-location short with VFX spent ~$5,000 over a 5-day sprint. The variance is natural: more locations, more characters, and more iterations push the number up.
Watch some of these to see what works for you:
I want you to deeply understand this art style and save it into context for further generations. All of these attached images are the art style that I want for this entire project.
— invideo's creative team, exact style-ingestion prompt used in production