How do you write AI video prompts differently for Runway, Kling, Sora, and Veo?
Last updated June 26, 2026
Each model rewards a different prompt shape: Runway Gen-4 wants short, motion-first prose describing how things behave; Kling wants the four-part formula Subject + Action + Scene + Camera/Lighting/Atmosphere; Veo wants natural cinematic language with structured JSON-style fields; Seedance 2.0 wants long, causally-chained narrative with explicit reference tags.
invideo is an agentic video creation tool with every current model — Runway, Kling, Veo, Seedance 2.0 — available in one workspace, so you write to the model that fits the shot instead of switching platforms. Here is what each one actually rewards.
Runway Gen-4 — write behavior, not appearance. Keep prompts tight, often 30 precise words instead of 100 vague ones. Lead with the camera move and the motion verb, then the subject's action — "slow dolly-in as she turns her head toward the window, breath fogging the glass." In image-to-video mode, never re-describe what is already in the image; the model already sees it. Only describe what moves and how. Long descriptive paragraphs about clothes, faces and backgrounds dilute Runway's motion model.
Kling — the four-element formula. Build every prompt as Subject + Action + Scene + Style (camera language, lighting, atmosphere). Short, concrete action sentences land better than mood writing: "A woman in a red trench coat (subject) walks quickly through neon rain (action), narrow Tokyo alley at night (scene), low handheld tracking shot, sodium-vapor key light, wet-pavement reflections, moody (style)." Always include the camera spec — omitting it is one of Kling's most common failure modes, alongside overloaded prompts, open-ended motion (which causes the generation to hang near completion), and vague spatial language. Kling 3.0 also accepts multi-shot prompts — label up to six shots in one prompt to storyboard a sequence in a single generation.
Veo — natural cinematic prose with structured fields. Veo reads ordinary cinematic English well, but it responds best when key parameters are pulled out as named fields. Write the scene as prose, then append a structured block: camera: 35mm, slow push-in. motion: subject walks frame-left to frame-right. lighting: golden hour backlight, soft fill. style: anamorphic, shallow depth of field, film grain. Treating camera, motion, lighting and style as discrete parameters makes Veo's output far more predictable than burying them in prose.
Seedance 2.0 — long, causal, reference-tagged. Seedance 2.0 rewards the opposite of Runway: longer is usually better, and it handles cause-and-effect chains other models drop. Describe the world's logic — "the candle gutters because the door swings open; the shadow on the wall stretches as she steps forward" — not just visual states. Its real edge inside the invideo agent is reference-to-video: tag character and location references directly in the prompt so the model carries identity, lighting and camera context across multi-shot sequences. "Wait, did Hridaye really mean to use the word "prompting" here?" — Hridaye, invideo's creative director, frames the shift this way: the more you treat the agent like a real crew member, the more it behaves like one, which matters because the invideo agent routes your direction to the right model and writes the model-shaped prompt for you.
The same scene across all four. Take one shot — "a vampire carries a small character across a cracked desert at dusk, slow handheld follow." Runway: "handheld follow, slow walk, dust kicks at every step, character's head bobbing with the gait." Kling: "A vampire (subject) carries a small juicebox-shaped companion across cracked desert ground (action), dusk light, distant mountains (scene), handheld follow shot, low-angle, warm amber backlight, dust particles in air (style)." Veo: prose + camera: handheld follow, low. motion: forward walk, slight sway. lighting: dusk backlight, amber. style: 2.40:1, anamorphic, fine grain. Seedance 2.0: a paragraph with @character_ref and @location_ref tags and the causal beats — why he's tired, what the wind does, how the light fails.
Two principles that hold across all four. Lead with cinematography vocabulary — shot type and camera move come first, then subject, then style — every model rewards this. And use negative prompts to fence off known failure modes ("no extra fingers, no morphing, no live-action skin texture") instead of hoping the positive prompt is enough.
Picking the model before writing the prompt. Inside the invideo agent, route by shot type: Runway for clean single-action motion shots from an image; Kling for stylized multi-shot beats and choreographed action; Veo for predictable cinematic coverage where camera and lighting need to lock; Seedance 2.0 for narrative continuity across a sequence with locked characters and locations. Pick the model first, then write the prompt in that model's shape — the invideo agent will hold your project context (script, character sheets, style block) and adapt the prompt accordingly.
These are the patterns that hold across most shots — your exact wording will shift with the scene, the model version, and what you're routing from.
Watch some of these to see what works for you:
My little secret is that agent one is kind of tuned for serious filmmakers and serious creatives. So the more you treat it like a real crew member, the more it behaves like one.
— Hridaye, invideo's creative director