AI Video Motion Design Prompting: The 2026 Playbook

Motion designer animating kinetic type on a colour-graded display

Motion is the part of AI video that fails most often and most loudly. A perfect-looking still that becomes a wobbling, drifting, melting clip is the universal failure mode of 2026 AI video models. The fix is not waiting for better models — Kling 3.0, VEO 3.1, SORA 2 and Hailuo can all produce broadcast-grade motion right now if you prompt for it correctly.

This is the working playbook for motion design prompting — camera moves, subject motion, slow-mo, hyper-lapse, and kinetic typography. Every pattern has been validated on real client work in the last 90 days.

Section 1: The motion vocabulary AI video models actually understand

Before any prompt template, you need the vocabulary. AI video models in 2026 respond to specific cinematographic terms with surgical precision — and they ignore vague motion language entirely.

Camera moves

Dolly in / dolly out — physical forward or backward camera movement, smooth, on rails or gimbal.
Truck left / truck right — sideways physical movement, parallel to subject.
Pan left / pan right — camera rotates horizontally on a fixed axis.
Tilt up / tilt down — camera rotates vertically on a fixed axis.
Pedestal up / pedestal down — camera physically rises or lowers without tilting.
Crane shot — sweeping vertical and horizontal combined, large arc.
Orbit / arc — camera circles the subject.
Whip pan — extremely fast horizontal pan, often with motion blur.
Zoom in / zoom out — lens focal length change, not camera position change.
Push in / pull out — colloquial dolly equivalents, widely understood by all 2026 models.
Locked off — completely static tripod shot.
Handheld — small organic camera drift, varying intensity.
Gimbal — smooth tracking with subtle inertia.
Steadicam — smooth tracking with light operator-style cadence.

Speed cues

Always pair a move with a speed: "slow," "deliberate," "fast," "snap," "creeping." A naked camera move ("dolly in") will default to a generic mid-speed.

Subject motion verbs

Walking, running, sprinting, jogging — clearly distinguished by all 2026 models.
Turning, pivoting, rotating, spinning — choose precision over generic "moves."
Reaching, grabbing, holding, releasing — hand action verbs need specificity.
Falling, dropping, collapsing, tumbling — physics verbs trigger the model's physics engine.
Splashing, dripping, pouring, cascading — fluid verbs activate fluid simulation paths in SORA 2 especially.

Section 2: Camera move prompt patterns

A camera move prompt has three required parts:

[move type] + [speed] + [subject relationship]

Slow dolly in on a subject

Slow deliberate dolly in toward the subject's face, 
starting at medium close-up and ending at extreme close-up, 
35mm lens, smooth gimbal motion, locked subject framing.

Whip pan transition

Fast whip pan from left to right, heavy motion blur during 
the pan, starting on [SUBJECT_A] and landing on [SUBJECT_B], 
24mm wide lens, kinetic music video aesthetic.

Orbit around a hero product

Slow 360-degree orbit around a [PRODUCT] sitting on a black 
marble surface, camera at eye level with the product, 
constant radius, smooth gimbal, 50mm lens, 
shallow depth of field with subject locked in focus.

Crane up reveal

Camera starts at ground level looking at the subject's feet, 
slow crane up to a high overhead shot revealing the full 
environment, ending on a top-down god's eye view, 24mm lens.

Locked off contemplative wide

Locked off tripod wide shot of [SUBJECT] in [LOCATION], 
no camera movement at all, subject moves within static frame, 
35mm lens, anamorphic 2.39:1, contemplative pacing.

Kling 3.0 and Runway Gen-4 are the most precise on camera moves. VEO 3.1 is excellent on slow dolly and crane shots. SORA 2 handles handheld documentary moves better than any other model. Test the same prompt across Versely's AI video generator to feel the difference.

Crane camera tracking a moving subject on a film set

Section 3: Slow-motion and hyper-lapse prompt patterns

Time manipulation is one of the highest-payoff motion design techniques and one of the most prompt-sensitive.

Slow motion (90fps to 240fps look)

SORA 2 leads on slow motion in 2026, particularly for fluid and physics interactions. Kling 3.0 is a close second.

Slow motion shot at 240fps of [ACTION], 
extreme detail in fluid dynamics, frozen droplets in mid-air, 
cinematic super slow-mo aesthetic, hard backlight catching 
the splash, 100mm macro lens, shallow depth of field.

For a person in slow motion:

Slow motion shot at 120fps of a [PERSON_DESCRIPTION] 
[ACTION], hair and clothing moving in detailed slow time, 
soft natural light, 50mm lens, shallow depth of field, 
cinematic slow-mo, no jitter.

Always include "no jitter" or "smooth slow-mo" as a negative pressure — slow-mo is where models most often introduce motion artifacts.

Hyper-lapse

Hyper-lapse is the inverse — speeding time while moving the camera. Wan 2.7 and Kling 3.0 handle this best in 2026.

Hyper-lapse moving forward through [LOCATION], 
camera tracks rapidly through space at walking pace 
but time moves at 30x speed, clouds streaking across the sky, 
cars and people blurring past, 24mm wide lens, 
locked horizon, smooth tracking motion, urban time-lapse aesthetic.

Time-lapse (locked off)

Locked off time-lapse of [LOCATION] over the course of one day, 
sun arcs across the sky, shadows rotate around the subject, 
cars and pedestrians appear as motion-blur streaks, 
24mm wide lens, no camera movement, classic time-lapse aesthetic.

Speed ramp (slow then fast)

Speed ramps are tricky in single-clip generation but achievable in models that respect duration cues:

Shot starts in extreme slow motion at 240fps for 2 seconds 
showing [SLOW_ACTION], then ramps to real-time speed 
for the final 2 seconds showing the resolution of the action, 
cinematic speed ramp aesthetic, 50mm lens.

Generate with a longer clip duration (8-10 seconds) for speed ramps to land cleanly.

Section 4: Subject motion choreography

Subject motion is where physics, anatomy and timing all collide. The prompt patterns below sidestep most of the common failure modes.

Single subject, single action

A [SUBJECT_DESCRIPTION] [ACTION_VERB] [DIRECTIONAL_CUE], 
[SPEED_CUE], maintaining anatomical consistency, 
no morphing limbs, no extra fingers, hands clearly defined.

Always include the negative pressure on hands and limbs. It is the single biggest source of distracting AI video failures in 2026.

Subject entering frame

Frame begins empty, [SUBJECT_DESCRIPTION] walks into frame 
from the right at the 1-second mark, crosses to center, 
stops and turns to face camera at the 3-second mark, 
holds position for the final second, locked-off wide shot.

Specifying the timing in seconds gives Kling 3.0 and VEO 3.1 the structure they need to land choreography reliably.

Two-subject interaction

[SUBJECT_A_DESCRIPTION] sits at a table on the left side of frame. 
[SUBJECT_B_DESCRIPTION] enters from the right, walks to the table, 
sits across from [SUBJECT_A], they make eye contact and exchange 
a brief nod. Locked off medium-wide shot, 35mm lens, 
both subjects maintain consistent appearance throughout.

Two-subject prompts fail more than single-subject ones — keep the action simple and the framing wide enough that the model has space to reason.

Object physics

SORA 2 is unmatched on object physics in 2026.

A glass bottle falls from a wooden table, hits the tile floor, 
shatters into multiple pieces, water splashes outward in slow 
motion, hard top light, locked-off low-angle shot, 50mm lens, 
realistic glass shatter physics, no morphing.

Section 5: Template motion design prompt library

Copy, paste, swap variables. These are workflow-ready patterns we use weekly.

Template 1: Brand product hero (Kling 3.0)

A [PRODUCT] sits on a [SURFACE], slow 360-degree orbit shot 
at eye level, constant radius, smooth gimbal motion, 
hard rim light from above-left, deep black background, 
shallow depth of field with product locked in focus, 
100mm macro lens, 4K commercial aesthetic, no camera shake.

Template 2: Kinetic social opener (Hailuo or PixVerse V6)

Fast push-in toward [SUBJECT], starting wide and ending in 
close-up over 2 seconds, hard flash-style lighting, 
vibrant saturated color, 24mm lens, slight motion blur 
during push, energetic music video aesthetic, 9:16 vertical.

Template 3: Cinematic documentary handheld (VEO 3.1)

Slow handheld follow shot at hip height tracking 
[SUBJECT_DESCRIPTION] as they walk through [LOCATION], 
natural light, 35mm lens at T2, shallow depth of field, 
shot on Arri Alexa Mini LF, contemporary documentary aesthetic, 
organic camera drift but no jitter.

Template 4: Slow-mo physics moment (SORA 2)

Slow motion shot at 240fps of [ACTION], extreme detail in 
fluid dynamics or particle physics, frozen movement in mid-air, 
hard backlight catching the motion, 100mm lens, 
shallow depth of field, cinematic super slow-mo aesthetic.

Template 5: Hyper-lapse cityscape (Wan 2.7)

Hyper-lapse moving forward through [CITY_LOCATION], camera 
tracks at walking pace but time moves at 30x speed, clouds 
streak across sky, traffic blurs past, 24mm wide lens, 
locked horizon, smooth tracking motion, urban aesthetic.

Template 6: Kinetic typography overlay shot

AI video models in 2026 cannot reliably render text — composite typography in post. But you can prompt for the live-action plate:

[SUBJECT_DESCRIPTION] stands centered against a clean colored 
backdrop with negative space on the left half of frame for 
typography compositing, locked off medium shot, soft even 
lighting, 50mm lens, 4K, 16:9 aspect ratio.

Then drop your kinetic type into the negative space using After Effects, Cavalry, or Versely's overlay system.

Section 6: Mistakes that ruin AI motion

No camera move named. Defaults to mid-speed generic drift. Always name the move and the speed.
Conflicting moves. "Slow dolly in with fast whip pan." Pick one.
Subject motion without pacing cues. "She walks across the frame" with no timing. Add "at the 1-second mark" or "for the first 2 seconds."
Asking for live text rendering. AI video models butcher text in 2026. Composite typography in post.
Slow-mo without negative pressure. Without "no jitter," "no morphing," slow-mo introduces artifacts.
Two-subject prompts in tight framing. Wide enough that the model has space to reason about both subjects.
Hand close-ups without hand cues. Add "hands clearly defined, five fingers, no morphing." Always.
Frame rate ignorance. A 24fps cinematic look and a 60fps slow-mo look will never feel cohesive. Pick one per project.
Skipping motion verbs. AI video defaults to near-static if you do not name a verb. Always include one.

Storyboard frames pinned to a wall mapping camera moves

Creator workspace with cameras and screens

FAQ

What is the best AI video model for camera move precision in 2026?

Kling 3.0 and Runway Gen-4 are the most precise on named camera moves. VEO 3.1 leads on slow dolly and crane shots. SORA 2 leads on handheld and documentary motion. Test all three for any motion-critical project.

How do I get reliable slow motion from AI video?

SORA 2 first, Kling 3.0 second. Always specify a frame rate ("at 240fps"), name the slow-mo register ("super slow motion"), and include negative pressure ("no jitter, no morphing"). Generate at the maximum clip duration the model supports.

Can AI video models render kinetic typography?

Not reliably in 2026. Generate a clean live-action plate with negative space for type, then composite the kinetic typography in After Effects, Cavalry, or your editor of choice.

How do I prompt for a hyper-lapse?

Use the words "hyper-lapse" and specify both camera motion ("tracks forward at walking pace") and time speed ("time moves at 30x speed"). Wan 2.7 and Kling 3.0 handle this best.

Why does my subject's hands always look wrong?

Default model behavior. Always include "hands clearly defined, five fingers, no morphing" in the prompt or in negative prompts. For close-ups of hands, drop to a wider framing or generate multiple takes and pick the cleanest.

The takeaway

Motion design from AI video in 2026 is a vocabulary problem. Name the camera move. Name the speed. Name the subject's verb. Name the timing in seconds. Add negative pressure for hands, limbs, jitter and morphing.

The models will produce broadcast-grade motion if you give them the structure. They will produce wobbling sludge if you do not.

For prompt structure fundamentals that underpin these patterns, see our advanced video prompt engineering guide. For maintaining a visual identity across motion-heavy sequences, see AI video style transfer techniques. For B-roll workflows that combine static and motion shots, try the AI b-roll generator.