Lyria Music Generation: Google's Music Model Hands-On Guide for 2026

Suno V5.5 makes a song. Lyria makes a score. After three months of running both inside the same video pipelines, that is the cleanest one-line summary I can give you. Lyria is what you reach for when the music is not the center of attention but the floor — the bed under a 90-second narrative, the cinematic rise behind a product reveal, the layered ambient under an audiobook chapter. Suno is what you reach for when the song is the deliverable.

Google DeepMind shipped Lyria 2 publicly in March 2026 as the consumer-facing successor to the research-only Lyria from 2024 and the limited Lyria RT used inside YouTube Shorts' soundtrack tools through 2025. This is the working guide: what it is, how to prompt it, how it stacks against Suno V5.5 and Stable Audio 3, and where it fits in real video and audio workflows.

Studio audio gear and synthesizers illustrating AI music generation

What Lyria actually is

Lyria 2 is Google DeepMind's foundation music model, integrated with the Gemini text encoder the same way VEO is. That integration is the thing nobody talks about and the reason its prompt adherence is unusually strong: when you write "minor key, 3/4 waltz feel, solo cello, played slightly behind the beat," the encoder reads that the way a session musician reads a chart, and the conditioning passed to the audio diffusion backbone is dense and structurally aware.

Three architectural facts that shape what you can ask of it:

Stem-aware generation. Lyria does not just render a stereo mix. It models discrete instrumental layers internally — piano, strings, percussion, bass, lead — and exports them separately on Pro tier. You can mute, swap, or rebalance a layer without re-rendering the whole track.
Long-form coherence. Up to 5-minute single-pass generation, with motivic memory: the cello phrase you hear at 0:30 actually returns at 2:45 with subtle development, not a fresh melodic random walk.
Cinematic vocabulary fluency. "John Williams-style horn fanfare" is rejected. "Brass-led orchestral fanfare in the style of late-Romantic film scoring, ascending fifths, cymbal swell on the downbeat" is honored exactly.

The license matters more than people expect. Lyria 2 outputs are commercially clearable on the Pro tier (with SynthID watermarking baked in) and individual creators get a clean "owned outputs" clause that has been notably cleaner than competitors' for the first year of release.

How Lyria differs from Suno V5.5

The most useful comparison, because most creators reach for Suno first.

Dimension	Lyria 2 (Pro)	Suno V5.5
Strongest at	Instrumental beds, cinematic, ambient, classical	Songs with vocals, pop/rock/hip-hop
Vocals	Limited — "vocalise" textures, no lyric sync	Full lyrics, multiple voice tones, strong
Maximum length	5 minutes single-pass	4 minutes per song (extendable)
Stem export	Yes, native (drums/bass/keys/strings/lead)	Yes, but quality of separation varies
Genre breadth	Strong cinematic / orchestral / ambient	Broader pop catalog, weaker on orchestral
Loop and seamless tile	Native, gapless	Possible with edits
Controllability	High — tempo/key/instrumentation/dynamics	Moderate — vibe-led prompting
License clarity (commercial)	Strong on Pro tier	Strong on Pro tier
Best-fit job	Score under a video, podcast bed, game ambient	Brand jingle, song, viral hook

Pick by job type. A 90-second narrative piece's score is Lyria. The catchy 30-second song that becomes a brand earworm is Suno. For a deep head-to-head on the song side of the comparison, the best AI music generators: Suno vs Udio vs Stable Audio breakdown is the reference post.

Lyria, Suno, and Stable Audio side by side

The third corner of this triangle is Stable Audio 3, which sits closer to Lyria than to Suno on the use-case map but with different strengths.

Dimension	Lyria 2	Suno V5.5	Stable Audio 3
Vocals	Vocalise only	Full	None
Instrumental quality	Excellent	Good	Excellent
Sound design / FX-adjacent	Strong	Weak	Strongest
Prompt adherence on structural music	Highest	Moderate	High
Maximum length	5 min	4 min	3 min (extendable)
Stem export	Native	Yes	Yes
Real-time / interactive modes	No (planned 2026)	No	Yes (DAW integration)
Pricing (per minute, retail Pro)	~$0.40	~$0.30	~$0.50

Stable Audio shines for sound design, drones, and texture-heavy ambient. Lyria shines for melodic, motivic, structurally-aware music. Suno shines for songs with hooks and lyrics. A serious creator uses all three.

Prompt structure that actually works

Lyria is structurally biased — write structurally. After several hundred generations, the eight-slot template that consistently produces clean output:

Genre + Mood + Instrumentation + Tempo + Key/mode + Structural form + Dynamic arc + Production notes

Each slot is one clause. Skip a slot and Lyria fills it from a default that may not match your other choices.

Worked example 1 — narrative score under a 90-second video

Cinematic orchestral underscore, contemplative and slightly melancholy mood, solo cello lead with sustained string pads and gentle low piano, 68 BPM, D minor, ABA form with a brief swell at 0:45 and a return to quiet at 1:15, dynamic arc rising from pp at 0:00 to mp at 0:50 then settling back, dry production with minimal reverb, no percussion.

First-pass output landed: cello carrying the melody, string pad sitting under it without crowding, the swell hitting near the prompted timestamp. I used this exactly as scored under a Sora 2 Pro Storyboard piece and only had to nudge level by 1.5 dB in the mix.

Worked example 2 — product hero ambient bed

Modern ambient electronic, optimistic but understated, layered analog synth pads with a delicate plucked synth lead and subtle low sub-bass, 100 BPM, F major, four-bar repeating motif that builds with added layers every 16 bars, dynamic arc rising slowly from soft to medium across 60 seconds, clean production with light tape saturation, no drums until the final 8 bars where soft kick and shaker enter.

Lyria respects the "until the final 8 bars" timing instruction reliably. This is the kind of thing Suno tends to ignore.

Worked example 3 — game/animation cinematic stinger

Brass-led orchestral fanfare in late-Romantic film-scoring tradition, heroic and confident mood, full brass section with timpani and string ostinato underneath, 132 BPM, C major, two-phrase form with a cymbal swell into the second phrase, building from mp to ff over 12 seconds, full orchestral production with theatrical reverb tail, no electronic elements.

This is the kind of cue Suno cannot produce convincingly. Lyria does it on the first pass roughly 70% of the time. The other 30% needs one regenerate with the brass section restated more emphatically.

Worked example 4 — podcast / audiobook bed

Quiet acoustic ambient, intimate and reflective, fingerpicked nylon-string guitar with sparse muted piano notes and a low bowed string drone underneath, 60 BPM, G major, gently looping eight-bar phrase that varies subtly each repeat, near-static dynamic arc at pp throughout, dry intimate production as if recorded in a small room, absolutely no percussion or rhythmic elements that would distract from spoken voice.

The "absolutely no percussion" inline negative is honored. Lyria's negative-prompt handling on instrumentation is significantly better than Suno's.

Sound recording session with mixing board and headphones

Instruments and tempo control

Two areas where Lyria is unusually controllable and the controls actually do what they say:

Tempo: specify in BPM, and Lyria locks to within ~2 BPM. You can request tempo changes ("starts at 60 BPM, accelerates to 90 BPM by the 2:00 mark") and the model honors the structural arc. Suno does not respect tempo numerically — it interprets "slow" and "fast."

Instrumentation lock: name instruments and Lyria uses those instruments and only those instruments. "Solo cello and piano, no other instruments" produces solo cello and piano. Suno often adds drums anyway because its training distribution is pop-heavy.

Key and mode: specify key (D minor, F# major, C Mixolydian) and Lyria honors it. This matters for video work where you may need the score to match a recorded element (a doorbell, a phone ring, an existing musical fragment).

Layered composition and stem export

The Pro-tier feature most underused: stem export.

Lyria can export the master stereo mix plus separate stems for drums, bass, keys, strings, and lead. The stems are clean — not source-separated from a mixed master, but rendered directly from the model's internal layer representation. This means:

Mute the drums if a scene goes intimate and you want only the bed.
Swap the bass line by rendering only the bass stem with a different prompt and dropping it back in.
Side-chain duck the score under dialogue cleanly because the lead stem is isolated.
Build interactive music for games or interactive experiences where layers come in and out based on state.

A serious mix workflow: render in Lyria with stems, drop the stems into your DAW (or AI movie maker which now accepts stem imports), do your duck and balance there, render the final.

Cinematic scoring use cases

Where Lyria is the right reach:

Narrative video scores — 60 to 180 seconds, with motivic development, where the music has to feel composed and not loop-pasted.
Product launch films — the rising orchestral arc behind a hero shot. Lyria's dynamic arc control is what makes this clean.
Documentary and explainer beds — quiet, motif-driven, doesn't compete with VO.
Game soundtracks — stem-based, layer-able, length-flexible.
Audiobook and podcast scoring — sparse, motif-aware, mood-respecting.
Animation cues — short stingers (10–20 seconds) with specific dramatic shape.

Where Lyria is not the right reach: pop songs with vocals, viral hooks, anything where lyrics are the deliverable. Use Suno V5.5 for those — see the best AI music generators for the song-side guidance.

Integration into video workflows

The single highest-leverage Lyria workflow on Versely:

Generate or shoot your video first. Lock the cut.
Identify your music structure: where does it enter, where does it swell, where does it duck, where does it end.
Write the Lyria prompt with the dynamic arc tied to specific timestamps.
Generate at the exact runtime needed.
Export with stems if you need a duck under dialogue.
Drop into AI movie maker or your DAW. Mix.

For a cleaner one-prompt path, Versely routes Lyria as the default music model for narrative video projects under 3 minutes and routes Suno V5.5 for branded/lyrical jobs. The Versely AI models guide covers which model the router picks for which prompt shape.

For sound effects layered over the score — footsteps, doors, ambient texture — the AI sound effects guide covers Suno SFX, Stable Audio FX, and ElevenLabs SFX in detail.

Common failure modes and the fixes

Tempo drift in long pieces. On 4–5 minute generations, tempo can wander by ~3 BPM by the end. Fix: generate in two 2:30 segments at locked tempo, crossfade.
Default reverb too lush. Lyria leans cathedral. Specify "dry production" or "minimal reverb" if you want intimacy.
Brass overpowering strings. Brass-heavy prompts win the mix. Add "strings prominent in the mix, brass supportive" to rebalance.
Vocalise creeping in. If you want pure instrumental, write "no vocals, no vocalise textures, fully instrumental." Lyria respects it.
Motif disappearing at scene change. If you have a written scene structure, anchor the motif with timestamps: "main cello motif returns at 1:45 in the higher octave."

Pricing and access (May 2026)

Access path	Who it's for	What you get	Cost (approx)
Lyria Free (Gemini)	Hobbyists	30s clips, 720p audio, watermarked	Free
Gemini Advanced	Individual creators	5min clips, full quality, no stems	$20/month (shared with VEO)
Lyria Studio (Google AI Studio)	Developers	API, full controls, stems	Metered
Vertex AI	Teams, production	Enterprise, SLAs, commercial license	~$0.30–0.50/min
Versely	Creators	Routed access, mixed with Suno/Stable Audio	Included in Versely tiers

For most individual creators, Versely's AI music generator is the cleanest entry point because it routes Lyria, Suno V5.5, and Stable Audio from one prompt and lets you A/B in a click.

FAQ

Is Lyria better than Suno?

For instrumental, cinematic, and ambient music — yes, clearly. For songs with vocals — no, Suno is still ahead. They solve different problems.

Can Lyria generate vocals?

Lyria 2 generates "vocalise" — wordless vocal textures (oohs, aahs, choral pads) — but not lyric-synced singing. For lyrics, use Suno V5.5.

How long can a Lyria track be?

Up to 5 minutes single-pass on Pro. Practical clean-output ceiling is around 4 minutes before tempo or motif coherence drifts.

Can I use Lyria-generated music commercially?

Yes on Pro tier, Vertex AI, and Versely-routed access, with SynthID watermarking applied. The free Gemini tier is non-commercial. Always check current Terms — Google updated music licensing in February 2026.

Does Lyria export stems?

Yes on Pro tier. Native stem export for drums, bass, keys, strings, and lead. The free tier exports stereo master only.

How does Lyria fit with Suno on the same project?

Use Lyria for the score, Suno for the song. A common pattern: Suno generates the title-track song, Lyria generates the underscore for the rest of the video. Both pipe into the AI movie maker for the final mix.

Bottom line

Lyria is the music model that respects structure — tempo, key, motif, dynamic arc, instrumentation. That makes it the right tool for video scoring, ambient beds, and any music where the composition has to support rather than steal the scene. Pair it with Suno for vocal-led work and Stable Audio for sound design, route everything through Versely, and you have an audio stack that covers the full music surface from a 12-second cinematic stinger to a 5-minute narrative score.