This guide condenses the official Google Lyria 3 capabilities into a creator-friendly workflow. It covers Clip vs Pro, custom lyrics, timestamp structure, image-to-music, instrumental prompts, language control, output parsing, and practical guardrails.
Why this page exists
The builder is powered by Google Lyria 3, but the workflow is shaped by our agent layer: structured prompting, cleaner lyric and timing controls, stronger generation defaults, async orchestration, and reusable track management. It is not a thin shell around a single API call.
Lyria 3 Clip
lyria-3-clip-preview
Best for
Fast tests, hooks, loops, previews
Duration
Always 30 seconds
Output
MP3
Lyria 3 Pro
lyria-3-pro-preview
Best for
Fuller songs with verses, choruses, bridges
Duration
A couple of minutes, guided by your prompt
Output
Model-selected MP3 or WAV
Use Clip when you want to explore ideas fast. Use Pro when you already know the direction and want a longer, more structured piece.
Clip is fixed at 30 seconds, so it is ideal for testing genres, moods, and hooks.
Pro is better when you need verses, choruses, bridges, or a longer emotional arc.
A strong workflow is Clip first, Pro second.
Lyria performs best when you describe the actual musical brief instead of a vague vibe.
Mention genre or genre blend: lo-fi hip hop, cinematic orchestral, indie pop, jazz fusion.
Name instruments: Rhodes, strings, brass, 808, acoustic guitar, vocal harmonies.
Set tempo and key when relevant: 85 BPM, D minor, G major.
Describe the mood and energy: nostalgic, aggressive, dreamy, uplifting, tense.
For Pro, mention desired length in the prompt when duration matters.
If you already know the lyric direction, paste it clearly and separate it from production instructions.
Use section tags such as [Verse], [Chorus], [Bridge], [Intro], [Outro].
Keep your musical direction above the lyrics so the model sees both intent and words.
If you want no vocals, do not provide lyrics and explicitly say instrumental only.
When you need precise pacing, tell the model what should happen in each time window.
Example: [0:00 - 0:10] Intro, [0:10 - 0:30] Verse, [0:30 - 0:50] Chorus.
Use timestamps to control energy lifts, instrument entrances, vocal timing, and fade-outs.
This is especially useful for trailers, scene music, and directed builds.
Google Lyria 3 supports multimodal music generation. You can provide up to 10 images and ask the music to follow their mood, colors, and story.
Use moodboards, concept art, cover sketches, scene stills, or product visuals.
Only add images when visual direction really matters. Otherwise keep the request simpler.
Images work best when your prompt also explains what musical feeling the visuals should produce.
For background music, trailers, games, and beats, tell Lyria explicitly that you want no vocals.
Use a phrase like: Instrumental only, no vocals.
This should appear directly in the prompt, not just as an implied preference.
Clip is often enough for instrumental concept testing before moving to Pro.
Lyria adapts vocal style and pronunciation to the language of your prompt.
If you want French lyrics, prompt in French.
If you want English vocals with Japanese section tags or notes, make that explicit.
Language control works better when you avoid mixing too many languages in one request.
The model returns multiple parts. Some parts are text and some parts are audio bytes.
Do not assume the first part is always lyrics or always audio.
Iterate through all returned parts and detect text versus inline audio data.
The text output can contain lyrics, structure notes, or other written material alongside the audio.
This tool is built on Google Lyria 3 and follows the same category of safety guardrails used across leading creative AI products. Avoid copyrighted lyrics, artist-name imitation, or requests to clone a recognizable performer. Focus on original briefs: genre, arrangement, instrumentation, emotion, language, lyrics, and structure.