Use prompts, lyrics, timing, BPM, and images the right way
This guide condenses Google Lyria 3 capabilities into a creator-friendly workflow. It covers Clip vs Pro, custom lyrics, timestamp structure, image-to-music, instrumental prompts, language control, output parsing, and practical guardrails.
Why this page exists
The builder is powered by Google Lyria 3, but the workflow is shaped by our product layer: structured prompting, cleaner lyric and timing controls, stronger generation defaults, async orchestration, and reusable track management.
Lyria 3 Clip
lyria-3-clip-preview
Best for
Fast tests, hooks, loops, previews
Duration
Always 30 seconds
Output
MP3
Lyria 3 Pro
lyria-3-pro-preview
Best for
Fuller songs with verses, choruses, and bridges
Duration
A couple of minutes, guided by your prompt
Output
Model-selected audio + text
1. Start with the right model
Use Clip when you want to explore ideas fast. Use Pro when you already know the direction and want a longer, more structured piece.
Clip is fixed at 30 seconds, so it is ideal for testing genres, moods, and hooks.
Pro is better when you need verses, choruses, bridges, or a longer emotional arc.
A strong workflow is Clip first, Pro second.
2. Write a musically specific prompt
Lyria performs best when you describe the actual musical brief instead of a vague vibe.
Mention genre or genre blend: lo-fi hip hop, cinematic orchestral, indie pop, jazz fusion.
Name instruments: Rhodes, strings, brass, 808, acoustic guitar, vocal harmonies.
Set tempo and key when relevant: 85 BPM, D minor, G major.
Describe the mood and energy: nostalgic, aggressive, dreamy, uplifting, tense.
For Pro, mention desired length in the prompt when duration matters.
3. Use custom lyrics when words matter
If you already know the lyric direction, paste it clearly and separate it from production instructions.
Use section tags such as [Verse], [Chorus], [Bridge], [Intro], [Outro].
Keep your musical direction above the lyrics so the model sees both intent and words.
If you want no vocals, do not provide lyrics and explicitly say instrumental only.
4. Control timing and structure with timestamps
When you need precise pacing, tell the model what should happen in each time window.
Example: [0:00 - 0:10] Intro, [0:10 - 0:30] Verse, [0:30 - 0:50] Chorus.
Use timestamps to control energy lifts, instrument entrances, vocal timing, and fade-outs.
This is especially useful for trailers, scene music, and directed builds.
5. Add images when visuals should influence the song
Google Lyria 3 supports multimodal music generation. You can provide up to 10 images and ask the music to follow their mood, colors, and story.
Use moodboards, concept art, cover sketches, scene stills, or product visuals.
Only add images when visual direction really matters. Otherwise keep the request simpler.
Images work best when your prompt also explains what musical feeling the visuals should produce.
6. Force instrumental output when needed
For background music, trailers, games, and beats, tell Lyria explicitly that you want no vocals.
Use a phrase like: Instrumental only, no vocals.
This should appear directly in the prompt, not just as an implied preference.
Clip is often enough for instrumental concept testing before moving to Pro.
7. Match the prompt language to the lyric language
Lyria adapts vocal style and pronunciation to the language of your prompt.
If you want French lyrics, prompt in French.
If you want English vocals with Japanese section tags or notes, make that explicit.
Language control works better when you avoid mixing too many languages in one request.
8. Understand the response correctly
The model returns multiple parts. Some parts are text and some parts are audio bytes.
Do not assume the first part is always lyrics or always audio.
Iterate through all returned parts and detect text versus inline audio data.
The text output can contain lyrics, structure notes, or other written material alongside the audio.