SSML (Speech Synthesis Markup Language)¶
Using SSML allows developers to control aspects of speech synthesis such as pronunciation, volume, pitch, and rate of speech. Here's a guide on how to use SSML effectively:
Understanding SSML Basics¶
- SSML is an XML-based markup language used to control text-to-speech synthesis.
- It provides tags to control various aspects of speech synthesis, including pronunciation, prosody, volume, and more.
- SSML is supported by many speech synthesis systems, including Amazon Polly, Google Text-to-Speech, and others.
Basic SSML Tags¶
<speak>
: This is the root element of an SSML document and indicates the start and end of the speech content.<break>
: Inserts a pause into the speech synthesis. You can specify the duration of the pause using thetime
attribute.<emphasis>
: Emphasizes a portion of the text. You can specify the level of emphasis using thelevel
attribute.<prosody>
: Modifies aspects of speech such as pitch, rate, and volume. Attributes includepitch
,rate
, andvolume
.<phoneme>
: Specifies the pronunciation of a word using phonetic alphabet symbols.<say-as>
: Indicates how a particular piece of text should be pronounced, such as numbers, dates, or currency.<audio>
: Embeds audio files into the speech output.
Using SSML in Code¶
When using SSML in your code, wrap the SSML markup within <speak>
tags.
Example:
<speak>
Here is a number <w role='amazon:VBD'>read</w>
as a cardinal number:
<say-as interpret-as='cardinal'>12345</say-as>.
Here is a word spelled out:
<say-as interpret-as='spell-out'>hello</say-as>.
</speak>
Voice Tag Support¶
Support For Voice Elements in SSML¶
In the speak tag, follow below-mentioned steps:
- Empty speak tag without attributes <speak>
- In this case, Kore Voice Gateway will construct voice and language elements on its own based on values supplied in Call control params.
- Customised speak tag with attributes <speak version="1.0" xml:lang="en-US" xmlns="W3C Speech Synthesis namespace ">
- In this case, Kore Voice Gateway will send the SSML without any modifications to the TTS engine. Follow option 2 and the voice element will work.
Example: