Skip to content

SSML (Speech Synthesis Markup Language)

Using SSML allows developers to control aspects of speech synthesis such as pronunciation, volume, pitch, and rate of speech. Here's a guide on how to use SSML effectively:

Understanding SSML Basics

  • SSML is an XML-based markup language used to control text-to-speech synthesis.
  • It provides tags to control various aspects of speech synthesis, including pronunciation, prosody, volume, and more.
  • SSML is supported by many speech synthesis systems, including Amazon Polly, Google Text-to-Speech, and others.

Basic SSML Tags

  • <speak>: This is the root element of an SSML document and indicates the start and end of the speech content.
  • <break>: Inserts a pause into the speech synthesis. You can specify the duration of the pause using the time attribute.
  • <emphasis>: Emphasizes a portion of the text. You can specify the level of emphasis using the level attribute.
  • <prosody>: Modifies aspects of speech such as pitch, rate, and volume. Attributes include pitch, rate, and volume.
  • <phoneme>: Specifies the pronunciation of a word using phonetic alphabet symbols.
  • <say-as>: Indicates how a particular piece of text should be pronounced, such as numbers, dates, or currency.
  • <audio>: Embeds audio files into the speech output.

Using SSML in Code

When using SSML in your code, wrap the SSML markup within <speak> tags.

Example:

<speak>

    Here is a number <w role='amazon:VBD'>read</w>

    as a cardinal number:

    <say-as interpret-as='cardinal'>12345</say-as>.

    Here is a word spelled out:

    <say-as interpret-as='spell-out'>hello</say-as>.

</speak>

Voice Tag Support

Support For Voice Elements in SSML

In the speak tag, follow below-mentioned steps:

  1. Empty speak tag without attributes <speak>
    1. In this case, Kore Voice Gateway will construct voice and language elements on its own based on values supplied in Call control params.
  2. Customised speak tag with attributes <speak version="1.0" xml:lang="en-US" xmlns="W3C Speech Synthesis namespace ">
    1. In this case, Kore Voice Gateway will send the SSML without any modifications to the TTS engine. Follow option 2 and the voice element will work.

Example:

<speak version="1.0" xml:lang="en-US" xmlns="http://www.w3.org/2001/10/synthesis"><voice name="en-US-AvaNeural">Hi, I'm eBay's automated phone assistant. Let me know in a few words how I can help.</voice></speak>