Call Control Parameters¶

Call control parameters are general-purpose parameters that can modify a call's behavior, including ASR/STT & TTS configurations.

Note

Automatic Speech Recognition (ASR) and Speech-to-Text (STT) are two terms that refer to the same technology. Both involve converting spoken language into written text by analyzing and interpreting audio input. The terms are used interchangeably, describing the same function—transforming speech into readable, actionable text.

There are two ways to define the Call Control Parameters - Node Level and Channel Level.

You can apply Call Control Parameters at either the Session or Node level, offering more flexibility in managing call behavior.

Session-Level Parameters: Add the prefix session. to apply parameters throughout the session (for example, session.ttsprovider).
Node-Level Parameters: Add the prefix node. to apply parameters only at a specific node (for example, node.ttsprovider).
Default Behavior: Parameters without a prefix are considered session-level by default.
Node-level parameters take precedence over session-level parameters. If no node-level parameters are defined, session-level properties will be applied.

Node Level Call Control¶

The call control section is Available In Entity Node/Message Node/Confirmation Node > IVR Properties > Advanced Controls. Learn more.

Channel Level Call Control¶

For information on configuring the Call Control Parameters at the channel level, refer to Define the Call Control Parameters.

Update/Modify Parameters¶

When updating language settings or modifying Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) parameters in Call Control Parameters, users can specify the updated field along with a minimal set of required parameters.

For example, if a user has already configured the STT provider and language in the call control parameters and wants to add a new language, the system appends the new parameter while retaining the existing values. Users only need to provide the additional sttLanguage parameter without redefining the previously set values.

This behavior applies to Session-Level Call Control Parameters.

Example

Existing Parameters:

{
  "sttProvider": "microsoft",
  "sttLanguage": "en-IN"
}

Adding a New Language:

{
  "sttLanguage": "en-ES"
}

In this scenario, the system retains the existing sttProvider and previously set sttLanguage, ensuring that only the new parameter is added without requiring users to re-enter unchanged values.

Supported Speech Engines¶

Kore.ai supports the following third-party service providers for ASR/TTS. Learn more.

Speech Engine	ASR Name	TTS Name	Supported Environment
Microsoft Azure	microsoft	microsoft	On Premise Cloud
Google	google	google	On-Premise Cloud
Nvidia (Riva)	nvidia	nvidia	On-Premise
Amazon (AWS)	aws	polly	Cloud
Deepgram	deepgram	deepgram	Cloud
Elevenlabs	Not Supported	elevenlabs	Cloud
Whisper	Not Supported	whisper	Cloud
Ami voice	amivoice		Cloud

Common ASR Parameters¶

Parameter	Type	Supporting STT/ TTS	Description	Examples
alternativeLanguages	Array of Objects	Google Microsoft Deepgram	An array of alternative languages that the speaker may be using Based on user utterance, the transcript will come from either of the selected languages.	alternativeLanguages = [ { "language": "de-DE", "voiceName": "de-DE-KatjaNeural" }, { "language": "fr-FR", "voiceName": "fr-FR-DeniseNeural" } ]
sttMinConfidence	Number Range- (0.1 to 0.9)	ALL	If the `minConfidence` parameter is set, and the transcript generated by the ASR falls below this confidence threshold, the Voice Gateway will disregard the input and trigger the timeout prompt to play. This ensures that only highly accurate speech recognition results are captured, improving the quality of the interaction.	Example `sttMinConfidence = 0.5`, Any ASR transcript with a confidence score below 0.5 will be ignored, and the system will play the timeout prompt. This ensures that only inputs with sufficient accuracy are processed, improving reliability in voice interactions.
Hints with Phrase level hintsboost This is an additional feature present in allowing a boost factor to be specified at the phrase level Kore VG Key = hints	Array Of Objects	: Google Nvidia	The parameter can list phrases or words that are passed to the speech-to-text service as "hints" for improving the accuracy of speech recognition. For example - weather and whether have the same pronunciation, for more accuracy we gave hints to the bot. Hints : [‘weather’] It will take weather as input Put this array in the Grammar section of the bot builder.	`"hints" = [` `{"phrase": "benign", "boost": 50},` `{"phrase": "malignant", "boost": 10},` `{"phrase": "biopsy", "boost": 20},` `]`
Hints with Separate HintBoost		Google Microsoft Nvidia		`"hints": ["benign", "malignant", "biopsy"],` `"hintsBoost": 50`
sttDisablePunctuation	Boolean	Google Microsoft	Prevents or Includes the ASR to add punctuation in response. By default, ASR will add punctuation in the User Transcript (for example, periods, commas, and question marks).	sttDisablePunctuation : true True: means remove the punctuation. False: Add the punctuation
vadEnable	Boolean	ALL	If true, delay connecting to the cloud recognizer until the speech is detected
vadVoiceMS	Number in MS	ALL	If vad is enabled, the number of milliseconds of speech is required before connecting to the cloud recognizer.
vadMode	Number between (0-3)	ALL	If vad is enabled, this setting governs the sensitivity of the voice activity detector; the value must be between 0 to 3 inclusive, lower numbers mean more sensitive
Microsoft ASR
azureSpeechSegmentationSilenceTimeoutMs	Number		Speech_SegmentationSilenceTimeoutMs is a timeout that can be set in between the phrases It is similar to Continuous ASR, the Only Difference is Continuous ASR is handled by vocieGateway but Azure speech segmentation is handled by AZURE ASR, So accuracy will be higher as compared to Continios ASR.	More Info
sttEndpointID	String		Custom service endpoint to connect to, instead of hosted Microsoft regional endpoint.
azurePostProcessing	String		improve the final transcript, such as text normalization (adjusting punctuation, casing, etc.) or specific custom handling based on the needs of the application.
azureSpeechRecognitionMode	String (Enum) It can be either 1) AtStart, 2) Continuous		"AtStart": Starts recognizing speech as soon as it detects audio input and stops when the speaker finishes. Suitable for short, one-time speech recognition tasks "Continuous": Continuously listens and transcribes speech, ideal for longer audio streams or uninterrupted speech sessions like meetings or dictation.	azureSpeechRecognitionMode = Continuous“”
profanityOption	String(enum)		It is used to mask profane words in the transcript. It has three values masked, removed, or raw. Default: raw	Example: profanityOption = “masked”
initialSpeechTimeoutMs	Number in Ms		Initial speech timeout in milliseconds.
requestSnr	Boolean		Request signal-to-noise information.
outputFormat	String		simple or detailed. Default: simple.
Google ASR
sttProfanityFilter	Boolean		A profanity filter provides a few options for dealing with profane words in the transcription. Default: false
singleUtterance	Boolean		If true, return only a single utterance/transcript.
sttModel	String		speech recognition model to use (default: phone_call)
sttEnhancedModel	Boolean		Use enhanced model
words	Boolean		Enable word offsets
diarization	Boolean		Enable speaker diarization
diarizationMinSpeakers	Number		Set the minimum speaker count.
diarizationMaxSpeakers	Number		Set the maximum speaker count.
interactionType	String		Set the interaction type: discussion, presentation, phone_call, voicemail, professionally_produced, voice_search, voice_command, dictation
naicsCode	Number		Set an industry NAICS code that is relevant to the speech.
googleServiceVersion	String v1 or v2		Specifies the version of Google's ASR API in use to ensure compatibility.
googleRecognizerId	String		Identifies the specific speech recognition model for processing the input.
googleSpeechStartTimeoutMs	Number		Set the time (in milliseconds) to wait for the speaker to start speaking before timing out.
googleSpeechEndTimeoutMs	Number		Defines how long to wait (in milliseconds) for silence before determining the end of speech.
googleEnableVoiceActivityEvents	Boolean		Enables detection of when the user starts or stops speaking during recognition.
googleTranscriptNormalization	Array		Adjusts the transcript to make it more readable, applying corrections like punctuation and caseing.
AWS ASR
awsAccessKey	String		The AWS access key for authenticating requests.
awsSecretKey	String		The corresponding secret key is used with the access key for AWS service authentication.
awsSecurityToken	String		A temporary security token (optional) for requests that use AWS Security Token Service (STS).
awsRegion	String		Specifies the AWS region where the service requests will be sent (for example, us-west-2, eu-central-1).
String	String		The name of the vocabulary filter is used to filter certain words or phrases during transcription.
awsVocabularyFilterName	String		The name of the vocabulary filter is used to filter certain words or phrases during transcription.
awsVocabularyFilterMethod	String/enum "remove", “mask", “tag”		Specifies how words in the vocabulary filter are handled. It can take one of three values: `"remove"`: Completely remove the word from the transcription. `"mask"`: Mask the word (for example, replace it with asterisks). `"tag"`: Add tags to identify the filtered word.
awsLanguageModelName	String		The name of a custom language model is to be applied during transcription for better accuracy in a domain-specific language.
awsPiiEntityTypes	Array		A list of PII (Personally Identifiable Information) entity types to be detected (for example, ["NAME", "EMAIL", "SSN"]). This helps the system identify and protect sensitive information during transcription.
awsPiiIdentifyEntities	Boolean		A flag that indicates whether or not to identify and highlight PII entities within the transcribed text. If true, PII entities will be detected and processed according to the configuration.
Nvidia ASR
nvidiaRivaUri	String		grcp endpoint (ip:port) that Nvidia Riva is listening.
nvidiaMaxAlternatives	Number		The number of alternatives to return.
nvidiaProfanityFilter	Boolean		Indicates whether to remove profanity from the transcript.
nvidiaWordTimeOffsets	Boolean		indicates whether to provide word-level detail.
nvidiaVerbatimTranscripts	Boolean		Indicates whether to provide verbatim transcripts.
nvidiaCustomConfiguration	Object		An object of key-value pairs that can be sent to Nvidia for custom configuration.
nvidiaPunctuation	Boolean		Indicates whether to provide punctuation in the transcripts.
Deepgram ASR
deepgramApiKey	String		Deepgram API key to authenticate with (overrides setting in Kore VG portal).
deepgramTier	String		Deepgram tier you would like to use ('enhanced', 'base').
sttModel	String		Deepgram model used to process submitted audio ('general', 'meeting', 'phonecall', 'voicemail', 'finance', 'conversationalai', 'video', 'custom').	nova-2-phonecall
deepgramCustomModel	String		Id of the custom model.
deepgramVersion	String		Deepgram version of the model used.
deepgramPunctuate	Boolean		Indicates whether to add punctuation and capitalization to the transcript.
deepgramProfanityFilter	Boolean		Indicates whether to remove profanity from the transcript.
deepgramRedact	Object `{ "type": "string", "enum": [ "pci", "numbers", "true", "ssn" ] },`		Whether to redact information from transcripts ('pci', 'numbers', 'true', 'ssn')
deepgramDiarize	Boolean		Whether to assign a speaker to each word in the transcript.
deepgramDiarizeVersion	String		If set to '2021-07-14.0' the legacy diarization feature will be used.
deepgramNer	Boolean
deepgramMultichannel	Boolean		Indicates whether to transcribe each audio channel independently.
deepgramAlternatives	Number		The number of alternative transcripts to return.
deepgramNumerals	Boolean		Indicates whether to convert numbers from written format (for example, one) to numerical format (for example, 1).
deepgramSearch	Array		An array of terms or phrases to search for in the submitted audio.
deepgramReplace	Array		An array of terms or phrases to search for in the submitted audio and replace.
deepgramKeywords	Array		An array of keywords to which the model should pay particular attention to boosting or suppressing to help it understand the context.
deepgramEndpointing	Boolean \| Number		Indicates the number of milliseconds of silence Deepgram will use to determine whether a speaker has finished saying a word or phrase. The value provided must be either several milliseconds or 'false' to disable the feature entirely. Note: The default endpoint value that Deepgram uses is 10 milliseconds. You can set this value higher to allow to require more silence before a final transcript is returned but we suggest a value of 1000 (one second) or less, as we have observed strange behaviors with higher values. If you wish to allow more time for pauses during a conversation before returning a transcript, we suggest using the utteranceEndMs feature instead.
deepgramVadTurnoff	Number
deepgramTag	String		A tag to associate with the request. Tags appear in usage reports.
deepgramUtteranceEndMs	Number		This parameter is used to configure ASR to detect the end of speech in live-streaming audio.
deepgramShortUtterance	Boolean		This causes a transcript to be returned as soon as the Deepgram is_final property is set. This should only be used in scenarios where you are expecting a very short confirmation or directed command and you want minimal latency.
deepgramSmartFormatting	Boolean		Indicates whether to enable Deepgram's Smart Formatting feature. Deepgram's Smart Format feature applies additional formatting to transcripts to optimize them for human readability. Smart Format capabilities vary between models. When Smart Format is turned on, Deepgram will always apply the best-available formatting for your chosen combination of model, model option, and language.

Common TTS Parameters¶

Parameter	Type	Supporting STT/ TTS	Description	Examples
disableTtsCache	Boolean	ALL	Using cache for calling TTS engine if same statement or word found.
ttsEnhancedVoice	String	AWS	Amazon Polly has four voice engines that convert input text into life-like speech. These include Generative, Long-form, Neural, and Standard. To use an Amazon Polly voice	Examples standard" , " neural", "generative", " long-form"
ttsGender	String MALE, FEMALE, NEUTRAL	Google
ttsLoop	Number / String	ALL	The ttsLoop parameter is used in Text-to-Speech (TTS) systems to control the repeated playback of a TTS-generated message. When ttsLoop is enabled, the specified TTS message will be played multiple times in a loop, which is useful in scenarios where you want to ensure the message is heard clearly, or when the user might need more time to process the information.	Example - ttsLoop = 2 Text will be played twice
earlyMedia	Boolean	ALL	The Early Media parameter in TTS (Text-to-Speech) is used to control the playback of audio prompts or messages before a call is fully connected. This feature is typically employed in telecommunication systems, allowing messages to be played while the call is still in the "early" phase, meaning before the recipient answers the call.
ttsOptions	Object	PlayHt, Deepgram, ElevenLabs, Whisper	It is used to tune the TTS.

TTS Options in Kore VG¶

Kore VG now supports a ttsOptions parameter that allows bot developers to customize Text-to-Speech (TTS) messages by passing dynamic objects tailored to the specific TTS provider. Depending on the provider, these options can be used to fine-tune aspects like voice settings, speed, and other properties.

Note

Each TTS provider will have its own set of customizable parameters. For more detailed information on the parameters they support, refer to their official websites.

Structure of `ttsOptions`¶

The ttsOptions object contains provider-specific settings in a key-value format. Below are examples of different TTS providers:

ElevenLabs¶

optimize_streaming_latency: Adjusts the latency during streaming.
voice_settings: Includes various voice customization options like stability, similarity_boost, and use_speaker_boost. Learn more.

PlayHT¶

quality: Sets the quality of the audio output.
speed: Controls the playback speed.
emotion, voice_guidance, style_guidance, and text_guidance: Allow further customization of the voice's emotional tone and style. Learn more.

Deepgram¶

Apart from generic parameters like ttsLanguage and voiceName, which are common across most TTS engines, Deepgram offers a few additional parameters that enhance customization:

encoding (string): You can specify the desired encoding format for the output audio file, such as mp3 or wav.
model (enum): Defines the AI model to be used for synthesizing the text into speech. The default model is aura-asteria-en, optimized for natural-sounding English voice output.
sample_rate (string): This enables you to set the sample rate of the audio output, offering control over the quality and clarity of the sound produced.
Container: The Container feature allows users to specify the desired file format wrapper for the output audio generated through text-to-speech synthesis.

These parameters provide additional flexibility for developers to fine-tune the audio output to meet their specific needs. All these parameters will be set inside ttsOptions. Learn more.

AWS¶

Apart from generic parameters like ttsLanguage and voiceName, which are common across most TTS engines, Aws offers a few additional parameters that enhance customization, like ttsEnhanceVoice, also known as an engine.

Amazon Polly has four voice engines that convert input text into lifelike speech. These include “standard," "neural," "generative," and "long-form."

ttsEnhancedVoice = “neural”

Open AI (Whisper)¶

Apart from generic parameters like ttsLanguage and voiceName, which are common across most TTS engines, Whisper offers a few additional parameters that enhance customization, like a model.

For real-time applications, the standard tts-1 model provides the lowest latency but at a lower quality than the tts-1-hd model. Due to how the audio is generated, tts-1 is likely to generate more static content in certain situations than tts-1-hd. In some cases, the audio may not have noticeable differences depending on your listening device and the person.

ttsOptions = {
   model = "tts-1"
}

Primary and Fallback ASR/TTS¶

ASR/TTS Fallback functionality can be implemented at various levels within the system, such as the application level, experience flow level, or even the call control parameter level. This mechanism ensures that if there is an error or failure with the primary ASR (Automatic Speech Recognition) or TTS (Text-to-Speech) service, the system will automatically switch to a secondary, or fallback, ASR/TTS configuration. By doing this, the fallback prevents interruptions in the service and ensures a seamless user experience, regardless of issues with the primary configuration. * For optimal performance, it’s advised to configure the fallback with the same vendor in a different region/label.

Configure Primary and Fallback ASR/TTS¶

Location 1 - Global Setting

In SmartAssist: Configurations > System Setup > Language & Speech > Voice Preferences > Show Advanced Settings.

Location 2 - Call Control Parameters

In SmartAssist: Automation > Select bot > Conversational Skills > Dialog Tasks > Select Dialog Task > Select the Node you want to configure > IVR Properties > Advance Controls > Call Control Parameters.

Location 3 - Experience Flows

In SmartAssist: Configurations > Experience Flows > Update/New Experience Flow > Speech Recognition Engine (ASR/TTS) > Show Advanced Settings.

Location 4 - Start Node in Experience Flow

Note

This feature is available only in ‘SmartAssist’ and not implemented in ‘XO11’. We will implement it in the next releases.
For now, you can add Primary & Fallback ASR/TTS from the same vendor only.
- Example: If you have selected the ‘Microsoft Azure Speech Services’ vendor as the ASR, you can enter a label name from the Microsoft vendor itself, such as ‘my_azure-US’.
- You can configure the label name in Primary ASR/TTS configuration and Fallback ASR/TTS configuration under Show Advanced Settings.
- The fallback ASR/TTS configuration should not be the same as the Primary ASR/TTS configuration.
- Both Primary and Fallback ASR/TTS configurations should be available in SAVG Speech Services otherwise you will not be able to configure in SmartAssist.
- The Credential Status of the Speech services configured in SAVG should be verified. If credential status is failed then ASR/TTS conversations will fail.
In Call control parameters,
- You can configure the fallback for different vendors. But for optimal performance, it’s advised to configure the fallback with the same vendor in a different region.
- In-call control parameters don’t have any validation of duplicate values for Primary and Fallback configurations, so you have to pay closer attention to spelling mistakes. Learn more.

Voice Gateway Properties¶

Parameter	Type	Supporting STT/ TTS	Description	Examples
Provider related parameters Speech-to-text and text-to-speech services interface with the user using a selected language (for example, English US, English UK, or German). Text-to-speech services also use a selected voice to speak to the user (for example, female or male). For Recognizer, Speech-to-text is used, and for synthesizer, Text to Speech sttProvider => google,microsoft => Recognizer ttsProvider => google,microsoft,aws => Synthesizer JSON example: `{` `sttProvider : “google”,` `sttLangauge:”en-IN”` `ttsProvider : “google”` `Language : “en-IN”,` `voiceName :”‘en-IN-Wavenet-A”` `}` For applying the below parameters we always have to use the STT engine as Recognizer otherwise the default is applied that was set as bot level or koreVG/SmartAssist application Note: Provider Properties will be Applied at the Session Level
sttProvider	String	ALL	To Set the Speech to Text Engine At any stage of the call, the bot can dynamically change the speech provider (speech-to-text or text-to-speech) of the call. The provider change can be done for the entire call duration (the current text/audio that is played by the bot).	sttProvider : “google”
sttLanguage	String	ALL	To set STT Language in for recognizing user's voice Note: Transcript will come according to sttLanguage sttLangauge = “zh-CN” All transcripts will come in Chinese Defines the language (for example, "en-ZA" for South African English) of the bot conversation and is used for the speech-to-text service.	sttLanguage = “en-US”
ttsProvider	String	ALL	Silimar to STT Provider	ttsProvider:”microsoft”
ttsLanguage	String	ALL	Similar To sttLanguage The parameter is required to set TTS languages.	Ex ttsLanguage = “en-US”
voiceName	String sttLanguage : ‘en-AU’,	ALL	voiceName is mandatory to text to speech conversion. Voice name should be correctly aligned to ttsLanguage. VoiceName is used in TTS only for bot response.	voiceName : ‘en -AU-NatashaNeural’ For example - `{ ttsPrrovider : ‘microsoft’, sttLanguage : ‘en-AU’, voiceName : ‘en-AU-NatashaNeural’ }`
enableSpeechInput	Boolean	All	If False, Allow only DTMF Input, By default It is always true and can be used in entity nodes Do not use this in Channel Over-rider Script. It is meant to be used only through the Call Control Parameter.	Example - enableSpeechInput: false
Labels and Fallback Providers Label - Assign/Create a label only if you need to create multiple speech services from the same vendor. Then, use the label in your application to specify which service to use. How to Configure Label (1) Add a speech service Inside the speech tab. (2) Select a provider and add a label with a unique name. (3) Use the same label in the call control parameter. (4) The NODE at which you use FallBack Call control parameters at the Same node Primary Recognizer and Synthesizer is NECESSARY to pass. Examples: sttProvider = “google”, sttLanguage = “en-US”, sttLabel = “google-stt-2” Examples: STT “sttProvider”: “microsoft”, “sttLabel”: “my_azure-US”, “sttLanguage: “en-US” TTS “ttsProvider” : “microsoft”, “ttsLanguage”: “en-US”, “voiceName”: “en-US-AmberNeural” “ttsLabel” : “my_azure-US” FallBack Examples “sttProvider” : “microsoft”, “sttLabel” : “my_azure-US”, “sttLanguage: “en-US” “ttsProvider” : “microsoft”, “ttsLanguage”: “en-US”, “voiceName”: “en-US-AmberNeural” “ttsLabel” : “my_azure-US” “sttFallbackProvider” : “microsoft”, “sttFallbackLanguage: “en-US”, “sttFallbackLabel”:”my_azure_Europe” “ttsFallbackProvider” : “microsoft”, “ttsFallbackLanguage” : “en-US” “ttsFallbackLabel” : “my_azure_Europe” “ttsFallbackVoiceName”: “en-US-AmberNeural” Notes: The NODE at which you use FallBack call control parameters, at the same node Primary Recognizer and Synthesizer is NECESSARY to pass. The best practice is to keep the same ASR Engine in Fallback with a different Label. If the current provider fails, Kore VG will pick a fallback provider. Similarly, we can add a Fallback for the TTS Provider. Fallback properties will be applied at the session level.
sttLabel	String		Uniquely identify ASR engine in Kore VG.
sttFallbackLabel	String		If fallback is enabled in Kore VG at the application level, then in case of any error the switch will happen of ASR to fallback configuration, it is recommended to have a fallback to the same vendor with a different region.
sttFallbackProvider	String		Fallback provider details
sttFallbackLanguage	String		Fallback language details
ttsLabel	String		Uniquely identify TTS engine in Kore VG
ttsFallbackLabel	String		Fallback Label details
ttsFallbackProvider	String		Fallback provider details
ttsFallbackLanguage	String		Fallback language details
ttsFallbackVoice	String		Fallback voice details
Continuous ASR Continuous ASR (automatic speech recognition) is a feature that allows speech-to-text (STT) recognition to be tuned for the collection of things like phone numbers, customer identifiers, and other strings of digits or characters, which, when spoken, often have pauses between utterances. Note: For Only Microsoft Microsoft Azure Introduces one ASR Property that works the same way as Continuous ASR, AzureSegmentationSilenceTimeout. Since Silence is detected by ASR Engine Directly Instead of Voice Gateway, detect and merge the response. AzureSegmentationSilenceTimeout is more accurate than continuous ASR. Learn more. Note: Continuous ASR/AzureSegmentationSilenceTimeout is applied at the session level. Throughout the Call, it will be active, and the developer can adjust the value at different nodes based on the requirement.
continuousASRTimeoutInMS	Number in millisecond	ALL	This is a duration of silence, in seconds, to wait after a transcript is received from the STT vendor before returning the result. If another transcript is received before this timeout elapses, then the transcripts are combined and recognition continues. The combined transcripts are returned once a timeout between utterances exceeds this value	For example - 5000 for 5 sec
continuousASRDigits	Any digit For example - *,%,&,#	ALL	a DTMF key which, if entered, will also terminate the gather operation and immediately return the collected results	continuousASRDigits : &
Barge-IN The Barge-In feature controls Kore VG behavior in scenarios where the user starts speaking or dials DTMF digits while the bot is playing its response to the user. In other words, the user interrupts ("barges-in") the bot. Note: Barge-in Will be applied at the Node level.
listenDuringPrompt	Boolean - True or False	ALL	If false, do not listen for user speech until the bot has finished playing its response to the user. Defaults to true Similar to Barge-in.
bargeInMinWordCount	Number	ALL	If barge-in is true, only kill speech when this many words are spoken. Defaults to 1.
bargeInOnDTMF	Boolean	ALL	Press any key to enable DTMF, and kill audio playback if the caller enters DTMF then you can tell your utterance or speech.
Timeout related parameters Note: All Timeout Parameters will be applied at the Node level.
userNoInputTimeoutMS	Number in millisecond 1 sec - 1000	ALL	Define the maximum wait time to receive user input If userNoInputTimeoutMS = 0 Kore VG will wait for an infinite time for User Input. Defines the maximum time (in milliseconds) that VoiceAI Connect waits for input from the user.	userNoInputTimeoutMS = 20000
dtmfCollectInterDigitTimeoutMS	Number - Time in milliseconds	ALL	Defines the timeout that Kore VG waits for the user to press another digit before it sends all the digits to the bot.
dtmfCollectSubmitDigit	Number	ALL	Defines a special DTMF "submit" digit that when received from the user, KoreVg immediately sends all the collected digits to the bot (as a DTMF message), without waiting for the timeout to expire or for the maximum number of expected digits.
dtmfCollectMaxDigits	Number	ALL	Maximum number of DTMF digits expected to gather Example If maxDigit = 5 So Bot will take only a maximum of 5 digits Input 1234567 bot takes only 12345.
dtmfCollectminDigits	Number	ALL	Minimum number of DTMF digits expected to gather. Defaults to 1
dtmfCollectnumDigits	Number	ALL	The exact number of DTMF digits is expected to be gathered.

Call Control Parameters¶

Node Level Call Control¶

Channel Level Call Control¶

Update/Modify Parameters¶

Supported Speech Engines¶

Common ASR Parameters¶

Microsoft ASR

Google ASR

AWS ASR

Nvidia ASR

Deepgram ASR

Common TTS Parameters¶

TTS Options in Kore VG¶

Structure of ttsOptions¶

ElevenLabs¶

PlayHT¶

Deepgram¶

AWS¶

Open AI (Whisper)¶

Primary and Fallback ASR/TTS¶

Configure Primary and Fallback ASR/TTS¶

Voice Gateway Properties¶

Provider related parameters

Labels and Fallback Providers

Continuous ASR

Barge-IN

Timeout related parameters

Structure of `ttsOptions`¶