VAD Parameters

Voice Activity Detection (VAD)

The [VADParams] section controls options for how the Speech Engine deals with voice activity. It is a text file that allows speech application developers to fine-tune how their application responds to incoming audio.

You can load LV_Platform.ini by going to View > Settings and clicking Voice Activity Detection Settings

Altering the values of these parameters can be an important part in tuning a speech recognition application. Most of the parameters are very application-specific and optimal values cannot be universally prescribed. In other words, many of them will require that speech application developers experiment with different values and find the ideal values for their applications.

We recommend the following settings for telephony applications:

Parameter Recommended Value
EOSDelay 1800 (ms)
WindBackTime 256 (ms)
NoiseSensitivity 50
VolumeSensitivity 50

All time values are in milliseconds. All other values are internal values and do not correspond to any real-world units.

The most commonly changed parameters will be VolumeSensitivity, for dealing with poor echo cancellation, and EOSDelay to allow callers to perform tasks such as reading digits. You may also find it useful to enable prompt normalization to combat barge-in problems.

The following parameters can be set:

EOSDelay

The amount of time, in milliseconds, between when a caller stops speaking and when the Engine detects end of speech. If callers are speaking with large pauses between words, increasing the end of speech delay may help prevent the system from cutting them off. Shorter values allow faster responses. Longer values are useful for things such as reading digit strings, where callers are likely to have long pauses between words.

WindBackTime

The length of time, in milliseconds, that the Speech Engine should wind back the audio after the detection of speech. This helps with times where the first few sounds a caller makes are not loud enough or distinct enough to trigger voice detection.

NoiseSensitivity

Determines how much louder the speaker must be than the background noise in order to trigger barge -in. The smaller this value, the easier it will be to trigger barge-in.

VolumeSensitivity

The volume required to trigger barge-in. The smaller the value, the more sensitive barge-in will be. This is primarily used to deal with poor echo cancellation. By setting this value higher (less sensitive) prompts that are not properly cancelled will be less likely to falsely cancel barge-in.

Sensitivity

This value controls both NoiseSensitivity and VolumeSensitivity.

EnableNorm

Turning this option on normalizes (equalizes the volume) the prompts that are played to callers. This is particularly useful for echo cancellation; prompts that are too loud can sometimes trigger barge-in if the prompt audio echoes back into the system.

Scale

This controls the level prompts are played at if prompt normalization enables. If the scale is too high, prompts can trigger barge-in. It is set on a scale of 1-100, with 100 being the loudest.

© 2012 LumenVox LLC. All rights reserved.