VAD Parameters
Voice Activity Detection (VAD)
The [VADParams] section controls options for how the Speech Engine deals with voice activity. It is a
text file that allows speech application developers to fine-tune how their application responds to
incoming audio.
You can load LV_Platform.ini by going to View > Settings and clicking Voice Activity Detection Settings
Altering the values of these parameters can be an important part in tuning a speech recognition application.
Most of the parameters are very application-specific and optimal values cannot be universally prescribed. In
other words, many of them will require that speech application developers experiment with different values and
find the ideal values for their applications.
We recommend the following settings for telephony applications:
Parameter |
Recommended Value |
EOSDelay |
1800 (ms) |
WindBackTime |
256 (ms) |
NoiseSensitivity |
50 |
VolumeSensitivity |
50 |
All time values are in milliseconds. All other values are internal values and do not correspond to any
real-world units.
The most commonly changed parameters will be VolumeSensitivity, for dealing with poor echo cancellation,
and EOSDelay to allow callers to perform tasks such as reading digits. You may also find it useful to enable
prompt normalization to combat barge-in problems.
The following parameters can be set:
EOSDelay
The amount of time, in milliseconds, between when a caller stops speaking and when the Engine
detects end of speech. If callers are speaking with large pauses between words, increasing the
end of speech delay may help prevent the system from cutting them off. Shorter values allow faster
responses. Longer values are useful for things such as reading digit strings, where callers are
likely to have long pauses between words.
WindBackTime
The length of time, in milliseconds, that the Speech Engine should wind back the audio after the
detection of speech. This helps with times where the first few sounds a caller makes are not loud
enough or distinct enough to trigger voice detection.
NoiseSensitivity
Determines how much louder the speaker must be than the background noise in order to trigger barge
-in. The smaller this value, the easier it will be to trigger barge-in.
VolumeSensitivity
The volume required to trigger barge-in. The smaller the value, the more sensitive barge-in will be.
This is primarily used to deal with poor echo cancellation. By setting this value higher (less sensitive)
prompts that are not properly cancelled will be less likely to falsely cancel barge-in.
Sensitivity
This value controls both NoiseSensitivity and VolumeSensitivity.
EnableNorm
Turning this option on normalizes (equalizes the volume) the prompts that are played to callers. This
is particularly useful for echo cancellation; prompts that are too loud can sometimes trigger barge-in
if the prompt audio echoes back into the system.
Scale
This controls the level prompts are played at if prompt normalization enables. If the scale is too high,
prompts can trigger barge-in. It is set on a scale of 1-100, with 100 being the loudest.