Configuration for the LumenVox Speech Engine on Asterisk is largely controlled by the lumenvox.conf file located in /etc/asterisk/. It is broken up into a few sections.
The [general] section contains a few general values:
servers
allows you to specify the Speech Engine Servers that should perform decodes. By default, this is set to 127.0.0.1, the local IP address.
save_sound_files
allows you to save response files (.callsre files) in the Engine installation directory, under /Lang/Responses/ and then in folders by date. These files contain audio of each call and a wealth of recognition information for use with the LumenVox Speech Tuner.
Note: In order for these files to be written, the user running Asterisk must have permission to write to the Responses directory.
The [grammars] section contains a list of grammars to load when starting the res_speech_lumenvox.so module. Each line should be in the following format:
grammarname=path
The grammarname is an identifier for the grammar you will use within your Asterisk speech applications. The path is the full path to the grammar file.
It is not necessary to keep grammars in this file as they can be loaded and unloaded by individual applications.
The [default] section contains several default values that will be used for voice activity detection (VAD), which is how the Speech Engine differentiates between actual voice and other background noise.
By adding additional contexts, you can define several different profiles that you can quickly switch between within your Dial Plan using the SPEECH_ENGINE Dial Plan function. For instance, if you had a profile named [custom], you could set all of the VAD values to the ones in [custom] by doing Set(SPEECH_ENGINE(profile)=custom).
You may also adjust any specific parameter via this method, e.g. Set(SPEECH_ENGINE(vad_bargein_level)=60)
The variables are as follows:
vad_snr_sensitivity
Determines how much louder the speaker must be than the background noise in order to trigger barge-in. The smaller this value, the easier it will be to trigger barge-in. This is set on a scale of 1-100.
vad_volume_sensitivity
The volume required to trigger barge-in. The smaller the value, the more sensitive barge-in will be. This is primarily used to deal with poor echo cancellation. By setting this value higher (less sensitive) prompts that are not properly cancelled will be less likely to falsely cancel barge-in. This is set on a scale of 1-100.
vad_eos_delay
The amount of time, in milliseconds, between when a caller stops speaking and when the Engine detects end of speech. If callers are speaking with large pauses between words, increasing the end of speech delay may help prevent the system from cutting them off. Shorter values allow faster responses. Longer values are useful for things such as reading digit strings, where callers are likely to have long pauses between words.
end_of_speech_timeout
The total amount of time, in milliseconds, that a caller has to speak. If the Engine has not detected end of speech by this time, the recognition will be stopped. If you are asking questions that will take a while for callers to answer (e.g. reading lengthy account numbers), set this value higher.
vad_wind_back
The length of time, in milliseconds, that the Speech Engine should wind back the audio after the detection of speech. This helps with times where the first few sounds a caller makes are not loud enough or distinct enough to trigger voice detection.
vad_burst_threshold
Controls the amount of time, in milliseconds, a voice must be detected before barge-in will be triggered. Changing this parameter helps stop noise from triggering barge-in.