lumenvox.conf

Configuration for the LumenVox Speech Engine on Asterisk is largely controlled by the lumenvox.conf file located in /etc/asterisk/. It is broken up into a few sections.

[general]

The [general] section contains a few general values:

servers

allows you to specify the Speech Engine Servers that should perform decodes. By default, this is set to 127.0.0.1, the local IP address.

save_sound_files

allows you to save response files (.callsre files) in the Engine installation directory, under /Lang/Responses/ and then in folders by date. These files contain audio of each call and a wealth of recognition information for use with the LumenVox Speech Tuner.

Note: In order for these files to be written, the user running Asterisk must have permission to write to the Responses directory.

[grammars]

The [grammars] section contains a list of grammars to load when starting the res_speech_lumenvox.so module. Each line should be in the following format:

grammarname=path

The grammarname is an identifier for the grammar you will use within your Asterisk speech applications. The path is the full path to the grammar file.

It is not necessary to keep grammars in this file as they can be loaded and unloaded by individual applications.

[default]

The [default] section contains several default values that will be used for voice activity detection (VAD), which is how the Speech Engine differentiates between actual voice and other background noise.

By adding additional contexts, you can define several different profiles that you can quickly switch between within your Dial Plan using the SPEECH_ENGINE Dial Plan function. For instance, if you had a profile named [custom], you could set all of the VAD values to the ones in [custom] by doing Set(SPEECH_ENGINE(profile)=custom).

You may also adjust any specific parameter via this method, e.g. Set(SPEECH_ENGINE(vad_bargein_level)=60)

The variables are as follows:

vad_snr_sensitivity

Determines how much louder the speaker must be than the background noise in order to trigger barge-in. The smaller this value, the easier it will be to trigger barge-in. This is set on a scale of 1-100.

vad_volume_sensitivity

The volume required to trigger barge-in. The smaller the value, the more sensitive barge-in will be. This is primarily used to deal with poor echo cancellation. By setting this value higher (less sensitive) prompts that are not properly cancelled will be less likely to falsely cancel barge-in. This is set on a scale of 1-100.

vad_eos_delay

The amount of time, in milliseconds, between when a caller stops speaking and when the Engine detects end of speech. If callers are speaking with large pauses between words, increasing the end of speech delay may help prevent the system from cutting them off. Shorter values allow faster responses. Longer values are useful for things such as reading digit strings, where callers are likely to have long pauses between words.

end_of_speech_timeout

The total amount of time, in milliseconds, that a caller has to speak. If the Engine has not detected end of speech by this time, the recognition will be stopped. If you are asking questions that will take a while for callers to answer (e.g. reading lengthy account numbers), set this value higher.

vad_wind_back

The length of time, in milliseconds, that the Speech Engine should wind back the audio after the detection of speech. This helps with times where the first few sounds a caller makes are not loud enough or distinct enough to trigger voice detection.

vad_burst_threshold

Controls the amount of time, in milliseconds, a voice must be detected before barge-in will be triggered. Changing this parameter helps stop noise from triggering barge-in.

Sample lumenvox.conf file

; LumenVox configuration file

[general]
servers=127.0.0.1 ; Speech Engine Servers to use.
save_sound_files=no ; Set to yes to save sound files for use with Speech Tuner

; Pre-loaded grammars
; Any grammars specified here will be loaded automatically at startup.
; Specify grammars with the following syntax:
; name=pathtogrammar
; E.g. a grammar called "digits" might be loaded as follows:
; digits=/etc/asterisk/grammars/english_digits.gram

[grammars]

; LumenVox profiles
; A tweaking profile can be used by using the SPEECH_ENGINE dialplan function.
; For example, to apply a profile called "custom" you would use the function:
; Set(SPEECH_ENGINE(profile)=custom)

[default]

; The settings within the [default] context are loaded automatically.
; You may also set individual values within an application by
; using SPEECH_ENGINE(name)=value, e.g. Set(SPEECH_ENGINE(vad_eos_delay)=500
; Signal-to-noise sensitivty
; Determines how much louder the speaker must be than the background noise in
; order to trigger barge-in. The smaller this value, the easier it will
; be to trigger barge-in. This is set on a scale of 1-100.

vad_snr_sensitivity=50

; Volume sensitivity
; The volume required to trigger barge-in. The smaller the value, the more
; sensitive barge-in will be. This is primarily used to deal with poor echo
; cancellation. By setting this value higher (less sensitive) prompts that are
; not properly cancelled will be less likely to falsely cancel barge-in.
; This is set on a scale of 1-100.

vad_volume_sensitivity=50

; End-of-speech delay
; This is the amount of time, specified in milliseconds, that the
; Engine must detect silence after speech before it begins processing
; the utterance. Set this value lower to capture short utterances such as
; single words. Set it lower for longer utterances that are likely to have many
; pauses in between speech, such as long digit strings.

vad_eos_delay=1250

; Wind-back length
; The length of audio to be wound back at the beginning of voice activity. This
; is used primarily to counter instances where barge-in does not accurately
; capture the very start of speech. The resolution of this parameter
; is 1/8 of a second.

vad_wind_back=750

; End of speech timeout
; This is the total amount of time to listen for speech after barge-in has
; been detected. This differs from the end-of-speech timeout as this is
; the total time a speaker has to speak, not the length of time between
; individual words. This parameter should not usually need to be changed.

end_of_speech_timeout=15000

; Whether to use the out-of-vocabulary filter during decode.

use_oov_filter=no

© 2012 LumenVox LLC. All rights reserved.