- Installation
- Administration
- Programmer's Guide
- Grammars
- MRCP Server
- FAQs
This file controls settings related to the speech client. This configuration file controls the bulk of functionality for speech recognition options.
It is located by default in /etc/lumenvox/ on Linux and in C:\Program Files\LumenVox\Engine\config\ on Windows. See Configuration Files for more information about other configuration files.
The following parameters can be set. The format to use within the configuration file is PROPERTY_NAME = VALUE
This section contains global configuration settings for both SRE (Speech Engine) and TTS (text-to-speech server).
VERSION
Description: Contains information about the version of the software that created the configuration file. Do not modify this.
Possible Values: This should not be modified by users.
LICENSE_SERVERS
Description: The IP address or hostname and (optionally) the port of the License Server to use.
Possible Values: A string of IP addresses or hostnames followed optionally by a colon and a port number, separated by commas. E.g. you could specify 127.0.0.1:7569,10.0.0.1:4971 to use two License Servers -- the first at 127.0.0.1 and port 7569, the second at 10.0.0.1 on port 4971.
Default Value: 127.0.0.1:7569
Note: Prior to the 10.0 release of the software, this property was known as LIC_SERVER_HOSTNAME
LICENSE_CACHE_PERIOD
Description: Licenses acquired from a license server during port creation are normally released back to the license server when the port is destroyed. License caching mechsnism, when enabled, prevents such licenses from being released back to the license server during port destruction. Instead, it is held in a cache and will be reused during a subsequent port creation. This setting controls the duration for which such a license will be held in cache. If the cached license doesn't get reused before this period elapses, it is then released back to the license server automatically. This helps improve performance by reducing the amount of communication with the license server.
Possible Values: A non-negative integer
Default Value: 30
Note: To disable license caching, set the value to zero.
LOGGING_VERBOSITY
Description: Controls the verbosity of event logging. This can be used to increase or decrease the amount of information logged by the application. Note that increasing the logging verbosity causes increase in CPU usage, and should therefore be avoided wherever possible in a production environment where optimal performance is critical.
Possible Values: 1 - 3
1 = Minimal logging. Logs only errors and critical issues.
2 = Medium logging. Logs all non-debug inofrmation, includes types covered in Minimal logging as well.
3 = Maximum logging. Logs all types of events. This will include any and all informational and debugging activity.
Default Value: 1
This section contains configuration settings for use with the Speech Engine.
SRE_SERVERS
Description: This property sets which Speech Engine servers are used for processing decodes.
Possible Values: A list of IP addresses and optional ports separated by semicolons. For instance, 127.0.0.1;10.0.0.1:5721 specifies a server at 127.0.0.1 using the default port of 5730, and a server at 10.0.0.1 using the port 5721.
Default Value: 127.0.0.1
LICENSE_TYPE
Description: The license type to use when creating a port.
Possible Values: VoxLite, SpeechPort, SLM, or Auto.
If the value is set to VoxLite,
the client will get the license from the Lite license pool (these licenses only allow up to 500
vocabulary items per recognition). If the value is set to SpeechPort, the client
will get the license from the Full license pool. By default, the client will auto-pick the license;
it will use up Full licenses before using Lite licenses.
Default Value: Auto
MAX_NBEST_RETURNED
Description: Specifies the maximum number of n-best results to be returned by the Engine.
Possible Values: Number of n-best results.
Default Value: 1
DECODE_TIMEOUT
Description: In a non-blocking decode, this is the timeout value, in milliseconds, used by LV_SRE_WaitForDecode and LVSpeechPort::WaitForDecode functions. In blocking decode, this is the time to wait until the decode times out and returns an error from LV_SRE_Decode and LVSpeechPort::Decode.
Possible Values: Time in milliseconds.
Default Value: 20000
LOAD_GRAMMAR_TIMEOUT
Description: Specifies how long, in milliseconds, the client should wait for a grammar to load. If the timeout is reached before the grammar is loaded, the LoadGrammar function returns error code -37, LV_LOAD_GRAMMAR_TIMEOUT.
Possible Values: Time in milliseconds.
Default Value: 200000
STRICT_SISR_COMPLIANCE
Description: Controls whether LumenVox will strictly implement the final SISR 1.0 standard for adding tags to grammars. Unless this value is changed, LumenVox normally runs in strict mode, using the final SISR 1.0 standard unless the grammar's tag-format is declared as lumenvox/1.0. If strict compliance is disabled, then LumenVox will treat a tag-format declaration of semantics/1.0 in a backwards compatibility mode, using the older draft of SISR.
Possible Values: 0 (disabled) or 1 (enabled)
Default Value: 1
TRIM_SILENCE_VALUE
Description: Controls how aggressively the Engine trims leading silence in incoming audio.
Possible Values: A number ranging from 0 (very aggressive) to 1000 (no silence trimmed).
Default Value: 970
SAVE_SOUND_FILES
Description: Controls whether the application will save off .callsre files used with the LumenVox Speech Tuner. Turn this on to capture audio and more information related to each decode. These files will be saved by default to /var/log/lumenvox/client/responses/ on Linux and C:\Program Files\LumenVox\Engine\Lang\Responses\ on Windows. See additional info here.
Possible Values: 0 - 3
0 = NONE
1 = BASIC
2 = ADVANCED
3 = ALL
Default Value: 0
NOISE_REDUCTION_ENABLE
Description: Specifies the Noise Reduction Model to be used by the Engine. These strip out background noise in audio being processed by the Engine. For most users the default noise reduction algorithm should work best. For certain noise coditions the Alternate noise reduction algorithm has shown better results. Hence, advanced users can try switching the algorithm to see if it improves their performance in noisy conditions. The Adaptive noise reduction algorithm works best only when the noise is constantly changing such as car or highway noise. For more stationary noises like fan noise, the default algorithm will show the best performance.
Possible Values: 0 - 3
0 = No Noise Reduction
1 = Default
2 = Alternate
3 = Adaptive
Default Value: 1
CLIENT_CACHE_ENABLE
Description: Enable or disables client side (SpeechPort) grammar caching. This can significantly reduce grammar load times, since processing of grammars is cached in memory and disk, improving performance.
Possible Values: 0 (caching disabled) or 1 (enabled)
Default Value: 1
CLIENT_CACHE_EXPIRATION
Description: The amount of time, in minutes, to allow an unused grammar to remain in memory. After a grammar has remained unused for this period of time, it will be unloaded from memory, but will remain in the disk cache, allowing fast reactivation and reloading if needed.
Possible Values: A number between 2 (minimum) and 1000000 (maximum)
Default Value: 1440
CLIENT_CACHE_MAX_NUMBER
Description: The maximum number of cached grammar entries to hold in memory at any time.
Possible Values: A number between 2 (minimum) and 1000000 (maximum)
Default Value: 100
CLIENT_CACHE_MAX_MEMORY
Description: The maximum size of memory to utilize for caching grammars.
Possible Values: A number of bytes between 100000 (min) and 536870912 (max)
Default Value: 268435456 (256 MB)
NOTE: The following two parameters are left out of the configuration file by default. They may be added in and used, but in most cases should not be touched.
LANGUAGE
Description: This is only used when you are using concept/phrase grammars and a language other than English. Using SRGS grammars, this value does nothing.
Possible Values: A string specifying the language name.
Default Value: (null)
SEARCH_BEAM_WIDTH
Description: Changes the size of the speech search beam.
Possible Values: A number between 0 and 1.
Default Value: 1e-6
This section contains global configuration settings for the text-to-speech server. Note that many of these settings will not be used by most users and are included in order to reflect the requirements of the Speech Synthesis Markup Language (SSML) standard. Users interested in more information about the various prosody settings would be advised to read the specification for the standard.
TTS_SERVERS
Description: This property sets which TTS Servers Engine servers are used for processing decodes
Possible Values: A list of IP addresses and optional ports separated by semicolons. For instance, 127.0.0.1;10.0.0.1:5721 specifies a server at 127.0.0.1 using the default port of 7579, and a server at 10.0.0.1 using the port 5721.
Default Value: 127.0.0.1
TTS_REQUEST_TIMEOUT
Description: The amount of time to wait, in milliseconds, for a response from the TTS Server after sending a request for speech synthesis.
Possible Values: An amount of time, in milliseconds.
Default Value: 10000 (10 seconds)
SYNTHESIS_LANGUAGE
Description: The default language to use for synthesis.
Possible Values: A valid language and country code. Languages are two letters and lowercase and a country code is two letters and uppercase. You will need a license for this language and the language pack installed on the TTS server(s) in order to get synthesis with the specified language.
Default Value: en-US
SYNTHESIS_SAMPLING_RATE
Description: The default sampling rate (in Hz) to use for synthesized speech.
Possible Values: A valid sampling rate. You will need the appropriate license; in almost all cases this value should not be changed.
Default Value: 8000
SYNTHESIS_SOUND_FORMAT
Description: The default codec/format for the synthesized audio.
Possible Values: A value from 1 to 3, representing the following values:
LOG_TTS_EVENTS
Description: Whether the application will generate TTS event log files for use with the Speech Tuner. Similar to the SAVE_SOUND_FILES option for Speech events.
Possible Values: 0 (off) or 1 (on).
Default Value: 0
SYNTH_PROSODY_PITCH
Description: The pitch of the audio being synthesized.
Possible Values: A number followed by "Hz", a relative change, or one of the following values: "x-low", "low", "medium", "high", "x-high", or "default". See the SSML standard for more detail.
Default Value: default
SYNTH_PROSODY_CONTOUR
Description: The contour of the audio being synthesized.
Possible Values: Please refer to the SSML standard on pitch contour for details.
Default Value: blank to use the default setting
SYNTH_PROSODY_RANGE
Description: The range of the audio being synthesized.
Possible Values: A number followed by "Hz", a relative change, or one of the following values: "x-low", "low", "medium", "high", "x-high", or "default". See the SSML standard for more detail.
Default Value: default
SYNTH_PROSODY_RATE
Description: The speaking rate of the audio being synthesized.
Possible Values: A relative change or "x-slow", "slow", "medium", "fast", "x-fast", or "default". See the SSML standard for more detail.
Default Value: default
SYNTH_PROSODY_DURATION
Description: The duration of time it will take for the synthesized text to play.
Possible Values: A time, such as "250ms" or "3s".
Default Value: default
SYNTH_PROSODY_VOLUME
Description: The volume of the audio being synthesized.
Possible Values: A number, a relative change or one of: "silent", "x-soft", "soft", "medium", "loud", "x-loud", or "default". See the SSML specification for more information.
Default Value: default
SYNTH_VOICE_GENDER
Description: The gender of the voice that will be used for synthesis.
Possible Values: Either neutral (which uses the default), male, or female.
Default Value: neutral
SYNTH_VOICE_AGE
Description: The age of the voice used for synthesis.
Possible Values: A non-negative integer.
Default Value: default
SYNTH_VOICE_VARIANT
Description: The prefered voice variant to be used in the synthesis.
Possible Values:
Default Value: default
SYNTH_VOICE_NAME
Description: The name of the voice to be used in the synthesis.
Possible Values: A name of a valid TTS voice. This will vary depending on the TTS licenses and voice packs you have installed. If left blank, it will default to the first voice for which the system has a license and an installed voice pack.
Default Value: blank
SYNTH_EMPHASIS_LEVEL
Description: The strength of the emphasis used in the voice during synthesis.
Possible Values: One of: "strong", "moderate", "none" or "reduced".
Default Value: moderate
If you wish to use License Authentication, you must add a new section to the client_property.conf file called [AUTHENTICATION] that appears at the bottom of the file.
This section must have two (and only two) key/value pairs:
AUTHENTICATION_USERNAME
Description: The username to be used by the client when requesting licenses from a License Server (there should be a corresponding entry in the license_authentication.conf file on the server side.
Possible Values: A string specifying the username.
Default Value: (null)
AUTHENTICATION_PASSWORD
Description: The passwod to be used by the client when requesting licenses from a License Server (there should be a corresponding entry in the license_authentication.conf file on the server side.
Possible Values: A string specifying the password.
Default Value: (null)
As an example, if your username is Alice and your password is alicespassword then you should have an [AUTHENTICATION] section that looks like:
[AUTHENTICATION]
AUTHENTICATION_USERNAME=alice
AUTHENTICATION_PASSWORD=alicespassword