As of version 9.0, the LumenVox Speech Engine features the ability to use two different types of acoustic models – semi-continuous models and higher quality continuous models. Previous to version 9.0, LumenVox used only semi-continuous acoustic models.
The difference between the two model types is the level of recognition accuracy and the amount of processing needed to complete the audio decodes. The continuous models have shown an accuracy increase across various domains, but at the expense of approximately 15-20% more processing time. In some cases, the semi-continuous decoder proved more accurate (however, the continuous decoder uses roughly one-third of the memory used by the semi-continuous decoder).
The table below shows some of the results of our accuracy testing between the two models during their development.
Test |
Continuous Model |
Semi-Continuous Model |
Natural Number |
92.71 |
90.69 |
Digits |
88.38 |
82.75 |
YesNo |
98.95 |
98.72 |
Date |
93.90 |
91.53 |
Dollar Amount |
93.66 |
89.56 |
Name at Agency |
90.32 |
89.17 |
Restaurant |
91.81 |
88.24 |
CityState |
88.20 |
82.97 |
Call Router Untuned |
85.67 |
86.25 |
Call Router Tuned |
96.49 |
96.76 |
Not all of our languages are supported in the continuous model yet. This is because each acoustic model (language) needs to be retrained for processing in a fully continuous mode.
Currently, LumenVox only has continuous models for American English and Australian English
This means that if you are using another language, such as either Spanish dialect, you must switch the decoder back to semi-continuous mode by modifying the sre_server.conf file, which is located in /etc/lumenvox/.
In that file there is a parameter called HMM_TYPE. Enter SEMI if you want to use semi-continuous models, or CONT for continuous models.
You must then restart the Speech Engine service for the changes to take effect.
In almost all cases, LumenVox recommends using the continuous model when it is available. The only time we recommend using the semi-continuous models and decoder is when fast decode time is more important than higher accuracy.
In addition to picking between semi-continuous and continuous models, in version 9.1 LumenVox supports various resolutions for the American English model. These settings only work when the Engine is in continous mode.
Three models are availble: low, medium, and high resolution versions. Higher resolution models offer better accuracy, but use more memory and CPU time.
To choose which model to use, there are three new settings in sre_settings.conf that allow you to specify which models will be loaded: LOAD_LOW_RES_MODEL, LOAD_MED_RES_MODEL, AND LOAD_HIGH_RES_MODEL. You may set any of those to 1 in order to load the model.
You must set at least one of these to 1 in order to start the Engine. If more than one is set to 1, all the specified resolutions will be loaded. When doing a decode, the Engine will default to the lowest resolution model that it loaded; this can be changed via SetPropertyEx and the PROP_EX_ACOUSTIC_MODEL_RESOLUTION parameter.