- Installation
- Administration
- Programmer's Guide
- Grammars
- MRCP Server
- FAQs
By setting the language identifier in a grammar, you can load different acoustic models and get recognitions in various languages.
In order to recognize sounds from different languages, we "train" the Speech Engine on large sets of transcribed audio from each language. The result of this process is an acoustic model, a large file that contains information about the way words in a language sound.
The LumenVox Speech Engine includes support for American English, Australian English, Indian English, U.K. English, Mexican Spanish, South American Spanish, and Canadian French.
The Speech Engine comes complete with all of the acoustic models you need in order to use the supported languages. By default, the Speech Engine will only recognize American English. To use other languages, you must first copy them into the proper folder.
On Windows, inside the Engine's installation directory is a Lang folder, and inside of that is a directory called OtherLanguages which contains the various acoustic models. On Linux, these models are in /etc/lumenvox/Lang/ by default. Simply copy the models you wish to use into the Dict directory in the Lang folder. You must stop and restart the Speech Engine service for these new models to be available.
It is very important to note that you cannot have two acoustic models active the same time. Within an application, you could switch between English and Spanish, but you could not decode an utterance against both models at the same time. This means you cannot have two grammars with different languages active at once -- doing so will cause an error.
Each acoustic model uses a significant amount of memory, so you should not load models you will not be using. If you need models that use less memory (at a cost of being less accurate), you should see our instructions for less memory intensive models.
The LumenVox Speech Engine can use either of two algorithms for recognizing patterns in speech – continuous or semi-continuous acoustic models.
In most cases, more accuracy can be gained from the continuous model, but at the expense of processing time. The continuous model typically uses 15-20% more processing time than the semi-continuous.
Currently, LumenVox only has continuous models for American English and Australian English.
When working with our continuous models, you must make sure that the acoustic model you’re using is compatible with the continuous decoder. This can be verified by checking this help page.
If you’re using an acoustic model that does not have continuous mode support, then you must switch to the semi-continuous mode. Otherwise the Speech Engine will not function correctly.
If you are looking to use a continuous mode-supported acoustic model (en-AU Australian English, for example), and you’ve already declared the continuous mode in your sre_server.conf, then you only need to load the acoustic model as normal.
We’re working on adding more language support in the Continuous model type. Please check back later if you do not see your language supported here.
All languages are supported with the Semi-Continuous decoder.
Please see our Continuous vs. Semi-Continuous Model page for help in switching between the model types.Which one you should use for your application will usually be obvious. A call router in America is best served by American English, and one in Australia by Australian English.
But there are cases where it's not so clear. What if you were developing an application for South African speakers, who speak an English dialect for which there is currently no acoustic model?
In that case you may want to try UK English first, as South African English is heavily influenced by UK English, and then the other models. The same sort of logic would go for other English or Spanish dialects. For instance, if you want to recognize Spanish speakers from Spain, you would try both our Mexican and South American Spanish models to see which works best.
You may need to experiment a little bit to see which model works best for your speakers. Try different models and add phonetic spellings for words that are commonly misrecognized.