- Installation
- Administration
- Programmer's Guide
- Grammars
- MRCP Server
- FAQs
A good speech recognition application depends on a well designed grammar. A grammar which contains very similar words (like "bit" and "pit") is an inefficient grammar that will hurt accuracy and speed. The engine will take longer as it tests the competing words against the audio. The resulting match will have a lower confidence because of the additional similar words.
The confidence score is a rough measure of how closely the speech matched the phrases in the grammar. The score ranges from 0 - 1000. The higher the score, the higher the estimated probability that the result is correct. A score of 500 indicates the Engine is 50 percent sure the result is correct. Typically, an application designer will use the confidence score to make decisions about the quality of a recognition result.
For instance, results over 600 might always be accepted, results between 599 and 200 might trigger a confirmation, and results below 200 might be rejected outright. The thresholds to use depend largely on the grammar that is being used. In addition to the grammars, an application's confidence thresholds should be one of the first things to tune.
Smaller grammars work better. The practical limit is 10,000 phrases, but the smaller the grammar, the greater the accuracy.
Longer phrases also work better. When you need to recognize a phrase like "How do I" or "transfer me to", put these in as a single phrase, not individual words. Except where recognizing a single word, (like "Yes" or "No") avoid single small words.
Also, attempt to cover all the words you believe a normal user will speak. If a word or phrase is not in the grammar, the Engine will not be able to identify it.
Another key thing to try tweaking are various Engine parameters, such as the voice activity detection settings. See Recommended Engine Settings for more details.
This is related to a bug currently in the Engine.
The results are actually in order of what the Engine believes are best; it is just displaying the confidence scores incorrectly. Basically the confidence score for the first result is correct and all the others are being shown too high.
The Engine calculates an initial round of scores, which it then uses to sort the N-Best results (highest score is first). Then it applies weighting on those scores, issuing penalties for certain things in the recognition. That weighted score is what it is supposed to display. Unfortunately, there is a bug that is causing the Engine to show the weighted score only for the first result, so all the other results are being shown with unweighted scores.
You are most likely to encounter this issue when speaking out-of-grammar words. In most cases, the penalty will be slight enough that the first result will still have the highest confidence score.