- Installation
- Administration
- Programmer's Guide
- Grammars
- MRCP Server
- FAQs
Ultimately, the Speech Engine is just a probability machine. Inside the Engine there are huge tables that store information about phonemes and the sounds produced by speech that correspond to those phonemes.
When the Engine decodes audio input, it compares the sounds in the audio to its phoneme tables to figure out which phonemes are contained in the audio. Using the grammars as a guide, the Engine comes up with probabilities that a series of sounds in the audio matches a word in the grammar.
You can modify the probabilities in an SRGS grammar by applying weights to words, phrases, and rules. By weighting parts of the grammar, you can make the Engine more or less likely to match audio to specific grammar items.
As an example, suppose we have a grammar that recognizes a person speaking a number that is four digits long:
This is a flexible grammar, but if you used it in practice you might be disappointed. You might notice that too often words like "four three" are being misrecognized as "forty." In general, your callers may be speaking a sentence that matches $single_digits the majority of the time, but the Engine too frequently returns a result that matches one of the other three rules.
You can help the Engine get the right answer more frequently by adding a weight to predispose it to choose the $single_digits rule.
Weights are numeric, and are entered into a grammar between two forward-slashes (the / character). They apply to the item immediately following them. Weights specify how much more or less likely one item is to be matched than another; in this sense weights are relative to other weights. Items are assumed to have a weight of 1 if no weight is specified.
So if an item is given a weight of 2 and a second item given a weight of 1, the first item is twice as likely to be recognized than the second. Likewise you could assign the items weights of 200 and 100 and it would have the same effect.
Suppose that callers match the $single_digits rule five times as often as the other rules. We could weight the grammar to reflect this:
Now, in cases where the Engine has a borderline decision to make between matching $single_digits or one of the others, it will more frequently choose $single_digits. We weighted the rules with a 5:1 ratio because we had actual data that reflected the fact that our callers were saying one rule five times as often as the others.
Weights are most useful when two items sound similar and are thus likely to be confused -- if applied properly, they will affect the outcome of a recognition only when the Engine had a close choice between two items. For this reason, it is a good idea to avoid very high or very low numbers for weights, unless you are weighting all the rules accordingly. If you were to weight one rule at 10,000 and leave all the other rules with the default weight of 1, the Engine would likely match every utterance to the rule with the extremely high weight, regardless of what was said.
If you give rules weights below 1, it can become very difficult for the Engine to match them, as this is effectively a negative weight. In addition to trying to match sounds to the phonemes in a grammar, the Engine also tries to match audio to noise, which it discards. If you apply very strong negative weights to rules, the Engine can end up almost always favoring noise over the negatively weighted rules.
Applying grammar weights should never be the first thing you do to your grammar. Initially, you don't know how often each rule will be matched, so you are better off letting all rules be treated equally. Only after you have a compelling amount of data to suggest that applying grammar weights will help the application, as we did above, should you apply them. And after you do apply them, you must test their effects on real call data. Badly applied weights are worse than no weights at all.
For more advanced SRGS topics, see Using Phonetic Spellings or return to our SRGS Introduction.