Welcome to Part 3 of Asterisk Speech Recognition 101. In our last video, we talked about the Asterisk speech recognition interface. In order to use that interface, we first have to design grammars.
Grammars are files that contain a list of words and phrases to be recognized. They provide rules and constraints for the speech engine. Smaller grammars tend to provide better accuracy and less decode time, because they create a smaller "search space," which makes it more likely to find a correct answer. Provide just enough coverage to cover the majority of your callers. You don't want to include obscure stuff that people have a small chance of saying. A classic example is swearing. If you put in curse words into your grammar, people might get misrecognized as saying a swear word when they're really not, and they might get offended if the system accuses them of this.
The grammar spec (called SRGS) describes two formats:
Grammars consist of a list of rules. Each rule contains tokens which will match those rules. A token is essentially anything the engine can match and turn into a phonetic equivalent. It can be a word, a series of words, or raw sounds (phonemes).
A special rule called the root rule exists. It must be matched for the grammar to be matched.
#ABNF 1.0 UTF-8; language en-US; mode voice; root $yesno; $yesno = $yes | $no; $yes = yes [please] | yeah; $no = no [thanks] | nope;
By default, a speech engine will return raw text or a parse tree from a grammar. This is great for simple grammars, but it's not going to tell us very much. For instance, there are a lot of different ways to say "yes." If I say "yup," the application will return "yup". If I say "yes please," the application will return "yes please", when really these both mean the same thing. What I would prefer is what the user meant, not what they actually said. You have to derive the meaning from the user's input. This is called semantic interpretation. One way you can do this is to put it in your application. But since semantic interpretation must be done somewhere for most applications, it makes sense to keep it in grammars (encapsulation).
Semantic Interpretation for Speech Recognition is the standard method of putting semantic interpretation into grammars. SI tags are placed into grammar rules. These tags contain bits of ECMAScript (JavaScript) that are executed when the rule is matched, turning each rule into a function.
#ABNF 1.0 UTF-8; language en-US; mode voice; tag-format; root $yesno; $yesno = $yes | $no; {out=rules.latest()}; $yes = yes [please] | yeah{out="yes"}; $no = no [thanks] | nope{out="no"};
In the next part, we'll talk about writing speech applications on Asterisk, as well as dial plan functions and their uses.