- Installation
- Administration
- Programmer's Guide
- Grammars
- MRCP Server
- FAQs
When building an application using speech recognition, it is often not enough to know what the user said -- rather, we need the meaning of an utterance. One caller may ask for "Customer Support" while another asks for "Customer Service," but as far as a call router is concerned, these are really the same thing.
Assigning meaning from a raw text is a process called semantic interpretation.
Creating a grammar and examining the parse tree generated by the Engine from a user's speech is the first step toward semantic interpretation. But it is often not enough to just read off the values of the tree; significant post processing of the tree is necessary to extract meaning.
As an example, here is an SRGS/ABNF grammar that matches speaking numbers from zero to nine hundred and ninety nine (it is by no means complete; for instance, it cannot recognize "two forty six" for 246):
If the Engine recognizes "two hundred twelve," the parse tree looks like this:
But if your application needs to determine if a speaker spoke a number larger than 500, then it's not enough to know the parse tree; all you have is a structure of words. You need to write code to transform the tree into the number 212.
The logic to do this transformation is going to be tied closely to the grammar's rules. For instance, within the $hundred rule, you have to know that there is an optional $base rule that has to be multiplied by 100. But in the $twenty_to_ninetynine rule, the optional $base has to be added to the total of the number you are building.
Because of the close relationship between a grammar's rules, and the semantic interpretation process, it can be convenient if you can put the semantic interpretation directly into the grammar.
Using a standard called Semantic Interpretation for Speech Recognition, SRGS grammars can contain logic that can aid in semantic interpretation. Using SISR, you place "tags" into grammars. These tags contain ECMAScript code (also known as JavaScript) that is executed when a rule (or portions of a rule) are matched.
The LumenVox Speech Engine supports the W3C's first approved recommendation for SISR, version 1.0 (adopted April 2007). You may refer to the complete specification for more details. The Engine also supports older drafts of the specification for backwards compatibility with older grammars.
The basic idea behind LumenVox's implementation of SISR is this:
To get started learning about SISR, see SISR Basics. For more information, see Rule Variables and SI Script by Example. Our documentation on Getting The Return Value describes how to access SISR results in your application.
If you are familiar with SISR using older versions of the standard, see Converting From Older SISR.