- Installation
- Administration
- Programmer's Guide
- Grammars
- MRCP Server
- FAQs
In addition to the phonetic spellings that can be placed directly in the grammar, multiple custom pronunciations can be grouped into a single file and referenced from an SRGS grammar. A collection of pronunciations like this is known as a lexicon.
Lexicons introduce a degree of modularity into your grammars, allowing separation between the specification of pronunciation and the rest of the grammar, including the rules and tags. You may reuse a single lexicon across multiple grammars, and fix errors or add words to the lexicon in a single place without modifying the grammars that reference it.
Grammars may reference more than 1 lexicon. See the Using Lexicons section below for a description of how to use multiple lexicons in a single grammar.
In an ABNF grammar, a lexicon is declared in the header using the 'lexicon' keyword followed by an ABNF URI:
In a GrXML grammar, 'lexicon' elements are delcared as immediate children of the 'grammar' element. The 'lexicon' element must have a 'uri' attribute.
The lexicon file is an XML document with a single <lexicon> element. Within the <lexicon> element are one or more <entry> elements, which include one or more <definition> elements.
The lexicon element must include an 'xml:lang' attribute and an 'alphabet' attribute. The 'xml:lang' specifies the language of the words in the lexicon, and is given as a language code. The 'alphabet' attribute specifies the format of the pronunciations in the 'definition' elements. Currently, SAMPA format is supported, with an optional localization modifier. For most reliable performance across multiple engine versions, it is suggested to specify 'localization=lumenvox'.
The entry elements define the words for which custom pronunciations are provided. The required 'key' attribute specifies the spelling of the word. There can be one or many entry elements within the lexicon element.
Within the entry elements are one or more definition elements. The required 'value' attribute specifies the pronunciation of the word in the parent entry element. This pronunciation is given in the alphabet specified in the 'alphabet' attribute of the lexicon element; currently the SAMPA phonetic alphabet is supported.
Referencing multiple lexicons within a single grammar is allowed, and giving multiple pronunciations for the same word is defined behavior. Resolution of priority between multiple grammars is controlled by the 'type' query property in the URI. By default, all pronunciations are added as alternates, and no existing pronunciations are removed. This is the behavior of a 'backup' lexicon, and is explicitly invoked by adding 'type=backup' to the query string.
Lexicons may also be designated as a 'primary' type. If a primary lexicon contains a pronunciation, all existing pronunciations for that word are ignored in lieu of the new pronunciation.
SpeechWorks format types are also supported, so the type may be written with an 'SWI' prefix.