LumenVox

Speech Engine

The LumenVox Speech Engine is an accurate, standards-based speech recognizer that supports multiple languages and can perform speech recognition on audio data from any audio source. On Linux or Windows, the speaker and hardware independent Speech Engine powers speech solutions and platforms deployed in Enterprise and SMB environments worldwide.

It also provides speech application developers with an efficient development and runtime platform, allowing for dynamic language, grammar, audio format, and logging capabilities to customize every step of their application. Grammars are entered as a simple list of words or pronunciations, or in the industry standard Speech Recognition Grammar Specification (SRGS).

Use the Speech Recognition Engine's API in 5 easy steps!

  1. Open a new Speech Port representing your connection to the Speech Engine.
  2. Load a grammar--multiple grammars can be loaded at once.
  3. Load the audio data into a sound channel.
  4. Tell our Engine to do a decode on the audio.
  5. Get the recognized results!

Engine Functionality

How Speech Recognition Works

Speech engines use this process to figure out what a speaker said:

The engine loads a list of words to be recognized. This list of words is called a grammar.

Audio from a speaker is captured by a microphone or telephone. This audio is turned into a waveform, a mathematical representation of sound.

The engine looks at features - distinct characteristics of sound - derived from the waveform and compares them with its own acoustic model. The engine searches its acoustic space, using the grammar to guide this search.

It then determines which words in the grammar the audio most closely matches and returns a result.


The Speech Engine compares audio with loaded grammars to produce recognized text.

Licensing

The Speech Engine is licensed per channel or simultaneous use of a speech recognition resource. Both the Speech Tuner and the Speech Platform are included at no cost when you purchase the Speech Engine.

There are two versions of the LumenVox Speech Engine:

Speech Engine Lite is licensed for up to 500 pronunciations per interaction.
Please contact us to discuss the practical limitations for your specific application.

LumenVox offers affordable development packages that include training and tech support to help you get started building your application. Please call or e-mail LumenVox for pricing.

LumenVox recognizes that the speech industry will need to work together to develop solutions for businesses, and as an important step, LumenVox speech recognition technology proudly supports and complements the following standards.

MRCP
Media Resource Control Protocol

Speech Synthesizers...Audio recorders...DTMF recognizers...Speech Recognizers...Speech verifiers... a fully functioning, media-rich application needs a lot of components to work together. Until now, all of these components had to be provided by a single vendor, or required extensive custom programming to integrate them. MRCP changes all this. The Media Resource Control Protocol allows you to seamlessly manage diverse media resources and provides a common language to speak to all of these devices. With MRCP, vendors can compete on the basis of their strengths, rather than attempting to create an all-inclusive, yet potentially mediocre package. Therefore, you are empowered to take the best product from each vendor, creating a speech application package that is tailored to your particular needs. In essence, MRCP is a protocol specifically designed to address the need for client control of media processing resources such as Speech Recognition and TTS engines. LumenVox supports MRCP v1 and 2 (NLMSL). For more information visit: www.ietf.org

SISR
Semantic Interpretation for Speech Recognition

LumenVox has implemented the W3C's SISR working draft, which is also part of the VXML 2.0 specification. SISR allows grammar authors to embed snippets of JavaScript code into their SRGS grammars, to automatically transform what a speaker says into a format understandable to an application. With LumenVox's Semantic Tags, callers can say, "September thirteenth two thousand four," and your application will understand "2004-09-13."

SRGS
Speech Recognition Grammar Specification

The W3C defined a syntax called the Speech Recognition Grammar Specification (SRGS), for representing grammars for use in speech recognition so that developers can specify the words and patterns of words to be listened for by a speech recognizer. The syntax of the grammar format is presented in two forms, an Augmented BNF Form and an XML Form. The specification makes the two representations mappable to allow automatic transformations between the two forms. The LumenVox Speech Engine supports the SRGS, as defined by the W3C. More information visit: www.w3.org/TR/speech-grammar/

VXML
Voice Extensible Markup Language

Voice Extensible Markup Language (VXML) is a mark-up language designed to code speech applications with many of the same architectural components as HTML. VoiceXML platforms connect to a combination of speech recognition engines, text-to-speech synthesis, telephony interfaces and a VoiceXML Interpreter software to process the call. In order to interface VXML with any speech engine, the engine must understand SRGS and SISR.

LumenVox's Speech Engine is compliant with what VXML expects, and our Speech Engine powers the speech recognition portion of many VXML platforms.

VXML Forum

The VoiceXML Forum is an industry organization formed to create and promote the VoiceXML. With the backing and contributions of its diverse membership, including key industry leaders, the VoiceXML Forum has successfully driven market acceptance of VoiceXML through a wide array of speech-enabled applications. LumenVox is a proud member of the Forum. For more information visit: www.voicexml.org.


Server-Side Grammar

Server Side Speech Grammars LumenVox offers even more efficient support for large speech recognition grammars, by allowing clients to pre-load grammars onto the server. This allows users to send the grammar prior to the decode requests.

Voice Activation Detection

In our mobile society, we rarely make calls while sitting in a quiet room. Whether the call is coming from a crowded restaurant or inside of a speeding car, one tricky task for speech recognition software is separating speech from background noise.

The Speech Engine uses a technology called Voice Activity Detection (VAD) to distinguish between actual speech and other sounds. Human speech has qualities that make it distinguishable from other sounds. VAD listens to the incoming audio for these qualities. These include:

  • Energy Level (volume)
  • Frequency (pitch)
  • Changes in frequency
  • Duration

NBest Results

Instead of returning only the top scoring result, you can instruct the Speech Engine to return several of the highest scoring, most likely answers, often called NBest results. Returning NBest results is particularly effective when callers need to spell names, street addresses, or e-mail addresses. Without NBest results, if a caller spells a name beginning with "N," but the engine returns a low confidence score, the caller would be asked to repeat the letter—and given how similar "N" is to "M," it's likely that the second answer would have a similarly low confidence score. With NBest results, the system can prompt the caller using several of the likely results, such as "Did you mean 'M,' as in 'Mary'?" When the caller responds, "No," the system goes to its next option, "Perhaps you meant 'N,' as in 'Nancy'?"

Speech Engine Sample Code

void RecognizeSpeech (void* SoundData, int SoundDataLength)
{
  const char* GrammarString =
  "#ABNF 1.0\n"
  "language en-US;\n"
  "mode voice;\n"
  "tag-format ;\n"
  "$yes = (yes | yeah | okay):'true';\n"
  "$no = (nope | no):'false';\n";

  LVSpeechPort Port;
  Port.OpenPort ();
  Port.LoadGrammarFromBuffer (0, GrammarString);
  Port.LoadVoiceChannel (0, SoundData,SoundDataLength, ULAW_8KHZ);

  Port.Decode (0, 0, LV_DECODE_SEMANTIC_INTERPRETATION | LV_DECODE_BLOCK );

  int NumInterpretations = Port.GetNumberOfInterpretations (0);
  for (int i = 0; i < NumInterpretations; ++i)
   cout << Port.GetInterpretationString (0,i);
  Port.ClosePort ();
}

Speech Engine Comparison

We are often asked how accurate our Speech Engine is. So, when we came out with our new Acoustic Models for the Speech Engine, it was important to validate improvements in accuracy and have comparative data to show to our clients and partners.


What Is Accuracy?

What is generally meant by "accuracy" is the correct recognition of in- grammar utterances, or how successful the speech recognition engine is at recognizing a word or phrase that is an expected spoken response.

How Did We Compare?

For our Accuracy Study we compared the LumenVox Speech Engine with one other leading Speech Engine, "Competitor O," using third-party test data.* We tested five different utterance types, or "domains," which were: Yes/No, Name & Department (ex. Company Directory), Date (Day, Month & Year), Numbers (such as two thousand, one hundred) and Digits (a string of digits).

It should be noted that the test set is a raw sampling of data from telephones and cell phones and includes a reasonable percentage of "noisy" data.

Results

Our Speech Engine fared extremely well, with comparatively high accuracy marks in almost all categories.

What Does This Mean?

LumenVox speech recognition software has always been the smart choice in terms of affordability, and after reviewing the analysis, you can be confident of our accuracy, performance, and position in the speech industry.


*All engines intentionally not optimized for utterances, applications or grammars. In addition, in order to ensure an objective test, we made no effort to tune or make adjustments to the software.

Tuned Versus Untuned Applications

The accuracy graphs above only reflect accuracy for untuned applications. The tested applications did not have their grammars — the list of words that will be recognized by a speech engine — optimized based on actual call data.

Because those accuracy figures are for untuned applications, they do not represent the final accuracy results users can expect from deployed speech recognition applications.

The graph to the right shows the accuracy results for the speech-enabled call router LumenVox uses. The router contains approximately 70 names.

Before being tuned, the router contained only proper names. After examining call data with our Speech Tuner, LumenVox added into the grammar alternative pronunciations and nicknames of employees. As you can see, this relatively simple act of tuning caused accuracy to move from 85.54% to 96.21%.

The following table presents the results of testing with the Windows version of our Speech Engine. The tests were done on a machine with an Intel Core 2 Duo (2.4 GHz Processor and 2 GB of memory). The grammar that was used has 500 proper names. The audio was 1.5 seconds in length. Each test ran for a total of 280 seconds, with each interaction taking 14 seconds.

Number of Ports
Memory (MB)
Number of Decodes
Processor Utilization %
039500
1399202
4437806
843716010
1644032015
2444148025
4844396045
96447192081

Finally, a fully-integrated speech solution for all Asterisk-based applications is now available.

Go ahead: Speech-enable your Call Router, give the freedom of hands-free phone interactions to your callers, or simply provide an automated interface for customer service.

The Speech Engine is directly and seamlessly integrated with the Asterisk PBX platform and Dial Plan through a unique connector bridge from Digium.

Now you can easily build speech-enabled IVR's by using the familiar Dial Plan scripting language or the C-API.

Supported Linux distributions include recent versions of:
rPath (Asterisk Now/Pound Key), Fedora Core, Red Hat Enterprise Server, Cent OS (through our Red Hat build), and Debian.

Technical Support

LumenVox provides limited complementary technical support to help you properly install the LumenVox Speech Engine and configure your copy of Asterisk to get a simple speech recognition application working correctly. Note that this support only applies to users using current versions of our software on officially supported Linux distributions.

call 877-977-0707 and ask for "Support"

e-mail support@lumenvox.com

For more advanced technical issues, including help troubleshooting your speech application, LumenVox support contracts are available at $175/hour with a minimum two-hour purchase.

Bulk rates are available: 5 hours for $750 or 10 hours for $1400.

LumenVox technical support is available weekdays, between 9 a.m. and 5 p.m. Pacific time. For general information on our support policy, help files, and useful resources, please click here.

Video Library

Supported Operating Systems

The LumenVox Speech Engine runs on Windows, Linux, and Solaris.

  • Linux Distributions:
  • Fedora Core
  • Debian
  • Cent OS
  • Red Hat Enterprise
  • Windows Versions:
  • 2000
  • XP
  • 2003 Server

Tips & Articles

White Papers

Training

Video Library

Supported Operating Systems

The LumenVox Speech Engine runs on Windows, Linux, and Solaris.

  • Linux Distributions:
  • Fedora Core
  • Debian
  • Cent OS
  • Red Hat Enterprise
  • rPath (Asterisk Now/Poundy Key)
  • Windows Versions:
  • 2000
  • XP
  • 2003 Server

Tips & Articles

White Papers

Training

Testing Environment

  • 3rd-party test data
  • Standard Grammar Format (SRGS)
  • Tested on dual Xeon at 2.4Ghz

Supported Operating Systems

The LumenVox Speech Engine runs on Windows, Linux, and Solaris.

  • Linux Distributions:
  • Fedora Core
  • Debian
  • Cent OS
  • Red Hat Enterprise
  • Windows Versions:
  • 2000
  • XP
  • 2003 Server

Speech Engine Resources

Video Library

Supported Operating Systems

The LumenVox Speech Engine runs on Windows, Linux, and Solaris.

  • Linux Distributions:
  • Fedora Core
  • Debian
  • Cent OS
  • Red Hat Enterprise
  • Windows Versions:
  • 2000
  • XP
  • 2003 Server

Tips & Articles

White Papers

Training

Asterisk Speech Recognition Resources

Resources

Asterisk Application Zone

Click on over to the Asterisk Application Zone where we showcase application code, tools and sample grammars to help you build speech applications on Asterisk. Enter your speech application contest, or simply peruse the Asterisk speech recognition resources.

Supported Operating Systems

The LumenVox Speech Engine runs on Windows, Linux, and Solaris.

  • Linux Distributions:
  • Fedora Core
  • Debian
  • Cent OS
  • Red Hat Enterprise
  • Windows Versions:
  • 2000
  • XP
  • 2003 Server

Quick Contact

Ask us a question:

E-mail the answer to: