Uses of Speech Recognition

Speech recognition applications are different from any other kind of computer application. It opens up a world of possibilities for developers, especially those building IVRs and other telephony applications, but speech recognition also has some challenges.

Rather than pressing buttons or interacting with a computer screen, users must speak to the computer, and this means there will be a level of uncertainty associated with their input, as automatic speech recognition only returns probabilities, not certainties. Before discussing the many ways speech recognition is useful, it is important to consider its unique strengths and weaknesses.

The most obvious weakness is the one mentioned above, namely the potential for misrecognition. No matter how much effort and care is put into developing a piece of speech recognition software, there will always be times when the application misrecognizes user input.

Because of this, it becomes important to provide for greater error handling than in other applications. If the confidence score on a specific recognition is low, it becomes important to confirm what the user said. The system may have to ask users to repeat themselves. Sometimes a given user will just not be understood, perhaps because he or she is in a noisy environment. If a speech engine returns low confidence values for the same user several times, it may be important to transfer that user to a human operator so the user can conduct his or her transaction.

Speech recognition is also affected by the quality of the input. If a user is calling a system, a bad cell phone connection or overly compressed Internet audio may throw off recognition. Handling these sorts of cases becomes very important when designing speech recognition applications.

Despite these weaknesses, speech recognition is the best way to handle a lot of applications. Traditional DTMF (Touch–Tone) phone applications require users to navigate long and complicated menus and submenus. At any time a user is limited to only a handful of possible choices, and they must remember the proper number to press.

A speech–enabled IVR gives users much greater flexibility. Speech systems are based around asking users questions and allowing them to answer in a way that is natural and intuitive. Speech applications can also present users with more options at any given time, as they are not limited by the number of keys on a phone keypad nor do users have to remember obscure numerical choices. Users can simply say what they want and get through their interactions must faster.

It also opens up new types of applications. Call routers become easier for users, since they don't need to know how to spell a name in order to say it. It becomes easier for users who are driving or otherwise incapable of looking at keypads to interact with a system.

Users can provide open–ended input that would not be possible in standard DTMF systems: specifying the city and state for a phone number directory, picking a specific color or make of car, choosing toppings on a pizza, dialing a number by saying a person's name, and looking up addresses are all examples of responses that would not be easy in traditional IVR applications.

Speech applications are better able to convey a company's unique brand, as users identify more with a computer system they talk to. By using quality voice talent that conveys specific emotions and personality, designers can build systems that connect with users in ways that other software never could.

