Converting Touch Tone Applications To Speech Recognition

DTMF to Speech: The Final Frontier

Best Practices for Converting Touch Tone® Applications To Speech Recognition

Speech recognition technology is now considered the most preferable way to interact with a telephony application. In addition, speech recognition is now feasible from a cost and ROI perspective — the price of core technology has gone down and software standards have opened up options for the consumer. So what is the best way to approach a "speech overhaul" for your DTMF application? And, what are some of the compelling reasons to speech-enable your DTMF, sometimes referred to as Touch Tone, solution?

In this paper we will cover some techniques and tips you can follow to convert your DTMF application to speech — and delve into the reasons for taking the plunge in the first place! With careful planning and a methodical approach, you can leverage speech recognition technology to bring increased caller satisfaction and improved customer service to your organization.

Why Speech?

At this point you may be asking yourself, what are the advantages to speech recognition and is it the best move for my company? To illustrate the advantages, let's take a look at an application that in DTMF goes something like this:

Umbrellas-R-Us Speech Application: DTMF to Speech

Application: "Thank you for calling Umbrellas-R-Us. To get store hours press one, to find the location of the closest store, press two, to order umbrellas over the phone, press three."

Caller Input: 3
Application: "We have 5 choices of colors for umbrellas. Press 1 for green, 2 for red, 3 for yellow, 4 for blue and 5 for purple. Press 6 to repeat this menu."

Caller Input: 4
Application: "We have two sizes of umbrellas, press 1 for compact and 2 for full-size."

Caller Input: 2
Now, let's take a look at how a well-designed speech application handles the same transaction:

Application: "Thank you for calling Umbrellas-R-Us. Would you like to order an umbrella or get store hours and locations?"

Caller: Order an umbrella
Application: "Great, would you like a green, red, yellow, blue or purple umbrella?"

Caller: Order an umbrella
Application: "Great, would you like a green, red, yellow, blue or purple umbrella?"

Caller: Blue
Application: "Would you like a compact or full-sized umbrella?"

Caller: Full-size

Umbrellas-R-Us Speech Application: DTMF to Speech

What did we achieve? Speech has transformed this passive DTMF menu structure into a more interactive and natural experience for the caller. Instead of simply listing off the menu items, it asks the caller questions. While this caller managed to remember which number to press for which color of umbrella, it is very possible that a caller would get confused, need to repeat the options a second time, and start to get frustrated. In general, long and complex menus make it difficult and cumbersome for any caller to a DTMF system to navigate. Other benefits of speech include:

  • More "personality" injected into the recorded prompts than a touch-tone system allows.
  • Increased customer satisfaction by engaging in a true dialogue with your company.
  • An opportunity to have a branded voice and personality. Customer loyalty increases when your customers feel like they "know" the person behind the application.
  • The possibility to have more global options (help, return, main menu, cancel, etc.), instead of simply pressing zero for the operator.
  • The ability to take your application where no DTMF could ever go. For example, imagine trying to have a customer "Key in" the city and state where they live — it's not possible! With speech they can simply say "San Diego, California."

Having said all that, make sure the application you choose to convert is best served as a speech recognition application. Perhaps start with the application that has the lowest call completion rates or an application that you'd like to get additional information from the caller, but simply can't today through DTMF

A High Level View, or How to Get Started

The goal of any telephony application is to either: route a call, provide information, complete a transaction, or all of the above. If implemented well, speech can smooth all of these processes and move a call forward more quickly

So, before you begin, keep in mind the following:

  • Don't just retrofit your existing DTMF application. Using the same menus and not re-evaluating the call flow will result in a wasted effort.
  • Start small and simple. Don't try to take on the most complex application.
  • If you are the developer, don't give in to management pressure to quickly build the application or use the existing menu structures. Make a case for the proper time to be invested in designing a dynamic speech solution.
  • Design a great Voice User Interface (VUI). There are some techniques below that will help you with this.
  • Choose standards-based software. Don't get caught using core technology that is proprietary — a best-of-breed solution gives you options and future flexibility.
  • Partner with a company that provides excellent service and knowledge transfer. Knocking up against the speech recognition learning curve alone is no fun. Make sure the company you partner with has technical resources and support plans that meet your needs.

About Grammars and Prompts

Before getting into grammars and prompts, let's discuss the difference between Directed Dialog and Natural Language speech applications. Directed Dialog is a way of developing the application to "guide" the caller to use specific phrases in their utterances. For example, the application would ask the caller, "Would you like sales, support or accounting?" Natural Language applications simply ask, "How may I help you?"

It is highly recommended to build your application using Directed Dialog because the confidence score of the Speech Engine processing the utterance will have a much higher accuracy rate with the focused grammar of a Directed Dialog solution. In addition, the time and resources needed to build an effective speech application using Natural Language is often cost prohibitive for most organizations. Not only would you need to load the grammar with every single possible response (very difficult!), but that, in turn, would increase the development time and cost.

Grammars
Grammars refer to the list of expected responses from your callers. For example, for a front-end auto attendant for your company, the grammar would include the list of names of the employees or departments, plus words or phrases such as "Main Menu", "Operator," or "Cancel."

Other points about grammars:

  • A caller will answer a question with an appropriate response. Therefore, the questions you ask need to stay focused.
  • Include the most common responses. The testing phase of your development will help you narrow down this list, or perhaps add a few words or phrases that you simply did not think of during your initial development.
  • In other words, think about how the average person would respond, not the anomaly.
  • Avoid using similar sounding words or phrases, and be sure to take out anything that could be confusing.
  • Items in a grammar that are rarely used or never used should be taken out of the grammar. You will find these words or phrases during the testing and tuning process.

Prompts
Audio prompts are where the rubber meets the road for a speech solution. This is how you direct your callers through a successful transaction. Speech applications ask questions, and need to coordinate with expected response. Learn from past speech application mistakes and do not write, "say" menus. For example, "To get account status say Account Status, for the main menu say Main Menu." This frustrates the caller and does not utilize the strengths of speech. Simply write the prompt as "You can get Account Status or return to the main menu."

The grammars then can include the filler words, for example:

$AccountStatus = [get] account status;

This enables the user to say either "get account status" or "account status."

Your speech prompt's menu items do not necessarily need to follow the same order as those of your DTMF ones. They can be ordered by common usage or for better sounding prompts. You'll have more freedom ordering speech prompts because the order of your DTMF prompts are almost always, "For A press 1, for B press 2 ... and for Z press 0."

One last thing about prompts, if at all possible use recorded prompts and not text-to-speech.

Other Tips

  • It is possible to use semantic interpretation in a grammar to translate the utterance into a DTMF string.

    Example $MainMenu = (Main Menu):"1"

  • Semantic interpretation will return 1 when user says Main Menu. So, in theory, your existing code can be used unchanged.
  • Decide if you want to have DTMF or the ability to "zero out" built into your application, in addition to global grammars such as main menu, operator, etc.

About Testing and Tuning

It cannot be over-emphasized that once you've built your speech application that you are only half way there! A significant part of the time you spend should be testing and tuning the application. Deploy the application and gather call data, listen to calls and adapt and tune the application to your caller, not the caller to the application.

Test with real people, not people who are trying to break it or those who are intimately familiar with the call flow. Create a repeatable tuning cycle every week or every month and follow through. The more you learn about how real callers are reacting to your system, the better the application will become. Learn more about Speech Tuning Strategies and tools that will help you through the tuning process.

Conclusion

In this paper we covered the basic considerations when planning to port your speech application from DTMF to Speech.

Call LumenVox today at 1-877-977-0707 to discuss your project and learn about ways we can partner to make your speech application a success.