Lucy, Envision, and Linc are now part of Capacity!

The Art of Speech and a Conversational IVR

The Art of Speech & Conversational IVRs

Share:

Speech technology can truly bring the customer experience to life, but this requires a unique blend of creativity, technology, and hardware. We asked Shaun McThomas, a Software Engineer at LumenVox, to share his perspectives on the art of integrating speech technology with IVRs to enhance the customer experience.

What are the biggest issues facing customers and IVRs today?

One of the biggest issues with current Interactive Voice Response (IVR) systems is that callers are forced to follow a rigid script. It’s not a conversation, it’s an interrogation. First, they are asked for one piece of information, then another piece, and another. The flow is stilted and feels nothing like a natural conversation—just a series of “painful, tiny steps” that make the whole process uncomfortable for the caller. 

Another problem is “IVR jail”, from which there is no escape route. This happens when customers are forced to listen all the way to the very end of the prompt before they can ask to go back to the main menu. 

How can contact centers address these pain points?

Most of these issues are easily resolved with an artful blend of good design and modern speech recognition technologies—an approach that LumenVox calls “Speech Art”. If you listen to the very best contact center agents within a business, and model how they question callers and solve issues,  you’ll understand how callers really ask questions and you’ll be in a better position to create a natural language IVR environment that provides very lifelike IVR responses. By following this model, you can produce frictionless, intuitive, and personalized interactions with callers to radically improve their experience.

The very first thing a good speech IVR system should do is quickly identify and confirm who the caller is. You can use a blend of technologies to simplify and speed up this process. You can look up the phone number they are calling from in your back-end systems and see if you can determine their identity from that; and you can add in speech recognition and voice biometric authentication to fast-track the process. 

Once you’ve identified the caller, use data available from your back-end systems to anticipate the reason for calling and personalize the next steps.

For example, if you are a power company and a customer’s home is in the middle of a known power outage, you can assume that is the reason for their call. Likewise, if you are an airline and they have a flight booked on your airline that departs within the next 24 hours, they are likely to be calling in connection with that flight. 

Now that you’ve made an assumption, confirm that’s the reason they are calling with a simple yes/no prompt. If they answer ‘yes’, provide them with appropriate information. If they are not calling for that reason, ask them why they have called and allow them to use natural language to answer. Importantly, always give them a way to correct themselves.

How does Conversational IVR work, exactly?

Conversational IVRs work by leveraging three key technologies: Text-to-Speech (TTS), Automatic Speech Recognizers (ASRs), and Natural Language Understanding (NLU). These aren’t the only pieces to the puzzle, but they are important ones. 

Let’s talk a little about each…

  • Text-to-Speech:
    This turns text into speech, which allows you to ask questions quickly. Using TTS instead of recordings is critical, as it enables you to personalize questions. For example, when someone first calls in and you want to verify them, you can use their name and directly ask if it’s them.
  • Automatic Speech Recognizers:
    An ASR’s job is to take speech, recognize it as something meaningful, and then turn it into something useful like text. There are many types of ASRs. LumenVox’s new transcription ASR uses machine learning techniques such as deep neural networks for natural language processing. This is effective for transcribing text from human speech. Before this sort of technology existed, you had to constrain your recognizers to a limited set of words (called a grammar). Modern natural language processing models have a large set of words they can recognize, allowing customers to speak naturally – and more accurately producing transcribed text. 
  • Natural Language Understanding:
    This takes the raw transcribed text and converts this into meaning, intents, and slots. For example, the caller can say: “I want to fly from New York to LA.” And we parse out “to fly, New York” “destination, from LA.”

Using these three technologies, we can create a conversation with a caller rather than a scripted interrogation. First, we would use TTS to ask the caller a question, then an ASR to get text back from the caller’s response, and NLU to understand that response. Finally, we would use that understanding to figure out whether we can process the request or ask for additional information from the caller.

Where does LumenVox come in?

At LumenVox, we’re creating a Configurable AI Gateway that makes it easy to integrate many different NLU engines with our ASR. This approach makes it possible for you to use widely available NLU platforms from IBM, Google, Microsoft, Amazon and others with your existing speech-enabled IVR—along with LumenVox ASR, TTS, and Voice Biometrics.

Many technology vendors don’t offer choices in the combinations of ASR, NLP, and NLU that you can use to build a solution. Their entire suite of technology and tools is often proprietary and therefore involves the use of expensive, dedicated professional service teams. At LumenVox, we want to be able to easily integrate existing technologies with our speech recognition, text-to-speech, and voice biometrics software as part of the solution stack. In other words: we want to take the technology that’s already out there and make it easier for our customers to use.

Are you ready to learn more about conversational IVR? Contact us today!

Ready to create an extraordinary voice experience for your customers?​