Video Transcription
Part 1 Speech Recognition Basics
Speech Recognition Basics
- Overview
- Speech Recognition vs. Voice Recognition
- Reasons for Speech Recognition
Overview
- Speech is basically just another user interface, an input method, like using a mouse
or a keyboard.
- Speech recognition recognizes what you say, not what you mean. Much like when you
click the button on your mouse, the computer doesn't really know your intent. When the
button is clicked the computer's response is in accordance with its programming. The
same thing is true with speech recognition. The user says words, the words are
recognized by the engine and the system response as it has been programmed.
- It is not artificial intelligence. You cannot carry on a viable conversation with a
computer just yet. There is a misconception that speech recognition has a science
ficition quality which would enable you to carry on an intelligent, natural conversation
with the computer.
- Speech recognition is only as good as the application built around it. For example,
in some early Windows point and click applications the icons may have been confusing at
first, the behavior of the application was foreign at first. One would not blame the
mouse for the operating system's behavior simply because it was the mouse that was
clicked. Likewise if you've have had some negative experiences with speech recognition,
it's possible that it may not have been the underlying technology as much as the fault
of the early application that wasn't really built to take advantage of its input method.
Speech recognition is currently a more mature technology and we have a better idea of
how to design better applications. Here at LumenVox.com you will be able to find various
examples of how to design applications well.
Speech Recognition vs. Voice Recognition
Speech recognition and voice recognition are two terms that are frequently used and to
most people the terms are interchangeable. The press will use the terms interchangeably and
well as people who don't have much experience in the industry. However, within the speech
recognition industry and academic circles, linguist, scholars and computer scientists who
study speech, there is a very large distinction.
Speech Recognition
This is the ability of a computer to understand the words that are spoken. It is the
translation of vocal sounds into predefined words to be recognized.
Voice Recognition
This is the ability to recognize a speaker based upon that speaker's style. We all have
specific characteristics about our individual styles, somewhat like a fingerprint. Voice
recognition technology allows computers to recognize distinct characteristics of our voice.
Used mainly for biometrics (authenticating for security purposes), and dictation. Here at
LumenVox we do speech recognition technology. So we recognize what you said, and not who
said it.
Reasons for Using Speech
What are some of the reasons to use speech recognition? What makes it better or
different then other input methods?
- More natural interaction
You don't have to be trained on how to speak since you've been trained in speech from
the very moment that you were born. Using a mouse or a keyboard or learning to dial a
telephone is not as natural. It's a pleasant and natural ability to simply state what
one wants, as opposed to point and click or typing keys.
- Convenient
If you're on the phone with your bank while driving, you may not be able to safely
reach your phone or key in your account numbers, so speech in this instance is great.
You simply say what you need and you don't have to worry about the location of your
cell phone or taking your eyes off the road. For many people, speech is the best way
to interact with systems for convenience.
- Open-ended questions
As an application designer, speech is great because it allows you to really open up
your applications and make them easier for people to use, more user friendly. Think
about certain prompts and questions you can incorporate with speech recognition that
you can't have with DTMF touch tone applications on the telephone:
- City and State. For example, a directory assistance application would need
to know the city and state for the required phone number. You cannot type in a
city and state. Perhaps a ZIP code can be keyed in, but there may be many ZIP
codes in a single city and most people may not know the ZIP code for a
particular area. With speech, the caller can be simply prompted for city and
state, and respond "San Diego, California," which is a much, much
simpler approach going far beyond DTMF applications.
- Call Router.
With speech applications, the menus are greatly improved. With DTMF you commonly
hear "Press one for a particular function, or person, or department, press two
for another." You may generally listen to the entire menu of choices just to
make sure you don't miss the choice that is most specific to your needs. This can
be frustrating and is not convenient to the user at all. With speech recognition
you can simply say where you would like to go without worrying what option is best.
Also with speech, you can cut down on the number of menus used. You don't have to
have the caller press one and go to another menu, then press five and be taken to
yet another menu, and so on. With speech you can have as many options as needed in
a single menu as opposed to the 10 or so options allotted by the number of keys on
a telephone keypad.
All of the above are just some of the reasons you'll want use speech recognition.