Transcripts entered with the Transcriber tool are helpful to tuning a speech application. They are used to perform tests on a speech application in order to pinpoint problems.
While transcribing, you should be mainly concerned with audio that is appropriate for the speech application and that is recognizable by humans. It is a subjective process and transcribers should strive to maximize their efficiency.
For instance, if the application records a caller talking to another person in the background, that speech can simply be marked as garbage and discarded while doing tests. Likewise, if a caller says something that is unintelligible to a human, there is no way the Engine can be expected to understand it and thus it can be marked as garbage as well.
What should be transcribed are utterances that are appropriate for the application. This includes intelligible out-of-grammar responses when those responses make sense as being valid responses to the prompt.
Noise tags, such as ++NOISE++ or ++COUGH++, are used to denote specific sounds that are out-of-vocabulary. They differ from the radio buttons that control quality: those are used to denote the general audio quality of the call, e.g. if the call is muddled or full of static. The noise tags are just for specific noises, and are used when creating transcripts to help the Engine better align words and noises.
Noise tags are not required when transcribing utterances, though their use will probably help generate slightly more accurate statistics. Mainly they are for building new acoustic models -- for most purposes, you can just transcribe what callers said and largely leave noise tags alone.
For perfect transcripts, transcribe what a speaker said verbatim, without correcting grammatical errors or mispronunciations. If you have a need for perfect, very detailed transcripts, the following rules are useful:
Guidelines For Detailed Transcriptions
Grammatical errors and mispronunciations:
For transcription purposes there are no such things as grammatical or mispronunciation errors. Transcribe precisely what the caller said. If the caller says "I seen him", then transcribe "I SEEN HIM".
Caller | Transcription |
---|---|
naw (no) | NAW |
nah (no) | NAH |
gonna (going to) | GONNA |
wanna (want to) | WANNA |
y'all (you all) | Y'ALL |
Hyphenating:
Never hyphenate.
Compound words:
Unless there is an obvious pause between two words, all compound words should be transcribed as one word when such a word exists in the dictionary. "Everyday" should not be transcribed as "EVERY DAY" for instance.
Abbreviations:
Never abbreviate, except when the speaker says the abbreviation. If the caller says "Doctor" then transcribe "DOCTOR" and not "Dr." However, if the caller says "Ave" instead of "Avenue" then transcribe "AVE."
Punctuation:
No punctuation should be used in transcriptions. Do not put in periods, commas, question marks, etc. However, if the word is possessive or a contraction you may use the apostrophe. Never use double quotes, the "+", "<", or ">> symbols. These symbols are used in the underlying code in order to analyze the gathered data.
Common Misspellings:
Watch for common spelling confusions. For instance, "they're", "there", and "their" all sound the same.
Numbers:
Numbers should be transcribed as words. If the caller says "Four hundred and fifty five" then the transcription should read "FOUR HUNDRED AND FIFTY FIVE" and not "455".
Letter sequences:
Spell out letter sequences.
Acronyms:
Transcribe acronyms as they are said. "NATO" is transcribed as "NATO" with no spaces or periods.
Initialisms:
Transcribe initialisms as they are said. "CIA" is transcribed as "C I A" with spaces to denote that each letter is pronounced individually.
Possessives:
Use standard punctuation rules to denote possession. "Susan's book" is transcribed simply as "SUSAN'S BOOK" and "The drivers' cars" is transcribed "THE DRIVERS' CARS".
Filler noise:
Depending on the type of filler noise, it should be transcribed as either a noise tag or a word.
Caller | Transcription |
---|---|
uh, ah, um, hm | ++UM++ |
huh | huh |
Yes/no sounds:
For anything resembling sounds of assent or denial, transcribe them as they sound.
Caller | Transcription |
---|---|
uhhuh (yes) | UHHUH |
hum um (no) | HUM UM |
yeah | YEHA |
yep | YEP |
Gender:
Pick the appropriate gender for what the speaker sounds like.