Creating Accurate Transcripts

Transcripts entered with the Transcriber tool are helpful to tuning a speech application. They are used to perform tests on a speech application in order to pinpoint problems.

While transcribing, you should be mainly concerned with audio that is appropriate for the speech application and that is recognizable by humans. It is a subjective process and transcribers should strive to maximize their efficiency.

For instance, if the application records a caller talking to another person in the background, that speech can simply be marked as garbage and discarded while doing tests. Likewise, if a caller says something that is unintelligible to a human, there is no way the Engine can be expected to understand it and thus it can be marked as garbage as well.

What should be transcribed are utterances that are appropriate for the application. This includes intelligible out-of-grammar responses when those responses make sense as being valid responses to the prompt.

About Noise Tags

Noise tags, such as ++NOISE++ or ++COUGH++, are used to denote specific sounds that are out-of-vocabulary. They differ from the radio buttons that control quality: those are used to denote the general audio quality of the call, e.g. if the call is muddled or full of static. The noise tags are just for specific noises, and are used when creating transcripts to help the Engine better align words and noises.

Noise tags are not required when transcribing utterances, though their use will probably help generate slightly more accurate statistics. Mainly they are for building new acoustic models -- for most purposes, you can just transcribe what callers said and largely leave noise tags alone.

For perfect transcripts, transcribe what a speaker said verbatim, without correcting grammatical errors or mispronunciations. If you have a need for perfect, very detailed transcripts, the following rules are useful:

Guidelines For Detailed Transcriptions

Go slowly and do a minimum of four listens.
Get a feel for what the speaker is saying.
Transcribe the words.
Transcribe the noises.
Check the words and noises.
If any changes are made to the transcription, listen again.

It is absolutely critical that you get both the words, noises, and their placements correct in order for the Tuner to know which sounds correspond to which word or sound in the transcription.

Transcription Rules

Grammatical errors and mispronunciations:

For transcription purposes there are no such things as grammatical or mispronunciation errors. Transcribe precisely what the caller said. If the caller says "I seen him", then transcribe "I SEEN HIM".

Caller	Transcription
naw (no)	NAW
nah (no)	NAH
gonna (going to)	GONNA
wanna (want to)	WANNA
y'all (you all)	Y'ALL

Hyphenating:

Never hyphenate.

Compound words:

Unless there is an obvious pause between two words, all compound words should be transcribed as one word when such a word exists in the dictionary. "Everyday" should not be transcribed as "EVERY DAY" for instance.

Abbreviations:

Never abbreviate, except when the speaker says the abbreviation. If the caller says "Doctor" then transcribe "DOCTOR" and not "Dr." However, if the caller says "Ave" instead of "Avenue" then transcribe "AVE."

Punctuation:

No punctuation should be used in transcriptions. Do not put in periods, commas, question marks, etc. However, if the word is possessive or a contraction you may use the apostrophe. Never use double quotes, the "+", "<", or ">> symbols. These symbols are used in the underlying code in order to analyze the gathered data.

Common Misspellings:

Watch for common spelling confusions. For instance, "they're", "there", and "their" all sound the same.

Numbers:

Numbers should be transcribed as words. If the caller says "Four hundred and fifty five" then the transcription should read "FOUR HUNDRED AND FIFTY FIVE" and not "455".

Letter sequences:

Spell out letter sequences.

Transcribe a spoken spelling by separating each letter by a space. For example, if a caller speaks "My name is spelled S-U-S-A-N", then transcribe "MY NAME IS S U S A N".
When a letter sequence is used as part of an inflected word, add the inflection to the end of the sequence with an apostrophe. If a caller says, "The witness IDed him", then transcribe "THE WITNESS ID'ED HIM".

Acronyms:

Transcribe acronyms as they are said. "NATO" is transcribed as "NATO" with no spaces or periods.

Initialisms:

Transcribe initialisms as they are said. "CIA" is transcribed as "C I A" with spaces to denote that each letter is pronounced individually.

Possessives:

Use standard punctuation rules to denote possession. "Susan's book" is transcribed simply as "SUSAN'S BOOK" and "The drivers' cars" is transcribed "THE DRIVERS' CARS".

Filler noise:

Depending on the type of filler noise, it should be transcribed as either a noise tag or a word.

Caller	Transcription
uh, ah, um, hm	++UM++
huh	huh

Yes/no sounds:

For anything resembling sounds of assent or denial, transcribe them as they sound.

Caller	Transcription
uhhuh (yes)	UHHUH
hum um (no)	HUM UM
yeah	YEHA
yep	YEP

Gender:

Pick the appropriate gender for what the speaker sounds like.