Annotation Project
Annotation Project
Task
Cut a section of clear human speech from the audio and transcribe the
audio into text. AI recognized text is available.
Process
On the top, you see a sound track with a gray highlighted section. The gray highlighted part is
the the target audio. The rest parts are there to make sure you have context.
Below the sound track, there are 5 buttons:
o start cut-s: "s" is the shortcut to mark the cutting start point
o end cut-e: "e" is the shortcut to mark the cutting end point
o play cut-a: "a" is the shortcut to start playing the gray section from start
o play-1: "1" is the shortcut to play the audio
o pause-2: "2" is the shortcut to pause the audio
Ignore the "default cut" and "current cut" as it just indicates where the audio starts and ends
cutting
Right above the text box, there is an audio classes single choice: by default, "speech" is
selected; if you find that the audio isn't intelligible, you can select "discard" and then submit,
this audio will be marked as an invalid task.
The text box: Here you enter the speech. You'll see auto-recognized speech there. Listen to
the audio and correct the speech. A few things to note:
1. Download and sign up on Lark. Ask your vendor manager to invite you to related Lark
organization.
2. Download and sign up on TCS using the Lark account you've created.
That's all the preparation you'll need to do. However, under the
hood, a lot of things happened:
Your vendor manager has created a Lark team or a Lark department to manage all
transcribers participating in this project;
With the Lark team registration issue, we granted this Lark team permission to access our
internal permission management portal: Hodor;
Hodor is the place where we grant Lark teams permission to TCS queues. On Hodor, we
assign different queues to Lark teams so that you, as transcriber, can work on the project.
Till now, you're all set to start working on this project. Here are
the next steps you'll follow:
1. Ask for domain name from your manager. For this project, it'll be tcs-sg.bytelemon.com for
everyone;
2. Ask for queue IDs from your vendor manager; then search queue ID to find the queue; click
on the star sign to subscribe to the queue. You'll then be able to find it under My Interests
tab;
3. Now go to My Interests tab and you'll see a queue there. Click on
1. Play the audio, listen to it, edit the text errors, adjust the audio start and end point,
finally click "Submit" to go to the next task;
2. If after listening to the audio, it contains unintelligible part, select audio classes as
"Discard" and then submit, you'll go to the next task;
3. When you're done editing the text and audio, and don't want to continue working,
click "Submit and Leave" to go to the previous page.
1. On TCS app, if you see error message "User not found...", it means you're not logged in on
Lark;
2. If you can't find your queue by searching the queue ID, check if you're using the wrong
domain: The correct domain is sg (tcs-sg.bytelemon.com), not va (tcs-va.bytelemon.com).
3. If 1&2 doesn't seem to fix your login or access issue, ask your vendor manager to check if
he/she added you to the Lark team
June 25 ~ October 31
Action Steps
1. Listen to the intercepted audio highlighted in gray and classify the audio into either [speech]
or [discard]:
Annotation Guidelines
1.
Categorize Audio
2.
Speech:
Discard:
1.
Cut Speech
2.
If part of the speech is not in English, cut it out and transcribe the clear English speech.
Cut it out when the non-audible sound is at the beginning or the end of the audio.
If the noise is in the middle and does not disturb the clear human speech, it is ok to ignore it.
If part of the audio is a song with lyrics (at the beginning or the end of the audio), cut it off
and transcribe the clear English speech.
o If 2 or more speakers repeat the same words simultaneously and the words sound
clear, transcribe them directly.
o If 2 or more speakers talk about different things simultaneously, cut it off and keep
only the clear English speech.
o If there is a main voice in a group conversation and the others are low or fuzzy, cut
out the unclear part and keep only the clear speech of the main speaker.
Silent part
Note:
1.
DOs
Transcribe the audio based on the AI recognized text.
ONLY transcribe the gray highlighted section (default cut). The rest of the audio can be
referred to as context.
Transcribe the speech word for word, including obvious grammatical mistakes.
Words in the text should preferably conform to American spelling.
Space is required between words.
Capitalize the first letter of names of people and places and common abbreviations should be
all capitalized (e.g. USA/FBI/CPC).
DON'Ts
DON'T use punctuation! (Except for apostrophe and hyphen, such as I'm or I've, "COVID-
Nineteen" or "nose-picking").
DON'T paragraph the text!
Numbers
Write numbers in full English words. If you hear "one", write "one" instead of "1".
Half-pronounced words
o The half-pronounced word is not a separate word, cut it out. For example, "I am a
stu...(student)", the word "student" is half pronounced and "stu" is not a separate
word. The transcribed text should be "I am a".
o The half-pronounced word is a separate word, transcribe it directly. For example,
"well, that is the super...(supermarket)", the word "supermarket" is half pronounced
and "super" is a separate word. The transcribed text should be "well, that is the
super".
o The half pronounced word is not a separate word, correct the word. For example, "I
still mi..mi..miss you", the word "miss" is half pronounced because of stammer but
"mi" is not a separate word. The transcribed text should be "I still miss you".
The half pronounced word is a separate word, transcribe the speech word for word. For
example, "The super..supermarket is over there", the word "supermarket" is half
pronounced because of stammer but "super" is a separate word. The transcribed text should
be "The super supermarket is over there".
If the speaker whispers or misses certain syllables of a word and you are sure what the word
is, transcribe the correct word in the text.
Repeat
Repeated words and sentences must be transcribed strictly according to how many times
they are repeated (except for modal words).
Modal words
Transcribe the modal words according to the times they are repeated, e.g when you hear 3
"ha" in the middle of the audio, the transcribe text should be "ha ha ha".
Spelling: oops, oh, gee, geez, um, wow, uh, ahem, yoo hoo, hooray, mmm, ouch, yuck, eew,
ugh, phew, aha, gosh, my, eh, hey, ah, ok.
Proper nouns
Transcribe the common proper nouns (name of a person/place/product/organization) if you
are sure what they are. If not, cut this part out.
If the speaker uses simplified form/informal/ abusive English words, transcribe them word
for word.
If the pronunciation is not standard but is able to tell the correct word, transcribe the correct
word.
Listen to the following default cut to confirm what the whole sentence is, write down the
correct word by context.
o e.g. "hole/whole", when you hear "The whole town disagreed with the mayor" in
the audio write down "whole" according to the context.
If there are multiple homophones and their meanings all match the context, go with any one.
o e.g. "where is my deer/dear." Both words match the meaning of the sentence,
either of them is ok.
Video Demo
Bad cases