0% found this document useful (0 votes)
133 views34 pages

Aragorn Training Document

This document provides instructions for annotating audio files. It describes logging into the Aragorn platform and claiming annotation tasks. It outlines the steps to determine if an audio file is valid or invalid and how to annotate valid files by timestamping, transcribing the text content, and adding tags. Transcribing involves recording exactly what is heard, including proper nouns, numbers, accents, and abbreviations while following punctuation rules.

Uploaded by

Noemi Steri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
133 views34 pages

Aragorn Training Document

This document provides instructions for annotating audio files. It describes logging into the Aragorn platform and claiming annotation tasks. It outlines the steps to determine if an audio file is valid or invalid and how to annotate valid files by timestamping, transcribing the text content, and adding tags. Transcribing involves recording exactly what is heard, including proper nouns, numbers, accents, and abbreviations while following punctuation rules.

Uploaded by

Noemi Steri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Project Aragorn

Audio Annotation

Presented by:
Klai Coscolluela
Team Manager

July 13, 2020


August 28, 2020 2

This is an audio annotation job


that requires the following:

• Listening
Overview • Validating
• Timestamping
• Transcription
• Categorization
August 28, 2020 3

• To start with your task, you need to log into the


Aragorn platform.
• Once logged in, go to the to be annotated
tab. You will be presented with the available
order/s, make sure to click on the claim
annotation task button right beside the first
order. Please see below screenshots.
Getting
Started
August 28, 2020 4

Click OK to proceed.

Getting
Started
August 28, 2020 5

Claimed task will go to the


annotating tab. Click on start
annotation.

Getting
Started
August 28, 2020 6

You will then be directed to the


platform as shown below.

Getting
Started
Steps

August 28, 2020 7


Determine whether audio file is valid
or invalid.

For a given speech fragment, judge whether it is effective or not according to the filtering rules. If it is
invalid, select the attributes as “invalid” and the reason as “Other” (pls see SS on the next slide). If it
is valid, annotate its text content, tag its starting and ending point and associated attributes.

Update: If there is only vocalized pause/modal words like ah, um, hahaha, in the audio, it can be
judged as invalid directly. No need to transcribe.

Short audio annotation usually refers to *.wav data that lasts less than 15s.

But in project, if the audio is longer than 15s and it is mostly English, you may trim the audio to file to
make it 15s or less to make it a valid audio.

August 28, 2020 8


Invalid Audio File

August 28, 2020 9


Invalid Audio Filtering Rules
• The whole audio is non-English content; the non-English part is in the middle of
the sentence.
• If it’s at the start or the end of the sentence, and can be cut off by time
stamping, then transcribe it.
• If the whole sentence is in non-English, and just a few words that sound like
English, or the words you hear are meaningless, then there is no need to
transcribe.
• The whole audio is mute or full of noise.
• The audio is Text to Speech instead of speech by human being.
• If there is only vocalized pause/modal words like ah, um, hahaha, in the audio,
it can be judged as invalid directly. No need to transcribe.
August 28, 2020 10
Valid Audio Annotation Rules
Timestamping
1. Click on the spectrum to select the starting point and then drag the mouse to the ending
point.
2. A certain mute part of 20ms-30ms shall be reserved before and after the valid speech
fragment.
1s = 1000ms, so 20ms-30ms should be around a quarter of one cell
3. During time stamping, the pronunciation of the valid words can’t be truncated.

If there is a truncation at the beginning or the end of the sentence, what is said can be
decided from the sense of hearing or the consonant / vowel can be seen on the
spectrum, then transcribe the truncated words; if the truncated words can’t be decided,
then intercept the exact time point from where the words can be heard clearly.
August 28, 2020 11
Valid Audio Annotation Rules
Timestamping

August 28, 2020 12


Text Content
Transcription
Recording exactly what you hear.
• Strictly follow the principle of RECORDING EXACTLY WHAT YOU HEAR.
• For example, the real pronunciation is “where are we going?”
• Special condition: word “where” is duplicated
• Wrong record: where are we going?
• Correct record: where where are we going?

August 28, 2020 13


Text Content
Transcription
Proper nouns
1. Please find the specified rules for proper nouns as follows, including person name, location, etc.
a) Person name. The name of a well-known person must be transcribed by the name which is
officially recognized, such as presidents’ name "Barack Obama" or "Donald Trump". General
names should be marked with the most common characters. For example, we should use "Ashley"
instead of "Ashlee".
If a person name can be annotated in English, please do not transcribe it in other languages.
However, when a person name is non-English source, please strictly record its original spelling, such
as "Aoife" - it cannot be annotated as "Avan" or "Efa" (invalid name).
b) The rule of location name and organization name is similar to person name.

August 28, 2020 14


Text Content
Transcription
Proper nouns
2. Please record English words correctly and avoid typos.

3. The use of homonyms: Make sure the grammar is correct when the
pronunciation is the same. For example:He took some lights on a peace of paper
-> He took some lights on a piece of paper. (Peace obviously does not conform to
semantic and grammar.)

August 28, 2020 15


Text Content
Transcription
Numbers
• Numbers should be completely translated into the corresponding English words
according to their pronunciation. For example, "5256" - > "five thousand two
hundred and fifty-six "," 2012 "- >"twenty twelve" or "two thousand and
twelve"," 19% "- >"nineteen percent" and so on. The number '0' has two
different pronunciations: 'zero' and 'o'. Please record the speaker's real
pronunciation. All numbers should be recorded according to their real
pronunciation.

August 28, 2020 16


Text Content
Transcription
Accent Problem and Polyphonic
Phonetic changes caused by accents or personal habits should be recorded based
on Standard English spelling. For instance, when pronouncing word "button",
most will may say [ˈbʌtn] but some will neglect the syllable [t] and say [ˈbʌn]. But
we always need to record this word as Standard English word "button".

Polyphonic characters or words with different pronunciations also need to be


recorded accordingly. For example, for word "live", no matter it is pronounced as
[lɪv] (stay at a place) or [laɪv] (living broadcast), we always record it as "live".

August 28, 2020 17


Text Content
Transcription
Abbreviations
For some English abbreviation words, if their pronunciation is
pronounced as a single word, please annotate them directly. Please
remember to add space between letters. For instances, "GDP" - > "G
D P", "ICBU" - >"I C B U" and so on.

August 28, 2020 18


Text Content
Transcription
Punctuations
1. Only [,], [.], [?], ['], [-]can be used.
2. Punctuation should be added at the end of a sentence.
3. There should not be too much punctuation or without punctuation in the whole
sentence.
4. Punctuation cannot be used continuously. For example, ",..." is not allowed.
5. The use of apostrophe ( ‘ ) for contractions is a must. If cannot enter on text
content, type on notepad and copy paste it on text content.
i.e: I’m, it’s, isn’t, didn’t, etc…

August 28, 2020 19


Text Content
Transcription
Tags
There are 3 tags in short audio annotation specification,
which are noise tag (noise), silence tag (sil) and doubt
tag (~).

August 28, 2020 20


Text Content
Transcription
Tags
Noise Tag -(noise)
Common specifications that need to add this tag is sudden noise and short duration.
1. Speakers: breathing, coughing, laughing, sneezing and other sounds from lips.
2. Recording equipment and telecommunication system: The telephone keypad tone, telephone
busy tone, and other noises from the recording system.
3. Background noise: The sudden background noise refers to the noise emerge from the
background instead of speaker, such as clapping sound, door-closing sound, car whistle, dog
barking sound and other noise caused by off hook or on hook or interference.
4. Music sound: Music sound including singing sound (with lyrics and melodies), humming
(melodies without lyrics), whistles, musical instrument sound, music and singing sound from
background TV/radio, and long-lasting phone ringtones, etc.
August 28, 2020 21
Text Content
Transcription
Tags
Silence tag -(sil)
(1) In case of obvious pause, mute for more than 1 second, the tag (sil) will be
marked. *Note: sil = silence
(2) This tag is not used at the beginning and end of the sentence.
(3) Silence tag cannot be used continuously.

August 28, 2020 22


Text Content
Transcription
Tags
Doubt tag - (~)
(1) Use (~) to present the doubtful 1-3 words in the sentence. (for special cases with more
than 3 words, please clarify the specification separately with the task issuer.)
(2) Do not use (~) at the beginning and end of the sentence.
(3) Doubt tag cannot be used continuously.
(4) Please use as few doubt tags as possible or do not use it.

Special attention: Doubt tags is a symbol set to improve the number of effective
data. Please do not use it whenever you encounter difficult or unclear audio.

August 28, 2020 23


Audio Label Annotation
Validation
Legal value: Valid, Invalid
Meaning:The audio is effective or not. Determine whether the
audio is valid or not according to the screening principle.

August 28, 2020 24


Audio Label Annotation
Gender
Legal value: Male, Female
Meaning: The gender of the speaker. When there are male
speakers and female speakers in one speech fragment,
usually annotate it as male. When you can tell the gender
of the child speaker, select Male. Special conditions should
be confirmed separately.

August 28, 2020 25


Audio Label Annotation
Accent
Legal value: Yes, No
Meaning: Accent /No Accent. Only obvious accent will be tagged as
"accent"
American/US English Accent is NO.
Asian English accent and the likes is YES.
Keep in mind that we identify accent based on the English fluency of the
speaker and not by location.

August 28, 2020 26


Audio Label Annotation
Background Noise
Legal value: Yes, No
Meaning: when the audio has continuous background noise, including
keyboard knocking sound, non-speaker voices, TV background sound,
car sound, etc. Depend on whether the noise is clear/obvious.

Special Note: If it’s sudden noise, only need to add the noise label
and no need to select Background Noise as Yes.

August 28, 2020 27


Audio Label Annotation
Child Tone
Legal value: Yes, No
The whole audio is child tone/ the whole audio is not child tone.
Depend on whether the tone is obvious.

August 28, 2020 28


August 28, 2020 29

Valid and invalid data shall be


inspected separately. WER is the
criteria for valid data while SER is for
invalid data.

Acceptance
Criteria
August 28, 2020 Main Rework Advices 30

1. Use of apostrophes (') for contractions.


2. Transcribe all the clearly heard Vocalized
pause/Modal words and repeated words,
do not miss them
3. Tag the speaker's laugh as (noise) in the
sentences, but the noises at the beginning
or end of the sentences shouldn't be
segmented into the valid speech.
Acceptance 4. Do not reserve too much mute part
Criteria before or after the valid speech (see
timestamping rule).
5. Some audios are the foreigners learning
Chinese, maybe their speech sound like
English pronunciation, but it's not. For
example, the pronoun of 3 in Chinese
sound like "Sun".
Main Rework Advices
August 28, 2020 31
6. If you can't tell whether it's English or not in a very short
audio, you can go to next one to take them as reference.
If the audio consists of two or more words that may
sound like English but main audio is composed of
foreign language also considering the last and next
audios are in foreign language then they may
categorize it as an invalid audio.
7. No is not the default option for Accent, it will be decided
by actual.
Only neutral American/US English accent is considered
Acceptance as NO.
Asian, British, Scottish, Australian and the likes are
Criteria considered as YES in accent.
8. Missing words, especially for the repeated words and the
vocalized words that you can clearly hear.
Always make it a habit to proof read before submitting,
always stick to our basic rule to type and include
exactly what you heard, specially for stutters/repeated
words.
9. Correct spelling of okay (ok) and ma’am (mam).
10. Always use punctuations whenever necessary.
August 28, 2020 32
1. Profanity / swear words and drugs
included in audio is VALID and should be
transcribed.
2. If the audio consists of too many or too
long foreign languages then in the middle
part it has one or two English words then
may consider it as INVALID.
Additional BUT if there are meaningful sentences then
still need to transcribe.
Updates 3. Don't transcribe the unsure content, sound
like English content, or try to cut several
meaningless words out of a sentence in non-
English. you can tell that the invalid
percentage is so high, so it's normal for them
see a task without valid audio, just don't
forcibly transcribe the content
August 28, 2020 33

The modal words need to be unified as below:


ew
huh

Attention
Additional hm
lah
Updates oh
uh
Um
Alipay
Feizhu
Yu Ebao
Speak now or forever hold your silence.
Q&A

You might also like