Labelling Rules

Labelling rules

Uploaded by

hafeezfatima27

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Labelling Rules

Labelling rules

Uploaded by

hafeezfatima27

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Rule document

1. We are looking for a quality inspection supplier this time, and you need to conduct quality
inspection on the audio that has been marked;
2. The quality inspection standards are completely consistent with the labeling specifications
written in this document. Quality control colleagues need to listen to the entire audio in its
entirety. If there are individual transcription spelling/sentence segmentation errors, they need
help to correct them. If it is obvious that the annotator has misunderstood the rules, return the
entire article.
3. The annotation personnel who have been returned with data need to make modifications. If
there are unclear rules and inconsistent standards, contact the responsible personnel to organize
records and send them to Aopeng for evaluation and confirmation.
4. As the quality inspection standards are consistent with the annotation specifications written in
this document, the quality inspection supplier also needs to annotate the original audio during
the trial phase, and determine whether the quality inspection supplier has the ability to execute
the project based on the trial results.

Firstly, read the content of the 'Annotation Standards', and then click on the' Key Questions and
Answers' online form to carefully review it
Online forms are very important!!!!

[Labeling Standards]
Before annotating, the annotator needs to assess the audio quality:
Confirm whether it is the data of a two person conversation in the corresponding language. If
not, directly determine that the entire audio is invalid
No need to label (if a whole audio segment is not a two person conversation, the data is invalid!)
Confirm the background sound of the audio, and any data with BGM laughter or applause will
directly determine that the entire audio is invalid
No need to label
Accent, accent data does not need to be annotated
1. For English data - if the speaker has a non European or American accent (which can be
understood as invalid data as long as the human ear judges that it is not a regular English
pronunciation, such as Indian English), the entire audio is directly judged as invalid
2. For Chinese data - if the speaker has a distinct regional accent, the entire audio will be deemed
invalid

be careful!! Regardless of whether the template is selected as valid or invalid, the "Valid Data"
button on the left side at the bottom of the page should never be moved! Click directly to
confirm completion and load the next one“
Thi
s
con
ten
t is
for
int
ern
al
use
onl
y
be careful!! If the entire audio is invalid, select "Invalid" and click Ctrl+Delete to clear all audio
segments. Select "none" for the line attribute to submit! (If delete cannot be restored, use with
caution)
Text pre transcription is configured in Appen task, and the model is used to automatically
recognize and distinguish the speaker's role.
However, the tool recognition is not entirely accurate. The task you need to do is to manually
move and modify the boxes and text marked incorrectly by the tool, and correct the machine's
cutting errors
be careful!! Do not let the pre recognition results interfere with manual judgment. Whether it's
the recognition of characters or the completeness and accuracy of each transcribed text
sentence. Based on human hearing and judgment, manual annotation shall prevail. (Note that it
is not based on pre identification)

1. Handling of front and rear silent segments: If the front and rear silent segments exceed 500ms,
they will be marked; No labeling is required within 500ms# none
2. Use tag # none for the middle silent section
In the original audio, if the pronunciation of the first and last sentences is cut off by the machine,
the incomplete sentences need to be framed as a whole, and the invalid data at the beginning
and end should be cut off when selecting #. Until the complete sentence appears, start selecting
the corresponding character and transcribing it

4. Regarding the segmentation criteria in the middle of paragraphs:

The first principle is to divide by role and select the corresponding role label (role 1/2)
From the beginning of character 1's speech to the beginning of character 2's speech, select #
speechr1 for this interval, without marking a silent section in between; From the beginning of
character 2's speech to the beginning of the next character 1's speech, select # speechr2 without
marking a silent section in between.
At the same time, it should be noted that the longest time for splitting is 10 seconds (no labels
need to be selected for the cut, just transcribe the next sentence directly), and within 10
seconds, use punctuation marks to naturally break sentences according to semantics (use a
period if the topic ends, and a comma if the topic continues)
Two people speaking at the same time, such as interrupting: choose # Two people speaking at
the same time without transcription
(If two people interrupt each other slightly, the sentence that needs to be transcribed should be
kept as complete as possible; but if two people keep interrupting each other, then choose the
whole sentence. # Two people speak at the same time without transcription.)
Short and ambiguous content: Choose # Two people speaking at the same time without
transcription
Thi If there are background sound effects (such as applause and laughter sound effects) but no one
s speaks, select the tag: # There are background sound effects but no one speaks
con
ten
t is In the trial data, it was found that some annotators rely entirely on pre recognition for the
for standard of audio framing, which is a wrong idea! The basis for segmentation is: sentence
int meaning+maximum length of 10 seconds, followed by the annotation rule for segmentation. If
ern
al
use
onl
y
pre recognition breaks a complete sentence into scattered segments, it requires manual deletion
of unnecessary and redundant cuts in the middle. One sentence and one box are enough

5. Spaces on both sides of numbers or letters in Chinese do not need to be marked

6. If there are other conflicts with the company or conversations introducing the audio source in
the audio, you need to select # Invalid data needs to be cut off
If the original audio contains inappropriate comments such as yellow violence, you need to select
# Invalid data needs to be cut off

【 Key Question Answering 】 Online Form Address:

https://docs.qq.com/sheet/DZHVSYXZKdVB6RldX?tab=BB08J2

Other instructions
1. Clicking on the "little robot" in the text box will link to pre transcribed text, saving time and
improving efficiency
2. Platform function keys, click on the "?" in the upper right corner to view the function guide

2. When selecting speaker 1 and speaker 2, the transcription content cannot be empty
3. The effective fragment cannot exceed 10 seconds
4. Required attributes

5. The first 'none' label in the left column is the platform default and cannot be deleted or
changed. At the same time, due to the platform's configuration of the "tag must be selected"
feature limitation, submissions cannot be made without selecting tags, which means that each
line must have two tags to submit for acceptance.

Thi
s
con
ten
t is
for
int
ern
al
use
onl
y
Thi
s
con
ten
t is
for
int
ern
al
use
onl
y

Eura English Transcription Guidelines 2024 - ADAP QF
No ratings yet
Eura English Transcription Guidelines 2024 - ADAP QF
25 pages
Reader's Digest Word Power (Gnv64)
93% (29)
Reader's Digest Word Power (Gnv64)
256 pages
Workbench User Manual For Transcribers
No ratings yet
Workbench User Manual For Transcribers
6 pages
Text Annotation Guidelines For Hindi ASR
No ratings yet
Text Annotation Guidelines For Hindi ASR
8 pages
Indonesia Transcription Guidelines - EN - 0413
No ratings yet
Indonesia Transcription Guidelines - EN - 0413
7 pages
CrowdSurf General Guidelines
100% (1)
CrowdSurf General Guidelines
26 pages
What Do We Do?: We Provide Audio Transcription Services, Which Means That We Convert Audio and Video Files Into Text
No ratings yet
What Do We Do?: We Provide Audio Transcription Services, Which Means That We Convert Audio and Video Files Into Text
12 pages
Testovi Iz Engleskog Jezika
No ratings yet
Testovi Iz Engleskog Jezika
9 pages
SJJ Hindi Transcription
No ratings yet
SJJ Hindi Transcription
9 pages
Specification for 1000 Hour American English Doctor-patient Dialogue Annotations
No ratings yet
Specification for 1000 Hour American English Doctor-patient Dialogue Annotations
7 pages
Shujiajia Audio Transcription & QA
No ratings yet
Shujiajia Audio Transcription & QA
6 pages
game 外语视频标注规范
No ratings yet
game 外语视频标注规范
6 pages
Gujarat (standard language) specification
No ratings yet
Gujarat (standard language) specification
6 pages
Audio Transcription Instruction(Praat)
No ratings yet
Audio Transcription Instruction(Praat)
16 pages
User Guide - Colloquial Video Annotation
No ratings yet
User Guide - Colloquial Video Annotation
5 pages
Ake ASR Transcription Rule (EN) - Long Audio - V0117
No ratings yet
Ake ASR Transcription Rule (EN) - Long Audio - V0117
5 pages
Ake ASR Transcription Rule (En) - Long Audio
No ratings yet
Ake ASR Transcription Rule (En) - Long Audio
4 pages
STEP 3 Audio_Transcription_Rules_EN-Final_0526
No ratings yet
STEP 3 Audio_Transcription_Rules_EN-Final_0526
13 pages
EU Portuguese Natural Conversation Annotation.docx 20240404 170408 ٠٠٠٠
No ratings yet
EU Portuguese Natural Conversation Annotation.docx 20240404 170408 ٠٠٠٠
8 pages
LOFT System Guidelines
No ratings yet
LOFT System Guidelines
17 pages
Annotation Project
No ratings yet
Annotation Project
11 pages
Quebec Accent French Colloquial Video Speech Transcription
No ratings yet
Quebec Accent French Colloquial Video Speech Transcription
6 pages
Transcription Rules - English Version
No ratings yet
Transcription Rules - English Version
7 pages
Aragorn Training Document
No ratings yet
Aragorn Training Document
34 pages
English Transcription Guidelines
No ratings yet
English Transcription Guidelines
6 pages
Guideline English Version - 20220508
No ratings yet
Guideline English Version - 20220508
12 pages
Loft Rules
No ratings yet
Loft Rules
6 pages
System Guidlines PDF
No ratings yet
System Guidlines PDF
10 pages
Aegisub Instructions - 20230314
No ratings yet
Aegisub Instructions - 20230314
14 pages
Appen
No ratings yet
Appen
9 pages
TCS Bangla Guidelines
No ratings yet
TCS Bangla Guidelines
7 pages
Rev Transcription
100% (2)
Rev Transcription
24 pages
Transcription
No ratings yet
Transcription
4 pages
Iris EN Long Audio Transcription Project: FAQ Frequent Answers & Questions
No ratings yet
Iris EN Long Audio Transcription Project: FAQ Frequent Answers & Questions
10 pages
Rev Transcription Style Guide v3.3
No ratings yet
Rev Transcription Style Guide v3.3
18 pages
Carneros Transcription Guidelines - Updated 20210727
No ratings yet
Carneros Transcription Guidelines - Updated 20210727
29 pages
Rev+Transcription+Style+Guide+3 0
No ratings yet
Rev+Transcription+Style+Guide+3 0
18 pages
Transcription Guide 20171117
No ratings yet
Transcription Guide 20171117
11 pages
Appen Nepali Annotation Guidelines
No ratings yet
Appen Nepali Annotation Guidelines
5 pages
Free Talk Annotation and Transcription Requirement-2022-12-29
No ratings yet
Free Talk Annotation and Transcription Requirement-2022-12-29
7 pages
Transcription Guide - Introduction, Labelling and Segmentation
No ratings yet
Transcription Guide - Introduction, Labelling and Segmentation
6 pages
Transcription Guidelines - GoTranscript
No ratings yet
Transcription Guidelines - GoTranscript
12 pages
Transcription Guidelines V 1.3 03022020
100% (1)
Transcription Guidelines V 1.3 03022020
24 pages
Rev Transcription Style Guide
No ratings yet
Rev Transcription Style Guide
2 pages
Specification 1
No ratings yet
Specification 1
4 pages
Transcription Guidelines en US v3.0
No ratings yet
Transcription Guidelines en US v3.0
24 pages
(En - US) Transcribe Long-Form Transcription Guidelines: Release Date: 20191209
No ratings yet
(En - US) Transcribe Long-Form Transcription Guidelines: Release Date: 20191209
24 pages
Specification
No ratings yet
Specification
4 pages
Transcriber Tool Manual 202001028
No ratings yet
Transcriber Tool Manual 202001028
13 pages
Standards For Tagging Malay Long Language Streams
No ratings yet
Standards For Tagging Malay Long Language Streams
11 pages
Guideline For The Annotation Platform-V20.2.25: Catalogue
No ratings yet
Guideline For The Annotation Platform-V20.2.25: Catalogue
9 pages
Paypal Payoneer Paypal Payoneer: Example
No ratings yet
Paypal Payoneer Paypal Payoneer: Example
5 pages
Tiktok Project Rules: Audio Characteristics
No ratings yet
Tiktok Project Rules: Audio Characteristics
7 pages
Editing Guidelines v.1.12: Transcribio
No ratings yet
Editing Guidelines v.1.12: Transcribio
16 pages
Guide For Transcribing Audio Records: July 2018
No ratings yet
Guide For Transcribing Audio Records: July 2018
8 pages
avert_transcription_style_guide_1.0
No ratings yet
avert_transcription_style_guide_1.0
16 pages
Transcriptionformat
No ratings yet
Transcriptionformat
14 pages
Speaker Diarization Guidelines 2024
No ratings yet
Speaker Diarization Guidelines 2024
12 pages
Job 2 Guidelines
No ratings yet
Job 2 Guidelines
9 pages
No sooner said than done: A step by step guide for users of Dragon speech recognition software
From Everand
No sooner said than done: A step by step guide for users of Dragon speech recognition software
Neil Sleight
No ratings yet
The ElevenLabs Prompt Bible: Computer & Technology, #1
From Everand
The ElevenLabs Prompt Bible: Computer & Technology, #1
Chris Oberholster
No ratings yet
Programming in C | Step by Step: The Simple Beginner's Guide
From Everand
Programming in C | Step by Step: The Simple Beginner's Guide
M.Eng. Johannes Wild
No ratings yet
PB1 - Kenny Neira - Semana 2
No ratings yet
PB1 - Kenny Neira - Semana 2
51 pages
Present Simple Tense Part 1 Grammar Explanation
100% (1)
Present Simple Tense Part 1 Grammar Explanation
2 pages
The Javanese Accent Influence in Pronouncing English
No ratings yet
The Javanese Accent Influence in Pronouncing English
6 pages
Examen Ingles
No ratings yet
Examen Ingles
11 pages
Grade 7 English 1st Periodic Test
No ratings yet
Grade 7 English 1st Periodic Test
3 pages
Lotus Phonics Program Overview-1
No ratings yet
Lotus Phonics Program Overview-1
20 pages
Assemling A Tent
No ratings yet
Assemling A Tent
4 pages
11 Grade Big Summative
No ratings yet
11 Grade Big Summative
4 pages
Aim High 4 Unit 10 Test B
No ratings yet
Aim High 4 Unit 10 Test B
2 pages
The Passive Voice1
No ratings yet
The Passive Voice1
19 pages
Close-Up B1 Intermediate Contents PDF
No ratings yet
Close-Up B1 Intermediate Contents PDF
3 pages
CD Practical Jignasha
No ratings yet
CD Practical Jignasha
39 pages
Shreemahalakshmi Kalpalatha Book Final A4 Printing Teluguindex
No ratings yet
Shreemahalakshmi Kalpalatha Book Final A4 Printing Teluguindex
12 pages
7A L29 Phiếu BT
No ratings yet
7A L29 Phiếu BT
6 pages
00 - Delegate Preparation Guide Dubai 2024
No ratings yet
00 - Delegate Preparation Guide Dubai 2024
22 pages
Smart Start Grade 2 (w24.2)
No ratings yet
Smart Start Grade 2 (w24.2)
3 pages
Respiration 1
No ratings yet
Respiration 1
16 pages
Introduction To Programming Lesson Plan
No ratings yet
Introduction To Programming Lesson Plan
3 pages
ACTIVITY TASK 5 - Passive Voice
No ratings yet
ACTIVITY TASK 5 - Passive Voice
4 pages
Worksheets Simple Past Tense
No ratings yet
Worksheets Simple Past Tense
1 page
ClickOn Starter Unit 3 Day 2
No ratings yet
ClickOn Starter Unit 3 Day 2
6 pages
LESSON 5 - Regional Literature
No ratings yet
LESSON 5 - Regional Literature
9 pages
Unit 1 Lesson 2.1 - Pronunciation & Speaking
No ratings yet
Unit 1 Lesson 2.1 - Pronunciation & Speaking
33 pages
Enrollment Agreement Standard Intensive English Program
No ratings yet
Enrollment Agreement Standard Intensive English Program
6 pages
Summative assessment for term - 8grade - англ - oxford
No ratings yet
Summative assessment for term - 8grade - англ - oxford
66 pages
De Thi Hoc Ki 2
No ratings yet
De Thi Hoc Ki 2
5 pages
Focus4 2E Grammar Quiz Unit1.2 GroupA 1kol
100% (1)
Focus4 2E Grammar Quiz Unit1.2 GroupA 1kol
2 pages
Romeo and Juliet Booklet
No ratings yet
Romeo and Juliet Booklet
32 pages

Labelling Rules

Uploaded by

Labelling Rules

Uploaded by

Rule document

4. Regarding the segmentation criteria in the middle of paragraphs:

5. Spaces on both sides of numbers or letters in Chinese do not need to be marked

【 Key Question Answering 】 Online Form Address:

You might also like