0% found this document useful (0 votes)
17 views3 pages

Short_Audio_Transcription_Guideline(1)

This document outlines the guidelines for a short audio transcription audit project, focusing on identifying valid and invalid audio clips. It specifies rules for invalid audio, valid audio transcription, and acceptance criteria for accuracy. The document emphasizes strict adherence to transcription standards, including the use of Arabic numerals and proper nouns.

Uploaded by

BRIAN MULUTU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views3 pages

Short_Audio_Transcription_Guideline(1)

This document outlines the guidelines for a short audio transcription audit project, focusing on identifying valid and invalid audio clips. It specifies rules for invalid audio, valid audio transcription, and acceptance criteria for accuracy. The document emphasizes strict adherence to transcription standards, including the use of Arabic numerals and proper nouns.

Uploaded by

BRIAN MULUTU
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Short audio transcription

1. Brief introduction
This is a short audio transcription audit project. All audios will less than 10s duration.

You need to judge whether the audio is valid or invalid firstly, then transcribe valid audio.

(If it has been pre-transcribed, please check whether the pre-transcribed text match the audio context.
If no, please correct the text.)

2. Invalid audio rule


Please invalid the audio if it is any case of the following, and no need to transcribe it.

a. Non-target language. The whole audio is not targeting language.


b. Noise & non speech. The whole audio is full of noise or silence.
c. Meaningless sentence. The whole audio is only filling words like ah, um, haha…
d. Illegal content. The audio involve pornography, violence, racial discrimination etc.
e. Non-native speaker/Wrong accent.
f. Incomplete sentence.
g. Stutter.
h. Wrong gender
i. Include over 3 (including 3) English words
j. Don’t meet domain requirement
The whole project will include the following domains:
General spoken language (TYKY)
Travel and shopping (LYGW)
Education and learning (JYXX)
Sports and entertainment (TYYL)
Number and time (SZSJ)
Clarification for domain “Number and time”:
There should be exact number in audio, and all number MUST be written in Arabic numerals form.
If the audio contains the following contents, then it can be valid and needs to be transcribed.
1. Number: one, two, three…should be written in 1,2, 3,,,,
2. Time: five pm, six am, half past six…should be written 5:00 pm, 6:00 am, 6:30
3. Date: Monday to Sunday, January to December, 3rd Aug, 27th Sep…

This content is for internal use only


4. Ordinal number: First, second…
For some words, if you can’t write them in Arabic numerals like Monday, December, first etc, then
you can write in target language words.
But words that represent a period, like morning, afternoon, evening, month, year, century etc are invalid,
you don’t need to transcribe the audio.
Here are some examples:
Audio 1: I will go back in five hours.
valid
Text 1: I will go back in 5 hours.
Audio 2: I will go back on Monday.
valid
Text 2: I will go back on Monday.
Audio 3: I will go back on eleventh October.
valid
Text 3: I will go back on 11th, Oct.
Audio 4: I will go back this morning.
Invalid, no need to transcribe.
Audio 5: I will go back in several weeks.
Invalid, no need to transcribe.
Audio 6: It has been a long time.
Invalid, no need to transcribe.

3. Valid audio rule


3.1 Strictly follow the principle of RECORDING EXACTLY WHAT YOU HEAR. DO NOT
ADD, OMIT ANY CONTEXT.
repetition words
Audio: where where are we going?

Transcription: where where are we going?

3.2 Transcribe foreign language (English) words as its pronunciation.

3.3 Proper nouns

a) English person name. The name of a well-known person must be transcribed by the name
which is officially recognized. General names should be marked with the most common
characters. -> Barack Obama, Donald Trump.

This content is for internal use only


b) Brand name. Brand names need to follow official published.

c) The use of homonyms: Make sure the grammar is correct when the pronunciation is the
same. For example, He took some lights on a peace of paper -> He took some lights on a piece
of paper. (Peace obviously does not conform to semantic and grammar.)

d) Proper nouns are written in uppercase and half-width English in order of pronunciation.

For example: VIP MBA NHK TBC


3.4 Numbers

Use Arabic numerals


Examples 1: five percent should be transcribed as 5%.
Examples 2: sixty-eight should be transcribed as 68.
Examples 3: five kilograms should be transcribed as 5km.
3.5 Punctuation

1. Only [,], [.], [?], [!]can be used.

2. Period or question mark MUST be added at the end of a sentence.

3. Punctuation cannot be used continuously. For example, ",..." is not allowed.

4. Only standard punctuation can be used. For example, “:-)”, ”>:-(“, ”:-|” are not allowed.

5. Special characters such as: @ % & * + = ~ # ¥ £ $ € ☆ ★ ● ◆ ◇ ℃ ‰ ♀ ♂ ° ※ § № ≥ ≤ ≠ ≈ ± ×


÷ ∑ √ ∥ ℉/ \ ^ « » are not allowed to use. But if there are some special symbol in proper nouns,
then you can keep it.

3.6 Prohibit the use of words other than the target language and English.

4. Acceptance criteria
Average sentence accuracy higher than 96%.

Sentence accuracy= Correct number of words/Total number of standard words

Average sentence accuracy=Sum of sentence accuracy/Total number of spot check sentence *100%

This content is for internal use only

You might also like