0% found this document useful (0 votes)
4 views

LOFT System Guidelines

The document outlines guidelines for transcribing audio dialogues in a call center setting, emphasizing the importance of speaker identification, transcription accuracy, and handling of background noise and personal identifiable information (PII). It provides detailed instructions on how to format speaker labels, manage overlapping speech, and categorize audio events. Additionally, it includes rules for dealing with difficult cases and specific instructions for transliteration of Hindi and English words.

Uploaded by

rawzinis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

LOFT System Guidelines

The document outlines guidelines for transcribing audio dialogues in a call center setting, emphasizing the importance of speaker identification, transcription accuracy, and handling of background noise and personal identifiable information (PII). It provides detailed instructions on how to format speaker labels, manage overlapping speech, and categorize audio events. Additionally, it includes rules for dealing with difficult cases and specific instructions for transliteration of Hindi and English words.

Uploaded by

rawzinis
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

hi-IN Call Center Guidelines in LP

Task Objective
You will listen to a dialogue that will likely contain multiple speakers. Your job is to identify and mark when
each speaker is speaking and transcribe the corresponding audio. Some of the audio will contain background
noise, background music, and ringtones; this should be marked too following the below instructions.

**IMPORTANT**
1) Once you are done transcribing a task, you ​MUST​ hit the completed button:

2) Transcribe ​ALL​ speech according to the ​hi_in WDC’s​. Regular speakers, pre-recorded speakers, and
synthesized speakers.
3) For transliteration, please use English.
4) Please read the section “How to handle ‘Difficult Cases’” carefully.
5) Do not skip any tasks unless they are ​completely​ silent.
6) Be aware that ​a segment should ​NEVER​ have more than 0.5 seconds of silence (see step 10).
7) Turns shouldn’t be more than half a minute long (30 seconds).
8) Noise, PII, music, ringtones and DTMF should be labeled with the annotation option.
9) Unintelligible and foreign speech should be labeled with a ​new turn​ (and not with an annotation).
10) Do not “name” the speakers. Please keep speaker labels as numbers only (speaker 1, speaker 2, pre
recorded speaker 1, pre recorded speaker 2, etc.)
11) All speaker labels should be ​consistently​ formatted. Speaker labels should ​always​: be in all
lowercase, be spelled correctly, and should ​not ​contain underscores or hyphens.

Correct Formatting Incorrect Formatting

speaker 1 Speaker 1

pre recorded speaker 1 pre-recorded speaker_1

8) Do NOT transcribe PII ​(definition at the bottom of the document)​. ​When PII is heard​, ​add an annotation
and chose “PII” from the drop-down menu. Check illustrated example below.
9)​ ​Please transcribe all English words in Latin script and all Hindi words in Devanagari script. ​If a
‘Hinglish’ word is used, please transcribe in Latin. See the transliteration section for more details.

10) Unintelligible and foreign audio must be entered as a separate turn. ​Every time you hear
unintelligible or foreign speech, please end the exiting turn and create a new turn (for the same speaker) to
label the audio and add [unintelligible] to the transcription box (example below).

Guidelines
1) Create a new Transcription box for the entire section of audio where you hear speech, using the “add
turn” option. Audio events such as noise, PII, music, Laughter, ringtone or DTMF should be added to
the transcription as an annotation, using the “add annotation” option.
2) Everytime you create a turn, you will have to assign a name to that turn. Use ​ONLY​ the below options:
a) speaker #
b) pre recorded speaker #
i) This can be either a recording or a synthesized speaker

3) For the categories below ​ONLY​, please add an annotation and select from the drop-down menu.
Please do not use any other category other than the ones listed here. ​*​Please do not use the
unintelligible or foreign speech annotations. Unintelligible and foreign speech should be added
as a separate turn and have a speaker assigned to it​ ​(more instructions below)​. ​Use only the
categories below:
a) Noise
b) PII
c) Music
d) Ringtone
e) Laughter
f) DTMF
i) Stands for: Dual Tone Multi Frequency. Also referred to as “touchtone.”
ii) Example: If the operator says “Press 1 to speak to a representative” and you hear a
‘beep’ (the caller pressing the button), that beep is what should be labeled as “DTMF”.

4) Identify the speaker by listening to the audio. The first speaker in the audio should be labeled
speaker 1.
Every time this speaker speaks throughout the rest of the audio, it should be labeled as speaker 1.
a) The next ​new​ speaker that is introduced should be labeled as speaker 2. The next different
speaker is speaker 3, then speaker 4, and so on.
b) Pre recorded speakers should be labeled the same way. The first pre recorded speaker that is
heard should be labeled as “pre recorded speaker 1”. The next pre recorded speaker that is
different from “pre recorded speaker 1” is “pre recorded speaker 2”, and so on.
c) Listen back through the audio to be sure you are not creating a duplicate speaker. When in
doubt, use “unidentifiable speaker”. Do not number the unidentifiable speakers.
d) Note that:
i) Lyrics in music should not be labeled. If there is background music and it has lyrics, just
add an annotation and select “MUSIC” from the drop-down.
ii) If there is a pre-recorded greeting/advertisement with music in the background, a turn
should be added as 'pre recorded speaker #' and an annotation “MUSIC” as overlapping
section (see step 8 for overlapping audio).
5) When to create a new segment?

a) When one speaker ​stops​ speaking and a new speaker ​starts​ speaking, create a new turn for
the new speaker.
b) When one speaker ​stops​ speaking and pauses for more than 0.5 seconds, create a new turn
for when this speaker resumes talking. If the pause is for less than 0.5 seconds, then do not
create a new turn (see step 9 for more info on this).
c) When speech goes over 30 seconds without a pause. A turn should not go over 30 seconds
even when speech goes over 30 seconds without a long pause. In this case, the turn should
end at 30 seconds and a new turn should be created to transcribe the remaining speech.
d) When unintelligible audio or foreign speech is heard, please end the current turn and create a
new turn and add [unintelligible] or [foreign speech] in the transcription box. ​Unintelligible and
foreign speech should be linked to the appropriate speaker. *Please do not use the
unintelligible or foreign speech annotations.
e) The only annotations allowed for this project are: noise, PII, music, ringtone and DTMF.
Enter all PII using the annotation option. ​When other noise events are heard, add a new
segment by selecting “add annotation” and choose an option from the drop-down menu. PII
should never be transcribed.

6) Edit the transcription time range by using the horizontal red line to help indicate the turn start and turn
end time you are editing for.
a) Note: the horizontal red line will not automatically provide you with the correct start for a
transcription segment. Hitting the “+” button will create a new segment ​two​ seconds prior to
where the redline is.
b) Helpful tip:​ Use the red line to show you the exact time stamp, then ​ctrl+; t​ o copy to start time.
Use ​ctrl+’​ to copy as the end timestamp.
c) Manually adjust the start and end times by editing the time stamps in the transcription box.

7) For each new segment you create, you will have to assign a name to the segment. Click the dropdown
menu to select an existing option, or to create a new one.
8) Where it says “text” in the transcription box, transcribe what you hear in the audio for regular speakers
AND pre recorded speakers.

9) For other audio categories such as noise, PII, music, ringtone and DTMF, then simply add an
annotation and select from the drop-down menu. PII should also be entered using the annotation
option.
10) A segment should ​NEVER​ have more than 0.5 seconds of silence.
a) Use the audio wave to identify periods of silence.
b) Anytime there is more than 0.5 seconds of silence, be sure there is no segment over that time
period.
i) Here is a good example of skipping 0.5 seconds of silence.
ii) Here is a ​bad​ example, do not do this!

11) If the speech is unintelligible, in a foreign language, OR singing create a separate turn to label the
unintelligible, foreign speech or singing audio. Add [unintelligible] or [foreign_speech] or [singing] to the
transcription. These segments should be linked to the appropriate speaker label. Please do not create a
separate speaker label for unintelligible or foreign speech. Note that this is not an overlapping turn (see
image below).
If the ENTIRE audio is sung or is in a foreign language, create a [singing] or [foreign speech] speaker
label, and create one segment lasting the entire duration of the audio. ​And mark the task as
completed.

*​Please do not use the unintelligible or foreign speech annotations.

12) If there is overlapping speech, or overlapping audio events, the individual segments should overlap
each other. Example:
● Important Notes
○ PLEASE BE SURE SPEAKER LABELING IS CONSISTENTLY FORMATTED.
○ Words should not be cut off when annotating start and end points of an utterance.
○ Speaker names should be distinct. A new speaker id should only be created when a new
speaker is heard.
○ A segment should ​NEVER​ have a period of silence greater than 0.5 seconds.
○ When there is unintelligible or foreign speech, end the current turn, create a separate turn with
[unintelligible] or [foreign speech] to the transcription and select the appropriate speaker.
○ When there is overlapping speech, create a Transcription box for both speakers for the same
timeframe.

How to Handle Difficult Cases

This section’s purpose is to show rules from the WDC’s, and then provide an answer as to how such
cases should be treated for this project. If a WDC rule does not appear in the list below, then follow
whatever is stated in the WDC’s.

DO NOT SKIP ANY TASKS UNLESS THEY ARE SILENT.

The WDC rule will be in BLACK.


The changed answer will be in ​RED​.
________________________________________________________________________________________

If the prompt cannot be understood, skip it (tag it as [skip]). It is preferable to skip rather than mistranscribe.
● Do not skip, use the [unintelligible] tag, see step #11 above

Skip the utterance if it: contains at least some word(s) that cannot be understood; is in a different language
typically not understood; contains no speech; contains only laughter; contains singing; contains only
synthesized speech (e.g. the voices of Google Now or Siri) and/or pre-recorded speech (e.g. TV or radio).
● Can’t be understood: Use the [unintelligible] tag, see step #11 above
● Different language: Use the [foreigh speech] tag, see step #11 above
● Laughter: Use the annotation option and selected Laughter from the dropdown menu
● Singing: Use the [singing] tag, see step #11 above

For utterances that contain both user-generated speech and pre-recorded or synthesized speech, transcribe
user-generated speech and ignore the pre-recorded/synthesized speech.
● TRANSCRIBE ALL SPEECH

If a prompt contains nonsense words, search them on the internet. If no clear results are found and the word is
unintelligible (there is no single obvious spelling), [skip] it.
● Do not skip use the [unintelligible] tag, see step #11 above

If the speaker sings, [skip]. Use the tag [music] if an entire utterance is music from an instrument, radio, TV,
etc.
● Singing: Use the [singing] tag, see step #11 above
● Music: Use the annotation option and select Music from the dropdown menu

[skip] if audio contains only laughter. Ignore laughter that is interspersed with speech (transcribe only the
speech).
● Laughter: Use the annotation option and selected Laughter from the dropdown menu

Profanity should be fully transcribed. However, feel free to skip a sentence that you feel uncomfortable
transcribing.
● Profanity should be fully transcribed. Otherwise, use the [unintelligible[ tag, see step #11 above

If the context of an alpha-digit sequence suggests it may be a password, credit card number, social security
number, etc., then use [skip].
● For instances of PII, use the annotation option and select PII from the dropdown menu

If an utterance is in a foreign language, tag with [skip], unless it is an easily identifiable media title or a foreign
language phrase commonly understood in the transcription language. Stick to the capitalization and
punctuation conventions of your target language.
● Use the [foreign speech], see step #11 above

If words in a foreign language are included in a sentence of your target language, transcribe only if commonly
understood by speakers of your language. Otherwise, [skip]. Foreign words that are commonly used (and
therefore should be transcribed) can include names of foreign foods or places, pop culture phrases like
"capisce", and greetings or thank yous in prominent world languages.
● Please follow the transliteration instructions below

Only transcribe foreground speech. A user's speech may go from the foreground to the background or vice
versa (determined by change in volume) and can be accompanied by change in speaker audience.
● TRANSCRIBE ALL SPEECH

If one person clearly speaks in the foreground and someone speaks in the background, transcribe the main
speaker and ignore the rest.
● TRANSCRIBE ALL SPEECH

If one person clearly speaks in the foreground and someone interrupts at roughly the same volume with a brief
(less than a second) overlapping speech segment, transcribe the main speaker and ignore the rest.
● TRANSCRIBE ALL SPEECH

If two or more people are speaking at once with no one clearly in the foreground, tag as [overlapping]. Do this
for overlaps longer than one second. Use this tag even when one person is a bit louder than the other(s) and
you can tell what they're saying.
● TRANSCRIBE ALL SPEECH

Transcribe repeated words as many times as uttered, but skip if it is more than 5 times.
● Transcribe all speech exactly as it is heard. Do not leave any words untranscribed.

Write media titles as they are most commonly written. Movie titles and English book titles should be written in
Devanagari.
● Movie titles and English book titles should be written in ​Devanagari​ script.

________________________________________________________________________________________

Transliteration
All English words should be transcribed using a Latin keyboard.
All Hindi words should be transcribed using a Devanagari keyboard.
All Hinglish words should be transcribed using a Latin keyboard.

Hinglish definition: ​Hinglish a hybrid language; it is the mixing English and Hindi together. These words are
neither an English word, or a Hindi word, they are ‘Hinglish’ words. Hinglish can mean different things to
different people, so please use your best judgement when transcribing.

Some examples of Hinglish words are:


● Auntyji, Uncleji — a child's elder relations and close adult contacts
● Pukka — Slang for genuine, or good
● Glassi — Thirsty
● Jungli — Unruly, wild
● Timepass — idle distraction, or languish
All of the above words, and other words like these should be transcribed in ​Latin text.

*PII List - DO NOT TRANSCRIBE PII*


PII Category Definition
NAME First and/or Last name
CREDIT_CARD_NUMBER
EMAIL
LOCATION
PHONE_NUMBER
SOCIAL_INSURANCE_NUMBER
DRIVER_LICENSE_NUMBER
NATIONAL_HEALTH_SERVICE_NUMBER
SOCIAL_SECURITY_NUMBER
PASSPORT
A tax file number (TFN) is a unique identifier issued by the
TAX_FILE_NUMBER Australian Taxation Office (ATO) to each taxpaying entity
LOCATION_STREET
LOCATION_STREET_NUMBER
MRN (medical record number)
CUSIP stands for Committee on Uniform Securities Identification
Procedures. A CUSIP number identifies most financial instruments,
including: stocks of all registered U.S. and Canadian companies,
BANKERS_CUSIP_ID commercial paper, and U.S. government and municipal bonds.
Each B.C. resident enrolled with the Medical Services Plan (MSP) is
given a unique lifetime identifier for health care called a Personal
BC_PHN Health Number (PHN)
OHIP Ontario Health Insurance Plan
QUEBEC_HIN Quebec Health Insurance Number
The French national identity card (French: Carte nationale d'identité
or CNI) is an official identity document consisting of a laminated
CNI NIR plastic card bearing a photograph, name and address.
The International Bank Account Number (IBAN) is an internationally
IBAN_CODE agreed system of identifying bank accounts
A SWIFT code is an international bank code that identifies particular
SWIFT_CODE banks worldwide. It's also known as a Bank Identifier Code (BIC).
The numbers located on the bottom of a check is called a MICR
line. MICR means Magnetic Image Character Recognition. The
MICR line is made up of three sets of numbers. The first set is
called the ABA Bank Routing Number or routing transit number
BANK_ROUTING_MICR (RTN)
A DEA number (DEA Registration Number) is an identifier assigned
to a health care provider (such as a physician, optometrist, dentist,
or veterinarian) by the United States Drug Enforcement
DEA_NUMBER Administration
A National Provider Identifier or NPI is a unique 10-digit
identification number issued to health care providers in the United
HEALTHCARE_NPI States by the Centers for Medicare and Medicaid Services (CMS).
MEDICARE_NUMBER
NIE_NUMBER The NIE is a tax identification number in Spain
The CPF (Cadastro de Pessoas Físicas or Natural Persons
Register) is a number assigned by the Brazilian revenue agency to
CPF_NUMBER both Brazilians and resident aliens who are subject to taxes in Brazil
Permanent Account Number (PAN) is a code that acts as an
identification for individuals, families and corporates (Indian or
PAN_INDIVIDUAL Foreign), especially those who pay Income Tax
netherlands: The citizen service number (BSN) is a unique personal
number allocated to everyone registered in the Personal Records
BSN_NUMBER Database (BRP).
International Statistical Classification of Diseases and Related Health
Problems (ICD), a medical classification list by the World Health
Organization (WHO). It contains codes for diseases, signs and
symptoms, abnormal findings, complaints, social circumstances, and
ICD_CODE external causes of injury or diseases.
FDA_CODE Prescription drug
Tax Identification Number in Spain
NIF http://www.investinspain.org/guidetobusiness/en/2/art_2_3.html
TAXPAYER_REFERENCE
CURP is the abbreviation for Clave Única de Registro de Población
(translated into English as Unique Population Registry Code or else
as Personal ID Code Number). It is a unique identity code for both
CURP_NUMBER citizens and residents of Mexico.
Receiver Registration Number (RNN) is a 10-character
alphanumeric can be to a bank account, a credit/debit card, mobile
RRN wallet, or home delivery.

Definitions
Task A random set of letters used to identify the audio wave you are transcribing. The list of
eligible tasks can be seen the Task section of LP.

Speaker ID The speaker id is used to identify the speaker in the audio. Use the same speaker id for
the same person throughout the task.

Speaker turn One continuous contribution to dialogue by a single speaker. It may consist of a single
word or multiple utterances.

Category A category is used when there is audio that we do not need transcribed but only labeled.

You might also like