0% found this document useful (0 votes)
66 views

Karl - MQM Task_Error Annotation

The document outlines a project for annotating translation errors by selecting word spans with corresponding issue types and severities. It emphasizes the importance of evaluating translations against human quality standards, considering context, and applying specific guidelines for identifying major and minor errors. Detailed instructions for the annotation process, including error types, severities, and style conventions, are provided to ensure consistency and accuracy in the evaluation of translations.

Uploaded by

nwabudefrank9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

Karl - MQM Task_Error Annotation

The document outlines a project for annotating translation errors by selecting word spans with corresponding issue types and severities. It emphasizes the importance of evaluating translations against human quality standards, considering context, and applying specific guidelines for identifying major and minor errors. Detailed instructions for the annotation process, including error types, severities, and style conventions, are provided to ensure consistency and accuracy in the evaluation of translations.

Uploaded by

nwabudefrank9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Karl - MQM Task_Error Annotation

Overview:

In this project, you will be annotating errors by selecting spans of words with their corresponding issue type
and severities.

Project aim and guidelines: To evaluate the quality of the translation received from different documents.
Note that the output can be of human or machine translated content. Regardless of this, the standard you
should always be reviewing the translation against is human translation quality. Report every occurrence
where the translation falls short of that standard. The translation should be:

●​ Linguistically correct
●​ Accurate
●​ Readable (fluent, grammatically correct, and natural-sounding)
●​ With terminology appropriate in the context
●​ Consistent
●​ Faithful in tone and register to the source
●​ Appropriately transformed for the target context: cultural references or humor should be substituted
with equivalents in the target language where appropriate (e.g. it's raining cats and dogs → it's
pouring).

It is mandatory that before starting the annotation process, you carefully read the instructions, including the
definitions for severities and error types. It is important to understand the difference between “Major” and
“Minor” severities and how to label text spans that cover the issue identified.

Be mindful of the following:

●​ Take context into account when annotating:


●​ If a translation might be questionable on its own but is acceptable in the context of the document,
then it should not be considered an error.
●​ Similarly, it is OK for the translation to make use of the context to omit some part of the source text
that is obvious from the context (even if it is not omitted in the source text). For example, an adjective
may not have to be repeated in the translated text, if it is naturally obvious from the context.
●​ When identifying issues, please be as fine-grained as possible. If a sentence contains multiple words
that are independently mistranslated, separate errors should be recorded.
●​ If the same error occurs more than once in a particular document or even in a sentence, all
occurrences should be reported. For consistency problems, assume that the first instance of a set of
inconsistent entities is correct, and label any subsequent entities that are inconsistent with it as
errors. For example, if a text contains “Doctor Smith … Dr. Smith … Doctor Smith … Dr. Smith”, the 2nd
and 4th occurrences should be marked.
●​ If the whole translation of a sentence is so bad that all or nearly all of it is completely wrong (e.g., word
salad, completely nonsensical output, severely broken syntax), then apply the “Major” severity and pick
the error type “Non-translation!”. Note that picking Non-translation! will automatically select the whole
sentence as the error span, even if you had selected a subspan to start with.

Style and convention guidelines

Please follow the general stylistic guidelines for your target language and mark an error for any part of any
sentence that does not comply with them. Also, make sure to perform basic online research to verify any
brand names, acronyms, titles, etc. if unsure. Note that most of the time, stylistic errors have a minor
severity, unless they alter the meaning.

Acronyms: should be translated (with an equivalent acronym in the target language or a spelled out
translated version) where the translated version is more common than the source language version.
Conversely, if the untranslated version is more common, a translated version has not been established or the
abbreviation or acronym constitutes a registered name or trademark, the abbreviation or acronym should be
kept in the source language.

Books, movies, songs, games, research papers, academic journal titles, legal or official documents (foreign
acts and regulations, international agreements, treaties, conventions, resolutions): where an official
translation exists, it should be used. Where no official translation is available, the title should be kept in the
source language or an appropriate translation should be provided, based on what is accepted/most
common in the particular case.

Capitalization: Product names, titles, and proper nouns should be capitalized even if they are not capitalized
in the source. Titles should use sentence case or title case, based on what is used in your target locale. Title
case is used mainly in English.

Company names, brands, and institutions: Company names and brand names should not be
localized/translated. Names of institutions and organizations can be translated if appropriate in the target
language.
Currency: Make sure currency acronyms and symbols are correctly localized and incorporated into the
sentence.

Dates: Make sure the correct date format is used as per your target locale.

Explanations: There should be no added explanations. Do not expect this and do not log an error if an
explanation in brackets is not provided.

Measurements: Do not expect measurements to be converted between imperial and metric. They will most
likely be just translated. Do not log errors if the measurement units are correctly translated.

Numbers: Make sure the correct decimal and thousand separators are used.
Time: Make sure time is correctly indicated in the format that is acceptable in your target locale in the
context of the sentence (24-hour clock vs. 12-hour clock).

Step-by-step instructions

1.​ Review the translation against the source, following all the mandatory guidelines.
2.​ Select the span of words affected by the error by clicking on the word/particle where the identified
error “begins”, then clicking on the word/particle where the issue “ends”. If it is only one word, click on
it twice.
3.​ The span should not include adjoining words not directly affected by the identified issue and do not
need to be modified for it to be fixed.
4.​ You can only mark spans within sentences. In the rare case that an error straddles multiple sentences
(e.g., when there is an incorrect sentence break), just mark the first part of the span that lies within a
sentence.
5.​ The shorter the span, the more useful it is.
6.​ When it comes to "Style/Unnatural or awkward" errors, please pinpoint the error rather than extend the
span to an entire clause.
7.​ If a single issue affects words that do not directly follow each other, as is the case with split verbs in
German (“teilte die Feuerwehr auf”) or phrasal verbs in English (“called Mary and her brother up”), log
the issue only for the first part (“teilte”, “called”) and do not log anything for the latter part (“auf”, “up”).
The text between “teilte” and “auf”, or between “called” and “up”, should not be included in the span if
the issue is with the verb only (“aufteilen”, “call up”).
8.​ Errors can appear either on the translation side, or rarely, for the "Source issue" type, on the source
side. When the error is an omission, the error span must be selected on the source side.
9.​ Select the severity of the issue using the buttons in the rightmost column ("Evaluations") or their
keyboard shortcuts:
10.​ Major severity (M)
11.​ Minor severity (m)
12.​ Select the category (also called type) and subcategory (also called subtype) of the error/issue
found. For example: Accuracy > Mistranslation.
13.​ After annotating all identified issues in a sub-paragraph, use the right arrow key (or the button) to
go to the next sub-paragraph.
Severities

Major severity: These errors significantly alter the source text's meaning or significantly degrade its quality.

1.​ The translated text says something different from the intent of the source text, or is substantially
difficult to understand, or has some very jarring linguistic flaw.
2.​ Typically, accuracy and terminology errors fall here, as well as egregious style or grammar errors.

Examples of Major issues:

Language pair Source Translation Comments

EN_DE makes light of the horrors mache die Schrecken leicht "Make light of something" means "to treat
something as unimportant". The phrase was
translated literally and the translation makes no
sense.
EN_DE pro-Remain Tories Brexit-freundlichen Tories Translated to the opposite meaning (pro-Remain
vs. pro-Brexit)

EN_DE ACP ACP The acronym "ACP" means nothing in German. It


should have been expanded to "Assistant
Commissioner of Police" and translated
accordingly ("Stellvertretender
Polizeikommissar")

ZH_EN 但是这个过程急不来,所以 But this process is not anxious The meaning of the source is "This process
cannot be rushed". The translation makes no
sense.

ZH_EN 清华迎来110岁生日,校长 Tsinghua celebrates its 110th "Capitalized people" makes no sense. It should
邱勇:大学之大在于培养大 birthday, President Qiu Yong: be "upstanding people".
写的人 The greatness of a university
lies in cultivating capitalized
people
EN_ZH unmute yourself please 请你闭麦 Mistranslated as "mute yourself".

ZH_EN 帕梅拉的意识,待在贝塔的 Pamela's consciousness stayed This should read "him". The meaning is
灵魂世界,透过他的双眼, in Beta's spiritual world. She substantially altered, since the source text
看到了眼前的景像。 saw the scene in front of her means that "she sees the scene that he is
through his eyes. seeing".

DE_EN Koch-Ottes Haare sind Koch-Otte's hair is short, The choice of pronoun is incorrect for Benita
kurz, streng gescheitelt severely parted and combed; Koch-Ottes, a German female textile designer.
und gekämmt, der Blick his look spirited.
energisch.
Minor severity: These errors are noticeable but minor flaws in the translated text. They do not significantly
alter the source text's meaning or degrade the text's quality.

1.​ Minor severity errors might add, drop, or modify minor details, or they may slightly decrease the
stylistic quality of the text.
2.​ Typically, the kinds of errors that fall under this severity level are grammar, spelling (including
capitalization and whitespace), style, punctuation, locale convention, and creative reinterpretation.

Examples of Minor issues:

Language Source Translation Comments


pair

EN_DE When cooking for a crowd, Beim Kochen für eine Menschenmenge braten A minor word choice error (a
Eunsook Pai sears the Eunsook Pai die Teigtaschen ein paar Stunden contextually incorrect expression
dumplings a couple of im Voraus an und dampft sie dann kurz vor but still understandable).
hours in advance and then dem Servieren.
steams them just before
serving.

EN_DE Heat another tablespoon Erhitzen Sie einen weiteren Esslöffel Literal, unidiomatic, but still
vegetable oil, and saute Pflanzenöl und braten Sie die Zwiebel unter understandable. The expression
onion, stirring occasionally,gelegentlichem Rühren 2 bis 3 Minuten an, bis "just softened" means "at the
until just softened, 2 to 3 sie gerade weich ist. point of becoming soft."
minutes.

EN_DE In an exclusive interview In einem exklusiven Interview mit Fast German uses curly quotes,
with Fast Company, Company scherzte Berners-Lee, dass die opening down, closing up. The
Berners-Lee joked that the Absicht hinter Inrupt die "Weltherrschaft" sei. translation uses the same quote
intent behind Inrupt is style as the source English, so
"world domination." both of these wrong quotes are
minor punctuation errors.

EN_DE The limited-time Taste of Die zeitlich begrenzte Veranstaltung Taste of Awkward syntax.
Knott's food, beer and wineKnotts Essen, Bier und Wein läuft bis zum 13.
event runs through Sept.
13 without rides, coasters
or other theme park September ohne Fahrgeschäfte, Achterbahnen
attractions. oder andere Attraktionen des Themenparks.

ZH_EN 因为其中大多数都被抛弃了Because most of them are released, the hope A minor terminological error. It
,希望包含在花粉中的生殖 is that there is a reproduction cell contained in should read "reproductive cell."
细胞,雄性生殖细胞 the pollen.

EN_ZH This is also explained on 这在该插件的另一方面也有解释。 Inaccurate translation of "widget"


the flip side of the widget. and "flip side". Widget usually
refers to a small gadget or
mechanical device, especially one
whose name is unknown or
unspecified. "flip side" means the
opposite side.
Context can be the key in determining whether an error is major or minor. For example, changing the tense of a
standalone sentence may be a minor error, but doing so in the middle of a narrative would be a major error.

Error Types and Subtypes

Error Type Subtype Subtype definition Use cases and examples

Accuracy Creative The translated text reinterprets the 1) When the translation includes additional text
Reinterpretation source but preserves its intent. Note or omits some that provides explanations or
The translated text does not that if the translation reinterprets the context that may be obvious in the source

accurately reflect the source text to such a great degree (target) locale, but not in the target (source)
source. that it changes the intent, then it locale. For example, an added short
should be marked as using introduction of an entity not well known in the
Mistranslation, Addition, or Omission target locale, or an omitted introduction of an
subtypes, as appropriate. entity well known in the target locale.

2) When the translation edits the source


creatively, perhaps to make the translated text
more fluent, expressive or localized.

Mistranslation The translated text does not (1) The source text talks about a person A
accurately represent the source text. knowing another person B, but the English
translation says "A was intimate with B." (2) The
source text states that something never
happens, whereas the translation says it
happens "often" or "rarely." (3) Incorrectly
gendered pronouns not warranted by the
context, such as "Mary slammed the door as he
left." Misgendering errors typically have a major
severity as they significantly alter the source
text's meaning.

Source language Content that should have been A word, phrase or sentence in a German
fragment translated has been left document has been copied verbatim into the
untranslated.
English translation, but a natural translation is
possible.

Addition The translated text includes A translated sentence that includes adverbs or
information not present in the adjectives without equivalents in the original
source. text, even after considering context.

Omission Content is missing from the A phrase has been dropped from the translation
translation that is present in the and it is not already implied by context. This
source. error type needs to be annotated on the source
side.

Fluency Inconsistency The text shows internal (1) A person is referred to with a masculine
inconsistency (not related to pronoun in one sentence and a feminine
Issues related to the form or terminology). pronoun in the next sentence. This would be a
content of translated text, major error. (2) An entity is referred to as

independent of its relation "Secretary of State" in one paragraph but as


to the source text; errors in "Minister of State" in the next. This would be a
the translated text that minor error, unless the context is one where
make it harder to both "Minister of State" and "Secretary of State"
understand. are valid technical terms with different
meanings, in which case it would be a major
error.

Grammar Issues related to the grammar or An English text reads "They goes together," or
syntax of the text, other than spelling "He could of fixed it." Both examples have
and orthography. jarring flaws that significantly degrade the text's
fluency and would justify a major severity.
However, it's possible that these sentence
constructs are present in a context where such
colloquial usage would not be out of place, and
in such contexts they may not be errors.

Register The content uses the wrong A formal invitation uses the German informal
grammatical register, such as using pronoun "du" instead of "Sie."
informal pronouns or verb forms
when their formal counterparts are
required.

Spelling Issues related to spelling or The French word "mer" (sea) is used instead of
capitalization of words, and the identically pronounced "maire" (mayor). This
whitespace.
example would merit a major severity, as the
meaning is substantially altered.

Text-Breaking Issues related to paragraph breaks Certain paragraph breaks are very important for
and line breaks. If a sentence ends establishing the proper flow of the text: for
with an incorrect or missing example, before and after a section heading or
paragraph break or line break, then a block-quote. If an important paragraph break
mark the last part of it (word or is completely missing (there is not even a line
punctuation) with this error type. break), then that is a major error, as it severely
degrades the quality of the text. If an
unwarranted paragraph break is seen in the
middle of a sentence, that is also a major error.
Most other errors of type "Fluency /
Text-Breaking" are usually minor errors.

Punctuation Punctuation is used incorrectly (for An English compound adjective appearing


the locale or style). before a noun is not hyphenated, as in "dog
friendly hotel." The reader can still grasp the
intent quite easily in this case, so this example
would have a minor severity.
Character encoding Characters are garbled due to "ハクサ�ス、ア" and "瓣в眏." See
incorrect application of an encoding. en.wikipedia.org/wiki/Mojibake for more.

Style Unnatural or The text is literal, written in an (1) A sentence is translated literally, which
awkward awkward style, unidiomatic or copies a style not used in the target language.
The text has stylistic inappropriate in the context. (2) A sentence is unnecessarily convoluted or

problems. too wordy, such as, "The lift traveled away from
the ground floor." This would be a minor error.
(3) Grammatically correct but slightly unnatural
sounding sentences such as “From where did
he come?” This would also be a minor error.

Bad sentence This error type is related to the Repetition: "Alexander had an idea. Alexander
structure arrangement of the sentence had a thought." This example would be a minor
structure. The marked span of text is error, unless the context dictates otherwise.
an unnecessary repetition, or it Long sentence: "The party, after blaming its
makes the sentence unnecessarily losses on poor leadership, that the spokesman
long, or it would have been better said could have paid more attention to the
expressed as a clause in the people's needs, split into two factions." This
previous sentence. sentence could have been split into multiple
sentences. Mergeable: "He gave him the money.
He accepted the reward." These two sentences
can be phrased better as a single sentence that
makes it clearer who accepted the reward. This
example is a minor error, without additional
contextual information.

Terminology Inappropriate for Translation does not adhere to "acide sulfurique" is translated to "acid of sulfur"
context appropriate or contains terminology instead of "sulfuric acid." This example would
A term (domain-specific that does not fit the context. have a minor severity level.
word) is translated with a
term other than the one
expected for the domain or
otherwise specified.

Inconsistent Terminology is used inconsistently The translation of a phone manual alternates


in the text. between the terms "front camera" and "front
lens." This example would have a minor severity
level.

Locale convention Address format Content uses the wrong format for "1600 Pennsylvania Ave" is translated to
addresses. Russian as "1600 Пенсильвания авеню"
The text does not adhere to instead of "Пенсильвания авеню 1600." This
locale-specific mechanical example would have a minor severity level.
conventions and violates
requirements for the
presentation of content in
the target locale.

Date format A text uses a date format The date "1969年1月6日" is shown as
inappropriate for its locale. "6/1/1969" (instead of "1/6/1969") and the
target locale can be clearly inferred to be U.S.
English. For this example, the severity level
would be major as the meaning of the date has
been significantly altered.

Currency format Content uses the wrong format for The dollar symbol is used as a suffix, as in
currency. "100$." This example would have a minor
severity level.

Telephone format Content uses the wrong form for An Indian phone number such as "xxxx-nnnnnn"
telephone numbers. is formatted as "(xxx) xnn-nnnn". This example
would have a minor severity level.
Time format Content uses the wrong form for Time is shown as "11.0" instead of "11:00" in a
time. language where the former is a mistake. This
example would have a minor severity level.

Name format Content uses the wrong form for The Chinese name (which lists surname first) "
name. 马琳" is translated as "Lin Ma" instead of "Ma
Lin". This example would also have a minor
severity level as it the reader can make out the
true intent quite easily.

Other

Any other issues (please


provide a short description
when prompted).
Non-translation! The translated sentence is completely unrelated
to the source sentence or is gibberish or is such
The sentence as a whole is a bad translation that there is virtually no part of
completely not a translation the meaning of the source that has been
of the source. This rare retained. Only available after choosing a major
category, when used, severity error.
overrides any other marked
errors for that sentence and
labels the full translated
sentence as the error span.

Source issue The source has meaning-altering typos or


omissions ("The was blue.") or is nonsensical
Any issue in the source. ("Th,jad jakh ;ih"). Note that even in the
presence of source issues, translation errors
should be annotated when possible. If some
part of the source is completely garbled, then
the corresponding translation need only be
checked for Fluency/Style errors.

You might also like