0% found this document useful (0 votes)
10 views

MODULE 5

The document discusses the challenges and strategies in machine translation (MT) for low-resource languages, highlighting issues such as limited parallel corpora and data quality. It outlines approaches like backtranslation and multilingual models to improve MT performance, while also addressing sociotechnical concerns, evaluation methods, and biases in MT systems. Ethical issues related to representation, quality of training data, and resource allocation are emphasized, particularly the impact on underrepresented languages.

Uploaded by

amthevibes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

MODULE 5

The document discusses the challenges and strategies in machine translation (MT) for low-resource languages, highlighting issues such as limited parallel corpora and data quality. It outlines approaches like backtranslation and multilingual models to improve MT performance, while also addressing sociotechnical concerns, evaluation methods, and biases in MT systems. Ethical issues related to representation, quality of training data, and resource allocation are emphasized, particularly the impact on underrepresented languages.

Uploaded by

amthevibes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

MODUL

E5
Machine Translation:
Translating in Low-Resource Situations, MT Evaluation, Bias and Ethical Issues.
Translating in Low-Resource Situations
Machine Translation (MT) relies heavily on large, high-quality parallel corpora—collections of
sentence pairs in two languages. However, many languages lack such resources, making them "low-
resource languages" in the context of MT.

Challenges in Low-Resource Machine Translation

1. Limited Parallel Corpora: Many languages, especially those spoken in low-income areas or less
widely used languages, do not have extensive parallel corpora—collections of text translated into
multiple languages—available for training MT systems. This scarcity makes it difficult to build
effective translation models because machine learning approaches generally require large datasets
to perform well.

2. Data Sparsity: Even high-resource languages can face challenges when translating into low-
resource domains, where very little data may be available. For instance, a particular genre or field
may not have a substantial amount of text to train on, leading to similar data sparsity issues.

3. Quality of Available Data: Quality concerns may arise not just from the quantity of data but also
from the nature of the data available. Many parallel corpora can contain incorrect translations,
boilerplate phrases, or repetitive sentences, especially if not enough native speakers were involved
in the content creation or quality checks.
Strategies for Addressing Low-Resource Situations

To tackle these issues, the PDF introduces two primary approaches commonly employed in low-resource
MT contexts:

1. Backtranslation:

• Definition: Backtranslation is a data augmentation technique that leverages monolingual corpora to


generate synthetic parallel data. It involves training a reverse-direction MT system to create
additional training data.

Mechanism:
• Start with a small parallel corpus (bitext) between the source and target languages.
• Train a target-to-source MT model using this bitext.
• Use the trained model to translate monolingual data available in the target language back to the
source language. This creates a synthetic bitext where natural sentences in the target language are
aligned to sentences generated by the MT model.
• This additional synthetic data is then combined with the existing parallel corpus and used to retrain
the original source-to-target MT model.

- Example: If there's a small bitext for translating Navajo to English, but there are plenty of English
sentences available, one could create a target-to-source model that translates these English sentences
2. Multilingual Models:

- Using multilingual models can help MT systems become more robust in low-resource settings. These
models can learn from multiple languages simultaneously, allowing for shared learning across related
languages, which can mitigate the issues of low data availability for any single language.

- By leveraging knowledge from high-resource languages, multilingual models can provide better
translation capabilities even for the low-resource languages they include.
Sociotechnical Issues
In the context of Translating in Low-Resource Situations, sociotechnical concerns highlight how
human, cultural, and organizational factors influence the quality and equity of MT systems.

1. Lack of Native Speaker Involvement:


One significant hurdle in developing MT systems for low-resource languages is the absence of
native speakers in critical roles such as content curation, data annotation, or evaluation of the
translation systems. This can compromise the quality and relevance of the training data.

2. Quality Concerns in Parallel Corpora:


Studies have suggested that many parallel corpora may contain substantial amounts of low-quality
data, often resulting
from a lack of involvement from native speakers. For instance, in one study, it was found that less
than 50% of the sentences
in many multilingual datasets were of acceptable quality, indicating the need for better data
governance and inclusion of
native perspectives in the dataset curation process.

3. Resource Allocation Bias:


Many MT systems historically focus on high-resource languages, particularly English.
This bias in resource allocation can lead to disparities in translation quality and availability, as
systems trained
Machine Translation (MT) Evaluation
1. Dimensions of MT Evaluation
MT is generally evaluated along two key dimensions:
- Adequacy: This measures how well the translation preserves the meaning of the source text. It's
sometimes referred to as "faithfulness" or "fidelity."
- Fluency: This assesses how natural and grammatically correct the translation is in the target language.
It evaluates whether the translation is clear, readable, and coherent.

2. Evaluation Methods
A. Human Evaluation
Human evaluation is considered the gold standard for MT assessment due to its higher accuracy
compared to automatic methods. Human raters evaluate translations based on fluency and adequacy,
typically using a scoring scale (e.g., 1 to 5 or 1 to 100) to rate various aspects:

- Fluency Rater Scale: Raters may score how intelligible, clear, readable, or natural the output is, using
a numerical scale where low scores denote poor quality and high scores denote high quality.
- Adequacy Rater Scale: Bilingual raters may be given both the source sentence and the proposed
target translation to score how much information from the source is preserved in the target translation.

Ranking Method: Alternatively, raters might be asked to rank candidate translations to determine
preferences between two or more outputs.
B. Statistical Methodology for Human Evaluation
Training human raters is crucial, as those without translation expertise may struggle to distinguish
between fluency and adequacy. A common practice includes:
- Removing outlier raters whose scores vary significantly from the group.
- Normalizing scores to ensure consistency across evaluations. Specifically, this involves subtracting the
mean from each rater's scores and dividing by the variance to standardize evaluations.

3. Automatic Evaluation Metrics


MT evaluation also utilizes various automatic metrics to provide efficiency and scalability in the
evaluation process. Below are key automatic metrics discussed:

A. BLEU (Bilingual Evaluation Understudy)


- BLEU measures the overlap of n-grams (sequences of words or characters) between machine
translations and references. It calculates precision at various n-gram levels and is widely used to
evaluate translations. The BLEU score includes:
- N-gram Precision: Calculate precision for unigrams, bigrams, trigrams, and up to 4-grams.
- Brevity Penalty: Short translations are penalized to encourage systems to produce longer outputs.
B. chrF

- chrF focuses on character n-gram overlap, solving some limitations of BLEU by allowing partial
matches and addressing morphological complexities.

Unlike BLEU (which is word-based and precision-oriented), chrF is:


• Character-based
• Uses F-score, which balances both precision and recall
C. BERTScore

BERTScore is a machine translation evaluation metric that leverages pre-trained contextual embeddings
from BERT (or similar transformer models) to compare the semantic similarity between the candidate
and reference translations. Unlike BLEU and chrF which rely on exact n-gram matches, BERTScore
captures semantic meaning.
Cosine Similarity
4. Statistical Significance Testing
When comparing the performance of two MT systems (e.g., system A and system B), it's vital to
determine if observed differences in their scores are statistically significant.
• A paired bootstrap test can be applied to assess whether the difference in scores is statistically
significant. This involves:
• Creating thousands of pseudo-test sets by randomly sampling with replacement from the original
test set.
• Computing the metric scores (e.g., BLEU, chrF, BERTScore) for each pseudo-test and
determining how frequently one system scores higher than the other.
• This test helps evaluate whether the difference in metrics reflects a true performance improvement or
5. Limitations of Automatic Metrics
just random variation in the test set.
• Sensitivity to Word Tokenization:
• BLEU's performance may deteriorate based on how words are tokenized, especially in
morphologically rich
languages.
• Local Evaluation:
• chrF and BLEU focus on n-gram overlap and are sensitive to local word or character
sequences but fail to capture sentence-level semantics, coherence, or global logical flow.
• Embedding Dependency (BERTScore):
• BERTScore addresses semantic similarity better by using contextual embeddings, but:
• It is computationally more expensive.
• Its results depend on the pretrained model (e.g., BERT-base vs. RoBERTa-large).
• It may overestimate performance when both systems make similar semantic errors.
• Lack of Interpretability:
• Unlike BLEU and chrF where n-gram matches can be directly observed, BERTScore relies on
Bias in Machine Translation
1. Gender Bias
• Machine Translation (MT) systems can perpetuate gender biases present in the training data.
• Example:
• When translating from gender-neutral languages (e.g., Hungarian) into English, MT often
assigns gendered pronouns based on stereotypes.
• A gender-neutral subject like "ő" (Hungarian) may be translated into “he” or “she” depending
on the profession mentioned in the sentence.
• For instance:
• "ő egy ápoló" → "She is a nurse"
• "ő egy vezérigazgató" → "He is a CEO"
• These outputs reflect societal stereotypes and bias in profession-gender associations.
• The issue becomes significant in domains like job applications or media translations, where the
wrong gender assumption can propagate discriminatory ideas.
2. Cultural Stereotypes
• MT systems may reflect and amplify cultural stereotypes because:
• The training data often over-represents dominant cultures.
• Low-diversity datasets fail to capture minority voices or culturally nuanced expressions.
• Bias may appear in:
• Translation of religious, ethnic, or geopolitical content.
• Interpretation of idioms, metaphors, or emotionally charged expressions in a way that
misrepresents the source culture.
Ethical Issues in Machine Translation

1. Representation and Involvement


Native speakers of low-resource languages are often underrepresented in MT development.
Consequences:
Poor-quality translations.
Reduced trust and usability.
Studies have found that some multilingual datasets contain low-quality content due to:
Lack of native speaker input.
Automated or crowd-sourced translations without verification.
2. Quality of Training Data
Many parallel corpora (used for training MT systems) contain:
Repetitive phrases
Incorrect translations
Noisy or synthetic data
This leads to:
Systematic translation errors.
Propagation of low-quality or misleading translations, especially for underrepresented
languages.
Ethical concern: reliability and trust in the system’s output.
2. Resource Allocation
MT research is skewed toward high-resource languages (e.g., English, French, Chinese).
Low-income or minority language communities receive less attention and funding.
Consequences:
Lack of MT support for many global languages.
Widening of the digital divide.
Inaccessibility of information, education, or services for large populations.
Ethical concern: fairness in access to technology.
THANK YOU

You might also like