0% found this document useful (0 votes)

11 views16 pages

Gayuuu_NLP[1]

The document is a mini project report on 'Multilingual Translation' developed using Natural Language Processing (NLP) techniques by a group of students under the guidance of an assistant professor. It outlines the project's objectives, which include automating text tokenization and translation to improve communication across language barriers, and describes the system's architecture, including language detection, tokenization, and translation using the MarianMT model. The report concludes that the automated system enhances efficiency and accuracy in multilingual communication and suggests future enhancements for broader applications.

Uploaded by

shritamghosh003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views16 pages

Gayuuu_NLP[1]

Uploaded by

shritamghosh003

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

MULTILINGUAL TRANSLATION

MINI PROJECT REPORT

for
21CSE356T - NATURAL LANGUAGE PROCESSING

Submitted by

SHRESHTHA SRIVASTAVA [RA2211003010872]

KALP AGARWAL [RA2211003010879]
GORANTLA GAYATRI [RA2211003010880]
P.VIVEKANANDA REDDY[RA2211003010882]
KESARLA ABHIRAM[RA2211003010908]

Under the Guidance of

Mr.S.Prabu
(Assistant Professor, Department of Computing Technologies)

In partial fulfillment of the requirements for the degree of

BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE AND ENGINEERING

DEPARTMENT OF COMPUTING TECHNOLOGIES

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
KATTANKULATHUR- 603 203
MAY 2025
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
KATTANKULATHUR – 603 203

BONAFIDE CERTIFICATE

Certified that Natural Language Processing Mini Project report titled

“MULTILINGUAL TRANSLATION” is the bonafide work of
“SHRESHTH SRIVASTAVA”[RA2211003010872], “KALP AGRAWAL”
[RA2211003010872], “GORANTLA GAYATRI” [RA2211003010880],
“PANNURU VIVEKANANDA REDDY”[RA2211003010882],
“KESARLA ABHIRAM”[RA2211003010908] carried out the project work
under my supervision. Certified further, that to the best of my knowledge the
work reported herein does not form any other work

Mr.S.PRABU Dr. NIRANJANA G

Guide
Professor & Head
Assistant Professor
Dept. of Computing Technologies Dept. of Computing Technologies
INDEX

CHAPTER NO TITLE PageNo

1 OBJECTIVE 1

2 ABSTRACT 2

3 INTRODUCTION 3

HARDWARE & SOFTWARE

4 5
REQUIREMENTS

5 CONCEPTS WORKING PRINCIPLE 6

6 PROGRAM 8

7 OUTPUT 15

8 CONCLUSIONS 19

9 REFERENCES 20
MULTILINGUAL TRANSLATION

OBJECTIVE

The primary objective of this project is to develop an automated system that leverages
Natural Language Processing (NLP) techniques to streamline text tokenization and
multilingual translation. The growing demand for efficient communication across language
barriers presents a significant challenge, particularly in industries where time-sensitive and
accurate translation is essential. Traditional translation methods, especially manual
approaches, often face limitations such as slow turnaround times, high error rates, and
difficulty handling large volumes of data. These challenges become even more complex
when dealing with multiple languages that require constant manual expertise.

This project aims to address these challenges by automating key language processing tasks.
Automating text tokenization and translation will reduce the time and effort involved in
manual translation while improving the accuracy and scalability of the process. Tokenization
is the first step in text processing, where text is broken down into smaller components like
words or phrases. This is essential for understanding the structure of the text and preparing
it for the translation phase. In this project, the system uses spaCy, a powerful NLP library,
to handle tokenization, segmenting the text into manageable units efficiently.

For multilingual translation, the system integrates the MarianMT model from HuggingFace,
a state-of-the-art machine translation model that supports numerous language pairs. By
leveraging deep learning algorithms, this model can translate text accurately and efficiently
between different languages, providing a reliable alternative to traditional methods that are
often slow and prone to errors. The system's ability to automatically translate across multiple
languages makes it scalable and well-suited for a range of applications, from business
communications to personal use.
ABSTRACT

In today’s interconnected and globalized world, language barriers continue to impede

effective communication, limiting interactions between people from different linguistic
backgrounds. The demand for efficient and reliable translation systems is greater than ever,
driven by the growth of international business, scientific collaborations, online education,
and multicultural interactions. Traditional translation methods often rely heavily on manual
effort, making them time-consuming and prone to human error. These challenges are further
exacerbated when large volumes of data need to be processed in a timely manner.

This project addresses these issues by developing an automated Natural Language

Processing (NLP) system that integrates two critical components: text tokenization and
multilingual translation. By utilizing advanced NLP techniques, the system automates key
stages of language processing, enabling faster and more accurate text handling and
translation. The system uses spaCy for text tokenization, breaking down the input text into
smaller units such as words or phrases, which is essential for understanding the structure of
the text and preparing it for the translation phase.

For translation, the system integrates the MarianMT model from HuggingFace, an AI-
powered machine translation model that uses deep learning to perform highly accurate
multilingual translations. The use of MarianMT reduces the time required for translation and
minimizes errors that often occur in human-driven processes. Unlike traditional manual
methods, this approach is more scalable and can easily handle large datasets, making it an
ideal solution for businesses, academic institutions, and multinational organizations. This
system demonstrates how automated translation through NLP can significantly improve
communication across different languages, reducing the barriers to global interaction.
INTRODUCTION:

Language serves as a foundational medium for communication, but when people speak
different languages, effective interaction can become a significant barrier. Manual
translation methods, while useful in limited contexts, are often time-consuming,
inconsistent, and impractical for large-scale or real-time applications. This challenge has led
to the adoption of Natural Language Processing (NLP) as a powerful approach to automate
language-related tasks.

This project, “Multilingual Translation using NLP,” addresses these challenges by

building an end-to-end pipeline for translating text across multiple languages with
automation and accuracy. The solution integrates several modern NLP techniques to deliver
a seamless translation experience. The core components of the system include:

1. Language Detection: Before translation, the input language is automatically detected

using the langdetect library to ensure the appropriate translation model is applied.

2. Text Tokenization: The input text is preprocessed and tokenized using spaCy, a robust
NLP library. Tokenization divides text into meaningful units, such as words or phrases,
preparing it for accurate analysis and translation.

3. Multilingual Translation: Translation is performed using the MarianMT models from

HuggingFace Transformers. These pretrained models support multiple language pairs
and use English as an intermediate language for indirect translation paths.

4. Sentiment Analysis (Optional): The system can also assess the sentiment of English
text using TextBlob, providing additional contextual understanding beyond literal
translation.

By automating each of these stages, the project demonstrates how NLP technologies can be
used to efficiently bridge language gaps and enhance communication in multilingual
environments. It offers a scalable and real-time solution suitable for personal, academic, or
business applications.
HARDWARE/SOFTWARE REQUIREMENTS:

Hardware Requirements:

• A computer or cloud-based platform capable of running Python and accessing

required external libraries and models.
• Minimum of 4GB RAM to run the NLP models effectively.
• A stable internet connection to interact with cloud-based models.

Software Requirements:

• Python: The primary programming language for implementing NLP processing and
integrating the various tools and models.
• spaCy: An industrial-strength NLP library used for tokenization, part-of-speech
tagging, and entity recognition.
• TextBlob: A lightweight library for performing sentiment analysis on English text.
• langdetect: Used to automatically detect the language of the input text before
initiating translation.
• HuggingFace Transformers: Specifically, MarianMT pretrained models are used for
translating text between multiple languages with English as an intermediate.
• Gradio: Used to build a web-based graphical user interface, enabling real-time user
interaction and showcasing the system's capabilities in a simple and accessible format.
• Development Environment: Google Colab (for cloud-based development) or any
local IDE such as VS Code or PyCharm for writing and testing code.
CONCEPTS/WORKING PRINCIPLE

The system operates through a multi-stage pipeline involving language detection,

translation via intermediate English, and natural language processing (NLP) analysis. The
following outlines the detailed working principles:

1. Language Detection

The input text is first analyzed using the langdetect library to identify the source language.
This enables the system to handle diverse input languages dynamically, without requiring
the user to specify the language manually.

Example:

Input: "Bonjour, comment ça va?"

Detected Language: fr (French)

2. Translation via English Intermediate

• The system uses MarianMT models from Helsinki-NLP (hosted on HuggingFace)

to perform translations.
• If the input language is not English, the text is first translated to English using a
source-to-English MarianMT model.
• Then, the English text is translated to the selected target language using an English-
to-target MarianMT model.
• This intermediate English step ensures better coverage and translation accuracy,
even for language pairs that do not have direct translation models.

Example:
Input: "Wie heißt du?" (German)
Intermediate English: "What is your name?"
Final Output (French): "Comment vous appelez-vous ?"
3. Natural Language Processing (NLP) Analysis

Once the English version of the text is obtained (either directly or via translation), it is
processed using the spaCy and TextBlob libraries to extract detailed linguistic insights.

Tokenization:

The text is split into individual words or tokens for further processing.

Example: "Hello world" → ['Hello', 'world']

Part-of-Speech (POS) Tagging:

Each token is labeled with its grammatical role such as noun, verb, adjective, etc.

Dependency Parsing:

The syntactic structure of the sentence is analyzed by identifying dependencies between

tokens, such as subject–verb or object–verb relationships.

Named Entity Recognition (NER):

Entities such as names of people, places, organizations, and dates are identified from the
text.

Sentiment Analysis:

The polarity of the sentence is computed using TextBlob. Based on the polarity score, the
sentence is classified as Positive, Neutral, or Negative.

4. End-to-End Automation

The system integrates all the above modules in a single interface using Gradio. It provides
a user-friendly interface where users can input any sentence, select the target language, and
receive both the translated output and comprehensive NLP insights automatically
Fig. 5.1: NLP Translation Pipeline - Stages from Data Preprocessing to Evaluation
APPROACH/METHODOLOGY/PROGRAMS:
OUTPUT:
CONCLUSIONS:

This project demonstrates the practical application of Natural Language Processing (NLP)
techniques to build an automated multilingual translation system. By leveraging a modular
pipeline consisting of language detection, intermediate translation, and sentiment analysis, the
system addresses the limitations of manual translation workflows and improves efficiency,
scalability, and accuracy in multilingual communication.

The system utilizes the langdetect library to automatically identify the source language,
enabling seamless support for various input languages. Translation is performed in two stages
using pretrained MarianMT models from HuggingFace, with English serving as the
intermediate language. This ensures greater consistency and compatibility between diverse
language pairs. For analyzing the sentiment of English text, the project integrates the TextBlob
library, which allows classification of the emotional tone before producing the final output in
the target language. Additionally, spaCy is used for linguistic tokenization, named entity
recognition, and part-of-speech tagging to facilitate deeper language understanding.

The entire process is executed within a Gradio-based web interface, enabling real-time
interaction and accessibility without requiring local installations. This end-to-end pipeline—
from language detection and translation to sentiment analysis—provides a robust and
automated approach for cross-lingual communication. The system not only reduces manual
effort but also enhances reliability when dealing with large or dynamic datasets.

Looking forward, the system could be extended to incorporate more complex NLP tasks such
as contextual emotion analysis, multilingual summarization, and real-time chatbot interactions.
Such enhancements would broaden its applicability across domains like customer service,
healthcare, education, and global collaboration. Overall, the project lays a strong foundation
for future advancements in intelligent, language-aware applications.
REFERENCES:

• NLTK – Natural Language Toolkit. Available at: https://www.nltk.org/

• spaCy – Industrial-Strength Natural Language Processing in Python. Available

at:https://spacy.io/

• TextBlob – Simplified Text Processing. Available at:

https://textblob.readthedocs.io/

• HuggingFace Transformers – MarianMT Models by Helsinki-NLP. Available

at: https://huggingface.co/Helsinki-NLP

• Langdetect – Port of Google's Language Detection Library. Available at:

https://pypi.org/project/langdetect/

Solution Manual for Fundamentals of Communication Systems, 2/E J G. Proakis, M Salehi - Download Now And Never Miss A Chapter
100% (7)
Solution Manual for Fundamentals of Communication Systems, 2/E J G. Proakis, M Salehi - Download Now And Never Miss A Chapter
44 pages
AAA079 APEX R2000 SM SN 001 To 378 Iss 5, 2017
100% (2)
AAA079 APEX R2000 SM SN 001 To 378 Iss 5, 2017
245 pages
A Survey of Techniques For Maximizing LLM Performance
100% (1)
A Survey of Techniques For Maximizing LLM Performance
40 pages
A.J. Aubrey & Meriden Fire Arms Shotguns
No ratings yet
A.J. Aubrey & Meriden Fire Arms Shotguns
22 pages
NAMED ENTITY RECOGNITION REPORTT (1)
No ratings yet
NAMED ENTITY RECOGNITION REPORTT (1)
38 pages
AI Powered Language Translation
No ratings yet
AI Powered Language Translation
2 pages
Language Mediator Ppt
No ratings yet
Language Mediator Ppt
11 pages
GroupNo. 02 SemIV MiniProject1B
No ratings yet
GroupNo. 02 SemIV MiniProject1B
14 pages
VAISHNAVI_PAPER
No ratings yet
VAISHNAVI_PAPER
5 pages
Ai Multilanguage Translator
No ratings yet
Ai Multilanguage Translator
3 pages
PHASE 1 PROJECT
No ratings yet
PHASE 1 PROJECT
18 pages
Mini_Project _Report1final (2)
No ratings yet
Mini_Project _Report1final (2)
46 pages
Translator
No ratings yet
Translator
60 pages
Transforming Communication
No ratings yet
Transforming Communication
12 pages
Report-Sakshi Rastogi 149105266
No ratings yet
Report-Sakshi Rastogi 149105266
47 pages
Phase 2 - AI Powered Language Translation
No ratings yet
Phase 2 - AI Powered Language Translation
7 pages
JETIR1806940
No ratings yet
JETIR1806940
12 pages
tanujasynopsis
No ratings yet
tanujasynopsis
8 pages
Language Translation
No ratings yet
Language Translation
4 pages
Document from Rakshi??
No ratings yet
Document from Rakshi??
8 pages
qr attadence project report (2)
No ratings yet
qr attadence project report (2)
14 pages
Jatin
No ratings yet
Jatin
47 pages
Phase 2 - Word File
No ratings yet
Phase 2 - Word File
6 pages
Python Microproject
No ratings yet
Python Microproject
27 pages
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
From Everand
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
Robert Johnson
No ratings yet
REAL-TIME LANGUAGE TRANSLATION USING TRANSFORMER MODELS IN PYTHON
No ratings yet
REAL-TIME LANGUAGE TRANSLATION USING TRANSFORMER MODELS IN PYTHON
5 pages
Project Report
No ratings yet
Project Report
20 pages
Natural Language Processing For Language Translation
No ratings yet
Natural Language Processing For Language Translation
23 pages
Final Translator
No ratings yet
Final Translator
9 pages
Language Translator p1 (1)
No ratings yet
Language Translator p1 (1)
11 pages
PBL Presentation Lang Trans Seetu (1) - 1
No ratings yet
PBL Presentation Lang Trans Seetu (1) - 1
16 pages
real time voice translator
No ratings yet
real time voice translator
28 pages
5 Nm
No ratings yet
5 Nm
18 pages
S&JAGAN Phase 3
No ratings yet
S&JAGAN Phase 3
16 pages
B15 P1 (1)
No ratings yet
B15 P1 (1)
15 pages
AIML Project Report
No ratings yet
AIML Project Report
19 pages
It Se Synopsis Sample
No ratings yet
It Se Synopsis Sample
21 pages
Btech english
No ratings yet
Btech english
36 pages
Proposal 21-CS-441 SE LAB
No ratings yet
Proposal 21-CS-441 SE LAB
7 pages
Machine Translation Thesis PDF
100% (3)
Machine Translation Thesis PDF
8 pages
rest 30598
No ratings yet
rest 30598
60 pages
Algorithms Made Simple: Understanding the Building Blocks of Software
From Everand
Algorithms Made Simple: Understanding the Building Blocks of Software
William E. Clark
No ratings yet
Minor1 1
No ratings yet
Minor1 1
21 pages
82-338-4-PB
No ratings yet
82-338-4-PB
9 pages
Language Translation
No ratings yet
Language Translation
6 pages
Project Report on Natural Language Processing
No ratings yet
Project Report on Natural Language Processing
4 pages
App PRJ
No ratings yet
App PRJ
11 pages
Idk
No ratings yet
Idk
1 page
Group 7 453
No ratings yet
Group 7 453
52 pages
ai1
No ratings yet
ai1
2 pages
Paper 14038
No ratings yet
Paper 14038
4 pages
project
No ratings yet
project
8 pages
Assembly Language: From Basics to Expert Proficiency
From Everand
Assembly Language: From Basics to Expert Proficiency
William Smith
No ratings yet
Code, Bytes, Algorithms, And Innovation: Software & Engineering
From Everand
Code, Bytes, Algorithms, And Innovation: Software & Engineering
Tobi Makinde
No ratings yet
Language Translator
100% (1)
Language Translator
13 pages
jjkk
No ratings yet
jjkk
7 pages
Hindi To English Machine Translation
No ratings yet
Hindi To English Machine Translation
4 pages
Machine Translation Dissertation
100% (2)
Machine Translation Dissertation
6 pages
Programming Best Practices for New Developers: A Practical Guide with Examples
From Everand
Programming Best Practices for New Developers: A Practical Guide with Examples
William E. Clark
No ratings yet
s11831-020-09449-7
No ratings yet
s11831-020-09449-7
29 pages
IJSRET V10 Issue3 125
No ratings yet
IJSRET V10 Issue3 125
3 pages
FinalPPT
No ratings yet
FinalPPT
21 pages
Major_synopsis_Sheeri][1]
No ratings yet
Major_synopsis_Sheeri][1]
9 pages
Programming the Future: A Guide to Scripting Languages
From Everand
Programming the Future: A Guide to Scripting Languages
Pasquale De Marco
No ratings yet
Date: 22 Jan 2020: Nagaraj 793/9 Mathru Krupa Jayanagara A Block Davangere
No ratings yet
Date: 22 Jan 2020: Nagaraj 793/9 Mathru Krupa Jayanagara A Block Davangere
4 pages
Thumb Rules For Sugar Industry Design Calculaions
100% (6)
Thumb Rules For Sugar Industry Design Calculaions
4 pages
Medical Device Reprocessing Technician
No ratings yet
Medical Device Reprocessing Technician
4 pages
Axle Load Kawasaki
No ratings yet
Axle Load Kawasaki
24 pages
VHDL Cheat Sheet PDF
No ratings yet
VHDL Cheat Sheet PDF
1 page
Fan Motor Basic Parts
No ratings yet
Fan Motor Basic Parts
7 pages
Luks On Disk Format
No ratings yet
Luks On Disk Format
13 pages
Ethiopia - 72066320R10024 - Strategic Info Specialist Oct 2020
No ratings yet
Ethiopia - 72066320R10024 - Strategic Info Specialist Oct 2020
8 pages
Current Electricity 8M Questions
No ratings yet
Current Electricity 8M Questions
32 pages
E-8 Marine Communication System
No ratings yet
E-8 Marine Communication System
113 pages
NUMBERS PRACTICE
No ratings yet
NUMBERS PRACTICE
4 pages
C++ User'S Guide: Forte Developer 6 Update 2 (Sun Workshop 6 Update 2)
No ratings yet
C++ User'S Guide: Forte Developer 6 Update 2 (Sun Workshop 6 Update 2)
384 pages
05 50 00 Ic
No ratings yet
05 50 00 Ic
44 pages
Xbox
No ratings yet
Xbox
13 pages
Project (Lap Tech)
No ratings yet
Project (Lap Tech)
216 pages
LTE Freq Hopping Jammer
No ratings yet
LTE Freq Hopping Jammer
69 pages
9 BASIC ICT SKILLS-1
No ratings yet
9 BASIC ICT SKILLS-1
8 pages
VirusTotal - URL - Fad4ff177b644cd569f3d89e449432
No ratings yet
VirusTotal - URL - Fad4ff177b644cd569f3d89e449432
1 page
Real Time Face Mask Detection and Thermal Screening With Audio Response For COVID-19
No ratings yet
Real Time Face Mask Detection and Thermal Screening With Audio Response For COVID-19
12 pages
The Method and Metric of User Experience Evaluation- A Systematic Literature Review
No ratings yet
The Method and Metric of User Experience Evaluation- A Systematic Literature Review
11 pages
09sn70 Sputtering System Op - Maual - Eng
No ratings yet
09sn70 Sputtering System Op - Maual - Eng
49 pages
Technical Manual FOB4 TS Section 3
No ratings yet
Technical Manual FOB4 TS Section 3
54 pages
Approved Hydraulic Fluids For Putzmeister Concrete Pumps
No ratings yet
Approved Hydraulic Fluids For Putzmeister Concrete Pumps
1 page
BAC Resolution Covid Supply
No ratings yet
BAC Resolution Covid Supply
2 pages
Infinite Insight: ABB Ability E-Mesh
No ratings yet
Infinite Insight: ABB Ability E-Mesh
5 pages
The University of Lahore, Islamabad Campus Course: Power System Protection Lab Work Sheet 14
No ratings yet
The University of Lahore, Islamabad Campus Course: Power System Protection Lab Work Sheet 14
6 pages

Gayuuu_NLP[1]

Uploaded by

Gayuuu_NLP[1]

Uploaded by

MULTILINGUAL TRANSLATION

MINI PROJECT REPORT

SHRESHTHA SRIVASTAVA [RA2211003010872]

Under the Guidance of

In partial fulfillment of the requirements for the degree of

DEPARTMENT OF COMPUTING TECHNOLOGIES

Certified that Natural Language Processing Mini Project report titled

Mr.S.PRABU Dr. NIRANJANA G

CHAPTER NO TITLE PageNo

HARDWARE & SOFTWARE

5 CONCEPTS WORKING PRINCIPLE 6

In today’s interconnected and globalized world, language barriers continue to impede

This project addresses these issues by developing an automated Natural Language

This project, “Multilingual Translation using NLP,” addresses these challenges by

1. Language Detection: Before translation, the input language is automatically detected

3. Multilingual Translation: Translation is performed using the MarianMT models from

• A computer or cloud-based platform capable of running Python and accessing

The system operates through a multi-stage pipeline involving language detection,

Input: "Bonjour, comment ça va?"

Detected Language: fr (French)

2. Translation via English Intermediate

• The system uses MarianMT models from Helsinki-NLP (hosted on HuggingFace)

Example: "Hello world" → ['Hello', 'world']

Part-of-Speech (POS) Tagging:

The syntactic structure of the sentence is analyzed by identifying dependencies between

Named Entity Recognition (NER):

• NLTK – Natural Language Toolkit. Available at: https://www.nltk.org/

• spaCy – Industrial-Strength Natural Language Processing in Python. Available

• TextBlob – Simplified Text Processing. Available at:

• HuggingFace Transformers – MarianMT Models by Helsinki-NLP. Available

• Langdetect – Port of Google's Language Detection Library. Available at:

You might also like