0% found this document useful (0 votes)

12 views21 pages

FinalPPT

Uploaded by

rayhalcomet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views21 pages

FinalPPT

Uploaded by

rayhalcomet

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

PolyglotCAM

Under the Guidance of Presented By

Dr. Meenakshi Sundaram Rayna Halley R 1NH21AI085
Professor Shoailuddin 1NH21AI096
Department of AI & ML
NHCE

21AIM73-Major Project
Outline

• Introduction
• Objective
• Literature survey / Existing systems
• Limitations of Existing Systems
• Proposed system
• System Design
• Tools Used
• Algorithm Details
• How the Algorithm works
• Problem Definition
• Result
• Conclusion
• Future enhancements

21AIM73-Major Project
Introduction

This system is designed to perform Optical Character Recognition (OCR) on images and translate the extracted text into multiple
languages, enabling seamless cross-language communication. It utilizes Tesseract OCR for accurate text extraction and the deep-translator
library for efficient translation, all integrated into a user-friendly interface built with Gradio. The application supports a wide range of
languages, including major Indian languages like Hindi, Tamil, Telugu, Kannada, and more, as well as English, making it versatile and
accessible. By combining powerful OCR and translation technologies, this system aims to bridge language barriers and provide a
convenient solution for text translation across various mediums.

21AIM73-Major Project
Objectives

•To capture images or upload existing files for text extraction.

•To extract text from images using advanced OCR tools like Tesseract.
•To translate the extracted text into a user-specified target language.
•To integrate translation and text extraction seamlessly in a single platform.
•To provide a simple and intuitive interface for uploading images, selecting languages, and viewing
translation results.

21AIM73-Major Project
Literature Survey
Title, Author, Journal, Year, DOI Methodology Problems Identified
Real-time Neural Machine Translation Explored low-latency translations using on- Trade-off between translation speed and
Ma et al., ACL, 2019, [10.18653/v1/P19- the-fly decoding and model optimization accuracy.
1011](https://doi.org/10.18653/v1/P19-1011)

Multilingual Translation with Extensible Developed multilingual models using large- Managing resource allocation for multiple
Multilingual Pretraining scale pretraining. languages.
Conneau et al., ACL, 2020,
[10.18653/v1/2020.acl-main.303](https://doi.
org/10.18653/v1/2020.acl-main.303)

Dynamic Convolution for Efficient Real-time Proposed dynamic convolutions as an Balancing efficiency and model complexity.
Translation efficient alternative to self-attention.
Wu et al., ICLR, 2020,
[10.48550/arXiv.1912.04053](https://doi.org/
10.48550/arXiv.1912.04053)

Simultaneous Translation with Segment-based Introduced segment-based approach for Managing segmentation errors and latency.
Consistency maintaining consistency in real-time outputs.
Zheng et al., EMNLP, 2020,
[10.18653/v1/2020.emnlp-main.24](https://do
i.org/10.18653/v1/2020.emnlp-main.24)

21AIM73-Major Project
Literature Survey

Title, Author, Journal, Year, DOI Methodology Problems Identified

Lightweight and Efficient Neural Machine Developed lightweight NMT models for Balancing model size and translation quality.
Translation resource-constrained devices.
Kasai et al., ACL, 2021,
[10.18653/v1/2021.acl-long.141](https://doi.o
rg/10.18653/v1/2021.acl-long.141)

Real-time Adaptive Machine Translation Examined adaptive methods for real-time Complexity in dynamic model adaptation.
Aharoni et al., ACL, 2020, NMT, allowing quick adaptation to new
[10.18653/v1/2020.acl-main.313](https://doi. languages and domains.
org/10.18653/v1/2020.acl-main.313)

SimulMT: A Toolkit for Simultaneous Neural Introduced SimulMT toolkit for simultaneous Difficulty in evaluating real-time
Machine Translation translation systems development. performance.
Ma et al., ACL, 2020, [10.18653/v1/2020.acl-
demo.6](https://doi.org/10.18653/v1/2020.acl
-demo.6)

Direct Speech-to-Text Translation with Discussed direct speech-to-text translation Handling noisy input and varied speech
Transformer using Transformer models. patterns.
Jia et al., Interspeech, 2019,
[10.21437/Interspeech.2019-2212](https://doi.
org/10.21437/Interspeech.2019-2212)

21AIM73-Major Project
Literature Survey
Title, Author, Journal, Year, DOI Methodology Problems Identified
Fast and Accurate Neural Machine Proposed low-rank attention mechanisms to Maintaining translation quality with reduced
Translation with Low-Rank Attention speed up translation while maintaining high computational resources.
Li et al., ACL, 2021, [10.18653/v1/2021.acl- accuracy.
long.220](https://doi.org/10.18653/v1/2021.a
cl-long.220)

Self-attention with Relative Position Enhanced the Transformer model by Increased model complexity and training
Representations incorporating relative position time.
Shaw et al., NAACL, 2019, representations.
[10.18653/v1/N19-1154](https://doi.org/10.18
653/v1/N19-1154)

Scaling Neural Machine Translation Explored methods to scale NMT models to Challenges in managing memory and
Ott et al., EMNLP, 2018, [10.18653/v1/D18- handle very large datasets. computational resources.
1322](https://doi.org/10.18653/v1/D18-1322)

Monotonic Infinite Lookback Attention for Introduced a novel attention mechanism for Balancing latency and translation accuracy.
Simultaneous Machine Translation simultaneous translation.
Arivazhagan et al., ACL, 2019,
[10.18653/v1/P19-1289](https://doi.org/10.18
653/v1/P19-1289)

21AIM73-Major Project
Literature Survey
Title, Author, Journal, Year, DOI Methodology Problems Identified
Understanding Back-Translation at Scale Investigated the effects of back-translation on Handling noise in synthetic data and ensuring
Edunov et al., EMNLP, 2018, NMT performance. quality.
[10.18653/v1/D18-1365](https://doi.org/10.18
653/v1/D18-1365)

Reducing Transformer Depth on Demand Proposed a method to dynamically adjust Maintaining performance with reduced model
with Structured Dropout Transformer depth during training. depth.
Fan et al., ICLR, 2020,
[10.48550/arXiv.1909.11556](https://doi.org/
10.48550/arXiv.1909.11556)

Pre-trained Models for Natural Language Reviewed various pre-trained models and Challenges in adapting pre-trained models to
Processing: A Survey their applications in NLP tasks. specific tasks and languages.
Qiu et al., AI Open, 2020,
[10.1016/j.aiopen.2021.01.001](https://doi.or
g/10.1016/j.aiopen.2021.01.001)

Understanding Back-Translation at Scale Investigated the effects of back-translation on Handling noise in synthetic data and ensuring
Edunov et al., EMNLP, 2018, NMT performance. quality.
[10.18653/v1/D18-1365](https://doi.org/10.18
653/v1/D18-1365)

21AIM73-Major Project
Existing system

Google Translate App

Google Translate app allows users to translate text by typing, speaking, or using their camera. It supports real-time translation for numerous languages.

Microsoft Translator
Microsoft Translator provides text, voice, and image translation features, leveraging AI to handle various languages and complex text recognition.

Waygo

Waygo specializes in translating text from Chinese, Japanese, and Korean to English using real-time camera input.

iTranslate
iTranslate offers text, voice, and camera translation features in multiple languages. It uses advanced algorithms to provide real-time translations and includes a dictionary
and phrasebook for enhanced communication.

21AIM73-Major Project
Limitations of Existing Systems

• Limited Language Support: Not all languages are supported equally, and dialect variations can pose challenges.
• Accuracy Issues: OCR and translation accuracy can be affected by poor image quality, complex fonts, or handwriting.
• Processing Speed: Some applications may have noticeable delays between capturing the image and displaying the translated text.
• User Interface Complexity: Some existing systems have interfaces that are not intuitive or user-friendly, leading to a steeper
learning curve for new users.

21AIM73-Major Project
Proposed system

•Users can upload images containing text for extraction and translation.
•Enhances the uploaded images by converting them to grayscale, correcting orientation, and
improving text visibility.
•Uses Tesseract OCR to extract text accurately from the uploaded images.
•Processes the extracted text through deep-translator to translate it into the user-selected target
language.
•Offers an intuitive Gradio-based interface for easy image uploads, language selection, and
viewing results.
•Supports multiple regional and international languages for both OCR and translation.
•Provides an online platform with a public link for easy access and usage.

21AIM73-Major Project
System Design

21AIM73-Major Project
Tools used

•OCR: Tesseract
•Translation: deep-translator (Google Translate API)
•UI: Gradio
•Programming: Python
•Image Processing: PIL (Python Imaging Library)
•Environment: Google Colab

21AIM73-Major Project
Algorithm Details

•Integration of OCR and Translation Algorithms: The system combines Tesseract OCR for text extraction from
images and deep-translator for multilingual translation, creating a seamless end-to-end text processing pipeline.
•Dynamic Language Mapping: Input languages are dynamically mapped to Tesseract’s predefined language codes
for OCR, ensuring compatibility and accurate text extraction for regional and international languages.
•Automated Translation Workflow: Extracted text is automatically processed by deep-translator using advanced
neural machine translation APIs (e.g., Google Translate), enabling precise and efficient language conversion.
•Optimized User Interaction via Gradio: The Gradio interface streamlines user interaction, integrating image
uploads, language selection, and result display into a single intuitive platform.
•Enhanced Text Detection with Pre-Processing: Pre-processing techniques like grayscale conversion and orientation
correction improve OCR accuracy, ensuring robust text recognition in varying image conditions.

21AIM73-Major Project
How the Algorithm Works and Why It Is Unique

•Combines OCR and translation into a single automated pipeline, streamlining the entire process.
•Supports a wide range of regional and international languages, with a special focus on Indian languages.
•Intuitive, GUI-driven interface ensures accessibility for both technical and non-technical users.
•Eliminates the need for manual text input or complex language mapping, saving time and effort.
•Simplifies user interaction while providing a comprehensive solution for text extraction and translation in
one workflow.

21AIM73-Major Project
Problem Definition

• Language Barriers in Global Communication: In today’s globalized world, language barriers can significantly hinder
effective communication, especially for travelers, expatriates, and international business professionals.
• Real-Time Translation Challenges: Existing translation tools often require manual input, which can be cumbersome and
slow. Additionally, achieving accurate translation in real-time is difficult due to varying lighting conditions, different text
orientations, and complex backgrounds.
• Need for User-Friendly Solutions: There is a significant need for intuitive, real-time translation tools that can seamlessly
integrate into everyday life, making communication effortless.

21AIM73-Major Project
Result

21AIM73-Major Project
Conclusion

The proposed system provides a robust solution for text extraction and translation by combining OCR technology
with translation APIs. By processing images, detecting text, and translating it into user-specified languages, the
system effectively bridges language barriers. Its intuitive interface and multilingual support make it practical for
real-world applications, such as translating signs, documents, or other visual content. This solution is efficient,
user-friendly, and can be further enhanced to support additional languages, offline capabilities, and advanced pre-
processing techniques to meet evolving user requirements.

21AIM73-Major Project
Future Enhancement

•Offline Capabilities:
Integrate pre-trained OCR and translation models to enable offline functionality, reducing reliance on APIs.
•Real-Time Video Processing:
Extend functionality to process and translate text from live video feeds for dynamic use cases.
•Speech Integration:
Add speech-to-text capabilities to allow voice input and audio output for translations.
•Mobile Optimization:
Develop a mobile-friendly version to ensure usability and accessibility on smartphones and tablets.
•Domain-Specific Models:
Train and integrate specialized translation models for domains like legal, medical, or technical applications.

21AIM73-Major Project
Thank You

21AIM73-Major Project

(Routledge Studies in Translation Technology 2018 - 1) Chan, Sin-Wai - The Human Factor in Machine Translation-Routledge, Taylor & Francis Group (2018)
No ratings yet
(Routledge Studies in Translation Technology 2018 - 1) Chan, Sin-Wai - The Human Factor in Machine Translation-Routledge, Taylor & Francis Group (2018)
269 pages
Bowker, Lynne & Fisher, Des - Computer-Aided Translation
No ratings yet
Bowker, Lynne & Fisher, Des - Computer-Aided Translation
6 pages
Classes in Python
No ratings yet
Classes in Python
5 pages
PHASE 1 PROJECT
No ratings yet
PHASE 1 PROJECT
18 pages
tanujasynopsis
No ratings yet
tanujasynopsis
8 pages
REAL-TIME LANGUAGE TRANSLATION USING TRANSFORMER MODELS IN PYTHON
No ratings yet
REAL-TIME LANGUAGE TRANSLATION USING TRANSFORMER MODELS IN PYTHON
5 pages
Gayuuu_NLP[1]
No ratings yet
Gayuuu_NLP[1]
16 pages
Ai Final Print
No ratings yet
Ai Final Print
23 pages
NLP-UNIT-V
No ratings yet
NLP-UNIT-V
18 pages
Natural Language Processing
No ratings yet
Natural Language Processing
12 pages
ai2
No ratings yet
ai2
6 pages
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
From Everand
Hugging Face Transformers Essentials: From Fine-Tuning to Deployment
Robert Johnson
No ratings yet
THANK_YOU
No ratings yet
THANK_YOU
23 pages
LangGragh
No ratings yet
LangGragh
14 pages
Machine Learning in Translation (Peng Wang, David B. Sawyer) (Z-Library)
No ratings yet
Machine Learning in Translation (Peng Wang, David B. Sawyer) (Z-Library)
219 pages
real time voice translator
No ratings yet
real time voice translator
28 pages
Leeds 2006
No ratings yet
Leeds 2006
34 pages
Recent Advances in Dialogue Machine Translation
No ratings yet
Recent Advances in Dialogue Machine Translation
21 pages
FN Paper 2
No ratings yet
FN Paper 2
13 pages
Mandarin Translator Bro
No ratings yet
Mandarin Translator Bro
23 pages
PD LAB Batch-16
No ratings yet
PD LAB Batch-16
17 pages
Research On Computer Aided English Translation in
No ratings yet
Research On Computer Aided English Translation in
6 pages
Challenges in NMT - 2004.05809
No ratings yet
Challenges in NMT - 2004.05809
22 pages
ai1
No ratings yet
ai1
2 pages
Advanced Technical Analysis of Contemporary Translation Technologies
No ratings yet
Advanced Technical Analysis of Contemporary Translation Technologies
4 pages
English-to-Malayalam_Machine_Translation_Framework_using_Transformers
No ratings yet
English-to-Malayalam_Machine_Translation_Framework_using_Transformers
5 pages
Advanced Technical Exploration of Modern Translation Technologies
No ratings yet
Advanced Technical Exploration of Modern Translation Technologies
4 pages
Research On The Relations Between Machine Translation and Human Translation
No ratings yet
Research On The Relations Between Machine Translation and Human Translation
7 pages
Seamless:: Multilingual Expressive and Streaming Speech Translation
No ratings yet
Seamless:: Multilingual Expressive and Streaming Speech Translation
145 pages
2401.14559
No ratings yet
2401.14559
132 pages
RCSHPPR 22
No ratings yet
RCSHPPR 22
5 pages
electronics-14-00243
No ratings yet
electronics-14-00243
30 pages
Course Project and Term Paper Logistics
No ratings yet
Course Project and Term Paper Logistics
7 pages
Ai Text Generation
No ratings yet
Ai Text Generation
10 pages
VAISHNAVI_PAPER
No ratings yet
VAISHNAVI_PAPER
5 pages
ASWIN_TS_Unit_3_NLP_Translations_Gen_AI[1]
No ratings yet
ASWIN_TS_Unit_3_NLP_Translations_Gen_AI[1]
5 pages
Project phase 4 ibm[1] (1)
No ratings yet
Project phase 4 ibm[1] (1)
8 pages
English Presentation
No ratings yet
English Presentation
46 pages
Project Report
No ratings yet
Project Report
20 pages
Model-Driven Online Capacity Management for Component-Based Software Systems
From Everand
Model-Driven Online Capacity Management for Component-Based Software Systems
André van Hoorn
No ratings yet
Bangla To English Machine Translation
No ratings yet
Bangla To English Machine Translation
112 pages
NLP Unit-5
No ratings yet
NLP Unit-5
14 pages
LLM AI4Bharath
No ratings yet
LLM AI4Bharath
101 pages
Synopsis Project Phase 1[1]
No ratings yet
Synopsis Project Phase 1[1]
5 pages
Translator
No ratings yet
Translator
60 pages
MinakoOHaganedTheRoutledgeHandbook
No ratings yet
MinakoOHaganedTheRoutledgeHandbook
18 pages
Proposal PhamThaiNguyen 22560053
No ratings yet
Proposal PhamThaiNguyen 22560053
11 pages
Lecture 11
No ratings yet
Lecture 11
5 pages
S&JAGAN Phase 3
No ratings yet
S&JAGAN Phase 3
16 pages
Language Translator 1a (1)
No ratings yet
Language Translator 1a (1)
18 pages
Neural Machine Translation A Review of Methods Resources and - 2020 - AI Ope
No ratings yet
Neural Machine Translation A Review of Methods Resources and - 2020 - AI Ope
17 pages
AIML Project Report
No ratings yet
AIML Project Report
19 pages
Voice_Translator_Research_paper(27-10-24) (1)
No ratings yet
Voice_Translator_Research_paper(27-10-24) (1)
15 pages
Proceedings of The First Workshop On Hum PDF
100% (1)
Proceedings of The First Workshop On Hum PDF
75 pages
Speech-to-Speech Translation
No ratings yet
Speech-to-Speech Translation
103 pages
Natural Language Processing For Language Translation
No ratings yet
Natural Language Processing For Language Translation
23 pages
Automated Real-Time Language Translation Through Speech Recognition.
No ratings yet
Automated Real-Time Language Translation Through Speech Recognition.
27 pages
4 Prompting ChatGPT for Translation a Comparative Analysis Of
No ratings yet
4 Prompting ChatGPT for Translation a Comparative Analysis Of
9 pages
final paper
No ratings yet
final paper
5 pages
Statistical Approaches
No ratings yet
Statistical Approaches
26 pages
A Survey of Multilingual Neural Machine Translation: Raj Dabre, Chenhui Chu, Anoop Kunchukuttan
No ratings yet
A Survey of Multilingual Neural Machine Translation: Raj Dabre, Chenhui Chu, Anoop Kunchukuttan
38 pages
Internal Components of A Computer
No ratings yet
Internal Components of A Computer
63 pages
Lab 03
No ratings yet
Lab 03
4 pages
All About Internet
No ratings yet
All About Internet
22 pages
Computer System Architecture
No ratings yet
Computer System Architecture
2 pages
5G NetStratOp Survey 2023
No ratings yet
5G NetStratOp Survey 2023
25 pages
Huong Dan Su Dung Tieng Anh
No ratings yet
Huong Dan Su Dung Tieng Anh
144 pages
Unit 2 - Introduction to Computer Terminology
No ratings yet
Unit 2 - Introduction to Computer Terminology
42 pages
Nithish Resume
No ratings yet
Nithish Resume
2 pages
Bit 2202 Data Structures and Algorithms Paper2
100% (1)
Bit 2202 Data Structures and Algorithms Paper2
3 pages
2502.08830v1
No ratings yet
2502.08830v1
27 pages
Course Information For 2020 - 2021: © College of The North Atlantic CP1211 Page 1 of 4
No ratings yet
Course Information For 2020 - 2021: © College of The North Atlantic CP1211 Page 1 of 4
4 pages
Disease Identification and Retinal Scan Correction Using Deep Learning Techniques Project Report
No ratings yet
Disease Identification and Retinal Scan Correction Using Deep Learning Techniques Project Report
72 pages
1 Minimization of Finite Automata
No ratings yet
1 Minimization of Finite Automata
5 pages
Lab Manual 09
No ratings yet
Lab Manual 09
6 pages
Question On Distributed Databased System
No ratings yet
Question On Distributed Databased System
3 pages
ChenTsai ProgramLanguages 4e Chapter1
No ratings yet
ChenTsai ProgramLanguages 4e Chapter1
34 pages
Files
No ratings yet
Files
19 pages
Embedded SW Engineer - Anitha Yala
No ratings yet
Embedded SW Engineer - Anitha Yala
6 pages
BN002 Yr2 COMP H2015 Jan08 Semester3 Switching Basics & Intermediate Routing
No ratings yet
BN002 Yr2 COMP H2015 Jan08 Semester3 Switching Basics & Intermediate Routing
23 pages
Unit 2 - Data Mining and Warehousing - WWW - Rgpvnotes.in
100% (1)
Unit 2 - Data Mining and Warehousing - WWW - Rgpvnotes.in
16 pages
Networking Open Exam
No ratings yet
Networking Open Exam
11 pages
Ip Project of Quizes
0% (1)
Ip Project of Quizes
23 pages
24.2 Exercise 1 - Welcome To The Appian Developer Learning Path
No ratings yet
24.2 Exercise 1 - Welcome To The Appian Developer Learning Path
5 pages
Rajant SpecSheet PeregrineLTE 101022
No ratings yet
Rajant SpecSheet PeregrineLTE 101022
3 pages
Ece4721 HW1
No ratings yet
Ece4721 HW1
7 pages
Log Old 2
No ratings yet
Log Old 2
60 pages
Seminar Circular
No ratings yet
Seminar Circular
9 pages
Microsoft Windows
No ratings yet
Microsoft Windows
13 pages
FortiGate Sec 02 Security Fabric
No ratings yet
FortiGate Sec 02 Security Fabric
39 pages

FinalPPT

Uploaded by

FinalPPT

Uploaded by

PolyglotCAM

Under the Guidance of Presented By

•To capture images or upload existing files for text extraction.

Title, Author, Journal, Year, DOI Methodology Problems Identified

Google Translate App

You might also like