0% found this document useful (0 votes)
12 views21 pages

FinalPPT

Uploaded by

rayhalcomet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views21 pages

FinalPPT

Uploaded by

rayhalcomet
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

PolyglotCAM

Under the Guidance of Presented By


Dr. Meenakshi Sundaram Rayna Halley R 1NH21AI085
Professor Shoailuddin 1NH21AI096
Department of AI & ML
NHCE

21AIM73-Major Project
Outline

• Introduction
• Objective
• Literature survey / Existing systems
• Limitations of Existing Systems
• Proposed system
• System Design
• Tools Used
• Algorithm Details
• How the Algorithm works
• Problem Definition
• Result
• Conclusion
• Future enhancements

21AIM73-Major Project
Introduction

This system is designed to perform Optical Character Recognition (OCR) on images and translate the extracted text into multiple
languages, enabling seamless cross-language communication. It utilizes Tesseract OCR for accurate text extraction and the deep-translator
library for efficient translation, all integrated into a user-friendly interface built with Gradio. The application supports a wide range of
languages, including major Indian languages like Hindi, Tamil, Telugu, Kannada, and more, as well as English, making it versatile and
accessible. By combining powerful OCR and translation technologies, this system aims to bridge language barriers and provide a
convenient solution for text translation across various mediums.

21AIM73-Major Project
Objectives

•To capture images or upload existing files for text extraction.


•To extract text from images using advanced OCR tools like Tesseract.
•To translate the extracted text into a user-specified target language.
•To integrate translation and text extraction seamlessly in a single platform.
•To provide a simple and intuitive interface for uploading images, selecting languages, and viewing
translation results.

21AIM73-Major Project
Literature Survey
Title, Author, Journal, Year, DOI Methodology Problems Identified
Real-time Neural Machine Translation Explored low-latency translations using on- Trade-off between translation speed and
Ma et al., ACL, 2019, [10.18653/v1/P19- the-fly decoding and model optimization accuracy.
1011](https://doi.org/10.18653/v1/P19-1011)

Multilingual Translation with Extensible Developed multilingual models using large- Managing resource allocation for multiple
Multilingual Pretraining scale pretraining. languages.
Conneau et al., ACL, 2020,
[10.18653/v1/2020.acl-main.303](https://doi.
org/10.18653/v1/2020.acl-main.303)

Dynamic Convolution for Efficient Real-time Proposed dynamic convolutions as an Balancing efficiency and model complexity.
Translation efficient alternative to self-attention.
Wu et al., ICLR, 2020,
[10.48550/arXiv.1912.04053](https://doi.org/
10.48550/arXiv.1912.04053)

Simultaneous Translation with Segment-based Introduced segment-based approach for Managing segmentation errors and latency.
Consistency maintaining consistency in real-time outputs.
Zheng et al., EMNLP, 2020,
[10.18653/v1/2020.emnlp-main.24](https://do
i.org/10.18653/v1/2020.emnlp-main.24)

21AIM73-Major Project
Literature Survey

Title, Author, Journal, Year, DOI Methodology Problems Identified


Lightweight and Efficient Neural Machine Developed lightweight NMT models for Balancing model size and translation quality.
Translation resource-constrained devices.
Kasai et al., ACL, 2021,
[10.18653/v1/2021.acl-long.141](https://doi.o
rg/10.18653/v1/2021.acl-long.141)

Real-time Adaptive Machine Translation Examined adaptive methods for real-time Complexity in dynamic model adaptation.
Aharoni et al., ACL, 2020, NMT, allowing quick adaptation to new
[10.18653/v1/2020.acl-main.313](https://doi. languages and domains.
org/10.18653/v1/2020.acl-main.313)

SimulMT: A Toolkit for Simultaneous Neural Introduced SimulMT toolkit for simultaneous Difficulty in evaluating real-time
Machine Translation translation systems development. performance.
Ma et al., ACL, 2020, [10.18653/v1/2020.acl-
demo.6](https://doi.org/10.18653/v1/2020.acl
-demo.6)

Direct Speech-to-Text Translation with Discussed direct speech-to-text translation Handling noisy input and varied speech
Transformer using Transformer models. patterns.
Jia et al., Interspeech, 2019,
[10.21437/Interspeech.2019-2212](https://doi.
org/10.21437/Interspeech.2019-2212)

21AIM73-Major Project
Literature Survey
Title, Author, Journal, Year, DOI Methodology Problems Identified
Fast and Accurate Neural Machine Proposed low-rank attention mechanisms to Maintaining translation quality with reduced
Translation with Low-Rank Attention speed up translation while maintaining high computational resources.
Li et al., ACL, 2021, [10.18653/v1/2021.acl- accuracy.
long.220](https://doi.org/10.18653/v1/2021.a
cl-long.220)

Self-attention with Relative Position Enhanced the Transformer model by Increased model complexity and training
Representations incorporating relative position time.
Shaw et al., NAACL, 2019, representations.
[10.18653/v1/N19-1154](https://doi.org/10.18
653/v1/N19-1154)

Scaling Neural Machine Translation Explored methods to scale NMT models to Challenges in managing memory and
Ott et al., EMNLP, 2018, [10.18653/v1/D18- handle very large datasets. computational resources.
1322](https://doi.org/10.18653/v1/D18-1322)

Monotonic Infinite Lookback Attention for Introduced a novel attention mechanism for Balancing latency and translation accuracy.
Simultaneous Machine Translation simultaneous translation.
Arivazhagan et al., ACL, 2019,
[10.18653/v1/P19-1289](https://doi.org/10.18
653/v1/P19-1289)

21AIM73-Major Project
Literature Survey
Title, Author, Journal, Year, DOI Methodology Problems Identified
Understanding Back-Translation at Scale Investigated the effects of back-translation on Handling noise in synthetic data and ensuring
Edunov et al., EMNLP, 2018, NMT performance. quality.
[10.18653/v1/D18-1365](https://doi.org/10.18
653/v1/D18-1365)

Reducing Transformer Depth on Demand Proposed a method to dynamically adjust Maintaining performance with reduced model
with Structured Dropout Transformer depth during training. depth.
Fan et al., ICLR, 2020,
[10.48550/arXiv.1909.11556](https://doi.org/
10.48550/arXiv.1909.11556)

Pre-trained Models for Natural Language Reviewed various pre-trained models and Challenges in adapting pre-trained models to
Processing: A Survey their applications in NLP tasks. specific tasks and languages.
Qiu et al., AI Open, 2020,
[10.1016/j.aiopen.2021.01.001](https://doi.or
g/10.1016/j.aiopen.2021.01.001)

Understanding Back-Translation at Scale Investigated the effects of back-translation on Handling noise in synthetic data and ensuring
Edunov et al., EMNLP, 2018, NMT performance. quality.
[10.18653/v1/D18-1365](https://doi.org/10.18
653/v1/D18-1365)

21AIM73-Major Project
Existing system

Google Translate App


Google Translate app allows users to translate text by typing, speaking, or using their camera. It supports real-time translation for numerous languages.

Microsoft Translator
Microsoft Translator provides text, voice, and image translation features, leveraging AI to handle various languages and complex text recognition.

Waygo

Waygo specializes in translating text from Chinese, Japanese, and Korean to English using real-time camera input.

iTranslate
iTranslate offers text, voice, and camera translation features in multiple languages. It uses advanced algorithms to provide real-time translations and includes a dictionary
and phrasebook for enhanced communication.

21AIM73-Major Project
Limitations of Existing Systems

• Limited Language Support: Not all languages are supported equally, and dialect variations can pose challenges.
• Accuracy Issues: OCR and translation accuracy can be affected by poor image quality, complex fonts, or handwriting.
• Processing Speed: Some applications may have noticeable delays between capturing the image and displaying the translated text.
• User Interface Complexity: Some existing systems have interfaces that are not intuitive or user-friendly, leading to a steeper
learning curve for new users.

21AIM73-Major Project
Proposed system

•Users can upload images containing text for extraction and translation.
•Enhances the uploaded images by converting them to grayscale, correcting orientation, and
improving text visibility.
•Uses Tesseract OCR to extract text accurately from the uploaded images.
•Processes the extracted text through deep-translator to translate it into the user-selected target
language.
•Offers an intuitive Gradio-based interface for easy image uploads, language selection, and
viewing results.
•Supports multiple regional and international languages for both OCR and translation.
•Provides an online platform with a public link for easy access and usage.

21AIM73-Major Project
System Design

21AIM73-Major Project
Tools used

•OCR: Tesseract
•Translation: deep-translator (Google Translate API)
•UI: Gradio
•Programming: Python
•Image Processing: PIL (Python Imaging Library)
•Environment: Google Colab

21AIM73-Major Project
Algorithm Details

•Integration of OCR and Translation Algorithms: The system combines Tesseract OCR for text extraction from
images and deep-translator for multilingual translation, creating a seamless end-to-end text processing pipeline.
•Dynamic Language Mapping: Input languages are dynamically mapped to Tesseract’s predefined language codes
for OCR, ensuring compatibility and accurate text extraction for regional and international languages.
•Automated Translation Workflow: Extracted text is automatically processed by deep-translator using advanced
neural machine translation APIs (e.g., Google Translate), enabling precise and efficient language conversion.
•Optimized User Interaction via Gradio: The Gradio interface streamlines user interaction, integrating image
uploads, language selection, and result display into a single intuitive platform.
•Enhanced Text Detection with Pre-Processing: Pre-processing techniques like grayscale conversion and orientation
correction improve OCR accuracy, ensuring robust text recognition in varying image conditions.

21AIM73-Major Project
How the Algorithm Works and Why It Is Unique

•Combines OCR and translation into a single automated pipeline, streamlining the entire process.
•Supports a wide range of regional and international languages, with a special focus on Indian languages.
•Intuitive, GUI-driven interface ensures accessibility for both technical and non-technical users.
•Eliminates the need for manual text input or complex language mapping, saving time and effort.
•Simplifies user interaction while providing a comprehensive solution for text extraction and translation in
one workflow.

21AIM73-Major Project
Problem Definition

• Language Barriers in Global Communication: In today’s globalized world, language barriers can significantly hinder
effective communication, especially for travelers, expatriates, and international business professionals.
• Real-Time Translation Challenges: Existing translation tools often require manual input, which can be cumbersome and
slow. Additionally, achieving accurate translation in real-time is difficult due to varying lighting conditions, different text
orientations, and complex backgrounds.
• Need for User-Friendly Solutions: There is a significant need for intuitive, real-time translation tools that can seamlessly
integrate into everyday life, making communication effortless.

21AIM73-Major Project
Result

21AIM73-Major Project
Result

21AIM73-Major Project
Conclusion

The proposed system provides a robust solution for text extraction and translation by combining OCR technology
with translation APIs. By processing images, detecting text, and translating it into user-specified languages, the
system effectively bridges language barriers. Its intuitive interface and multilingual support make it practical for
real-world applications, such as translating signs, documents, or other visual content. This solution is efficient,
user-friendly, and can be further enhanced to support additional languages, offline capabilities, and advanced pre-
processing techniques to meet evolving user requirements.

21AIM73-Major Project
Future Enhancement

•Offline Capabilities:
Integrate pre-trained OCR and translation models to enable offline functionality, reducing reliance on APIs.
•Real-Time Video Processing:
Extend functionality to process and translate text from live video feeds for dynamic use cases.
•Speech Integration:
Add speech-to-text capabilities to allow voice input and audio output for translations.
•Mobile Optimization:
Develop a mobile-friendly version to ensure usability and accessibility on smartphones and tablets.
•Domain-Specific Models:
Train and integrate specialized translation models for domains like legal, medical, or technical applications.

21AIM73-Major Project
Thank You

21AIM73-Major Project

You might also like