0% found this document useful (0 votes)

12 views

ASR Survey Presentation

Uploaded by

drishti23117

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

ASR Survey Presentation

Uploaded by

drishti23117

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Automatic Speech Recognition

(ASR) Systems for Indic

Languages

Presented By: Drishti Singh, MT23117

ENG 599 - Winter 2024 Presentation

Contents

1. Introduction to ASRs
2. Motivation to create ASRs for Indic languages
3. Challenges
4. Past Literature Review
5. Methodology
6. Results
7. Conclusion
8. Future Scope
9. References
Introduction
Automatic Speech Recognition (ASR) marks a significant milestone in the evolution of Human-Computer
Interaction. From its humble beginnings in basic digit identification, ASR has progressed to sophisticated
systems capable of transcribing continuous speech in real-time. This advancement has driven the
automation of tasks across various levels of complexity, fueling research efforts towards developing
optimal systems that effectively comprehend and analyze speech for seamless task execution.

● ASR technology has undergone substantial transformation (mostly after the advent of LLMs),
enabling computers to transcribe speech into text with increasing accuracy and efficiency.

● The evolution of ASR has been fueled by the growing need to automate tasks and enhance
Human-Machine interaction, driving extensive research and development in the field.

● ASR systems play a crucial role in enabling hands-free operation of devices, facilitating voice
commands for tasks ranging from simple commands to complex queries.
Motivation
The linguistic diversity of countries like India presents unique challenges and opportunities for ASR
technology. In India, where numerous languages are spoken and a significant portion of the population
faces literacy challenges, the need for ASRs tailored to Indic languages is particularly pressing.

● India's vast linguistic diversity and substantial illiterate population underscore the importance of
ASRs customized for Indic languages.

● Language Model Models (LLMs) like ChatGPT and Gemini have opened new avenues for bridging
language gaps through AI, offering solutions to societal challenges faced by rural communities.

● ASR technology holds the potential to empower individuals in rural areas with limited literacy
skills, enabling them to interact effortlessly with computers and devices, thereby unlocking
opportunities for education, information access, and socio-economic empowerment.
Challenges
1. Linguistic Diversity: Indic languages exhibit significant linguistic diversity in terms of
phonetics, grammar, and vocabulary. Developing ASRs that accurately capture these
linguistic nuances across different Indic languages poses a considerable challenge.
2. Limited Data Availability: Unlike widely spoken languages such as English, data
availability for training ASRs in Indic languages is often limited. This scarcity of labeled
audio data hampers the ability to train robust and accurate ASR models.
3. Variability in Pronunciation: Indic languages often exhibit variability in
pronunciation due to regional accents, dialects, and socio-cultural factors. ASR models
must account for this variability to ensure accurate transcription of speech.
4. Code-Switching and Multilingualism: Many speakers of Indic languages engage in
code-switching, where they seamlessly switch between multiple languages within a
single conversation. ASRs need to effectively handle code-switching and multilingual
speech to accurately transcribe such interactions.
Past Literature Review

Research in Automatic Speech Recognition (ASR) has predominantly centered around three
key areas: Multilingual ASR, Monolingual ASR, and Code-Switching ASR, each posing distinct
challenges and garnering significant attention in advancing speech recognition technology.

1. Multilingual ASR - To handle many languages at the same time, the union of the
corpus consisting of all language characters is fed to a single deep neural network and
is trained collectively.
2. Monolingual ASR - Monolingual ASR consists of various pre-trained models, which are
trained on each language separately and fine-tuned accordingly. There is a language
identification module (LID) attached to such an ASR, which recognises the language and
directs to the specified model accordingly.
3. Code - Switching ASR - When bilingual speakers switch between languages
depending on the situation, it's called code-switching. This phenomenon occurs when
someone speaks two languages successively in the same conversation, and it can differ
from how each language is spoken individually.
Benchmark Datasets for ASR Training:

1. Vistaar Benchmark Set:

a. Kathbath-Hard: Diverse dataset from speakers representing

12 Indian languages, including background noise for a
challenging benchmark.

b. FLEURS: Dataset of translated Wikipedia content with broad

language coverage. Vistaar Dataset
Description
c. CommonVoice: Crowdsourced speech dataset from diverse
sources encompassing 8 languages.

2. MUCS Dataset: Included in the MUCS 2021 competition,

featuring multilingual aspects and code-switching transcripts.
Combines manually curated gold standard labels with weakly
curated labels from crowdsourcing platforms
MUCS Dataset Description
Methodology

Methodologies for Indian ASR research encompass diverse approaches, including self-supervised learning
and encoder-decoder architectures, driving advancements in speech recognition technology.

● Wav2Vec 2.0:
● Utilizes self-supervised learning with extensive unlabeled speech data.
● Two stages: pretraining with masking and contrastive task, fine-tuning with labeled data.
● Extracts robust contextualized information from speech audio for Multilingual ASR.

● IndicWhisper Model:
● Utilizes OpenAI's Whisper models for monolingual training and combination.
● Trains individual models for each language with Transformer-based architecture.
● Fine-tunes Whisper-medium model for each language.

● Seq2Seq Model:
● Utilized for multilingual speech recognition.
● Encoder-decoder architecture maps input audio to intermediate representation and generates
output sequence.
Results

IndicWav2Vec Model and

IndicWhisper Model
generated almost same
WER (Word Eror Rate) and
therefore portrayed similar The above histogram shows that for almost all
performance. Indian languages, multilingual approach
outperformed monolingual approach in terms of
pre-training loss.
This is for the Kathbath Dataset

The diagram portrays different ways of training

multilingual ASRs, and then comparing their
relative WER change (lower being better) as the
Conclusion

The study underscores the pivotal role of speech recognition in machine learning,
driven by humans' innate propensity for speech comprehension. It highlights the
challenges researchers face, notably the scarcity of large audio datasets,
particularly for diverse Indian languages lacking sufficient corpora and labeled
data. However, the emergence of robust benchmarks like Kathbath and Vistaar
signal a promising avenue for developing more efficient ASRs across all Indian
languages, given access to diverse datasets. Furthermore, leveraging Language
Model Models (LLMs) can facilitate the generation of synthetic datasets, enabling
weak supervised training during the initial stages.
Future Scope

Future research endeavors in ASR should focus on addressing dataset scarcity, fostering
collaboration, and exploring novel techniques to enhance performance, paving the way for
inclusive and effective speech recognition systems.

● Address challenges of dataset scarcity and label annotation for Indian languages through
collaborative efforts between academia, industry, and government bodies.
● Foster collaboration to curate and expand existing datasets, ensuring inclusivity and
representation of all Indian languages.
● Explore novel techniques such as active learning and transfer learning to enhance ASR
performance, especially for low-resource languages.
● Advance Language Model Models (LLMs) and synthetic data generation techniques to
improve ASR robustness and scalability.
● Develop inclusive and effective speech recognition systems for all languages spoken in
India and beyond, leveraging advancements in technology and data accessibility.
References

[1] T. Javed, K. S. Bhogale, A. Raman, A. Kunchukuttan, P. Kumar, and M. M. Khapra, “IndicSUPERB: A

Speech Processing Universal Performance Benchmark for Indian languages.” arXiv, Dec. 15, 2022.
Accessed: Mar. 21, 2024. [Online]. Available: http://arxiv.org/abs/2208.11761

[2] A. Gupta et al., “CLSRIL-23: Cross Lingual Speech Representations for Indic Languages.” arXiv, Jan.
13, 2022. Accessed: Mar. 21, 2024. [Online]. Available: http://arxiv.org/abs/2107.07402

[3] H. S. Chadha, P. Shah, A. Dhuriya, N. Chhimwal, A. Gupta, and V. Raghavan, “Code Switched and
Code Mixed Speech Recognition for Indic languages.” arXiv, Jun. 13, 2022. Accessed: Mar. 21, 2024.
[Online]. Available: http://arxiv.org/abs/2203.16578

[4] K. S. Bhogale, S. Sundaresan, A. Raman, T. Javed, M. M. Khapra, and P. Kumar, “Vistaar: Diverse
Benchmarks and Training Sets for Indian Language ASR.” arXiv, Aug. 02, 2023. Accessed: Mar. 21,
2024. [Online]. Available: http://arxiv.org/abs/2305.15386

[5] V. Pratap et al., “Massively Multilingual ASR: 50 Languages, 1 Model, 1 Billion Parameters.” arXiv,
Jul. 07, 2020. Accessed: Mar. 21, 2024. [Online]. Available: http://arxiv.org/abs/2007.03001

The Effective Form Tutor
100% (1)
The Effective Form Tutor
14 pages
Ued102 Individual - Assignment
No ratings yet
Ued102 Individual - Assignment
31 pages
Diass DLL
100% (1)
Diass DLL
9 pages
2208.12666v1 Feature Extraction
No ratings yet
2208.12666v1 Feature Extraction
13 pages
Comparative Analysis of Automatic Speech Recognition Techniques
No ratings yet
Comparative Analysis of Automatic Speech Recognition Techniques
8 pages
Development and Suitability of Indian Languages Speech Database For Building Watson Based ASR System
No ratings yet
Development and Suitability of Indian Languages Speech Database For Building Watson Based ASR System
7 pages
j9
No ratings yet
j9
37 pages
Scaling Speech Technology To 1,000+ Languages
No ratings yet
Scaling Speech Technology To 1,000+ Languages
41 pages
ASRoIL - A - Comprehensive - Survey - For - Automa Kannada
No ratings yet
ASRoIL - A - Comprehensive - Survey - For - Automa Kannada
32 pages
IT Report-1
No ratings yet
IT Report-1
14 pages
s10772-024-10082-z
No ratings yet
s10772-024-10082-z
13 pages
2022.lrec-1.542 A Survey of Multilingual Models For Automatic Speech Recognition
No ratings yet
2022.lrec-1.542 A Survey of Multilingual Models For Automatic Speech Recognition
9 pages
2406.01446v1
No ratings yet
2406.01446v1
13 pages
Automatic Speech Recognition For Indian Languages: Comprehension and Analysis
No ratings yet
Automatic Speech Recognition For Indian Languages: Comprehension and Analysis
11 pages
Electrical Engineering (2017-2021) Punjab Engineering College, Chandigarh - 160012
No ratings yet
Electrical Engineering (2017-2021) Punjab Engineering College, Chandigarh - 160012
23 pages
Final Review - Kannada Accent Recognition
No ratings yet
Final Review - Kannada Accent Recognition
27 pages
Speech-to-Text Systems and Technologies: Definitive Reference for Developers and Engineers
From Everand
Speech-to-Text Systems and Technologies: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
CHAPTER ONE
No ratings yet
CHAPTER ONE
13 pages
Comparing the Fine-Tuning and Performance of Whisper Pre-Trained Models for Turkish Speech Recognition Task
No ratings yet
Comparing the Fine-Tuning and Performance of Whisper Pre-Trained Models for Turkish Speech Recognition Task
4 pages
Hindi spoken digit analysis for native and non-native speakers
No ratings yet
Hindi spoken digit analysis for native and non-native speakers
7 pages
Language Identification: Fundamentals and Applications
From Everand
Language Identification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Performanceanalysisof ASRModelfor Santhalilanguageon Kaldiand Matlab Toolkit
No ratings yet
Performanceanalysisof ASRModelfor Santhalilanguageon Kaldiand Matlab Toolkit
5 pages
Racial Disparities in Automated Speech Recognition
No ratings yet
Racial Disparities in Automated Speech Recognition
6 pages
ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development
No ratings yet
ivrit.ai: A Comprehensive Dataset of Hebrew Speech for AI Research and Development
9 pages
Text-to-Speech Systems and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Text-to-Speech Systems and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
ASR For Samskruta
No ratings yet
ASR For Samskruta
8 pages
ASR in NLP
No ratings yet
ASR in NLP
7 pages
Indian Language Speech Database A Review
No ratings yet
Indian Language Speech Database A Review
5 pages
Punjabi A
No ratings yet
Punjabi A
7 pages
2503.23542v1
No ratings yet
2503.23542v1
26 pages
Lightweight_Multi-Speaker_Multi-Lingual_Indic_Text-to-Speech
No ratings yet
Lightweight_Multi-Speaker_Multi-Lingual_Indic_Text-to-Speech
9 pages
Representation Analysis Methods - For Translation
No ratings yet
Representation Analysis Methods - For Translation
218 pages
Thesis-Speech Recognition Markov
No ratings yet
Thesis-Speech Recognition Markov
65 pages
2514-Article Text-11375-1-10-20220919 (1)
No ratings yet
2514-Article Text-11375-1-10-20220919 (1)
12 pages
BharatBhasaNet-A Unified Framework to Identify Indian Code Mix Languages
No ratings yet
BharatBhasaNet-A Unified Framework to Identify Indian Code Mix Languages
12 pages
Applsci 12 01091
No ratings yet
Applsci 12 01091
18 pages
VAIS ASR: Building A Conversational Speech Recognition System Using Language Model Combination
No ratings yet
VAIS ASR: Building A Conversational Speech Recognition System Using Language Model Combination
3 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
35 pages
An In-Depth Analysis of Automatic Speech Recognition System
No ratings yet
An In-Depth Analysis of Automatic Speech Recognition System
5 pages
A Study On Automatic Speech Recognition
100% (1)
A Study On Automatic Speech Recognition
2 pages
Voice Based Translator
No ratings yet
Voice Based Translator
4 pages
STT Mathematical Problems in Engineering - 2022 - Hassan - Improvement in Automatic Speech Recognition of South Asian Accent
No ratings yet
STT Mathematical Problems in Engineering - 2022 - Hassan - Improvement in Automatic Speech Recognition of South Asian Accent
12 pages
(IJCST-V11I2P2) :pooja Shirude, Mohit Chaudhari, Gaurav Baviskar, Mahesh Kanhere
No ratings yet
(IJCST-V11I2P2) :pooja Shirude, Mohit Chaudhari, Gaurav Baviskar, Mahesh Kanhere
3 pages
Comparative Analysis of State-of-the-Art Speech Recognition Models For Low-Resource Marathi Language
No ratings yet
Comparative Analysis of State-of-the-Art Speech Recognition Models For Low-Resource Marathi Language
2 pages
ASRcourseMOSIG2024
No ratings yet
ASRcourseMOSIG2024
97 pages
DOC-20241111-WA0002.
No ratings yet
DOC-20241111-WA0002.
10 pages
A Review On Speech Recognition Challenge
No ratings yet
A Review On Speech Recognition Challenge
7 pages
Speech Data Collection-1
No ratings yet
Speech Data Collection-1
14 pages
1-s2.0-S0957417424009850-main
No ratings yet
1-s2.0-S0957417424009850-main
11 pages
Automatic Speech Recognition
No ratings yet
Automatic Speech Recognition
9 pages
2205.08014v1
No ratings yet
2205.08014v1
5 pages
ai
No ratings yet
ai
8 pages
Xiao Guest Lecture ASR
No ratings yet
Xiao Guest Lecture ASR
39 pages
Speech Recognition
No ratings yet
Speech Recognition
4 pages
Automatic Speech Recognition (ASR) : Omar Khalil Gómez - Università Di Pisa
100% (1)
Automatic Speech Recognition (ASR) : Omar Khalil Gómez - Università Di Pisa
65 pages
Convai Technical Overview Speech Ai Part 2 2301964
No ratings yet
Convai Technical Overview Speech Ai Part 2 2301964
11 pages
End-to-End Speech Recognition: A Survey
No ratings yet
End-to-End Speech Recognition: A Survey
27 pages
Vivek Kumar - 1613112052
No ratings yet
Vivek Kumar - 1613112052
7 pages
Language Understanding with LUIS: Definitive Reference for Developers and Engineers
From Everand
Language Understanding with LUIS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Speech Recognition As Emerging Revolutionary Technology
No ratings yet
Speech Recognition As Emerging Revolutionary Technology
4 pages
Rust for Beginners
From Everand
Rust for Beginners
Hernando Abella
No ratings yet
A Brief Introduction To Automatic Speech Recognition
No ratings yet
A Brief Introduction To Automatic Speech Recognition
22 pages
LR_speech_tts_ASR_combo_2020
No ratings yet
LR_speech_tts_ASR_combo_2020
11 pages
Grammar Teaching Lesson Plan
No ratings yet
Grammar Teaching Lesson Plan
3 pages
English 10
No ratings yet
English 10
10 pages
[FREE PDF sample] Beyond Territory Dynamic Geographies of Knowledge Creation Diffusion and Innovation 1st Edition Harald Bathelt ebooks
100% (3)
[FREE PDF sample] Beyond Territory Dynamic Geographies of Knowledge Creation Diffusion and Innovation 1st Edition Harald Bathelt ebooks
73 pages
Lesson Planning
No ratings yet
Lesson Planning
5 pages
PINTOR-DLP-Q3-W2-Day 1
No ratings yet
PINTOR-DLP-Q3-W2-Day 1
6 pages
Academic MBA Letter
No ratings yet
Academic MBA Letter
2 pages
Sample Action Research Proposal
No ratings yet
Sample Action Research Proposal
4 pages
Guide To Dictogloss
No ratings yet
Guide To Dictogloss
3 pages
Lesson Script in Reading and Literacy: Catch-Up Fridays
No ratings yet
Lesson Script in Reading and Literacy: Catch-Up Fridays
15 pages
Transdisciplinary Approach Practicum For Speech Language Pathology and Special Education Graduate Students
No ratings yet
Transdisciplinary Approach Practicum For Speech Language Pathology and Special Education Graduate Students
18 pages
S9 - Template - Efdt Online Asynchronous Learning
No ratings yet
S9 - Template - Efdt Online Asynchronous Learning
7 pages
Cooperation and The Use of Technology
No ratings yet
Cooperation and The Use of Technology
29 pages
Itech Lesson 1
No ratings yet
Itech Lesson 1
4 pages
1995 - Exam
No ratings yet
1995 - Exam
12 pages
LESSON PLAN TEMPLATE FOR Preading Ed MArch 8,2024
No ratings yet
LESSON PLAN TEMPLATE FOR Preading Ed MArch 8,2024
4 pages
Research Edtech Group A
No ratings yet
Research Edtech Group A
19 pages
faAEU PHD Finance Rev July19
No ratings yet
faAEU PHD Finance Rev July19
2 pages
Best Practices of LAC
100% (7)
Best Practices of LAC
2 pages
DepEd Guidance and Counseling Services
No ratings yet
DepEd Guidance and Counseling Services
11 pages
CH 6
No ratings yet
CH 6
25 pages
Who Is A Lifelong Learner
No ratings yet
Who Is A Lifelong Learner
1 page
Louw and Muller 2014 Literature Review On Models of The PHD
No ratings yet
Louw and Muller 2014 Literature Review On Models of The PHD
37 pages
INquiries. TOS. Midterm
No ratings yet
INquiries. TOS. Midterm
4 pages
Learning Update
No ratings yet
Learning Update
4 pages
AMEE Guide 21 - Curriculum Mapping PDF
No ratings yet
AMEE Guide 21 - Curriculum Mapping PDF
15 pages
Interdisciplinary Unit Plan
No ratings yet
Interdisciplinary Unit Plan
6 pages
Work Immersion 12 Module 5 6
100% (1)
Work Immersion 12 Module 5 6
9 pages

ASR Survey Presentation

Uploaded by

ASR Survey Presentation

Uploaded by

Automatic Speech Recognition

(ASR) Systems for Indic

Presented By: Drishti Singh, MT23117

ENG 599 - Winter 2024 Presentation

1. Vistaar Benchmark Set:

a. Kathbath-Hard: Diverse dataset from speakers representing

b. FLEURS: Dataset of translated Wikipedia content with broad

2. MUCS Dataset: Included in the MUCS 2021 competition,

IndicWav2Vec Model and

The diagram portrays different ways of training

[1] T. Javed, K. S. Bhogale, A. Raman, A. Kunchukuttan, P. Kumar, and M. M. Khapra, “IndicSUPERB: A

You might also like