SlideShare a Scribd company logo
2
Most read
4
Most read
6
Most read
Zeroth Review - 2021
M.Vignesh
221003105
IV-CSE A
221003105@sastra.ac.in
S.Mahadevan
221003057
IV-CSE A
221003057@sastra.ac.in
Guided By
Ms. Bhavani R
APII/CSE/SRC/SASTRA
PROJECT
OVERVIEW
• Visual Speech Recognition
• Extract lip features
• Neural Network model to train the lip sequence
• Transcribing and evaluate Lip movements into text
3
INTRODUCTION
4
• There are many Recognition System that recognize words from audio features.
• Lip reading System is one of the developing technology
• Aims to recognize words only by visual feature without audio
• Classify and recognize words by visemes movements
PROBLEM
STATEMENT
5
• Noisy Environment
• Speech Speed
• Accent
• Pronunciation
• Facial Features
OBJECTIVE
6
• Extract textual or speech data from facial features
• Train a Neural network system to process visemes sequence
• Develop a Speaker Independed system
• Recognize and classify ten different words
S/W –H/W REQUIREMENTS
7
SOFTWARE REQUIREMENTS:
• Anaconda
• System:64 bit OS, x64 processor
HARDWARE REQUIREMENTS:
• 4 GB RAM
• Better GPU(For performance)
EXISTING VS PROPOSED SYSTEM
8
Existing System:
• Uses BBCLRS2 Dataset
• Recognize and classify only ASCII characters and decode words
• Complex processing and gives better accuracy after 2000 epochs
Proposed System:
• Uses MIRACL-VC1 Dataset
• Recognize and classify ten different words
• Simple and gives better accuracy after 200 epochs
9
S.No Paper Title, author Journal details with Date
of publication
Methodology applied Merits and demerits
1 Lip-Reading Driven Deep
Learning Approach for
Speech Enhancement, Ahsan
Adeel, Mandar Gogate, Amir
Hussain, and William M. Whitmer
2019- IEEE Transaction This paper uses LSTM driven
Audio Visual
mapping approach.
• Increased accuracy
• Autonomous Speech
enhancement
• Poor performance for
Realtime speech
2 An audio-visual corpus for
multimodal automatic
speech recognition, Andrzej
Czyzewski, Bozena Kostek,
Piotr Bratoszewski,Jozef Kotus,
Marcin Szykulski
2017-Springer This paper uses Active
Appearance Model(AAM) and
Hidden Markov Models(HMM)
• Recognize in street noise
• Babble noise dramatically
worsens the accuracy of
speech recognition
3 Extraction of Visual Features for
Lipreading, Iain Matthews,
Timothy F. Cootes, J. Andrew
Bangham, Stephen Cox,
Richard Harvey
2017-IEEE Transaction This paper uses Active shape
model (ASM) and point
distribution model (PDM)
• Accuracy improved when
a noisy audio
signal is augmented with visual
information
• Poor performance in
Babble noise
4 Audio-visual speech
recognition using deep
learning, Kuniaki Noda, Yuki
Yamaguchi, Kazuhiro Nakadai,
Hiroshi G. Okuno,Tetsuya Ogata
2014-Springer This paper uses Hidden Markov
Model(HMM)
• Increased Performance
• Reverberation, illumination,
and facial orientation,
occur
5 Speaker-Independent Speech
Recognition using
Visual Features, Pooventhiran
G., Sandeep A.
2020-IEEE This paper uses 3D-CNN model • Improved Accuracy
• Complex
LITERATURE SURVEY
10
PROPOSED ARCHITECTURE
REFERENCES
11
[1] A. Thanda and S. M. Venkatesan, “Audio visual speech recognition using deep recurrent
neural networks,” in IAPR workshop on multimodal pattern recognition of social signals in human-
computer interaction. Springer, 2016
[2] E. Petajan, B. Bischoff, D. Bodoff, and N. M. Brooke, “An improved automatic lipreading
system to enhance speech recognition,” in Proceedings of the SIGCHI conference on Human factors
in computing systems, 1988
[3] A. Torfi, S. M. Iranmanesh, N. Nasrabadi, and J. Dawson, “3d convolutional
neural networks for cross audio-visual matching recognition,” IEEE Access, vol. 5, pp. 22 081–22 091,
2017.
[4] N. Alothmany, R. Boston, C. Li, S. Shaiman, and J. Durrant, “Classification
of visemes using visual cues,” in Proceedings ELMAR-2010.
IEEE, 2010.
[5] I. Almajai, S. Cox, R. Harvey, and Y. Lan, “Improved speaker independent lip reading using
speaker adaptive training and deep neural networks,” in 2016 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP).
12
THANK YOU

More Related Content

PPT
Automatic speech recognition
PPTX
Lip reading Project
PPTX
SPEECH RECOGNITION USING NEURAL NETWORK
PPT
Speech Recognition System By Matlab
PPTX
speech processing basics
PPTX
Speech Recognition
PPT
Speech Recognition
Automatic speech recognition
Lip reading Project
SPEECH RECOGNITION USING NEURAL NETWORK
Speech Recognition System By Matlab
speech processing basics
Speech Recognition
Speech Recognition

What's hot (20)

PDF
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
PPTX
Data Compression Project Presentation
PPTX
Semi supervised approach for word sense disambiguation
PDF
Lecture: Word Sense Disambiguation
DOCX
A seminar report on speech recognition technology
PPTX
Digital watermarking
PPT
PPTX
Introduction to text to speech
PDF
VIDEO CODECS
PPTX
Speaker Recognition
PPTX
Gesture Recognition Technology-Seminar PPT
PDF
UNIT_III_FULL_PPT.pdf.pdf
PDF
Brief Introduction to Spread spectrum Techniques
PPT
Umts system architecture
PPT
Signalling Techniques & Basics of CCS
PPTX
A seminar on ultra wide band
PPTX
Language identification
PPTX
WaveNet
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
Data Compression Project Presentation
Semi supervised approach for word sense disambiguation
Lecture: Word Sense Disambiguation
A seminar report on speech recognition technology
Digital watermarking
Introduction to text to speech
VIDEO CODECS
Speaker Recognition
Gesture Recognition Technology-Seminar PPT
UNIT_III_FULL_PPT.pdf.pdf
Brief Introduction to Spread spectrum Techniques
Umts system architecture
Signalling Techniques & Basics of CCS
A seminar on ultra wide band
Language identification
WaveNet
Ad

Similar to Lip Reading.pptx (20)

PDF
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
PDF
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
PPTX
lip reading using deep learning presentation
PDF
Hybrid Attention Mechanisms in 3D CNN for Noise-Resilient Lip Reading in Comp...
PDF
Lip Reading by Using 3-D Discrete Wavelet Transform with Dmey Wavelet
PDF
Review On Speech Recognition using Deep Learning
PPTX
lips _reading_nagham _salim compute.pptx
PDF
Speaker independent visual lip activity detection for
PDF
Speaker independent visual lip activity detection for human - computer inte...
PPT
lips _reading _in computer_ vision_n.ppt
PDF
Incremental Difference as Feature for Lipreading
PDF
Deep Learning for Speech Recognition - Vikrant Singh Tomar
PPTX
Cross Model.pptx
PPTX
Deep Learning for Automatic Speaker Recognition
PDF
Deep Learning Based Voice Activity Detection and Speech Enhancement
PDF
CONFIDENCE LEVEL ESTIMATOR BASED ON FACIAL AND VOICE EXPRESSION RECOGNITION A...
PDF
Deep Language and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018
PDF
Audio and Vision (D2L9 Insight@DCU Machine Learning Workshop 2017)
PDF
Constructed model for micro-content recognition in lip reading based deep lea...
PDF
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
lip reading using deep learning presentation
Hybrid Attention Mechanisms in 3D CNN for Noise-Resilient Lip Reading in Comp...
Lip Reading by Using 3-D Discrete Wavelet Transform with Dmey Wavelet
Review On Speech Recognition using Deep Learning
lips _reading_nagham _salim compute.pptx
Speaker independent visual lip activity detection for
Speaker independent visual lip activity detection for human - computer inte...
lips _reading _in computer_ vision_n.ppt
Incremental Difference as Feature for Lipreading
Deep Learning for Speech Recognition - Vikrant Singh Tomar
Cross Model.pptx
Deep Learning for Automatic Speaker Recognition
Deep Learning Based Voice Activity Detection and Speech Enhancement
CONFIDENCE LEVEL ESTIMATOR BASED ON FACIAL AND VOICE EXPRESSION RECOGNITION A...
Deep Language and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018
Audio and Vision (D2L9 Insight@DCU Machine Learning Workshop 2017)
Constructed model for micro-content recognition in lip reading based deep lea...
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Ad

Recently uploaded (20)

PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PDF
Abrasive, erosive and cavitation wear.pdf
PDF
86236642-Electric-Loco-Shed.pdf jfkduklg
PPTX
introduction to high performance computing
PDF
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
PPT
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
PPTX
communication and presentation skills 01
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPT
Total quality management ppt for engineering students
PPTX
Nature of X-rays, X- Ray Equipment, Fluoroscopy
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PPTX
Artificial Intelligence
PDF
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
Abrasive, erosive and cavitation wear.pdf
86236642-Electric-Loco-Shed.pdf jfkduklg
introduction to high performance computing
Human-AI Collaboration: Balancing Agentic AI and Autonomy in Hybrid Systems
Introduction, IoT Design Methodology, Case Study on IoT System for Weather Mo...
communication and presentation skills 01
Automation-in-Manufacturing-Chapter-Introduction.pdf
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Total quality management ppt for engineering students
Nature of X-rays, X- Ray Equipment, Fluoroscopy
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Artificial Intelligence
null (2) bgfbg bfgb bfgb fbfg bfbgf b.pdf
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT

Lip Reading.pptx

  • 1. Zeroth Review - 2021 M.Vignesh 221003105 IV-CSE A [email protected] S.Mahadevan 221003057 IV-CSE A [email protected] Guided By Ms. Bhavani R APII/CSE/SRC/SASTRA
  • 2. PROJECT OVERVIEW • Visual Speech Recognition • Extract lip features • Neural Network model to train the lip sequence • Transcribing and evaluate Lip movements into text 3
  • 3. INTRODUCTION 4 • There are many Recognition System that recognize words from audio features. • Lip reading System is one of the developing technology • Aims to recognize words only by visual feature without audio • Classify and recognize words by visemes movements
  • 4. PROBLEM STATEMENT 5 • Noisy Environment • Speech Speed • Accent • Pronunciation • Facial Features
  • 5. OBJECTIVE 6 • Extract textual or speech data from facial features • Train a Neural network system to process visemes sequence • Develop a Speaker Independed system • Recognize and classify ten different words
  • 6. S/W –H/W REQUIREMENTS 7 SOFTWARE REQUIREMENTS: • Anaconda • System:64 bit OS, x64 processor HARDWARE REQUIREMENTS: • 4 GB RAM • Better GPU(For performance)
  • 7. EXISTING VS PROPOSED SYSTEM 8 Existing System: • Uses BBCLRS2 Dataset • Recognize and classify only ASCII characters and decode words • Complex processing and gives better accuracy after 2000 epochs Proposed System: • Uses MIRACL-VC1 Dataset • Recognize and classify ten different words • Simple and gives better accuracy after 200 epochs
  • 8. 9 S.No Paper Title, author Journal details with Date of publication Methodology applied Merits and demerits 1 Lip-Reading Driven Deep Learning Approach for Speech Enhancement, Ahsan Adeel, Mandar Gogate, Amir Hussain, and William M. Whitmer 2019- IEEE Transaction This paper uses LSTM driven Audio Visual mapping approach. • Increased accuracy • Autonomous Speech enhancement • Poor performance for Realtime speech 2 An audio-visual corpus for multimodal automatic speech recognition, Andrzej Czyzewski, Bozena Kostek, Piotr Bratoszewski,Jozef Kotus, Marcin Szykulski 2017-Springer This paper uses Active Appearance Model(AAM) and Hidden Markov Models(HMM) • Recognize in street noise • Babble noise dramatically worsens the accuracy of speech recognition 3 Extraction of Visual Features for Lipreading, Iain Matthews, Timothy F. Cootes, J. Andrew Bangham, Stephen Cox, Richard Harvey 2017-IEEE Transaction This paper uses Active shape model (ASM) and point distribution model (PDM) • Accuracy improved when a noisy audio signal is augmented with visual information • Poor performance in Babble noise 4 Audio-visual speech recognition using deep learning, Kuniaki Noda, Yuki Yamaguchi, Kazuhiro Nakadai, Hiroshi G. Okuno,Tetsuya Ogata 2014-Springer This paper uses Hidden Markov Model(HMM) • Increased Performance • Reverberation, illumination, and facial orientation, occur 5 Speaker-Independent Speech Recognition using Visual Features, Pooventhiran G., Sandeep A. 2020-IEEE This paper uses 3D-CNN model • Improved Accuracy • Complex LITERATURE SURVEY
  • 10. REFERENCES 11 [1] A. Thanda and S. M. Venkatesan, “Audio visual speech recognition using deep recurrent neural networks,” in IAPR workshop on multimodal pattern recognition of social signals in human- computer interaction. Springer, 2016 [2] E. Petajan, B. Bischoff, D. Bodoff, and N. M. Brooke, “An improved automatic lipreading system to enhance speech recognition,” in Proceedings of the SIGCHI conference on Human factors in computing systems, 1988 [3] A. Torfi, S. M. Iranmanesh, N. Nasrabadi, and J. Dawson, “3d convolutional neural networks for cross audio-visual matching recognition,” IEEE Access, vol. 5, pp. 22 081–22 091, 2017. [4] N. Alothmany, R. Boston, C. Li, S. Shaiman, and J. Durrant, “Classification of visemes using visual cues,” in Proceedings ELMAR-2010. IEEE, 2010. [5] I. Almajai, S. Cox, R. Harvey, and Y. Lan, “Improved speaker independent lip reading using speaker adaptive training and deep neural networks,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).