0% found this document useful (0 votes)
101 views

Mini Project Report

The document provides details about a student group project on speech recognition using Python. It includes an introduction to speech recognition, the procedures followed in the project, how speech recognition works, and the advantages of speech recognition technology.

Uploaded by

Kalyan Kale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views

Mini Project Report

The document provides details about a student group project on speech recognition using Python. It includes an introduction to speech recognition, the procedures followed in the project, how speech recognition works, and the advantages of speech recognition technology.

Uploaded by

Kalyan Kale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

SRES’ SANJIVANI COLLEGE OF

ENGINEERING, KOPARGAON 423


603(M.S.)

Department of
Mechatronics Engineering
2021-2022

MICRO PROJECT REPORT


ON
“Speech Recognition using Python”
GROUP NO: 09

1
CERTIFICATE
This is to certify that this report on Mini Project entitled,

Speech recognition using Python


Submitted by,
Srno. Roll no. Name PRN No.
01 17 Pranavkumar Pravin Chavan UMX20M1017
02 28 Kalyan Jayant Kale UMX20M1028
03 37 Rajratna Avinash Naranje UMX20M1037
04 49 Omkar Anil Shirke UMX20M1049
For the partial fulfilment of the requirements of Second Year (Mechatronics Engineering)
degree of the Sanjivani COE, Kopargaon embodies the work done by them under our
guidance and supervision in the academic year 2021-2022

Dr. Naveen Kumar

(Guide)

Prof. R. A. Kapgate Dr. A. G. Thakur


Head, Dept. of Mechatronics Engg. Director, SCOE

2
DECLARATION
We declare that this written submission represents our ideas in our own words; we
have adequately cited and referenced the original sources. We also declare that we have
adhered to all principles of academic honesty and integrity and have not misrepresented or
fabricated any idea/data/fact/source in our submission. We understand that any violation of
the above may cause disciplinary action by the Institute and can also evoke penal action from
the sources which have not been properly cited.

Date: - 08 /06/2023
Place:-Kopargaon
Sr.No Name Signature
01 Pranavkumar Pravin Chavan
02 Kalyan Jayant Kale
03 Rajratna Avinash Naranje
04 Omkar Anil Shirke

3
Acknowledgement

"Microproject title” has been the opportunity to express ourselves technically. This
has proven to be a stepping stone which will be of immense help to us as we enter market.
We want to express our gratitude to everyone who helped us by giving moral support and by
solving our difficulties. Everyone has contributed immensely and helped us for the same unto
the completion of project.
We take this opportunity to express our deep sense of gratitude towards head of
department of Mechatronics Engineering Dr. R. A. Kapgate and our esteemed guide Dr.
Naveen Kumar for his expert guidance during preparation of this seminar. He has received
us whenever we required his help. In true sense of word we are grateful to him. We are highly
grateful to our subject coordinator Prof. Sidhant S. Kulkarni for extending all the facilities
in completing this seminar.
We would like to place our sincere thanks to all staff members of Electronics &
Telecommunication Department who have helped us directly or indirectly for our seminar
preparation. We would also like to thank all our friends, who helped us and initiated
discussion during the seminar. Last but not least; we want to acknowledge our beloved
parents, who have taken great pains for our education.

1. Pranavkumar Pravin Chavan


2. Kalyan Jayant Kale
3. Rajratna Avinash Naranje
4. Omkar Anil Shirke

Speech Recognition software


Introduction:

4
Today there is new era of knowledge, computers and artificial intelligent and AI means
artificial intelligent are helping the humans to in there everyday life to make the human life
easier. It may be in Health care, defence, education, agriculture, etc. According to
International Business Machine(I.B.M.) the A.I. can be define as leverages computers and
machines to mimic the problem-solving and decision-making capabilities of the human mind.
The Speech Recognition is also comes under the artificial intelligent concept. The
speech recognition software having capabilities to record and recognise the spoken words.
This software collects the voice from the surrounding and then store it in the audio file of
extension of .waw file. It adds the capabilities to understand the spoken human words so that
the so many desktop automation software are developed. In Speech Recognition software we
can give the voice command by just speaking, the software will record the sound in audio
format and recognize the spoken word capture in audio format and the it will understand the
spoken command by using some filters to remove the particular background noise then it will
take that desire action to automate the tasks. The very popular examples of the speech
recognition is Google assistant.

Procedure:
Here, we have developed the speech recognition software using python, and to make
it the following steps are given:
Step 1: Import the Necessary libraries like speech_recogntion, pyttsx3, datetime, Wikipedia,
webbrawser, os, win32print, etc.
Step 2: Initialise the important functions for accessing the mic, record speech, file_convertor,
printer, speak, takeCommand , etc.
Step 3: Initialise the condition like, if I say open google then it will say ok opening google
and then opened it using simple if – elif statements within while loop.

Step 4: Initialise the conditions of you want in given code.

How the Speech Recognition works?


Speech recognition, also known as automatic speech recognition (ASR), is a
technology that converts spoken language into written text. The process of speech recognition
involves several steps, including digital signal processing (DSP) techniques. Here's a high-
level overview of how speech recognition works with respect to DSP:
1. Signal acquisition: The first step is to capture the speech signal using a microphone or
other audio input device. The analog signal is then converted into a digital signal through an
analogto-digital converter (ADC). This digitized speech signal serves as the input for further
processing.
2. Preprocessing: The digitized speech signal may contain noise, variations in volume,
and other distortions. Preprocessing techniques are applied to enhance the quality of the

5
signal and improve its suitability for recognition. DSP algorithms such as filtering, noise
reduction, and normalization may be used to clean up the signal.
3. Feature extraction: In this step, relevant features of the speech signal are extracted to
represent the speech content. The most commonly used feature in speech recognition is the
Mel-frequency cepstral coefficients (MFCCs). MFCCs capture the spectral characteristics of
the speech signal by analyzing short frames of the signal and extracting the cepstral
coefficients that represent the shape of the vocal tract.
4. Acoustic modeling: Acoustic modeling involves creating statistical models that
represent the relationship between the extracted speech features and the corresponding
phonetic units (such as phonemes or subword units). Hidden Markov Models (HMMs) are
commonly used in acoustic modeling. These models learn the statistical patterns of speech
units and are trained using large amounts of labeled speech data.
5. Language modeling: Language modeling is used to predict the sequence of words or
phrases that are most likely to occur given a specific speech input. Language models
incorporate linguistic knowledge and statistical analysis of language patterns. They are used
to improve the accuracy of recognizing spoken words in the context of a particular language
or application domain.
6. Decoding: The decoding process involves matching the acoustic and language models
to find the most likely sequence of words or phrases that correspond to the input speech. This
is done using search algorithms such as the Viterbi algorithm. The decoded output represents
the recognized text or a sequence of recognized words.
It's important to note that speech recognition is a complex and challenging task, and DSP is
just one aspect of the overall process. Other techniques such as machine learning, neural
networks, and natural language processing (NLP) are also integral to modern speech
recognition systems.

Adavantages:
Speech recognition offers several advantages that make it a valuable technology in various
applications. Here are some key advantages of speech recognition:

1. Hands-free and convenient: One of the primary advantages of speech recognition is its
hands-free operation. Users can interact with devices and applications using their voice,
eliminating the need for manual input such as typing or navigating menus. This convenience
is particularly beneficial in situations where manual input is difficult or not feasible, such as
when driving, cooking, or physically impaired.

2. Increased productivity: Speech recognition can significantly enhance productivity by


allowing users to dictate text or commands instead of typing them. It can be much faster to
speak than to type, enabling users to create documents, emails, or messages more efficiently.

6
This advantage is particularly relevant for professionals who need to generate large volumes
of text regularly, such as writers, journalists, or transcriptionists.

3. Accessibility and inclusivity: Speech recognition plays a crucial role in making


technology accessible to individuals with disabilities or those who have difficulty using
traditional input methods. It enables people with mobility impairments or conditions like
arthritis or carpal tunnel syndrome to interact with devices and computers effectively.
Additionally, individuals with visual impairments can benefit from speech recognition as it
provides an alternative means of input and control.

Limitations:
While speech recognition technology offers numerous benefits, it also has some limitations
that can affect its performance and usability. Here are a few limitations of speech recognition:
1. Accuracy and error rate: Speech recognition systems may still have challenges in
accurately recognizing and transcribing speech, especially in noisy environments or with
speakers who have strong accents or speech impairments. Errors can occur in the recognition
process, leading to inaccuracies in transcriptions or misunderstood commands. Although
advances in technology have improved accuracy, achieving 100% accuracy in all scenarios
remains challenging.
2. Vocabulary limitations: Speech recognition systems typically work best with
predefined vocabularies or specific domains. Recognizing words or phrases outside of the
system's trained vocabulary can lead to errors or incorrect transcriptions. Systems may
struggle with uncommon or specialized terminology, acronyms, or names that are not part of
their training data. Expanding vocabulary coverage and adapting to new words or terms can
be a complex task.
3. Contextual understanding: Speech recognition systems may face difficulties in
understanding the context in which words or phrases are spoken. Language is often
ambiguous, and words can have multiple meanings depending on the context.
Disambiguating words and accurately interpreting the speaker's intended meaning can be
challenging, especially when relying solely on speech input without additional context cues.
4. Speaker dependence: Some speech recognition systems require initial training or
adaptation to a specific speaker's voice characteristics. This speaker dependence means that
the system may not perform as well for different speakers or require retraining when used by
multiple individuals. Handling speaker variations, accents, and speech idiosyncrasies across a
diverse user base remains a challenge.
5. Privacy concerns: Speech recognition involves capturing and processing audio data,
raising privacy concerns. Users may be apprehensive about their voice data being stored,
analyzed, or potentially misused. Ensuring the security and privacy of speech data is a critical
consideration for speech recognition technology providers.

7
It's important to note that ongoing research and development efforts are continuously
addressing these limitations, and the performance of speech recognition systems has
improved significantly over time.

Result:
As we experimented with the software, whenever we give command to the software it will
performed the given tasks to automate the desktop applications such as telling the
information from the Wikipedia about certain topic, opening google, google translator,
Discovery channel website and giving introduction from the voice command spoken.

Source code:
#import the require modules import pyttsx3 import
datetime import speech_recognition as sr import
wikipedia import webbrowser import os import
urllib.request import re from selenium import
webdriver from selenium.webdriver.support.ui import
WebDriverWait from selenium.webdriver.common.by import
By from selenium.webdriver.common.keys import Keys
import speech_recognition as sr import time import
win32print import tempfile import sounddevice
from scipy.io.wavfile import write import
soundfile import tkinter as Tk from tkinter
import * from tkinter import messagebox
from tkinter import scrolledtext,
filedialog from tkinter import filedialog,
font import ctypes, sys import pywhatkit
def texteditor():
mic()

8
root = Tk()
root.title("Text Editor")
root.geometry("660x660")
#CreatvMain Frame my_frame
= Frame(root)
my_frame.pack(pady=5)
#scrollbar text_scroll =
Scrollbar(my_frame)
text_scroll.pack()
#Create a text box my_text =
Text(my_frame, width=97, height=25,
selectbackground="yellow",
selectforeground="black",undo=True,yscrollcommand=text_scroll.set
) my_text.pack() #Configure our scrollbar
text_scroll.config(command=my_text.yview) root.mainloop()
def mic():
r = sr.Recognizer()
with sr.Microphone() as source:
print("Listening...")
r.pause_threshold = 1
audio = r.listen(source)

try:
print("Recognizing...")
data=r.recognize_google(audio, language =
'en_in') Text(data) print(f"User
said: {Text}\n")
except
Exception as e:
print(e)
print("Say that
again please") return "None"
return data
def record_speech(): fs = 44100 second = 10
print("recording.....") record_voice =
sounddevice.rec(int(second*fs),samplerate=fs,channels
=2) sounddevice.wait()

9
write("wuw.wav",44100,record_voice)
def
file_convert():
data, samplerate = soundfile.read('wuw.wav')
soundfile.write('mun.wav', data, samplerate, subtype='PCM_16')

def create_wordfile():
r=sr.Recognizer() audio_file =
sr.AudioFile("mun.wav") with audio_file
as source:

r.adjust_for_ambient_noise(source)
audio = r.record(source) result =
r.recognize_google(audio)
with open("mw.pdf","w") as file:
open("mw.pdf","r")
os.startfile("mw.pdf","print")

# initialisation pyttsx3
#print(voices[1].id) engine =
pyttsx3.init('sapi5') voices =
engine.getProperty('voices')
print(voices[1].id)
engine.setProperty('voices',voices[1].id
) def speak(audio):
engine.say(audio)
engine.runAndWait() def

10
wishMe():
hour =int(datetime.datetime.now().hour)
if hour>=0 and hour<12:
speak('Good morning sir, i am nova how can i help you')
elif hour>=12 and hour<18:
speak('Good Afternoon sir, i am niva how can i help you ')
else:
speak('Hello sir , I am NOVA , how can I help you ')
# intialisation of speach recognition
def takeCommand():
# It takes microphone input and returns string output
r = sr.Recognizer()
with sr.Microphone() as source:
print("Listening...")
r.pause_threshold = 1
audio = r.listen(source) try:
print("Recognizing...") query =
r.recognize_google(audio, language = 'en_in')

11
file.write(result)

#Taking the printout def printer():

12
print(f"User said: {query}\n")

except
Exception as e:
print(e)

13
print("Say that again please")
return "None"
return query

if __name__ == "__main__":
wishMe()
# Creating the document

while True:
query = takeCommand().lower()
if 'nova' in query:
print(query)

# Searhing the result on wikipedia


elif 'wikipedia' in query:
speak('Searching wikipedia') query =
query.replace("wikipedia", "") results =
wikipedia.summary(query, sentences=2)
speak("According to wikipedia") print(results)
speak(results)

elif"editor" in query:
speak("Ok. opening the text editor")
texteditor()

# create the document


elif "create document" in query:
speak("okay creating the
document") record_speech()
file_convert() elif "create file" in
query:
speak("creating")
create_wordfile()
elif "record

14
audio" in query:
speak("okay recording the audio")

# take printout of hard copy


elif 'print' in query:
speak("okay printing the document")
printer()

# open the youtube


elif 'open youtube' in query:
speak('opening yourube')
webbrowser.open('youtube.com')

# open the google elif


'open google' in query:
speak('opening google')
webbrowser.open('google.com')
#open the stack over flow website
elif 'open stack overflow' in query:
speak('opening stack overflow')
webbrowser.open('stackoverflow.com')

# open gmail elif 'open


gmail' in query:
speak('opening gmail')
webbrowser.open('gmail.com')

# open google map elif


'open map' in query:
speak('opening map')
webbrowser.open('maps.google.com')
# tell the time
elif 'the time' in query:
strTime = datetime.datetime.now().strftime("%H:%M:%S")
speak(f"sir the time is{strTime}")

# open the google translator


elif 'open translater' in query:
speak('opening google translator')
webbrowser.open('translate.google.co.in')

15
# open the google colab
elif 'colab' in query:
speak('opening google colab')
webbrowser.open('research.google.com')

# open the jupiter notebook


elif 'jupiter' in query:
speak('opening jupiter')
webbrowser.open('jupiter.org')

# open the google news


elif 'news' in query:
speak('opening google news')
webbrowser.open('news.google.com') #
open the movie verse elif 'movie' in
query:
speak('opening moviesverse')
webbrowser.open('themoviesverse.co')

# open website of discovery elif 'discovery' in


query: speak('opening official website of discovery
channel')
webbrowser.open('discovery.com')

# Introduction of nova sir elif 'introduce yourself' in query:


engine. setProperty("rate", 150) speak('Hello!, I am nova. Means
new origine of vertual assistant. This is the age of computers, robotics and
knowledge and in this time AI is playing very big roll to help the humans in
every field in their everyday task to make things easier. So the developer
has developed me to help the users to solve their everyday problem in their
life to make life easy. So i hope you will enjoy with me. Thank you! ')

# Search on the google elif 'google search ' in query:


speak('searching on google') query = query.replace("google

16
search","") query = query.replace("google","") speak("sir
i found some results on internet related to your search")
pywhatkit.search(query)
#elif 'LinkedIn' in query:
# speak("ok sir opening your linkedin profile")
# webbrowser.open("https://in.linkedin.com/")

try:
result =
googleScrap.summary(quary,3)
speak(result) except: pass

17
Some Of The Main Functions And Features In The Code:

texteditor(): This function opens a text editor using the Tkinter library.

mic(): This function uses speech recognition to convert speech input from the user into text.

record_speech(): This function records audio from the microphone using the sounddevice
module.

file_convert(): This function converts the recorded audio file to a different format using the
soundfile module.

create_wordfile(): This function converts the recorded audio file into text and saves it as a
Word
document.

printer(): This function opens and prints a PDF document.

speak(): This function uses the pyttsx3 module to convert text into speech.

wishMe(): This function greets the user based on the current time.

takeCommand(): This function uses speech recognition to take voice input from the user.

The main part of the code uses a while loop to continuously listen for user commands and
perform the corresponding actions based on the recognized commands.

Reference:

18
https://realpython.com/python-speech-recognition/
https://www.ibm.com/topics/artificialintelligence#:~:text=Artificial%20intelligence
%20leverages%20computers%20and,capabiliti es%20of%20the%20human%20mind
https://www.youtube.com/watch?v=Lp9Ftuq2sVI https://www.wikipedia.com

19

You might also like