Mini Project Report
Mini Project Report
Department of
Mechatronics Engineering
2021-2022
1
CERTIFICATE
This is to certify that this report on Mini Project entitled,
(Guide)
2
DECLARATION
We declare that this written submission represents our ideas in our own words; we
have adequately cited and referenced the original sources. We also declare that we have
adhered to all principles of academic honesty and integrity and have not misrepresented or
fabricated any idea/data/fact/source in our submission. We understand that any violation of
the above may cause disciplinary action by the Institute and can also evoke penal action from
the sources which have not been properly cited.
Date: - 08 /06/2023
Place:-Kopargaon
Sr.No Name Signature
01 Pranavkumar Pravin Chavan
02 Kalyan Jayant Kale
03 Rajratna Avinash Naranje
04 Omkar Anil Shirke
3
Acknowledgement
"Microproject title” has been the opportunity to express ourselves technically. This
has proven to be a stepping stone which will be of immense help to us as we enter market.
We want to express our gratitude to everyone who helped us by giving moral support and by
solving our difficulties. Everyone has contributed immensely and helped us for the same unto
the completion of project.
We take this opportunity to express our deep sense of gratitude towards head of
department of Mechatronics Engineering Dr. R. A. Kapgate and our esteemed guide Dr.
Naveen Kumar for his expert guidance during preparation of this seminar. He has received
us whenever we required his help. In true sense of word we are grateful to him. We are highly
grateful to our subject coordinator Prof. Sidhant S. Kulkarni for extending all the facilities
in completing this seminar.
We would like to place our sincere thanks to all staff members of Electronics &
Telecommunication Department who have helped us directly or indirectly for our seminar
preparation. We would also like to thank all our friends, who helped us and initiated
discussion during the seminar. Last but not least; we want to acknowledge our beloved
parents, who have taken great pains for our education.
4
Today there is new era of knowledge, computers and artificial intelligent and AI means
artificial intelligent are helping the humans to in there everyday life to make the human life
easier. It may be in Health care, defence, education, agriculture, etc. According to
International Business Machine(I.B.M.) the A.I. can be define as leverages computers and
machines to mimic the problem-solving and decision-making capabilities of the human mind.
The Speech Recognition is also comes under the artificial intelligent concept. The
speech recognition software having capabilities to record and recognise the spoken words.
This software collects the voice from the surrounding and then store it in the audio file of
extension of .waw file. It adds the capabilities to understand the spoken human words so that
the so many desktop automation software are developed. In Speech Recognition software we
can give the voice command by just speaking, the software will record the sound in audio
format and recognize the spoken word capture in audio format and the it will understand the
spoken command by using some filters to remove the particular background noise then it will
take that desire action to automate the tasks. The very popular examples of the speech
recognition is Google assistant.
Procedure:
Here, we have developed the speech recognition software using python, and to make
it the following steps are given:
Step 1: Import the Necessary libraries like speech_recogntion, pyttsx3, datetime, Wikipedia,
webbrawser, os, win32print, etc.
Step 2: Initialise the important functions for accessing the mic, record speech, file_convertor,
printer, speak, takeCommand , etc.
Step 3: Initialise the condition like, if I say open google then it will say ok opening google
and then opened it using simple if – elif statements within while loop.
5
signal and improve its suitability for recognition. DSP algorithms such as filtering, noise
reduction, and normalization may be used to clean up the signal.
3. Feature extraction: In this step, relevant features of the speech signal are extracted to
represent the speech content. The most commonly used feature in speech recognition is the
Mel-frequency cepstral coefficients (MFCCs). MFCCs capture the spectral characteristics of
the speech signal by analyzing short frames of the signal and extracting the cepstral
coefficients that represent the shape of the vocal tract.
4. Acoustic modeling: Acoustic modeling involves creating statistical models that
represent the relationship between the extracted speech features and the corresponding
phonetic units (such as phonemes or subword units). Hidden Markov Models (HMMs) are
commonly used in acoustic modeling. These models learn the statistical patterns of speech
units and are trained using large amounts of labeled speech data.
5. Language modeling: Language modeling is used to predict the sequence of words or
phrases that are most likely to occur given a specific speech input. Language models
incorporate linguistic knowledge and statistical analysis of language patterns. They are used
to improve the accuracy of recognizing spoken words in the context of a particular language
or application domain.
6. Decoding: The decoding process involves matching the acoustic and language models
to find the most likely sequence of words or phrases that correspond to the input speech. This
is done using search algorithms such as the Viterbi algorithm. The decoded output represents
the recognized text or a sequence of recognized words.
It's important to note that speech recognition is a complex and challenging task, and DSP is
just one aspect of the overall process. Other techniques such as machine learning, neural
networks, and natural language processing (NLP) are also integral to modern speech
recognition systems.
Adavantages:
Speech recognition offers several advantages that make it a valuable technology in various
applications. Here are some key advantages of speech recognition:
1. Hands-free and convenient: One of the primary advantages of speech recognition is its
hands-free operation. Users can interact with devices and applications using their voice,
eliminating the need for manual input such as typing or navigating menus. This convenience
is particularly beneficial in situations where manual input is difficult or not feasible, such as
when driving, cooking, or physically impaired.
6
This advantage is particularly relevant for professionals who need to generate large volumes
of text regularly, such as writers, journalists, or transcriptionists.
Limitations:
While speech recognition technology offers numerous benefits, it also has some limitations
that can affect its performance and usability. Here are a few limitations of speech recognition:
1. Accuracy and error rate: Speech recognition systems may still have challenges in
accurately recognizing and transcribing speech, especially in noisy environments or with
speakers who have strong accents or speech impairments. Errors can occur in the recognition
process, leading to inaccuracies in transcriptions or misunderstood commands. Although
advances in technology have improved accuracy, achieving 100% accuracy in all scenarios
remains challenging.
2. Vocabulary limitations: Speech recognition systems typically work best with
predefined vocabularies or specific domains. Recognizing words or phrases outside of the
system's trained vocabulary can lead to errors or incorrect transcriptions. Systems may
struggle with uncommon or specialized terminology, acronyms, or names that are not part of
their training data. Expanding vocabulary coverage and adapting to new words or terms can
be a complex task.
3. Contextual understanding: Speech recognition systems may face difficulties in
understanding the context in which words or phrases are spoken. Language is often
ambiguous, and words can have multiple meanings depending on the context.
Disambiguating words and accurately interpreting the speaker's intended meaning can be
challenging, especially when relying solely on speech input without additional context cues.
4. Speaker dependence: Some speech recognition systems require initial training or
adaptation to a specific speaker's voice characteristics. This speaker dependence means that
the system may not perform as well for different speakers or require retraining when used by
multiple individuals. Handling speaker variations, accents, and speech idiosyncrasies across a
diverse user base remains a challenge.
5. Privacy concerns: Speech recognition involves capturing and processing audio data,
raising privacy concerns. Users may be apprehensive about their voice data being stored,
analyzed, or potentially misused. Ensuring the security and privacy of speech data is a critical
consideration for speech recognition technology providers.
7
It's important to note that ongoing research and development efforts are continuously
addressing these limitations, and the performance of speech recognition systems has
improved significantly over time.
Result:
As we experimented with the software, whenever we give command to the software it will
performed the given tasks to automate the desktop applications such as telling the
information from the Wikipedia about certain topic, opening google, google translator,
Discovery channel website and giving introduction from the voice command spoken.
Source code:
#import the require modules import pyttsx3 import
datetime import speech_recognition as sr import
wikipedia import webbrowser import os import
urllib.request import re from selenium import
webdriver from selenium.webdriver.support.ui import
WebDriverWait from selenium.webdriver.common.by import
By from selenium.webdriver.common.keys import Keys
import speech_recognition as sr import time import
win32print import tempfile import sounddevice
from scipy.io.wavfile import write import
soundfile import tkinter as Tk from tkinter
import * from tkinter import messagebox
from tkinter import scrolledtext,
filedialog from tkinter import filedialog,
font import ctypes, sys import pywhatkit
def texteditor():
mic()
8
root = Tk()
root.title("Text Editor")
root.geometry("660x660")
#CreatvMain Frame my_frame
= Frame(root)
my_frame.pack(pady=5)
#scrollbar text_scroll =
Scrollbar(my_frame)
text_scroll.pack()
#Create a text box my_text =
Text(my_frame, width=97, height=25,
selectbackground="yellow",
selectforeground="black",undo=True,yscrollcommand=text_scroll.set
) my_text.pack() #Configure our scrollbar
text_scroll.config(command=my_text.yview) root.mainloop()
def mic():
r = sr.Recognizer()
with sr.Microphone() as source:
print("Listening...")
r.pause_threshold = 1
audio = r.listen(source)
try:
print("Recognizing...")
data=r.recognize_google(audio, language =
'en_in') Text(data) print(f"User
said: {Text}\n")
except
Exception as e:
print(e)
print("Say that
again please") return "None"
return data
def record_speech(): fs = 44100 second = 10
print("recording.....") record_voice =
sounddevice.rec(int(second*fs),samplerate=fs,channels
=2) sounddevice.wait()
9
write("wuw.wav",44100,record_voice)
def
file_convert():
data, samplerate = soundfile.read('wuw.wav')
soundfile.write('mun.wav', data, samplerate, subtype='PCM_16')
def create_wordfile():
r=sr.Recognizer() audio_file =
sr.AudioFile("mun.wav") with audio_file
as source:
r.adjust_for_ambient_noise(source)
audio = r.record(source) result =
r.recognize_google(audio)
with open("mw.pdf","w") as file:
open("mw.pdf","r")
os.startfile("mw.pdf","print")
# initialisation pyttsx3
#print(voices[1].id) engine =
pyttsx3.init('sapi5') voices =
engine.getProperty('voices')
print(voices[1].id)
engine.setProperty('voices',voices[1].id
) def speak(audio):
engine.say(audio)
engine.runAndWait() def
10
wishMe():
hour =int(datetime.datetime.now().hour)
if hour>=0 and hour<12:
speak('Good morning sir, i am nova how can i help you')
elif hour>=12 and hour<18:
speak('Good Afternoon sir, i am niva how can i help you ')
else:
speak('Hello sir , I am NOVA , how can I help you ')
# intialisation of speach recognition
def takeCommand():
# It takes microphone input and returns string output
r = sr.Recognizer()
with sr.Microphone() as source:
print("Listening...")
r.pause_threshold = 1
audio = r.listen(source) try:
print("Recognizing...") query =
r.recognize_google(audio, language = 'en_in')
11
file.write(result)
12
print(f"User said: {query}\n")
except
Exception as e:
print(e)
13
print("Say that again please")
return "None"
return query
if __name__ == "__main__":
wishMe()
# Creating the document
while True:
query = takeCommand().lower()
if 'nova' in query:
print(query)
elif"editor" in query:
speak("Ok. opening the text editor")
texteditor()
14
audio" in query:
speak("okay recording the audio")
15
# open the google colab
elif 'colab' in query:
speak('opening google colab')
webbrowser.open('research.google.com')
16
search","") query = query.replace("google","") speak("sir
i found some results on internet related to your search")
pywhatkit.search(query)
#elif 'LinkedIn' in query:
# speak("ok sir opening your linkedin profile")
# webbrowser.open("https://in.linkedin.com/")
try:
result =
googleScrap.summary(quary,3)
speak(result) except: pass
17
Some Of The Main Functions And Features In The Code:
texteditor(): This function opens a text editor using the Tkinter library.
mic(): This function uses speech recognition to convert speech input from the user into text.
record_speech(): This function records audio from the microphone using the sounddevice
module.
file_convert(): This function converts the recorded audio file to a different format using the
soundfile module.
create_wordfile(): This function converts the recorded audio file into text and saves it as a
Word
document.
speak(): This function uses the pyttsx3 module to convert text into speech.
wishMe(): This function greets the user based on the current time.
takeCommand(): This function uses speech recognition to take voice input from the user.
The main part of the code uses a while loop to continuously listen for user commands and
perform the corresponding actions based on the recognized commands.
Reference:
18
https://realpython.com/python-speech-recognition/
https://www.ibm.com/topics/artificialintelligence#:~:text=Artificial%20intelligence
%20leverages%20computers%20and,capabiliti es%20of%20the%20human%20mind
https://www.youtube.com/watch?v=Lp9Ftuq2sVI https://www.wikipedia.com
19