Project Report Sample 1
Project Report Sample 1
Term Project
Faculty:
Lecturer
ECE Department
Spring, 2021
Acknowledgement
In this section write 2-3 lines acknowledging the individuals or websites or organizations who have
helped you building your project, provided you with data or any kind of required information (if
For example –
First of all, we would like to express our profound gratitude to our honorable course instructor, Dr.
Hasan Uz Zaman, for his constant and meticulous supervision, valuable suggestions, his patience
and encouragement to complete the thesis work. We would also like to thank the ECE department
of North South University for providing us with the opportunity to have an industrial level design
experience as part of our curriculum for the undergraduate program. Finally, we would like to thank
our families and everybody who supported us and provided with guidance for the completion of this
project.
II
Abstract
In this section write the summary of your project i.e. why are you doing this project, what will be
the probable application of this project, what kind of engineering problem are you solving with your
project (e.g. If you build a filter, that will help you reducing the noise, getting read of the unwanted
signals and so on. P.S. Do not use my example exactly if your project is a filter design), what
For example -
In this report we present a robot with twelve degree of freedom which has the capability of
transporting limited sized objects from one place to another with pick-up and drop capabilities. The
robot’s responses are based on speech recognition of verbal commands. In our project we have used
Google speech recognition module, as well as our own speech processing software, in order to
understand verbal commands. Then we compared the results of the two. We also categorized the
objects into six specific categories according to the amount of gripping force required to lift the
objects. The categories are divided according to object stiffness. An android application was used to
communicate with the robot through Bluetooth communication. The application decodes the human
speech into an array of characters which are transmitted to the robot using Bluetooth technology.
The robot uses microcontroller which decodes the messages into executable functions. In this
project we have designed the robot such that it understands only fifteen distinct verbal commands
and ignores others. We also created a device which can locate the source of the sound. The key idea
is to locate sound source in 3 dimensional system using Inverse square law of sound propagation.
We used inverse square law of sound propagation for distance calculation through amplitude
measuring from three microphones. We calculated the frequency of the sound using Digital Signal
Processing and auto correlation. This frequency was applied in Inverse square law of sound
propagation to find the distance of the source from a specific microphone. Being a speech
III
responsive mobile robot it can be effectively used to move objects from one place to another by
IV
Appendices.........................................................................................................................................82
Appendix A: Copy your entire code (if any)…………………………………………………….83
V
List of Figures (if any)
VI
List of Tables (if any)
VII
Chapter 1
Project Overview
1
1.1 Introduction
A speech responsive robot is capable of responding to human voice. Human speech is challenging to
interpret as it requires both speech processing and artificial intelligence and is a continuous process
through continuous learning. Google speech recognition is one of the finest speech recognition
software available in the market which is also open source. Almost all internet users have used
Google speech recognition system. Most of the uses of Google Speech recognition is to search online
through voice. But very few have used Google Speech Recognition API for mechanical control,
robot control etc. In our paper we present a human speech responsive robot with multiple
functionality. We used both Google Speech Recognition module and developed our own software for
speech recognition, and compared the results of both. We used an android application to
communicate with the robot. We also categorized the force required to lift the objects into specific
six categories. The robot has the capability to transport objects from one place to another with the
help of a gripper. The robot can pick and drop limited sized objects and move according to speech
command. The android application processes the speech and sends the message to the
microcontroller which is processed to executable functions. The robot is also capable of locating the
digital signal processing is called speech processing. Aspects of speech processing includes the
acquisition, manipulation, storage, transfer and output of speech signals. The input is called speech
Speech recognition involves techniques for recognizing and translating spoken language to text. It is
the ability of a machine or program to identify words and phrases in spoken language and convert
2
vocabulary of words and phrases, and it may only identify these if they are spoken very clearly.
modeling represents the relationship between linguistic units of speech and audio signals; language
modeling matches sounds with word sequences to help distinguish between words that sound similar.
Often, hidden Markov models are used as well to recognize temporal patterns in speech to improve
having to use a keyboard, mouse, or press any buttons. Today, automatic speech recognition
programs are used in many industries, including Healthcare, Military (e.g. F-16 fighter jets),
Telecommunications, and personal computing (i.e. hands-free computing). The most frequent
applications of speech recognition within the enterprise include call routing, speech-to-text
continuously developed. The pros of speech recognition software are it is easy to use and readily
available. Speech recognition software is now frequently installed in computers and mobile devices,
The downside of speech recognition includes its inability to capture words due to variations of
pronunciation, its lack of support for most languages outside of English and its inability to sort
3
1.2.4 Performance
Speech recognition performance is measured by accuracy and speed. Accuracy is measured with
word error rate. WER works at the word level and identifies inaccuracies in transcription, although
it cannot identify how the error occurred. Speed is measured with the real-time factor. A variety of
factors can affect computer speech recognition performance, including pronunciation, accent, pitch,
CMU Sphinx
Julius
Kaldi
Open-source applications that provide convenient user interfaces for the above:
Simon
Jasper project
Voice Notebook
SpeechTexter
Speechnotes
Trint
4
Many cell-phone handsets have basic dial-by-voice features built in. Smartphones such as iPhones
and BlackBerrys also support this. A number of third-party apps have implemented natural-
Indigo: Virtual assistant for android, iOS, and WP, by Artificial Solutions
Windows also has built in speech recognition software such as Cortana which was mentioned
above, Windows Speech Recognition, other add-ons and third-party apps for voice recognition in
at the moment, which can understand natural human speech and act accordingly. There are some
robots that are remote controlled, however, a robot with voice control is very rare. The robot we
built acts according to the command given by the human with speech, and it responds in real-time.
5
Locate the source of the sound, i.e. locate where the speaker is
1.3.2 Difficulty
The level of difficulty of this project was very high, as speech recognition alone is a huge project,
and understanding the speech and to act accordingly in real-time is not easy. We were not planning
to use any existing speech processing software, since none offered sufficient accuracy. Rather we
planned to build our own software, which we eventually did. First we had to develop the software
for the speech processing, and then we built the hardware components. We also had to develop the
theory of how we could detect the location of the speaker using three microphones. We had built the
robot from scratch by ourselves, and we also designed and built a unique gripper for the robot for
picking up and dropping objects. Finally, we had to synchronize both the hardware and software
parts in order to build the robot. We also implemented the system using Google Speech Recognition
1.4 Motivation
Technology has always been developed very rapidly by and for the able people. Upon close
speculation, we can see that most of the modern day innovations are for the able bodied, and the
physically disabled people have always been on the sidelines. Our objective was to design, develop
and build a robot that would provide a useful and efficient means of doing daily work for the
disabled people with ease. It can be used for those unable to move, to give voice commands so that
the robot can do the work, carry limited sized objects, and help the person.
Also, a robot that can help us do our work easily without moving our hands would be industrially
profitable. It can even aid people in difficult and unsafe situations such as rescue operations in
disaster zones, and it can be used to diffuse bombs or to help doctors as a third hand in medical
procedures.
6
Development in this field can open up boundless possibilities and a new era in robotics. It can result
in many new applications that can be very useful and have a great impact on the lives of people.
1.5 Summary
In this chapter, we have briefly described the basics of speech recognition, existing speech
recognition software, and the main idea on which our project was built. We have described the
capabilities of the robot, what motivated us to design and build this system, and our
accomplishments in here. The following chapters describe the theory and details of the components
used, the mechanical description, designs, and the overall structure of the system.
7
Chapter 2
Related work
8
2.1 Introduction
The existing work related to mobile robot with speech recognition, and sound source localization
that we had discovered and found useful, are described in this chapter. As we searched for similar
systems, we found that there are very few existing systems where both audio processing and robot
implementation are synchronized together. We went through some similar projects which gave us
some insights. We also searched for systems and existing papers to localize the sound source, in
order for the robot to locate the source of the verbal command. However, most of the papers we
found described locating the direction of the sound source, but did not give the distance.
Studies based on speech recognition on robot control were very few. One of the studies (Robot-by-
voice: Experiments on commanding an industrial robot using the human voice) used Microsoft
speech to conduct the robotic function [1]. They used Microsoft speech to conduct speech
communication with the robot. The robot was stationary and was solely based on a single mechanical
hand which has the ability to move, grip, pick-up and drop objects in specific area near the
Voice Automated Mobile Robot gives us the idea of voice automated mobile robot [2]. Their
research was based on the idea of communication of robot through voice. Most of the paper was
based on theory which gives us general idea of speech recognition of machine but the paper hardly
Some researches other than robot control was also conducted based on speech recognition. Design
of a Voice controlled Smart Wheelchair used CMU Sphinx to decode the human speech and used
Google glass for communication [3]. Their technology was focused on communication with the
wheel chair. It used CMU sphinx for speech processing. They achieved partial success in building
9
2.2.1 Robot-by-voice: Experiments on commanding an
industrial robot using the human voice
This paper reports a few results of an ongoing research project that aims to explore ways to
command an industrial robot using the human voice. A demonstration is presented using two
industrial robots and a personal computer (PC) equipped with a sound board and a headset
microphone. The demonstration was coded using the Microsoft Visual Basic and C#.NET 2003
and associated with two simple robot applications: one capable of picking ‐and ‐placing objects
and going to predefined positions, and the other capable of performing a simple linear weld on
a work‐piece. The speech recognition grammar is specified using the grammar builder from the
Microsoft Speech SDK 5.1. The paper also introduces the concepts of text ‐to ‐speech
translation and voice recognition, and shows how these features can be used with applications
is controlled through connected speech input. The language input allows a user to interact with the
robot which is familiar to most of the people. The advantages of speech activated robots are hands-
free and fast data input operations. In future, it is expected that speech recognition systems will be
used as man-machine interface for robots in rehabilitation, entertainment etc. In view of this,
aforementioned system is a source of learning process for a mobile robot which takes speech input
as commands and performs some navigation task through a distinct man-machine interaction with
the application of the learning. The speech recognition system is trained in such a way that it
recognizes defined commands and the designed robot navigates based on the instruction through the
Speech Commands. The medium of interaction between humans and computers is on the processing
of speech (words uttered by the person). . The complete system consists of three sub-systems, the
speech recognition system, a central controller and the robot .we have studied the various factors
10
such as noise which interferes speech recognition and distance factor. The results prove that
system. Proposed design supports voice activation system for physically disabled persons
incorporating manual operation. Arduino microcontroller and speaker dependent voice recognition
processor have been used to support the navigation of the wheel chair. The direction and velocity of
the chair are controlled by pre-defined Arabic voice commands. The speaker dependent, isolated
word recognition system (IWRS) for a definite utterance of Arabic words to suit the patient's
requirements has been programmed and successfully demonstrated. The technique of speech signal
processing for extraction of sound parameters, noise removal, intensity and normalization of time ,
and features matching etc. have been done with the speech processor HM2007that being embedded
efficiently in real time. Arduino receives the coded digital signals from the IWRS which being
properly recognizes voice commands in order to control the function of the chair accordingly. The
wheelchair does not respond to a false speech command. The overall mechanical assembly is driven
reduction gear with built-in locking control. The system is tested using a speech password to start
operation and seven Arabic commands to control motion: "Amam (forward), Saree'(fast), Batee'
(slow), Khalf (backward), Yameen (right), Yesar (left), Tawaqaf (stop)". It proved a good working
of a source through visual ways using triangulation and other positioning techniques but using a
Some researches were conducted to locate the source of sound. One of the significant research was
done using time delay of arrival (TODA) technique [1]. This paper (Robust Sound Source
11
Localization Using a Microphone Array on a Mobile Robot) calculated the arrival time of sound
and gives an accuracy of 3 degree. However it has a limitation of 3 meter i.e. it can locate sound
Using Microphone Arrays positioned in a special orientation was used to locate the sound source [3],
in Localization Estimation of Sound Source by Microphones Array. But limitation was that it
does not locate the source position respect to the microphone arrays. This paper gave us an
estimation of direction of sound source but did not provide us any data on distance.
Spherical Microphone Array.) used spherical microphone and Graphics Processing Unit (GPU)
and SRP-PHAT algorithm for sound source localization [2]. This paper provides us a better
estimation of sound source compare to the previous ones but this process is very expensive as
multiple array of microphones were used to create the spherical microphone, it requires GPU to do
the estimation and it has a limitation of localizing the position of the sound source. This paper has
very high accuracy of localizing the sound source direction but has limitation in localizing distance.
Using blind source estimation to estimate the sound source was also conducted in a research [4]. This
paper was titled Real-time multiple sound source localization using a circular microphone array
based on single-source confidence measures. It also has some limitations. It only focuses on
direction and uses circular arrays but does not locate the distance.
microphones was presented. The method is based on time delay of arrival estimation. Results show
that a mobile robot can localize in real time different types of sound sources over a range of 3 meters
12
2.3.2 Localization Estimation of Sound Source by
Microphones Array
In this paper, we studied a localization estimation of sound source angle and distance by plane
microphones array. We place the microphones on the peaks of equilateral triangle and square,
estimate sound source angle and distance that from source to microphone according to different
delays that from source to each microphones. We research an orientation segmentation method by
analyzing the delay characteristics and a quick estimation algorithm to reduce the computational
improving estimation accuracy. The system can be used for counter-terrorism, etc. This paper is
discussed theoretically and verified with the new method with experimental data.
systems for their applications in sound field analysis, beamforming, spatial audio, etc. The
positioning of target and interfering sound sources is a crucial step in many of the above
applications. Therefore, 3D sound source localization is a highly relevant topic in the acoustic signal
processing field. However, spherical microphone arrays are usually composed of many microphones
and running signal processing localization methods in real time is an important issue. Some works
have already shown the potential of Graphic Processing Units (GPUs) for developing high-end real-
time signal processing systems. New embedded systems with integrated GPU accelerators providing
low power consumption are becoming increasingly relevant. These novel systems play a very
important role in the new era of smartphones and tablets, opening further possibilities to the design
system using a spherical microphone array fully implemented on an embedded GPU. The real-time
13
capabilities of these platforms are analyzed, providing also a performance analysis of the localization
circular array, in order to suppress the localization ambiguities faced with linear arrays, and
assuming a weak sound source sparsity which is derived from blind source separation methods. The
proposed method performs very well both in simulations and in real conditions at 50% real-time.
2.4 Summary
The existing work related to speech recognition on mobile robot and sound source detection that we
found useful have been briefly described in this section. The next chapter elaborates more on the
14
Chapter 3
Theory
15
3.1 Introduction
The details of the theory of our system are discussed in this chapter. The theoretical explanation is
1. Speech recognition
complex steps. When you speak, you create vibrations in the air. The analog-to-digital converter
(ADC) translates this analog wave into digital data that the computer can understand. To do this,
it samples, or digitizes, the sound by taking precise measurements of the wave at frequent intervals.
The system filters the digitized sound to remove unwanted noise, and sometimes to separate it into
different bands of frequency (frequency is the wavelength of the sound waves, heard by humans as
differences in pitch). It also normalizes the sound, or adjusts it to a constant volume level. It may
also have to be temporally aligned. People don't always speak at the same speed, so the sound must
be adjusted to match the speed of the template sound samples already stored in the system's
memory.
Next the signal is divided into small segments as short as a few hundredths of a second, or even
thousandths in the case of plosive consonant sounds -- consonant stops produced by obstructing
airflow in the vocal tract – like "p" or "t." The program then matches these segments to
representation of the sounds we make and put together to form meaningful expressions. There are
roughly 40 phonemes in the English language (different linguists have different opinions on the
The next step seems simple, but it is actually the most difficult to accomplish and is the focus of
most speech recognition research. The program examines phonemes in the context of the other
phonemes around them. It runs the contextual phoneme plot through a complex statistical model
16
and compares them to a large library of known words, phrases and sentences. The program then
determines what the user was probably saying and either outputs it as text or issues a computer
command.
Early speech recognition systems tried to apply a set of grammatical and syntactical rules to speech.
If the words spoken fit into a certain set of rules, the program could determine what the words were.
However, human language has numerous exceptions to its own rules, even when it's spoken
consistently. Accents, dialects and mannerisms can vastly change the way certain words or phrases
are spoken. Imagine someone from Boston saying the word "barn." He wouldn't pronounce the "r"
at all, and the word comes out rhyming with "John." Or consider the sentence, "I'm going to see the
ocean." Most people don't enunciate their words very carefully. The result might come out as "I'm
goin' da see tha ocean." They run several of the words together with no noticeable break, such as
"I'm goin'" and "the ocean." Rules-based systems were unsuccessful because they couldn't handle
these variations. This also explains why earlier systems could not handle continuous speech – you
had to speak each word separately, with a brief pause in between them.
Today's speech recognition systems use powerful and complicated statistical modeling systems.
These systems use probability and mathematical functions to determine the most likely outcome.
According to John Garofolo, Speech Group Manager at the Information Technology Laboratory of
the National Institute of Standards and Technology, the two models that dominate the field today
are the Hidden Markov Model and neural networks. These methods involve complex mathematical
functions, but essentially, they take the information known to the system to figure out the
The Hidden Markov Model is the most common, so we'll take a closer look at that process. In this
model, each phoneme is like a link in a chain, and the completed chain is a word. However, the
chain branches off in different directions as the program attempts to match the digital sound with
17
the phoneme that's most likely to come next. During this process, the program assigns a probability
score to each phoneme, based on its built-in dictionary and user training.
This process is even more complicated for phrases and sentences – the system has to figure out
where each word stops and starts. The classic example is the phrase "recognize speech," which
sounds a lot like "wreck a nice beach" when you say it very quickly. The program has to analyze the
phonemes using the phrase that came before it in order to get it right.
If a program has a vocabulary of 60,000 words (common in today's programs), a sequence of three
words could be any of 216 trillion possibilities. Obviously, even the most powerful computer can't
These statistical systems need lots of exemplary training data to reach their optimal performance –
megabytes of text. These training data are used to create acoustic models of words, word lists, and
multi-word probability networks. There is some art into how one selects, compiles and prepares this
training data for "digestion" by the system and how the system models are "tuned" to a particular
application. These details can make the difference between a well-performing system and a poorly-
18
Chapter 4
Structure of the system
19
4.1 Introduction
The structure of how the system works is discussed in this chapter. We have designed and built a
robot with the capability to pick-up and drop limited sized objects applying force according to the
object’s stiffness. The robot is capable of responding to 15 distinct verbal commands. In our project
we have used Google speech recognition module in order to understand verbal commands. We also
categorized the objects into six specific categories according to the amount of gripping force
required to lift the objects. An android application was used to communicate with the robot through
Bluetooth communication. The application decodes the human speech into an array of characters
which are transmitted to the robot using Bluetooth technology. The robot uses microcontroller
explained first. For the sake of clarity, we have also explained the fifteen verbal commands the
4.2.1 Procedure
First, human speech command is picked up through an android application that uses Google speech
recognition to convert the verbal command into an array of characters. This array of characters is
sent to the microcontroller on the robot using a Bluetooth module. The microcontroller receives the
data, and processes it into an executable function. There are 15 functions that we have already
defined for the robot to execute, such as moving forward or backward, picking up and dropping
objects with only the force required, and so on. If the verbal command matches one of these
predefined functions, that function is performed by the robot. If it does not match any of the
20
4.2.2 Functions
The 15 functions that the robot is capable of performing are as follows.
1. Move forward
The robot will start moving forward from its current position when it receives this command. It
will keep on moving until it is at a distance of 12 cm from any object or obstacle, which is within
its range of distance needed for it to pick up any object. The robot will continuously check the
distance in front of it using the ultrasonic sensor placed in front of the robot body.
2. Move backward
The robot will move backwards a distance of 10 cm every time it receives this command. This
will allow it to move forward and come within its pick-up zone for being able to hold an object if
needed.
3. Move left
The robot will move left by an angle of 15° every time it receives the command to “move left”.
4. Move right
The robot will move right by an angle of 15° every time it receives the command to “move
right”, and then can be commanded to move in that direction through the “move forward”
command.
5. Stop
Whenever the robot receives the verbal command “stop”, it will come to a halt from a state of
motion.
receiving it, the robot computes the distance it needs to move, and will then move. The robot has
an ultrasonic sensor attached in front of the body using which it can determine whether there are
any obstacles in front of it or not and also the distance between the obstacles and itself. When the
robot is in steady state it measures the distance in front of it. When the command is given to
21
move forward X centimeters it subtracts the X centimeters from the distance of the obstacle
received by the ultrasonic sensor and traverses forward until that distance is achieved.
grabs it, and picks it up. The reason for object to be at 12 cm front is the gripper has a grabbing
range of 8 cm to 15 cm in front of the robot. When this grabbing object is given the robot will
lower the arm using a dc motor for a specific time and when the arm is sufficiently lower the
gripper will grab the object with another motor and arm motor will again move in different
direction to move the object upward so that the path between sonar and obstacles remain open.
To clarify details of operations the mechanical diagram are elaborately discussed in section V.
object the robot will lower the arm using the arm motor and gripper motor will release the object
and the arm motor will again pull the arm up. After that the robot will go backward 10
centimeters. The required distance to traverse will be calculated using the ultrasonic sensor from
the object. After that the robot will remain steady and wait for the next command.
The following seven functions allow the robot to categorize the weight of the object it needs to pick
up, to determine the force it needs to exert to hold the object and lift it. There are six categories
which we have defined for classifying the weight of the object. The maximum weight the robot is
capable of lifting is 1 kg, and this is defined as “very very heavy”, which is the first category of the
object’s weight. The last category is “very very light” which is for objects within the range of 0 g to
170 g. The verbal command will have to specify the category of the object’s weight.
22
9. Object is very very light
When this command is given, the robot will consider the weight of the object to be in the “very
very light” category, which is for objects of weight within the range of 0 g to 170 g. When the
robot is commanded to pick up an object afterwards, it will exert only the force required to pick
up an object of this weight, and will not exert any more force, so that the object is not crushed or
deformed.
range of 171 g to 340 g. It will then exert the necessary force to pick up an object of this weight
so that it exerts a force appropriate to pick up an object within this range of weight when it is
commanded to do so, and does not exert a force any more or less than that.
1 kg.
15. Object is X kg
This command tells the robot the weight of the object, so that it can decide which category of
weight the object belongs to, so that it exerts the appropriate force when it is commanded to grab
23
4.3 Workflow and Algorithms
This section will describe the sequence of events that result in a verbal command being processed
and a corresponding function being executed by the robot. This section will describe the sequence
of events that result in a verbal command being processed and a corresponding function being
application uses Google speech recognition module to convert the human speech command into an
array of characters and sends the processed data to the microcontroller on the robot, via Bluetooth.
The microcontroller processes the data into an executable function and then the robot executes the
command. This action is explained through the flow diagram in Fig. 4.1.
Human Speech
Microcontroller
24
4.3.2 Explanation of the algorithm for command execution
25
Fig. 4.2 explains the algorithm of the robot’s execution of verbal commands. After the
microcontroller accepts the character array, it checks whether it is an acceptable string of command
or not. If not, it is checked whether or not all previous commands have been finished. If there are no
commands remaining to be executed, the robot is stopped and all commands are set as finished.
However if there are any commands remaining, the running commands are executed.
If it is an acceptable command, the program checks whether the command is “stop” or not. If yes,
the robot’s motion is stopped, and all commands are set as finished. The system then goes back to
If the command that is being processed is not “stop” and belongs to one of the other 14 predefined
commands, it is selected and executed. After completion of execution of this command, the robot is
stopped and all commands is set as finished. Then the program again starts checking for another
array of characters.
When the execution of running commands is being finished, if the command is either “move
forward” or “move forward X cm”, a check is made for whether there are any obstacles within 12 cm
of the robot’s front. If an obstacle is detected, the robot is stopped, and all commands set as finished.
Then the program again starts checking for another array of characters.
In case of no obstacle being detected, the robot keeps moving forward either indefinitely or until it
reaches X cm, depending on which command it was given, and the program again goes back to
checking for another array of characters (i.e. it waits for a new command).
Whenever “Pick up object” command is given, the robot moves forward by a distance of 5 cm with
its gripper opened and the object becomes situated within pickup zone of the robot. The robot then
picks the object according to its category of weight. By default the object category is set to “very
very light”, which is for objects of weight within 0 g to 170 g. There are six specific commands for
object categorization. These six commands can be given during execution of both “move forward”
26
and “move forward X cm” functions, or after the “stop” function is executed. “Object is X kg” sets
the object according to the category it belongs to. Where ‘X’ is any variable between 1 to 1000 g.
The Force Sensitive Resistor (FSR) is set at the gripper. Whenever the robot is required to pick up
the object according to its weight category, it measures the amount of force required to pick up the
object. We have set some specific values of FSR for different weight categories. As we are using a
microcontroller to measure the force amount, we have applied analog input values to set the force
category. The robot will continue to grip the object until it gets the required force set for its category.
For example if we give the command “Object is heavy” and then give the command “Pick up the
object”, the robot will squeeze the gripper having FSR until it gets the analog input value between
480-500 which is given by the Force Sensitive Resistor and read by the microcontroller. More
explanation of FSR and its connection configuration with the microcontroller is given in the next
Weight
Category 0-170 171-340 341-510
(gram)
FSR values
(analog input 950-970 780-820 600-620
value)
Weight
Category 511-680 681-850 851-1000
(gram)
FSR values
(analog input 480-500 320-340 180-200
value)
27
4.4 Equipment and Schematic Diagrams
The Schematic diagram of the microcontroller part is divided into two blocks (as shown in Fig. 4.3),
and schematic diagrams of the two individual blocks are also given on the following page.
28
Block A shows the connections between microcontroller and Force Sensitive Resistor (FSR),
Bluetooth Module, Sonar module (Fig. 4.4). Block B shows the connections between microcontroller
and the two motor drivers which control Wheel motors and Gripper motors (Fig. 4.5).
29
The Ultrasonic Sensor Sonar module (hc Sr04) is used to measure the frontal distance of any object
from the robot. It helps the robot navigate and informs us if any object is within the vicinity of the
robot. It also confirms whether the object is within the pickup zone of the robot. Bluetooth module
(hc 05) establishes connection between the android application and microcontroller to transmit data
from the app as an array of characters. FSR 400 is used to measure the amount of force applied while
grabbing an object, in order to pick it up. Two L293D motor drivers are used to control the four
4.5 Summary
In this chapter, we have described how the system works. The sequence of events resulting in a
verbal command being processed, sent to the microcontroller, and then being executed has been
described in detail in this chapter. We have also given diagrams of the workflow and the algorithm
of command execution, the robot circuit and the equipment used. Detailed description of the
30
Chapter 7
Modules used in this system
31
7.1 Introduction
The different modules used in our system and their functions are described in this chapter.
The following modules have been used in the construction of the robot:
the ranging accuracy can reach to 3mm. The module includes ultrasonic transmitters, receiver and
control circuit.
(2) The Module automatically sends eight 40 kHz and detect whether there is a pulse signal back.
(3) If the signal comes back, through high level, time of high output IO duration is the time from
There are only four pins that you need to worry about on the HC-SR04: VCC (Power), Trig
5V Supply
0V Ground
32
Fig. 7.1. Sonar sensor module hc-sr04
is pre-configured as a slave Bluetooth device. Once it is paired to a master Bluetooth device such as
PC, smart phones and tablet, its operation becomes transparent to the user. No user code specific to
The HC-05 supports two work modes: Command and Data mode. The work mode of the HC-05 can
be switched by the onboard push button. The HC-05 is put in Command mode if the push button is
activated. In Command mode, user can change the system parameters (e.g. pin code, baud rate, etc)
using host controller itself of a PC running terminal software using a serial to TTL converter. Any
changes made to system parameters will be retained even after power is removed. Power cycle the
HC-05 will set it back to Data Mode. Transparent UART data transfer with a connected remote
33
The HC-05 can be re-configured by the user to work as a master Bluetooth device using a set of AT
commands. Once configured as master, it can automatically pair with a HC-05 in its default slave
The HC-05 will work with supply voltage of 3.6VDC to 6VDC, however, the logic level of RXD
pin is 3.3V and is not 5V tolerant. It can be damaged if connect directly to a 5V device (e.g Arduino
Uno and Mega). A Logic Level Converter is recommended to protect the HC-05. The power to the
Features:
Bluetooth v2.0+EDR
Supported baud rate: 9600, 19200, 38400 (default), 57600, 115200, 230400, and 460800.
Passkey: 1234
34
7.4 Force Sensitive Resistor (FSR 400)
The model 400 FSR is a single-zone Force Sensing Resistor optimized for use in human touch
control of electronic devices such as automotive electronics, medical systems, and in industrial
and robotics applications. FSRs are two-wire devices. They are robust polymer thick film
(PTF) sensors that exhibit a decrease in resistance with increase in force applied to the surface
of the sensor. It has a 5.1mm diameter active area and is available in 4 connection options.
Cost-effective
Ultra-thin
Applications:
35
Detect liquid blockage
amplifiers since they take a low-current control signal and provide a higher-current signal. This
L293D contains two inbuilt H-bridge driver circuits. In its common mode of operation, two DC
motors can be driven simultaneously, both in forward and reverse direction. The motor operations
of two motors can be controlled by input logic at pins 2 & 7 and 10 & 15. Input logic 00 or 11 will
stop the corresponding motor. Logic 01 and 10 will rotate it in clockwise and anticlockwise
directions, respectively.
Enable pins 1 and 9 (corresponding to the two motors) must be high for motors to start operating.
When an enable input is high, the associated driver gets enabled. As a result, the outputs become
36
active and work in phase with their inputs. Similarly, when the enable input is low, that driver is
disabled, and their outputs are off and in the high-impedance state.
direction. These circuits are often used in robotics and other applications to allow DC motors to run
37
7.5.2 Table of operation
7.6 Summary
In this chapter, we have described the different modules used in our system – sonar module,
Bluetooth module, force sensitive resistor and motor driver L293D, as clearly and briefly as
possible. The following chapter describes the mechanical construction of the robot.
38
Chapter 8
Mechanical Description
39
8.1 Introduction
The mechanical construction of the robot is described in detail in this chapter. The mechanical part
of the robot has been designed and made from scratch entirely by us. We designed a unique gripper
that is capable of gripping and lifting objects and dropping them. The 1 st model of the robot was
built using PVC sheets. We have used 5 mm and 3 mm thick PVC boards for the construction of the
body. The final model was built using wooden boards which is quite flexible and the robot was not
fully well built. The mechanical measurements and explanations of the constructions are discussed
below.
The robot has two bases, namely upper base and lower base, situated 6.3 cm apart from each other.
The two bases are connected via base joints. The electrical components are placed on the lower base.
The gripper needs two n20 (100 rpm screw shaft) motors. One motor (Gripper motor 1 in Fig. 8.1)
acts for gripping the object and another motor (Gripper motor 2 in Fig. 8.1) helps in lifting the
object. The length of the screw shaft of the 100 rpm motor is around 3.3 cm. When the screw shaft of
Gripper motor 2 rotates, the screw moves forward and backward respectively, which in turn moves
Joint 1. When Joint 1 is moved forward and backwards, the Lift Hand 1 moves up and down
respectively, which creates a rotation of Lift Hand 2 about the Pivot 2 resulting in lifting the object
up and down as required. Lift Hand 1 and Lift Hand 2 are connected by pivot 1.
40
Fig. 8.1. Side view of the robot (Diagram drawn using SketchUp)
Lift Hand 2 is connected to the upper base of the robot at the Pivot 2 joint. As there is a 6.3 cm gap
between the two bases, there is sufficient space for the electrical equipment to be placed between the
upper base and the lower base. One castor ball is placed at the front to allow free movement of the
robot.
The sonar module placed on the lower base determines the distance of any object from the robot. At
normal condition, if the distance of an object is around 5 cm from the sonar, that object is selected as
an eligible candidate for lifting operation. There are also two 200 n20 motors for wheel rotation. In
Fig. 8.2 we can see the upper view of the robot. The width of the robot is 18 centimeters and length
of the robot is 22 centimeters without the gripper. With the gripper, the length of the robot is around
35 centimeters.
41
Fig
While researching for the mechanical part of our robot, we came across many existing gripper
constructions. One such construction has the capability to grasp irregular objects [4]. However the
construction of such a gripper required more space and mechanical work. One of the papers also
proposed pneumatic muscles actuated gripper [5]. But it had limitations of grasping irregular objects.
Here in Fig. 8.3 we have proposed a very simple design of gripper which has the capability of
holding limited sized irregular objects. In Fig. 8.3, the motor is situated in the gripper base.
Whenever the screw shaft of the motor rotates, the screw moves forward and backward, which in
turn makes Hand 3 move up and down. Hand 3 is connected to the Hand 1 at Pivot 2. Whenever
Hand 3 moves, it makes Hand 1 rotate around Pivot 1, which makes Hand 2 move accordingly.
42
Fig. 8.3 The gripper construction (Diagram drawn using SketchUp)
Hand 2 has foam grip attached to it which provides more gripping friction to allow the gripper to
grab objects better. Hand 2 is flexible and rotatable around Pivot 3 which helps in gripping irregular
objects. As the construction is symmetrical, this creates gripping action even towards irregular
objects.
43
Fig. 8.4 Model 1 trying to pick up object.
The model had a total weight of 11 Kg. It was mechanically heavy and was able to maneuver in sand
due to tank track system. The mechanical arm was built using wood. The second model had seven
degree of freedom. The second model had 1 arm. The arm had 5 degree of freedom. The 3 rd model
had 12 degree of freedom. Each arm having 5 degree of freedom itself. Other than the arm the robot
had two strong motors on wheels for locomotion. The 2nd model was around 18 kg in total weight.
44
Fig. 8.5 The robotic arm 2nd version upper view and side view..
explanation of how each of the mechanical components works has also been given here.
46
47
Chapter 11
Results and Discussion
48
11.1 Introduction
The results and findings of our project are discussed in this chapter. We have solved the equations
derived from Inverse square law of sound propagation of sound attenuation to find the distance of
the sound source in 3D-coordinate system using three microphones. Also, we have compared the
speech recognition performance of our self-developed software with that of Google Speech
Recognition API.
Similarly we calculated auto correlation for the 3 microphone. Then we applies Fourier
Transformation in order to get clear frequency which gave us significantly similar frequency
49
In the Fig. 11.2, the most above one is the result data from microphone one, second one is the result
data from microphone 2, and the lowest one is the result data from microphone 3.
At first we calculated the frequency through auto Correlation and fast Fourier transformation to get
the resultant frequency. We took the average of the three frequencies to get the resultant frequency.
50 51 52 50
75 72 70 67
50
Then we calculated the coordinate from the three data to get the following result
40,50,30 40 52 32
From the above finding we can conclude that with greater distance accuracy decreases, yet we can
other existing speech recognition software, but Google speech response was comparatively better,
which is why we used it. Some of the researches we came across included robot interaction for
speech development [7] and some other researches included robotic arm control using speech
processing [10]. There were also researches which included control of operating systems using
speech recognition [12]. Some examples of such researches included opening and closing of cd
Out of all these researches we chose Google speech recognition because evaluation of Google speech
was better compared to the other speech recognition systems [14]. We gave 300 commands set by
51
our command library by 10 people. Out of these 267 commands were accurate giving us the
The robot’s response to different weighted objects was not same and due to different heaviness, time
required to lift the object was different for different objects using the same motor. The current rating
was different for different categories. Considering different weighted object categories and through
taking 10 objects of each weight category the time required to lift the objects were calculated and
average was measured to achieve the amount of time required by the motor to lift the objects.
Object
Very very light Very light Light
Category
Weight
Category 0-170 171-340 341-510
(gram)
Time required
3 6 8
to lift (sec)
Weight
Category 511-680 681-850 851-1000
(gram)
Time required
9 11 13
to lift (sec)
Other than a few exceptions, the robot was able to lift almost all objects. Our limitation for object
size is 15x15x20 cm3, i.e. object must fit into a square having surface area less than 225 cm 2, and the
height of the object must be less than 20 cm to be qualified as a candidate for lifting operation. In
Fig. 11.3 we can see the robot lifting a “very heavy” category object. In Fig. 11.4 we see the robot is
lifting “very very light” category object. It was easily able to transport from one place to another
through verbal command. In Fig. 11.5 we see the robot is lifting an object in the weight category
52
Fig. 11.3. Robot lifting “very heavy” category object
Fig. 11.4. Robot trying to lift “very very light” category object
53
11.3.1 Limitations
The robot has some limitations while lifting objects. It cannot lift all objects. As discussed before,
this maximum weight of object this robot can lift is 1 kg. Also, the dimensions of the object to be
lifted must fit into 15x15x20 cubic centimeters i.e. the object must fit into a square having surface
area of less than 225 square centimeters, and the height of the object must be less than 20 centimeters
The robot was has been made using PVC sheet. As PVC sheets are not strong enough, sometimes
while lifting heavy objects, the sheets seem to bend. We intend to build the robot using stronger
The robot uses DC motor and measures distance using SONAR technology. While in motion, the
robot does not always stop at the exact instant when it requires to stop due to inertia. However at
11.4 Summary
In this chapter, we have described and discussed the results of our project, which involved locating
the sound source and the construction of speech responsive robot with object categorization.
54
Chapter 12
Conclusion
55
In this project, we have focused on two things – sound source localization and building a speech
Firstly, we were able to locate the sound source in 3D-coordinate system. There is scope for
extending this research in the future. For example – an autonomous robot that can detect the person
speaking in respect to the robots position – in darkness, through only sound analysis location can be
We have also successfully constructed the robot, and designed and built a unique gripper for the
robot’s hands. The robot has the capability to respond to speech and transport limited sized objects
using Google Speech recognition. The accuracy rate for successful response towards speech was
89%. The accuracy rate for successful completion of function was 94%. There were hardly accidents
occurring as it navigated using sonar. Whenever there was an object at very close vicinity, the robot
We hope to build a stronger, bigger and more robust robot in the future which can lift more heavy
objects. This robot can be very useful for the disabled, as it will provide them with a useful means of
doing daily work with ease, simply by giving verbal commands for transportation of objects
according to their need. Also, a robot that can help us do our work easily without moving our hands
would be industrially profitable. It can even aid people in difficult situations such as rescue
Thus, development in this field could open up boundless possibilities and new applications that can
56
Bibliography
57
[1] Norberto Pires, J. (2005). Robot-by-voice: Experiments on commanding an industrial robot
[2] Jain, R., & Saxena, S. K. (2011). Voice Automated Mobile Robot. International Journal of
58
Appendices
59
Appendix A
Codes Related to Arduino
60
For example –
Codes Related to Arduino
This code is in c plus plus
#include <SoftwareSerial.h>
#include <Servo.h>
Servo myservo;
#define RxD 53
61