Text-to-Speech Device For Visually Impaired People: International Journal of Pure and Applied Mathematics July 2018
Text-to-Speech Device For Visually Impaired People: International Journal of Pure and Applied Mathematics July 2018
net/publication/326224279
CITATIONS READS
10 13,804
1 author:
Shirly Edward
SRM Institute of Science and Technology
37 PUBLICATIONS 226 CITATIONS
SEE PROFILE
All content following this page was uploaded by Shirly Edward on 06 July 2018.
Abstract - People who suffer from low vision, sight and visual vision. So, that they can easily use this device without
impairment are not able to see words and letters in ordinary having to ask for help from others and they can utilize this
newsprint, books and magazines clearly. This can make the device for academic and intelligence ability.
reading process difficult which can disturb the learning
process and slow the person's intelligence development.
Therefore, a device is needed to help them read. So we had
developed one such device that can scan and read any kind of
text by changing it to voice message. The purpose of this device
is to process the input Image, pdf, Documents, Textbooks, and
News Papers as input into a voice as output. Each Module for
image processing and voice processing are present in the
device. It also has the ability to play and stop the output while
reading. It has less error rate and less processing time and cost
efficiency. Raspberry pi 3 was used to develop the device. This
device actually acts as an artificial eye to visually impaired
people. It doesn't need any human supervision.
I.Introduction
Based on the survey by World Health organization Figure 1. Ratio of people with visual impairment
in 2010, total population in India is 1181.4 million out of (courtesy: WHO, 2010)
which people who suffer with blindness, low vision and
visual impairment are 152.238 Million[1]. Figure 1 shows II. Hardware and Software
the number of people who are blind, with low vision and
visually impaired (in thousands) per million population. A. Raspberry Pi 3
According to Dr.Bjorn, impaired vision can have The Raspberry Pi is a small computer which can
negative effects on learning and social interaction. It can be fit in the palm of one’s hand. It runs on Linux and also
affect the natural development of intelligence and academic with few other low-power operating systems. It was created
ability, social, and profession [2]. People who are visually by the Raspberry Pi Foundation. Raspbian is the official
impaired cannot be recovered with the help of glasses. This operating system of Raspberry Pi. Other operating systems
causes people with low vision, they cannot even see the exist, but they’re mostly made for specific projects. It has
normal printed paper. They can only see if the sizes of the Broadcom BCM2837 64bit Quad Core Processor powered
characters or letters are big enough. This condition impacted Single Board Computer running at 1.2GHz with 1GB RAM.
the length of the reading process and made the eyes tired. To The Raspberry Pi has gone through several iterations since
help improve the quality of life for people with low vision a its launch in 2012. The latest version is the Raspberry Pi
tool to read the article is needed. The rate of vision Model 3.It is available for 3200 INR. It has a ceaseless
impairment can vary in each individual with low vision. computer that includes an HDMI output, up to four USB
Therefore a device developed in this work utilized other ports, Wi-Fi, and Bluetooth.
sensory function in receiving information from a text. The
device is specifically designed for the people with low
1061
International Journal of Pure and Applied Mathematics Special Issue
C. Python IDLE
1062
International Journal of Pure and Applied Mathematics Special Issue
It is a process of converting a grey scaled image Tesseract OCR is a type of OCR engine with matrix
into a binary image. It basically consists of black and white. matching. The selection of Tesseract engine is due to the
Binarization comes with thresholding .Every pixel with a wide acceptance in the world because of flexibility and
value greater than 170 turns white (gets the value of 255) extensibility of these engines and the fact that many
and every pixel with a value lesser than 170 becomes black researchers are actively developing this OCR engine [4].
(gets the value of 0). Defects in machines such as distortion at the edges and the
dim light effect make it difficult for most OCR engines to
(c) Image Processing module using Optical get high accuracy text
Character Recognition
Tesseract OCR Implementation
This module consists of OCR or Optical Character
Recognition. It targets typewritten text, one glyph or The input image captured by the camera has a size of 8 MP
character at a time. OCR utilizes optical mechanism to or 215 ppi (pixels per inch) [5]. As per the specifications of
automatically recognize the characters, this technology the Tesseract OCR engine, 20 pixels uppercase letters is the
imitate the ability of the human senses of sight, where the minimum character size that can be read. Tesseract OCR
camera replaces the eye and processing of image is done in accuracy will decrease with the font size of 8pt.
the computer as a substitute for the human brain. OCR
engine required state and initial steps in order to get the best The software processes the input image and converts into
input of OCR to reduce the disability of this OCR engine. text format. The image is taken by the user via GPIO pin
Setup state is well adapted to the specifications of the that are connected to the tactile key by making use of
desired initial device. So that the desired output of this interrupt function [5]. Furthermore, the picture is captured
processing has a minimum error rate is also a short by using raspistill program with sharpness mode to sharpen
processing time. This module does not change the OCR the image. The resultant image has a .jpg format with a
algorithm, but gives additional state to get the best input of resolution of 3280*2464 pixels.
OCR. It is generally an "offline" process, which analyses a
static document.
Binary
image
Connected Character
Adaptive Component Outlines
Thresholding Analysis
Input(Gray or
Color image)
Extracted
Character Outlines
Text
organized into words
Recognize Recognize
word pass 2 word pass 1
1063
International Journal of Pure and Applied Mathematics Special Issue
languages. In this research work, English TTS system is The function of the main program is to provide various
used for reading the text. commands to retrieve process and convert input image into
a sound signal. Various GPIO pins are allotted to control the
Flite is a lighter version of Festival built specifically for
module operations such as capture, play, pause and stop and
embedded systems. It has commands that make it easier to
use than Festival on the command line[6]. It runs faster than to switch off the voice output.
Festival. ESpeak is a compact open source software speech
synthesizer for English and other languages, for the
operating systems like Linux and Windows. ESpeak uses a
"formant synthesis" method. This allows many languages to
be provided in a small size. The speech is clear, and can be
used at high speeds.
TTS (Text-to-Speech) is a system that can convert input
from text into speech. Text-to-Speech in principle consists
of two subsystems that are:
a). Text to Phoneme converter
Text to phoneme converter is used to convert the sentence
input in a particular language in the form of text into a series
of codes that usually represented by the sound of the
phoneme codes, its duration and pitch. This section is
language dependents.
b). Phoneme to Speech converter
Phoneme to Speech converter will accept input in the form
of codes as well as the pitch and duration of phonemes
produced by the previous section
4. Design Implementation
1064
International Journal of Pure and Applied Mathematics Special Issue
Interrupt-based script can be used to enable the best way for We tested each module to see the effectiveness of every
soft shutdown feature of the Raspberry Pi power supply Step. Table 1 presents the results of an accuracy testing and
switch. Interrupts also improves the efficiency of the code average time after enters the image processing module.
and minimizes load on the CPU compared with while loop.
The implementation diagram of power supply switch for Table 1 Results of accuracy testing
Raspberry Pi is shown below. Figure 6 shows the flow chart
of the power supply switch.
Distance Total Errors Processing Percentage
in cm words Time in of Error
seconds
30 254 5 54 2.36
15 78 1 23 1.28
18 156 1 38 0.64
25 234 2 47 0.85
26 312 4 54 0.64
20 213 5 52 1.27
24 213 6 45 2.81
6. CONCLUSION
In this research work, a Text-to-Speech device for visual
impaired people that can change the text image input into
Figure 6 Flow chart of power supply switch sound is implemented. The performance of the device is
5. RESULT high enough and it achieves a readability tolerance of less
than 2%, with the average time processing less than one
The testing was done using Raspberry Pi platform with the minute for various paper and font size. With good lighting,
Following specifications: the average error rate from the image processing module is
better. This is a portable device and it does not require
internet connection, and can be used independently by
a) SBU Raspberry Pi 3 900 MHz Quad Code ARM people with low vision, visual impairment. This device also
has a user interface that allows people to interact easily.
b) Cortex-A7
d) Bootable SanDisk Ultra 16GB micro SD Card [1] Global data on Visual impairments, World Health Organization, 2010
From the experimental results it is known that the image [2]R. Mengko and A. Ayuningtyas.”Indonesian TTS system using syllable
processing module has the following restrictions. They are concatenation: Speech Optimization” Proc. International Conference on
Instrumentation, Communication, Information Technology, and Biomedical
the maximum size of the input image is taken from
Engineering (ICICI-BME), November 2013, pp. 412-415
Magazine. Any input image that uses the block letter fonts
will work fine. The minimum font size is 8 point.
1065
International Journal of Pure and Applied Mathematics Special Issue
[3]ARCHANA A. SHINDE, Dilip Chougule. “Text Pre-processing and [5]R. Smith. “An Overview of the Tesseract OCR Engine", USA: Google
Text Segmentation for OCR”, International Journal of Computer Science Inc.
Engineering and Technology”, Vol.2 , Issue 1 (2012), pp. 810-812, 2012.
[6]Samuel Thomas, Hema A.Murthy and C. Chandra Sekhar. “Distributed
[4]R. Mithe, S. Indalkar and N. Divekar. ” Optical Character Recognition" Text to Speech Synthesis for Embedded Systems-An Analysis”.
International Journal of Recent Technology and Engineering (IJRTE)”, Proceedings of the eleventh National Conference on Communications,
ISSN: 2277-3878, Volume 2, Issue 1(2013). NCC 2005, pp.1-5.
1066
1067
1068