0% found this document useful (0 votes)
12 views

Text-to-Speech Device For Visually Impaired People: International Journal of Pure and Applied Mathematics July 2018

Uploaded by

dynamivisa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Text-to-Speech Device For Visually Impaired People: International Journal of Pure and Applied Mathematics July 2018

Uploaded by

dynamivisa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/326224279

Text-to-Speech Device for Visually Impaired People

Article in International Journal of Pure and Applied Mathematics · July 2018

CITATIONS READS

10 13,804

1 author:

Shirly Edward
SRM Institute of Science and Technology
37 PUBLICATIONS 226 CITATIONS

SEE PROFILE

All content following this page was uploaded by Shirly Edward on 06 July 2018.

The user has requested enhancement of the downloaded file.


International Journal of Pure and Applied Mathematics
Volume 119 No. 15 2018, 1061-1067
ISSN: 1314-3395 (on-line version)
url: http://www.acadpubl.eu/hub/
Special Issue
http://www.acadpubl.eu/hub/

Text-To-Speech Device for Visually Impaired People

Shirly Edward.A1 Jothimani.A2 Jayaprakash.V3 Joe Benhur Xavier.F4


Assistant Professor Assistant Professor U.G Student U.G Student
Dept of ECE, Dept of ECE, Dept of ECE, Dept of ECE,
SRMIST, Vadapalani, SRMIST, Vadapalani, SRMIST, Vadapalani, SRMIST, Vadapalani,
Chennai Chennai Chennai Chennai

Abstract - People who suffer from low vision, sight and visual vision. So, that they can easily use this device without
impairment are not able to see words and letters in ordinary having to ask for help from others and they can utilize this
newsprint, books and magazines clearly. This can make the device for academic and intelligence ability.
reading process difficult which can disturb the learning
process and slow the person's intelligence development.
Therefore, a device is needed to help them read. So we had
developed one such device that can scan and read any kind of
text by changing it to voice message. The purpose of this device
is to process the input Image, pdf, Documents, Textbooks, and
News Papers as input into a voice as output. Each Module for
image processing and voice processing are present in the
device. It also has the ability to play and stop the output while
reading. It has less error rate and less processing time and cost
efficiency. Raspberry pi 3 was used to develop the device. This
device actually acts as an artificial eye to visually impaired
people. It doesn't need any human supervision.

I.Introduction

Based on the survey by World Health organization Figure 1. Ratio of people with visual impairment
in 2010, total population in India is 1181.4 million out of (courtesy: WHO, 2010)
which people who suffer with blindness, low vision and
visual impairment are 152.238 Million[1]. Figure 1 shows II. Hardware and Software
the number of people who are blind, with low vision and
visually impaired (in thousands) per million population. A. Raspberry Pi 3

According to Dr.Bjorn, impaired vision can have The Raspberry Pi is a small computer which can
negative effects on learning and social interaction. It can be fit in the palm of one’s hand. It runs on Linux and also
affect the natural development of intelligence and academic with few other low-power operating systems. It was created
ability, social, and profession [2]. People who are visually by the Raspberry Pi Foundation. Raspbian is the official
impaired cannot be recovered with the help of glasses. This operating system of Raspberry Pi. Other operating systems
causes people with low vision, they cannot even see the exist, but they’re mostly made for specific projects. It has
normal printed paper. They can only see if the sizes of the Broadcom BCM2837 64bit Quad Core Processor powered
characters or letters are big enough. This condition impacted Single Board Computer running at 1.2GHz with 1GB RAM.
the length of the reading process and made the eyes tired. To The Raspberry Pi has gone through several iterations since
help improve the quality of life for people with low vision a its launch in 2012. The latest version is the Raspberry Pi
tool to read the article is needed. The rate of vision Model 3.It is available for 3200 INR. It has a ceaseless
impairment can vary in each individual with low vision. computer that includes an HDMI output, up to four USB
Therefore a device developed in this work utilized other ports, Wi-Fi, and Bluetooth.
sensory function in receiving information from a text. The
device is specifically designed for the people with low

1061
International Journal of Pure and Applied Mathematics Special Issue

B. Raspberry Pi Camera Module V2.1 a) Range of reading distance is 15-30 cm.


b) Character size is minimum of 8 pt.
The Raspberry Pi has a Camera Module V2 with c) Maximum size of reading material can be varied.
high quality and 8 megapixel Sony IMX219 image sensor d) Maximum tilt of the text line is 5 degrees from the
which is custom designed and acts as add-on board for Vertical
Raspberry Pi, with fixed focus lens. It is capable of e) Type of characters includes Roman, Egyptian or
acquiring 3280 x 2464 pixel static images, and also supports Sans Serif types.
1080p30, 720p60 and 640x480p60/90 video. The add-on
board attaches to Pi via one of the small sockets present on The module is designed in such a way that there
the board upper surface and uses the dedicated CSI is no physical equipment or stand like structure is used, to
interface, designed especially for interfacing to cameras. carry the pi cam module, as it is placed using two L-clamps
The size of the board is 25mm x 23mm x 9mm. It also over the encasing of the board. The pi camera lens is
weighs just over 3g, making it perfect for mobile or other adjusted, in order to acquire the script sharply. The distance
applications where size and weight are not of much between the cam module and script is between 15 to 30cm,
importance. Figure 2 shows the image of Raspberry Pi 3 the minimal distance that a human eye needs to read a script.
class B module.
3. System Architecture

Figure 2. Raspberry pi 3 Class B

C. Python IDLE

Python, created by Guido van Rossum is a high-


Figure 3 Architecture of the Device
level programming language mainly used for general-
purpose programming. This language gives constructs
intended to enable writing clear programs on both a little Figure 3 shows the architecture of the device. It
and vast scale. Python highlights a dynamic sort framework mainly consists of Image Correction Module and TTS
and automatic memory management and supports numerous correction and Voice Processing Module. Each of these
programming paradigms, including object-oriented, modules are elaborated in the consequent sections.
functional and imperative programming. It has a huge and
extensive standard library. Python is a wonderful and very A. Image Correction module
useful programming language. It is easy to use and with
Raspberry Pi it lets us to convert one’s project into real- (a) Gray Scaling
time. Python syntax is very clean, with an emphasis on
readability and uses Standard English keywords. Start by It is a process of converting a digital or pixel image
opening IDLE. The easiest introduction to Python is through into a gray scaled image [3]. Each value of pixel is defined
IDLE, a Python Integrated DeveLopment Environment. as single sample as it carries only the information about
intensity. These are composed exclusively of gray shades,
D. System Specification and Design varying from weakest intensity i.e., black to white the
strongest intensity ranging from 0 to 255.
The device is designed based on the following
restrictions: (b) Binarization

1062
International Journal of Pure and Applied Mathematics Special Issue

It is a process of converting a grey scaled image Tesseract OCR is a type of OCR engine with matrix
into a binary image. It basically consists of black and white. matching. The selection of Tesseract engine is due to the
Binarization comes with thresholding .Every pixel with a wide acceptance in the world because of flexibility and
value greater than 170 turns white (gets the value of 255) extensibility of these engines and the fact that many
and every pixel with a value lesser than 170 becomes black researchers are actively developing this OCR engine [4].
(gets the value of 0). Defects in machines such as distortion at the edges and the
dim light effect make it difficult for most OCR engines to
(c) Image Processing module using Optical get high accuracy text
Character Recognition
Tesseract OCR Implementation
This module consists of OCR or Optical Character
Recognition. It targets typewritten text, one glyph or The input image captured by the camera has a size of 8 MP
character at a time. OCR utilizes optical mechanism to or 215 ppi (pixels per inch) [5]. As per the specifications of
automatically recognize the characters, this technology the Tesseract OCR engine, 20 pixels uppercase letters is the
imitate the ability of the human senses of sight, where the minimum character size that can be read. Tesseract OCR
camera replaces the eye and processing of image is done in accuracy will decrease with the font size of 8pt.
the computer as a substitute for the human brain. OCR
engine required state and initial steps in order to get the best The software processes the input image and converts into
input of OCR to reduce the disability of this OCR engine. text format. The image is taken by the user via GPIO pin
Setup state is well adapted to the specifications of the that are connected to the tactile key by making use of
desired initial device. So that the desired output of this interrupt function [5]. Furthermore, the picture is captured
processing has a minimum error rate is also a short by using raspistill program with sharpness mode to sharpen
processing time. This module does not change the OCR the image. The resultant image has a .jpg format with a
algorithm, but gives additional state to get the best input of resolution of 3280*2464 pixels.
OCR. It is generally an "offline" process, which analyses a
static document.

Binary
image
Connected Character
Adaptive Component Outlines
Thresholding Analysis
Input(Gray or
Color image)

Find Text, Lines


and words

Extracted
Character Outlines
Text
organized into words
Recognize Recognize
word pass 2 word pass 1

B. TTS Correction and Voice Module


In TTS correction and voice module text is converted to
Figure 4 System Design for Image Processing speech. The output of OCR is the text, which is stored in a
file (speech.txt) [6]. Here, Flite and Espeak software’s are
Figure 4 shows the system design for Image processing used to convert the text to wave format. Finally text.wav can
module. be heard.
Flite and Espeak are open source software’s that can be
Tesseract OCR
implemented to Raspberry pi, which is available in many

1063
International Journal of Pure and Applied Mathematics Special Issue

languages. In this research work, English TTS system is The function of the main program is to provide various
used for reading the text. commands to retrieve process and convert input image into
a sound signal. Various GPIO pins are allotted to control the
Flite is a lighter version of Festival built specifically for
module operations such as capture, play, pause and stop and
embedded systems. It has commands that make it easier to
use than Festival on the command line[6]. It runs faster than to switch off the voice output.
Festival. ESpeak is a compact open source software speech
synthesizer for English and other languages, for the
operating systems like Linux and Windows. ESpeak uses a
"formant synthesis" method. This allows many languages to
be provided in a small size. The speech is clear, and can be
used at high speeds.
TTS (Text-to-Speech) is a system that can convert input
from text into speech. Text-to-Speech in principle consists
of two subsystems that are:
a). Text to Phoneme converter
Text to phoneme converter is used to convert the sentence
input in a particular language in the form of text into a series
of codes that usually represented by the sound of the
phoneme codes, its duration and pitch. This section is
language dependents.
b). Phoneme to Speech converter
Phoneme to Speech converter will accept input in the form
of codes as well as the pitch and duration of phonemes
produced by the previous section

4. Design Implementation

a). Import and Initialization


Python’s standard library covers a wide range of modules.
The voice processing module uses OS package which
provides file and process operations, pygame package which
provides functions for playing sounds, RPi. GPIO package
which provides a class to control the GPIO on a Raspberry
Pi, and subprocess package which allows spawning new
processes, connect to their input/output/error pipes, and
obtain their return codes. The isPause and isStop are
variables that will be used for the audio player features.
These variables are initialized with a value of False, which
means they have not been active. Figure 5 shows the Figure 5 Flow chart of the process flow
flowchart of the process flow.
We need to import pygame module. Pygame is a module
b). Setting which serves as an audio controller setup which enables us
to inculcate various knobs for controlling the playback of
We need to import the raspberry pi general purpose input the voice.
output library from the python library which is in raspberry
pi. Then the number is allotted to the GPIO pin in d). Power Supply Management
accordance with the breakout board.

c). Main Program

1064
International Journal of Pure and Applied Mathematics Special Issue

Interrupt-based script can be used to enable the best way for We tested each module to see the effectiveness of every
soft shutdown feature of the Raspberry Pi power supply Step. Table 1 presents the results of an accuracy testing and
switch. Interrupts also improves the efficiency of the code average time after enters the image processing module.
and minimizes load on the CPU compared with while loop.
The implementation diagram of power supply switch for Table 1 Results of accuracy testing
Raspberry Pi is shown below. Figure 6 shows the flow chart
of the power supply switch.
Distance Total Errors Processing Percentage
in cm words Time in of Error
seconds

30 254 5 54 2.36

15 78 1 23 1.28

18 156 1 38 0.64

25 234 2 47 0.85

26 312 4 54 0.64

20 213 5 52 1.27

24 213 6 45 2.81

From the results of this test, it is showed that the average


time of image processing is about one minute or less,
depending on the number of the input words that are
processed with an average error. This is because the
additional state and condition gives better input to the OCR
machine.

6. CONCLUSION
In this research work, a Text-to-Speech device for visual
impaired people that can change the text image input into
Figure 6 Flow chart of power supply switch sound is implemented. The performance of the device is
5. RESULT high enough and it achieves a readability tolerance of less
than 2%, with the average time processing less than one
The testing was done using Raspberry Pi platform with the minute for various paper and font size. With good lighting,
Following specifications: the average error rate from the image processing module is
better. This is a portable device and it does not require
internet connection, and can be used independently by
a) SBU Raspberry Pi 3 900 MHz Quad Code ARM people with low vision, visual impairment. This device also
has a user interface that allows people to interact easily.
b) Cortex-A7

c) Raspberry Pi 8MP Camera Board Module REFERENCES

d) Bootable SanDisk Ultra 16GB micro SD Card [1] Global data on Visual impairments, World Health Organization, 2010

From the experimental results it is known that the image [2]R. Mengko and A. Ayuningtyas.”Indonesian TTS system using syllable
processing module has the following restrictions. They are concatenation: Speech Optimization” Proc. International Conference on
Instrumentation, Communication, Information Technology, and Biomedical
the maximum size of the input image is taken from
Engineering (ICICI-BME), November 2013, pp. 412-415
Magazine. Any input image that uses the block letter fonts
will work fine. The minimum font size is 8 point.

1065
International Journal of Pure and Applied Mathematics Special Issue

[3]ARCHANA A. SHINDE, Dilip Chougule. “Text Pre-processing and [5]R. Smith. “An Overview of the Tesseract OCR Engine", USA: Google
Text Segmentation for OCR”, International Journal of Computer Science Inc.
Engineering and Technology”, Vol.2 , Issue 1 (2012), pp. 810-812, 2012.
[6]Samuel Thomas, Hema A.Murthy and C. Chandra Sekhar. “Distributed
[4]R. Mithe, S. Indalkar and N. Divekar. ” Optical Character Recognition" Text to Speech Synthesis for Embedded Systems-An Analysis”.
International Journal of Recent Technology and Engineering (IJRTE)”, Proceedings of the eleventh National Conference on Communications,
ISSN: 2277-3878, Volume 2, Issue 1(2013). NCC 2005, pp.1-5.

1066
1067
1068

View publication stats

You might also like