0% found this document useful (0 votes)
71 views

Proof Cover Sheet

IETE journal text to speech

Uploaded by

manoja
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views

Proof Cover Sheet

IETE journal text to speech

Uploaded by

manoja
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

TIJR_A_1452642.

3d (Standard Serif-IETE) (215£280mm) 29-03-2018 8:56

PROOF COVER SHEET


Author(s): Vaibhavi Rajendran and Bharadwaja Kumar
Article title: A Robust Syllable Centric Pronunciation Model for Tamil Text to Speech
Article no: 1452642
Enclosures: 1) Query sheet
2) Article proofs

Dear Author,

1. Please check these proofs carefully. It is the responsibility of the corresponding author to check these and approve or amend them. A
second proof is not normally provided. Taylor & Francis cannot be held responsible for uncorrected errors, even if introduced during the
production process. Once your corrections have been added to the article, it will be considered ready for publication.

Please limit changes at this stage to the correction of errors. You should not make trivial changes, improve prose style, add new material,
or delete existing material at this stage. You may be charged if your corrections are excessive (we would not expect corrections to exceed
30 changes).

For detailed guidance on how to check your proofs, please paste this address into a new browser window: http://journalauthors.tandf.co.uk/
production/checkingproofs.asp

Your PDF proof file has been enabled so that you can comment on the proof directly using Adobe Acrobat. If you wish to do this, please save
the file to your hard disk first. For further information on marking corrections using Acrobat, please paste this address into a new browser
window: http://journalauthors.tandf.co.uk/production/acrobat.asp

2. Please review the table of contributors below and confirm that the first and last names are structured correctly and that the
authors are listed in the correct order of contribution. This check is to ensure that your name will appear correctly online and when
the article is indexed.

Sequence Prefix Given name(s) Surname Suffix


1 Vaibhavi Rajendran
2 Bharadwaja Kumar
TIJR_A_1452642.3d (Standard Serif-IETE) (215£280mm) 29-03-2018 8:56

Queries are marked in the margins of the proofs, and you can also click the hyperlinks below.
Content changes made during copy-editing are shown as tracked changes. Inserted text is in red font and revisions have a blue indicator .
F
n
Changes can also be viewed using the list comments function. To correct the proofs, you should insert or delete text following the instructions
below, but do not add comments to the existing tracked changes.

AUTHOR QUERIES
General points:

1. Permissions: You have warranted that you have secured the necessary written permission from the appropriate copyright owner for
the reproduction of any text, illustration, or other material in your article. Please see http://journalauthors.tandf.co.uk/permissions/
usingThirdPartyMaterial.asp.
2. Third-party content: If there is third-party content in your article, please check that the rightsholder details for re-use are shown
correctly.
3. Affiliation: The corresponding author is responsible for ensuring that address and email details are correct for all the co-authors.
Affiliations given in the article should be the affiliation at the time the research was conducted. Please see http://journalauthors.
tandf.co.uk/preparation/writing.asp.
4. Funding: Was your research for this article funded by a funding agency? If so, please insert ‘This work was supported by <insert the
name of the funding agency in full>’, followed by the grant number in square brackets ‘[grant number xxxx]’.
5. Supplemental data and underlying research materials: Do you wish to include the location of the underlying research materials (e.g. data, sam-
ples or models) for your article? If so, please insert this sentence before the reference section: ‘The underlying research materials for this article
can be accessed at <full link>/ description of location [author to complete]’. If your article includes supplemental data, the link will also be pro-
vided in this paragraph. See <http://journalauthors.tandf.co.uk/preparation/multimedia.asp> for further explanation of supplemental data and
underlying research materials.
6. The CrossRef database (www.crossref.org/) has been used to validate the references. Resulting changes are tracked in red font.

Q1. AU: Please provide missing city for the affiliation.


Q2. AU: Please spell out TTS at first mention.
Q3. AU: Please check word “inkling” for correctness in the sentence “When we started …”.
Q4. AU: Please spell out POS at first mention.
Q5. AU: The disclosure statement has been inserted. Please correct if this is inaccurate.
Q6. AU: The CrossRef database (www.crossref.org/) has been used to validate the references. Mismatches between the
original manuscript and CrossRef are tracked in red font. Please provide a revision if the change is incorrect.
Q7. AU: Please provide missing city/state for Ref. [2].
Q8. AU: Please provide missing city/state and last name of second author in Ref [3].
Q9. AU: Please provide missing conference name, location and page range for Ref. [4].
Q10. AU: Please provide missing city/state and page range for Ref. [5].
Q11. AU: Please provide missing city/state for Ref. [6].
Q12. AU: Please provide missing city/state for the Ref. [7].
Q13. AU: Please provide missing city/state for Ref. [8].
Q14. AU: Please provide missing city/state and page range for Ref. [10].
Q15. AU: Please provide missing city/state for Ref. [11].
Q16. AU: Please provide missing city/state for Ref. [12].
Q17. AU: Please provide missing city/state for Ref. [16].
Q18. AU: Please provide missing city/state for Ref. [17].
Q19. AU: Please provide missing city/state for Ref. [18].
Q20. AU: Please provide complete information for Ref. [19].
Q21. AU: Please provide complete information for Ref. [21].
Q22. AU: Please provide missing volume number/issue number/page numbers for Ref. [22].
Q23. AU: Please provide missing page range for Ref. [24].
Q24. AU: Please provide missing city/state and page range for Ref. [25].
Q25. AU: Please provide missing city/state for Ref. [27].
Q26. AU: Please provide missing volume number/issue number/page numbers for Ref [28].
Q27. AU: Please provide missing city/state for Ref. [29].
Q28. AU: Please provide missing city/state for Ref. [30].
TIJR_A_1452642.3d (Standard Serif-IETE) (215£280mm) 29-03-2018 8:56

Q29. AU: Please provide missing city/state for Ref. [31].


Q30. AU: Please provide missing city/state for the Ref. [32].
Q31. AU: Please provide missing page numbers for the Ref. [33].
Q32. AU: Please provide missing city for the Ref. [34].
Q33. AU: Please provide missing issue number for Ref. [35].
Q34. AU: Please provide missing city/state for Ref. [36].
TIJR_A_1452642.3d (Standard Serif-IETE) (215£280mm) 29-03-2018 8:56

How to make corrections to your proofs using Adobe Acrobat/Reader

Taylor & Francis offers you a choice of options to help you make corrections to your proofs. Your PDF proof file has been enabled so that
you can edit the proof directly using Adobe Acrobat/Reader. This is the simplest and best way for you to ensure that your corrections will
be incorporated. If you wish to do this, please follow these instructions:

1. Save the file to your hard disk.

2. Check which version of Adobe Acrobat/Reader you have on your computer. You can do this by clicking on the “Help” tab, and then
“About”.

If Adobe Reader is not installed, you can get the latest version free from http://get.adobe.com/reader/.
3. If you have Adobe Acrobat/Reader 10 or a later version, click on the “Comment” link at the right-hand side to view the Comments
pane.
4. You can then select any text and mark it up for deletion or replacement, or insert new text as needed. Please note that these will
clearly be displayed in the Comments pane and secondary annotation is not needed to draw attention to your corrections. If you
need to include new sections of text, it is also possible to add a comment to the proofs. To do this, use the Sticky Note tool in the task
bar. Please also see our FAQs here: http://journalauthors.tandf.co.uk/production/index.asp.
5. Make sure that you save the file when you close the document before uploading it to CATS using the “Upload File” button on the
online correction form. If you have more than one file, please zip them together and then upload the zip file.

If you prefer, you can make your corrections using the CATS online correction form.

Troubleshooting

Acrobat help: http://helpx.adobe.com/acrobat.html


Reader help: http://helpx.adobe.com/reader.html

Please note that full user guides for earlier versions of these programs are available from the Adobe Help pages by clicking on the link
“Previous versions” under the “Help and tutorials” heading from the relevant link above. Commenting functionality is available from
Adobe Reader 8.0 onwards and from Adobe Acrobat 7.0 onwards.

Firefox users: Firefox’s inbuilt PDF Viewer is set to the default; please see the following for instructions on how to use this and download
the PDF to your hard drive: http://support.mozilla.org/en-US/kb/view-pdf-files-firefox-without-downloading-them#w_using-a-pdf-
reader-plugin
TIJR_A_1452642.3d (Standard Serif-IETE) (215£280mm) 29-03-2018 8:56

IETE JOURNAL OF RESEARCH, 2018


VOL. 0, NO. 0, 1–12
https://doi.org/10.1080/03772063.2018.1452642

A Robust Syllable Centric Pronunciation Model D1for Tamil Text D2to Speech
Vaibhavi Rajendran and Bharadwaja Kumar
Q1 School of Computing Science and Engineering, Vellore Institute of Technology, India

ABSTRACT KEYWORDS
5 The HumanD3–Computer Interaction era contrived the researchers to work on speech and languages Grapheme to phoneme;
to develop interactive interfaces. A speech synthesizer is one such interface facilitating people to Letter to sound; Letter to
amalgamate with the digital era. The present work is focused on developing a Letter-To-Sound syllable; Lexicon;
mapping for a Tamil speech synthesizer, which is an intriguing task due to the script to sound Pronunciation model; Seed
lexicon; Syllables; Tamil; Text
mapping irregularities in Tamil. Tamil is a syllable-timed language, hence a new syllable centric to speech
10 rule-based approach is formulated in the present work with a moreD34 extended set of rules than the
existing rule-bases in the literature. This proposed rule-based system outperforms the existing rule-
based systems with a low Character Error Rate and High Mean Similarity Score.

1. INTRODUCTION decade. Two main incisive factors which cater to the per-
formance of a speech synthesizer are intelligibility and 45
Tamil language is a widely spoken Dravidian language in
naturalness. Intelligibility is a measure which accounts
15 Tamil D35Nadu, PondicherryD36 and Andaman & Nicobar in
forD42 the capability of being understood by the listener and
India; also in countries like Sri D37Lanka, Malaysia and
naturalness is a measure which relates to how similar the
Singapore. Dravidian languages are among the most
synthesizer sounds toD43 a human. A speech synthesizer
complex languages in the world at the level of morphol-
comprisesD4 two main modules: a Natural Language Proc- 50
ogy, perhaps comparable only to Finnish and Turkish
essing (NLP) and a Digital Signal Processing (DSP)
20 [1]. This can be attributed to the fact that a significant
module. Naturalness and intelligibility are allied to both
part of grammar that is handled by syntax in English
NLP and DSP modules and hence research on both the
(and other similar languages) is handled within mor-
modules is required. To improve the naturalness by
phology in Tamil (and other Dravidian languages). One
means of a DSP module, work on prosody D45modelling 55
more idiosyncratic characteristic of Tamil is that the Let-
and speech waveform production for Tamil has been
25 ter-D38To-D39Sound (LTS) mapping is non-trivial when com-
predominantly carried out [2,3]. On the contrary, less
pared to other Dravidian languages. Tamil script has
emphasis has been given on the NLP module in a Tamil
lesser number of consonants and it has neither aspirated
speech synthesizer and improvements by means of an
nor voiced stop consonants in written script; however,
LTS mapping model for enhancing naturalness and 60
voiced stops are present in the spoken language as allo-
intelligibility is minimal [4]. An efficient LTS contributes
30 phones. In addition, the voicing of stop consonants is
hugely to improve the intelligibility and naturalness in a
governed by diversified rules and hence decoding all the
precise way. The presence of non-triviality in letter to
rules is highly non-trivial. This fact poses a mammoth
sound correspondence in Tamil language further
challenge in developing a speech synthesizer for enhanc-
increases the need for a robust LTS. 65
ing the interaction of a computer with people who speak
35 Tamil. With the increase in circulation of digitized docu-
Words can be spoken by breaking them into further
ments onD40 the D41Internet, demand for a Tamil speech syn-
smaller sound units such as phones or syllables. The popu-
thesizer to render these digital documents is also
larly used subword sound unit is a “D6p 4 honeme”D74 and it is
increasing. A Tamil speech synthesizer will not only sub-
mapped to a “D8g4 rapheme”D9,4 which is the orthographic/tex-
stantiate as an interface in HCI (Human Computer
tual representation of the sound unit. When we deal with 70
40 Interaction) era but will also help us stride through the
letter to phoneme conversion, the terminology used will
digital era with ease.
be Grapheme to Phoneme (G2P). The other subword unit
is a syllable. “D0S5 yllable”D15 takes the form of CVC and is big-
Tamil speech synthesizers have been an enticing
ger than a phone [5]. Here, C is a consonant and V is a
research topic with notable accomplishments in the past
vowel of the language. When we deal with letter to syllable 75
© 2018 IETE
TIJR_A_1452642.3d (Standard Serif-IETE) (215£280mm) 29-03-2018 8:56

2 V. RAJENDRAN AND B. KUMAR: A ROBUST SYLLABLE CENTRIC PRONUNCIATION MODEL FOR TAMIL TEXT TO SPEECH

conversion we generally use the term “D2L 5 TSD3”5 4.5D Indian lan- AnalogyD74 [10] and pronunciation by latent analogy [14]
guages are syllable based [6] and the choice of syllable as approaches which are suitable only for pronunciation of 125
an efficient synthesis unit has been already experimented rarer context words. One among the many approaches
on and confirmed for Indian languages [7]. Tamil which is explored is a joint-grapheme-phoneme n-gram
80 one among the Indian languages is likewise a syllableD5- approach [15] referred to as a joint-sequence model
timed language [5]. Hence developing a syllableD6-5 based pro- which generates fairly good pronunciation of words. The
nunciation dictionary would be more appropriate for major constraint in this joint-sequence model is the 130
Tamil speech synthesis. Moreover, syllables are believed to need of a huge dataD75-set and some more additional mod-
conceive linguistic information which will help in els [16,17] such as an Expectation Maximization based
85 7m
5D odelling the pronunciation of words efficiently [8]. Sylla- alignment model and a translation model. On the con-
bles hold the acoustic variability of speech and also possess trary, Long-Short Term Memory (LSTM) neural net-
co-articulation information which can be useful for pros- work model completely eliminates the overhead of 135
Q2 ody generation in later stages of a TTS [6]. constructing such additional models and actually pro-
vides similar results [18]. Neural networks and Recur-
This paper is fixated on building a syllable centric pro- rent Neural Networks [19] have also been proposed for
90 nunciation model which is very essential for speech pro- solving LTS problems and they do produce good pro-
duction in a TTS. The pronunciation model is nunciation results. But the questionable factor about the 140
formulated as an LTSD58 mnemonic mapping problem [9], LSTM neural network and other neural networks is the
where letters are to be mapped to their appropriate amount of time it takes to process the data. Though
sound units and can be given as: Lp[Lc]Ln = Sc. Here, Lc numerous models have been explored, one unsettling
95 represents the current Letter under analysis; Lp repre- fact is that there is still a need for a robust LTS. The con-
sents the previous letter; Ln represents the next letter; Sc ception of a syllableD76-based LTS is yet to be focused. 145
is the current letter’s sound representation (mnemonics Some of the work that has been exclusively done for a
or symbol/sound unit) and this mapping function is Tamil LTS and some work which has been adapted for
explained in detail in D59Section D604. This kind of mapping Tamil are discussed in this section.
100 function is required as the spelling and pronunciation of
words are not always the same.

The next section briefs the related work on LTS,


2.1 Dictionary Look-Up Approach
D61Section D623 is an elaboration on characteristics and
complexities in Tamil LTS, D63Section D644 holds our pro- It is a simple approach where a huge set of words used in 150
105 posed modified Tamil LTS approach, D65Section D65 is a the language along with D7its pronunciation is stored in a
discussion about the results obtained in comparison lexicon. Now, when the synthesizer requires the pronun-
with the existing approaches andD67, finally D68Section D696 ciation of any word, it just fetches the lexicon, searches
holds the conclusion andD70 future work. through the list of stored words and retrieves the pro-
nunciation. Initially, a dictionary look-D78up approach was 155
used in a Tamil synthesizer [20]. However, a complete
2. RELATED WORK
list of words spoken in any language cannot be formed
110 Widely used approaches to build LTS system in the liter- and the total number of words in any language cannot
ature forks into either a ruleD71-driven or dataD72-driven be specified at all. Whenever a word is not present in the
approach. As an initiative towards automated pronunci- dictionary it becomes an Out-Of-VocabularyD79 word [21]. 160
ation generation, linguistic rules were confined to form a In addition to this general complexity, Tamil which is
rule-based approach. Some of the rule-based approaches highly inflectional and agglutinative in nature has a
115 which giveD73 fairly good results have been discussed in [4] higher rate of word formation from just one root word.
for Tamil, [10] for English and [11] for Urdu. The inflectional nature of Tamil is so intense that a
Tamil verb can take around 200 forms excluding the 165
Decision trees [12] and Bayesian networks [13] were the auxiliary information. If the auxiliaries are included, one
two initial approaches explored for G2P. A decision tree single Tamil verb can have more than 1800 forms [22].
on syllables was experimented for English but due to The complexity gets further intensified when another
120 improper syllable boundary detection, the generated word gets combined with these inflected words in the
pronunciation of words was not very good when com- process of agglutination. Framing a lexicon which can 170
pared to the performance of a phoneme based decision hold all these possible combinations of word formations
tree [8]. Researchers then explored Pronunciation by in Tamil is highly impossible.
TIJR_A_1452642.3d (Standard Serif-IETE) (215£280mm) 29-03-2018 8:56

V. RAJENDRAN AND B. KUMAR: A ROBUST SYLLABLE CENTRIC PRONUNCIATION MODEL FOR TAMIL TEXT TO SPEECH 3

2.2 Rule-based Approach used for Tamil G2P and has been stated to work
much faster in comparison to Sequitur [28]. When
The nature of the lexicon is “D80closed”D81 whereas the input
syllables are used instead of phonemes there is an
175 word given to the synthesizer is “D82open”D83. Closed nature
obvious increase in the length of the sound unit. This
refers to a fixed number of words while open nature
increase in length of the sound unit means the graph- 225
refers to the unbounded word possibilities in any lan-
eme grouping also increases. Two parameters namely,
guage [21]. To deal with the open nature of words, a
MaxX and MaxY are accountable for managing the
rule-based approach is D84modelled based on the word for-
length of the units and are set to a value of two in
180 mation pattern of the language. The first rule-based sys-
Phonetisaurus [16]. Now, the next apprehensive fact
tem is given in [23] and the second is given in [4]. A
is that the phoneme prediction model in Phonetisau- 230
rule-set very similar to [23] is used in [24] for Tamil
rus works with the support of a bi-gram prediction
G2P aiding the development of a TTS for threeD85 Indian
model [30]. The reason for setting the length of sub-
languages (Tamil, Telugu and Hindi). Work on other
string to two and for choosing a bi-gram model for
185 Dravidian languages such as Telugu [20], [25] and Mala-
phoneme prediction is because phonemes mostly
yalam [26] on LTS for a speech synthesizer also revolve
adhere to a maximum length of two. On the con- 235
mostly around rule-based systems only. An attempt to
trary, Tamil syllables have an average length of two/
provide aD86 unified G2P for Indian languages has also
three and a maximum length of five. Hence when
been made with the aid of language specific rule-based
Phonetisaurus is subjected to letter to syllable mne-
190 approach in alliance with a generic character set for
monic mapping, the toolkit neglects the additional
around 13 Indian languages [6,27]. This unified parser
length of letters in each sub-string. These omissions 240
in [27] is capable of rendering both syllable and pho-
result in an erroneous pronunciation for Tamil sylla-
neme based mnemonic representation of pronuncia-
bles. Even if the values of MaxX and MaxY parame-
tions. The inclusion of syllableD87-based pronunciation
ters are reset from two to a greater value, the bi-
195 generation clearly states that syllables are being experi-
gram prediction model will still be a hindrance to the
mented by researchers for providing a better TTS. The
performance of the toolkit for syllables. 245
performance of the rule-based systems of Tamil in [23],
[4] and [27] are further discussed in D8Section D895.
2.5 Decision Tree

2.3 Sequitur Decision tree algorithm has been experimented for


resolving Tamil G2P in [31] and in Festival synthesizer
200 Sequitur is an open-source data driven G2P converter. [32]. A decision tree has a root node from which leaves
Although sequitur is not exclusively built for Tamil lan- are branched out. The nodes contain a question and on 250
guage, this tool gives fair pronunciation results of Tamil answering the question with either a yes/no we can fur-
words by mapping the Tamil letters to phonemes [28]. ther traverse to the next level until the last level is
The problem arises when letter to “D90syllable”D91 mapping is reached. The last leaf gives you the target class of the let-
205 carried out. The aligner can take up to a maximum of ter/phoneme under analysis from which D93its pronuncia-
256 unique phoneme occurrences only. The total num- tion is learntD94. It is observed that decision tree works well 255
ber of phonemes for a common phone set of 20 official if phonetically related questions are given at each node
Indian languages is only around 50 [20]. Hence the value and complex questions are preferable than simple single-
256 is adequate for phonemes (G2P), but when syllables ton questions. Although decision tree is an automated
210 (LTS) are considered for pronunciation, it is very far approach, it still requires a manually corrected and
away from even an admissible range. The number of syl- annotated dictionary for the training to take place. It 260
lables in any language is always much greater than pho- also requires clustering of the graphemes based on lin-
nemes in the language [29]. Sequitur tool recedes from guistic characteristics. The decision tree discussed here
performing when syllables are used, as the aligner gets is D95centred on phonemes and a decision tree D96centred on
215 restricted to 256 mappings alone and this totally deterio- syllables for Tamil is still unevolved. Constructing a
rates the performance. decision tree based on syllables will definitely result in a 265
bigger tree than the one for phonemes. When the size of
the tree increases, the tree goes through over-fragmenta-
2.4 Phonetisaurus
tion issues and does not work efficiently. Hence a sylla-
Phonetisaurus is also a data driven G2P toolkit which ble centric decision tree does not seem to be a
contains a framework of open source tools and is convincing solution for developing a syllable-D97based pro- 270
220 D92centred on phonemes [17]. Phonetisaurus has been nunciation model for Tamil.
TIJR_A_1452642.3d (Standard Serif-IETE) (215£280mm) 29-03-2018 8:56

4 V. RAJENDRAN AND B. KUMAR: A ROBUST SYLLABLE CENTRIC PRONUNCIATION MODEL FOR TAMIL TEXT TO SPEECH

The need of a Tamil syllable based synthesizer and D98its sound unit increases the LTS mapping complexity in
dependency on a robust Tamil LTS has already been Tamil [21].
mentioned in D9Section D101. The next section states the main
275 complexity involved in developing a Tamil LTS.
4. PROPOSED SYLLABLE CENTRIC RULE-BASED
APPROACH
3. CHARACTERISTICS AND COMPLEXITIES OF In this paper, a modified rule-based approach has been 320
TAMIL LANGUAGE formulated after a profound analysis on the existing
In Tamil scripting system, there are 18 consonants; 12 rule-based approaches. The first ruleD15 base in [23] and
vowels and 1 special symbol (aayudham). Aayudham in the second one in [4] are named as System-I and
280 Tamil is named as a special symbol as it cannot be cate- System-II, respectively, for our discussion. In System-I, a
gorized into either a vowel or a consonant of the lan- set of 67 rules and in System-II, a set of 30 rules have 325
guage. Tamil is the longest surviving Dravidian language been given. When a grapheme has more than one sound
which is highly agglutinative, inflectional and morpho- variant, the additional variants are referred to as allo-
logically very rich. During these years of survival, a lot of phones. Hence rules are to be exclusively formulated for
285 foreign language words have been included in Tamil. To the vallinams which give rise to allophones in Tamil.
write these foreign words in Tamil, few additional alpha- Vallinams which are referred to as the unvoiced stop 330
bets called granthas are required. To represent the Tamil consonants tend to give rise to the voiced stop conso-
character’s sound unit, English alphabets (romanization) nants in certain cases. The allophonic variants dealt in
are used. The vowels (uyir ezhuthukal) and the special these two systems are given in Figures 7 and D168, respec-
290 symbol (aayudham) in Tamil language along with D10its tively. Although System-I and System-II have captured
roman representation D102are given in Figure 1. Similarly, the allophonic representations, the mapping of Tamil 335
the consonants (mei ezhuthukal) and granthas are given letters to syllables has proven to be a highly perplexing
in Figures 2 and 3, respectively. The consonants in Tamil task. There are words which still cannot be processed by
language can be split into three categories according to these two systems. Despite the time and labour
295 ancient Tamil literature: vallinam or hard consonants, demanded for performing the intricate linguistic analy-
mellinam or soft consonants and idaiyinam or medial sis, the fact that any expert system in NLP cannot figure 340
consonants as given in Figure 4. These threeD103 categories out all possible relevant cases [31] coerced us to analyse
of consonants can be further classified into phonetic cat- words which devalued the performance of these two sys-
egories based on the place and manner of articulation. tems. The rules in System-I and System-II were unified
300 Among the hard consonants, except “D104R”D105 which is an in the context of syllables and the results were analysed.
approximant, the other fiveD106 fall into stop consonants cat-
egory; all the soft consonants fall into nasals category;
4.1 Need for New Rules 345
and all the medial consonants fall into approximants cat-
egory [33–35]. An example of mapping between the let- The focal point of this paper is the formulation of a sub-
305 ters in a Tamil word to a sequence of symbols is given in sequent rule-based system developed after a strenuous
Figure 5 which illustrates the letter to sound conversion analysis which is named as System-III for further discus-
after syllabalizing the word. sion. The allophonic variants dealt by System-III are
given already in Figure 6 (in D17Section D183). Every language 350
The character set in Tamil scripting system is less in has D19its own syllable scripting rules;D120 Tamil language’s spo-
comparison with other Indian languages and hence to ken syllable scripting rules haveD12 been followed to syllab-
310 redeem the lesser number of characters with sound ify the words. The new syllable centric rule-based system
units, stop consonants (vallinam) of Tamil tend to have incorporates some of the existing rules along with a vital
more than one sound unit [4] (given in Figure 6). There- set of new rules. 355
fore, the orthographic correspondence which is quite
good in many other Indian languages is not the case in A refined set of 29 rules are present in the new rule-base,
315 Tamil. This one-D13toD14-many mapping of the consonant and including an additional set of 11 new rules while rest of

Figure 1: Tamil vowels and D107its D108Roman notation.


TIJR_A_1452642.3d (Standard Serif-IETE) (215£280mm) 29-03-2018 8:56

V. RAJENDRAN AND B. KUMAR: A ROBUST SYLLABLE CENTRIC PRONUNCIATION MODEL FOR TAMIL TEXT TO SPEECH 5

Figure 2: Tamil consonants and D109its D10Roman notation.

is the initial letter of the word and finally,  represents


that the letter can be followed by any other letter.

For the target letter under analysis (Lc), the rule-list is


traversed from the beginning till end to find a matching 385
rule. The search operation is similar to a linear search
and if the end of the rule-list is reached and none of the
Figure 3: Granthas and D1its D12Roman notation.
rules match the Lp, Lc and Ln set, then the default form
of the target letter is chosen to be the sound representa-
tion. The default sound representations of the letters fol- 390
lowed in System-III are given in Figures 1–D1343D135,
respectively, for the vowels, consonants and granthas.

An ardent analysis was carried out on Tamil words from


“D136Ponniyin Selvan”D137 novel. Ponniyin Selvan is a 2400-
page, historical novel with 42D1383,223 words. We have used 395
Figure 4: Categories of Tamil consonants. this novel for analysis as it contains pure Tamil words.
The analysis revealed certain pattern regularity in the
allophone production of the current letter due to the
the rules are retained from System-I and System-II. The influence of the preceding and following letters. This
proposed rule-based system is given in Figure 9, the first regularity befriended us to formulate rules in addition to 400
360 column lists the vallinam-stop consonants (unvoiced) the existing rules in the literature. As mentioned in
which exhibit the capability of producing allophonic var- D139Section D1403, the ambiguity in pronunciation prevails only
iations (voiced). These variations upsurge the need of mostly around the vallinam-hard consonants of Tamil.
one-D12to-D123many mapping in Tamil LTS. The second col- When we started analysing the words with these six
umn contains the rules and is sub-headed further into hard consonants, an inkling between the influence of 405 Q3
365 three columns: first represents theD124 letter previous to the other consonant groups such as mellinam (nasal/soft)
letter under analysis (Lp); the second is the letter which and idaiyinam-(medial) surfacedD14 unmistakably. Contex-
is under analysis (Lc) and the third is the letter which tual pair of each vallinam co-occurring with other con-
occurs next to the letter under analysis (Ln) for pronun- sonant groups was rigorously monitored. The analysis
ciation. All these three letters are to be considered for process of these consonant groupings was inordinately 410
370 framing a rule. In majority of the cases, the reflection of time consuming but certainly beneficial and procured us
the previous and next letter plays a crucial role in decid- with an additional set of 11 rules.
ing the pronunciation of the current letter. The third col-
umn exemplifies Tamil words which followD125 the rule Four rules specific to consonant “D142k”D143 haveD14 been formu-
given in column two. The fourth column is the mne- lated, the first new rule for consonant “D145k”D146 is given for
375 monic pronunciation representation of the words gener- words where “D147k”D148 is followed by a nasal consonant “D149ND150”D15, 415
ated by the new system (System-III). The fifth column the rule states that the consonant ‘D152k’D153 in such cases is to
specifies appropriate sound variant of the stop conso- be produced as “D154g’D15 instead of “D156k”D157. Similar effect is seen
nant (vallinam) in column one. The rules include terms when consonant “D158y”D159 precedes “D160k”D16 which is the second
and notations like “D126nasal”D127, “D128hal”D129, “D130[]”D13 and “D132”D13; where rule. Although rule for “D162y”D163 and “D164k”D165 occurrence together
380 nasal refers to the mellinam category of the consonants has been given in previous systems, there “D16k”D167 is set to be 420
given in Figure 4, hal refers to halant, [] denotes that Lc pronounced as “D168h”D169. The “D170g”D17 variant fitted in more

Figure 5: Spelling to pronunciation mapping.


TIJR_A_1452642.3d (Standard Serif-IETE) (215£280mm) 29-03-2018 8:56

6 V. RAJENDRAN AND B. KUMAR: A ROBUST SYLLABLE CENTRIC PRONUNCIATION MODEL FOR TAMIL TEXT TO SPEECH

Figure 6: Allophones dealt in new syllable centric rule based system.

appropriately than the “D172h”D173 variant. To determine the Conjointly, sometimes both the rules can be applicable
inclusion of this rule, few Tamil native speakers were on the same letter also. Pertaining to such cases, letter
asked to vote on which of the two variations was more doubling rule is always given priority. An example of 455
425 appropriate. They were given a set of words with a repre- one such word where two rules are applicable to the
sentation of both variants and were asked to select the same letter is given in first row of Figure 10. The word
more appropriate variant. All of them unanimously pertains to a rule where a nasal is followed by “D207k”D208 as
agreed to the “D174g”D175 variant. Third rule is applicable when given in Figure 9 and also certainly to the doubling rule
consonant “D176zD17”D178 occurs with halant before “D179k”D180, here “D18k”D182 of consonant “D209k”D210. The adversity here is that both the 460
430 gets pronounced as “D183g”D184. The fourth rule of “D185k”D186 is applica- rules are applicable to the same letter (first occurrence of
ble when two “D187k”D18 occurs together; it is referred to as let- “D21k”D21 in the word), if we apply both the rules or apply
ter doubling, in such cases both “D189k”D190 get pronounced as them in overlapping order the pronunciation will defi-
“D19k”D192 itself. Letter doubling refers to the instances where nitely be incorrect. Hence one among the two rules
two same letters occur next to each other. needs to be prioritized. System-III generates the correct 465
pronunciation of the first word in Figure 9 due to the
435 Moreover, it was observed in our analysis that all words prioritization of doubling rule. The reason for giving pri-
with letter doublings resulted in theD193 same pattern of ority to the doubling rule over other rules is because the
pronunciation. When we added doubling rules in the doubling rule reflects on two letters and sometimes the
rule-set, the pronunciation of words with such occur- pronunciation of two syllables. When two syllables are 470
rences improved remarkably. Consequently, doubling incorrect, obviously the word is not pronounced
440 rules for three more consonants D194– “D195s”D196, “D197D”D198 and “D19th”D20 are appropriately.
added in the new rule-base while doubling rule for the
other two stop consonants D201– “D20p”D203 and “D204R”D205 already exists. Generally, the allophonic variant is a result of the influ-
ence of one consonant which either precedes or follows
In System-I and System-II, less emphasis is given for let- another consonant. But, there exists a special case where 475
ter doubling; in System-III letter doubling rules have the co-occurrence of the consonant “D213D”D214 and “D215s”D216 produ-
445 been given a higher priority. The doubling can occur ces allophonic variants of both the consonants. Thus, for
along with a combination of halant or mathra in both these two consonants two rules with the occurrence of
the first and second letter. One example of occurrence of consonant “D217D”D218 and “D219s”D20 which produces an allophone of
such doublings for each of the vallinam consonants is both the consonants (“D21T”D2 and “D23ch”D24, correspondingly) 480
listed in Figure 10. Now, there could be cases where a let- were given. The inclusion of granthas in the analysis led
450 ter doubling and another rule seemD206 applicable to the to the addition of two grantha-D25specific rules in the new
same word. In such cases, the right order of application rule-base. The reason for inclusion of granthas in Tamil
of the rule will ensure the correct pronunciation. scripting system has already been stated in D26Section D273.

Figure 7: Allophones dealt in System-I.

485

Figure 8: Allophones dealt in System-II.


TIJR_A_1452642.3d (Standard Serif-IETE) (215£280mm) 29-03-2018 8:56

V. RAJENDRAN AND B. KUMAR: A ROBUST SYLLABLE CENTRIC PRONUNCIATION MODEL FOR TAMIL TEXT TO SPEECH 7

Figure 9: Modified Rule base – System-III.


TIJR_A_1452642.3d (Standard Serif-IETE) (215£280mm) 29-03-2018 8:56

8 V. RAJENDRAN AND B. KUMAR: A ROBUST SYLLABLE CENTRIC PRONUNCIATION MODEL FOR TAMIL TEXT TO SPEECH

Figure 10: Words with letter doublings.

490 The right prioritization and right order in execution of section). For every word in the test set, D23its corresponding
the rules increases the efficiency of the proposed syllable trusted pronunciation is checked against the generated
centric rule-based system. pronunciation of the word by each rule-based system.

To evaluate and compare the results, a test set in a com- 515


5. RESULTS AND CONCLUSION
mon romanization format (used in System-III) is
In this section, we have given comparative results of four formed. Since the sound units used in pronunciation
495 systems where Sytsem-I, System-II and System-IV are dictionaries developed by the four systems differ, we
available from literature and System-III is the proposed need to have a common criterion [10] to compare the
system in the present work. System-I is developed by results. We have converted the phonemic sequence or 520
[23] and System-II is developed by [4]. The unified syllable sequence of the pronunciation dictionary to the
parser developed in [27] is referred to as System-IV. corresponding word and then compared the four sys-
500 Since the rules in [24] system are incorporated from tems. To compare the results generated by the four rule-
System-I developed by [23], we have chosen System-I based systems (I, II, III andD234 IV), a set of three main met-
within these two for comparison. Figure 11 shows the rics widely used in the literature [36] has been used: 525
comparative results of the four systems used for LTS Character Error Rate (CER), Mean Levenshtein Distance
mnemonic mapping. (MLD) and Mean Similarity Score (MSS).

505 The test set consists of a list of randomly selected Tamil To calculate the CER, the number of letters inserted,
words from Tamil D28Wikipedia. The pronunciation of deleted or substituted in the word compared with the
words given in the test set is termed as “D29trusted trusted pronunciation is identified and summed up; 530
pronunciation”D230. We have come up with a bench-marked this value is then divided against the length of the
test set with trusted pronunciations scrutinized by lin- generated word. CER is presented as a metric scaled
510 guistic experts in Tamil (D231for further information about from 0 to 100 in percentage. MLD and MSS are met-
the linguists, please refer to the D23“Acknowledgement” rics which calculate the distance and similarity of the

Figure 11: Evaluation of System-I, System-II and System-III.


TIJR_A_1452642.3d (Standard Serif-IETE) (215£280mm) 29-03-2018 8:56

V. RAJENDRAN AND B. KUMAR: A ROBUST SYLLABLE CENTRIC PRONUNCIATION MODEL FOR TAMIL TEXT TO SPEECH 9

Figure 12: Example processing of a word in System-III.

535 generated pronunciation against the trusted pronun- the word. Undoubtedly, this new syllable centric rule-
ciation with the endowment of the Levenshtein Dis- base gives a better performance than existing rule-bases
tance algorithm [37] on the target and source words due to the newly formulated rules. A notable increase in 580
correspondingly. MLD metric shows how far the gen- the performance of the newly formulated rule-based sys-
erated pronunciation has drifted from the trusted tem can be attributed to two factors. The first is the right
540 pronunciation; a lower value of MLD indicates a bet- prioritization and order of rule implication. The second
ter performance. The MSS is a metric which shows is that, some of the new rules have been formulated on
how similar the pronunciation of the generated word characters “D259k”D260 and “D261th”D26 (fourD263 rules for “D264k”D265 and twoD26 rules 585
is against the trusted pronunciation. It falls between for “D267th”D268) which have a very high frequency of occurrence
a scale of 0 andD235 1 where higher the similarity, the in a corpus. A frequency and distribution analysis of
545 higher the efficiency. Collectively, a low CER andD236 threeD269 Tamil Corpus such as the D270Central Institute of
MLD and a high MSS is a clear indicator for an effi- Indian LanguagesD271, Wikipedia-dump and Ponniyin Sel-
cient pronunciation model. The test set comprisesD237 van, revealed a higher occurrence of characters such as 590
2756 words and the results of pronunciation gener- “D27k”D273 and “D274th”D275 in words. A higher occurrence of a charac-
ated by the four systems (I, II, III and IV) are tabu- ter portrays a higher usage of the character in communi-
550 lated in Figure 11, where CP and IP represent the cation. Hence, rules formulated on such frequently
number of correct pronunciation and incorrect pro- occurring characters have procured us with a drastic
nunciation generation of the words by the system. A increase in the performance of this proposed rule-based 595
CP means that the whole word is pronounced cor- system.
rectly without any error, the CER score of such a
555 word is zero (least possible score) and the similarity Although a rule-based approach generates the pro-
score is one (maximum possible score). Hence the nunciation for Tamil words, the accuracy of pronun-
number of CP generated by a system directly influen- ciation similarity is generally less and hence the
ces the D238MSSD239. Out of the 2756 words in the test set, words are generally subjected to a hand correction 600
1902 words are correctly pronounced by System-III. [31] and then used as a training set for any machine
560 Clearly, System-III provides a higher number of cor- learning approach. The proposed syllable centric
rect pronunciations in comparison to other systems rule-based system (System-III) provides a mean simi-
which apparently makes System-III outperform larity score of 0.97 when checked against the test set
Systems-I, D240II and D241IV with an evidently high MSS and is the maximum so far. This proposed rule-based 605
allied with a least CER and MLD. Hence, the system discerns the need of hand-refinement of the
565 proposed system (System-III) has a tremendous words and increases D276its reliability. Hence, the pro-
increase in pronunciation performance due to the posed rule-base can be used to develop the seed lexi-
extended set of rules. con for Tamil language which can form the training
data for any machine learning technique in future. 610
In order to understand the superiority of the proposed This lexicon will hold the word and D27its syllabaries in
system, let us take as an example a pronunciation of the a line. This syllable representation is convened to
570 word given in Figure 12 D24– “gaN; ga: Na dhiD243”. System-III resolve the non-triviality of the letter to sound corre-
is able to produce the proper pronunciation of the word spondence and in fact the syllable itself will provide
due to the additional rule of consonant “D24k”D245 and “D246ND247”D248 linguistic information for developing a better TTS 615
D249occurrence together. The rule states if “D250k”D251 is followed by based on syllables. Evidence for syllables being able
“D25ND253”D254 then “D25k”D256 should be pronounced as “D257g”D258. The process- to provide linguistic information has already been
575 ing of this word and the corresponding rule applied to confirmed in [6] and [8]D278. Hence forming a legitimate
the letters in the word is shown in Figure 12. The other lexicon through automated rules will prove to be very
systems fail to produce the proper pronunciation of useful. 620
TIJR_A_1452642.3d (Standard Serif-IETE) (215£280mm) 29-03-2018 8:56

10 V. RAJENDRAN AND B. KUMAR: A ROBUST SYLLABLE CENTRIC PRONUNCIATION MODEL FOR TAMIL TEXT TO SPEECH

6. CONCLUSION 2. A. Bellur, K. B. Narayan, K. Raghava Krishnan, and H. A.


Murthy, “Prosody modeling for syllable-based concatena-
The existing rule-based as well as machine learning tive speech synthesis of Hindi and Tamil,” in Proceedings
based approaches for G2P (phoneme centric) in Tamil of National Conference on Communications, IEEE, 2011, 670
are still inadequate to resolve all the complexities and pp. 1–5. Q7
625 has been clearly reviewed in this paper. Moreover, a syl-
3. A. Pradhan, S. Aswin Shanmugam, A. Prakash, K. Veezhi-
lable encompasses linguistic information which will help
nathan, and H. Murthy, “A syllable based statistical text to
in D279modelling the pronunciation of words in a better way. speech system,” in Proceedings of the 21st European Signal
Hence, a syllable centric model is a requisite to enhance Processing Conference (EUSIPCO), IEEE, 2013, pp. 1–5. 675 Q8
the performance of a Tamil TTS in accordance to the
630 fact that Tamil is a syllable-timed language. 4. S. Yuvaraja, V. Keri, S. C. Pammi, K. Prahallad, and A. W.
Black, “Building a Tamil voice using HMM segmented
labels,” 2010. Q9
This new syllable centric rule-based approach (System-
III) is an efficient syllable ingrained LTS and will also 5. K. R. Krishnan, S. Aswin Shanmugam, G. R. Anusha Pra-
serve as a seed lexicon for forming the training set for kash Kasthuri, and H. A. Murthy, “IIT Madras’s submis- 680
any machine learning approach. For many other lan- sion to the blizzard challenge 2014,” in Proceedings of the
635 guages, such a dictionary/lexicon is available online Blizzard Challenge Workshop, 2014. Q10

enabling the experimentation of machine learning


6. H. A. Patil, T. B. Patel, N. J. Shah, H. B. Sailor, R.
approaches on LTS. A dictionary of this sort is not avail- Krishnan, G. R. Kasthuri, T. Nagarajan, L. Christina, N.
able for Tamil. Inevitably, the new syllable ingrained Kumar, V. Raghavendra, et al., “A syllable-based frame- 685
rule-based pronunciation model for Tamil certainly work for unit selection synthesis in 13 Indian languages,”
640 manifests the enhanced pronunciation of Tamil words in International Conference in Oriental COCOSDA held
and also increases the credibility of the system as it bene- jointly with 2013 Conference on Asian Spoken Language
Research and Evaluation (O-COCOSDA/CASLRE), IEEE,
fits us with the generation of a seed lexicon. The lexicon 2013, pp. 1–8. 690 Q11
can then also be annotated with more linguistic informa-
Q4 tion such as position and number of syllables; POS of the 7. K. S. Prahallad and A. W. Black, “Unit size in unit selec-
645 word and other related details. This ideology of an anno- tion speech synthesis,” in Proceedings of the Interspeech,
tated lexicon will be a step towards improving the lin- 2003. Q12

guistic resources for Tamil which is actually a low


8. L. Jiang, H.-W. Hon, and X. Huang, “Improvements on a
resource language [24]. trainable letter-to-sound converter,” in Proceedings of the 695
Fifth European Conference on Speech Communication and
Technology, 1997. Q13
ACKNOWLEDGMENTS
9. J. Lee, and G. G. Lee, “A data-driven grapheme-to-
650 We express our fervent gratitude to Dr D280Va.Mu.Se.Muthurama- phoneme conversion method using dynamic contextual
linga Andavar, D281associate D28professor in PG and Research Depart- converting rules for Korean TTS systems,” Comput. 700
ment of Tamil, Pachaiyappa’s College, Chennai, Tamil D283Nadu, Speech Lang., Vol. 23, no. 4, pp. 423–34, 2009.
India; Dr D284S. Ganesh, D285assistant D286professor, Department of Tamil,
Arul Anandar College, Karumathur, Madurai, Tamil D287Nadu, 10. R. I. Damper, Y. Marchand, M. J. Adamson, and K.
655 India; and Dr D28R. Vimala Devi, D289assistant D290professor, Department Gustafson, “Comparative evaluation of letter-to-sound
of Tamil, Chellammal Women’s College, Chennai, Tamil conversion techniques for English text-to-speech synthe-
D291Nadu, India, for their valuable help in building the pronuncia- sis,” in Proceedings of the third ESCA/COCOSDA Work- 705
tion test set. We also thank the Tamil native speakers who shop (ETRW) on Speech Synthesis, 1998. Q14
actively took part and shared their opinion in the analysis of
660 the pronunciation generation. 11. S. Hussain, “Letter-to-sound conversion for Urdu Text-
To-Speech system,” in Proceedings of the Workshop on
Computational Approaches to Arabic Script-Based Lan-
Q5 DISCLOSURE STATEMENT guages, Association for Computational Linguistics, 2004, 710
pp. 74–9. Q15
No potential conflict of interest was reported by the authorD29sD293.
12. A. K. Kienappel and R. Kneser, “Designing very compact
decision trees for grapheme-to-phoneme transcription,”
Q6 REFERENCES
in Proceedings of the Interspeech, 2001, pp. 1911–4. Q16
1. G. Bharadwaja Kumar, K. N. Murthy, and B. B. Chaud-
665 huri, “Statistical analyses of Telugu text corpora,”Int. J. 13. C. Ma, M. A. Randolph, and J. Drish, “A support vector 715
Dravidian Linguist., Vol. 36, no. 2, pp. 71–99, 2007. machines-based rejection technique for speech
TIJR_A_1452642.3d (Standard Serif-IETE) (215£280mm) 29-03-2018 8:56

V. RAJENDRAN AND B. KUMAR: A ROBUST SYLLABLE CENTRIC PRONUNCIATION MODEL FOR TAMIL TEXT TO SPEECH 11

recognition,” in Proceedings of the IEEE International King, et al. , 2014. “The blizzard challenge,” in Proceedings
Conference on Acoustics, Speech, and Signal Processing, of the Blizzard Challenge Workshop, 2014. Q24
Proceedings (ICASSP’01), IEEE, 2001, Vol. 1, pp. 381–4.
26. S. Nair, C. R. Rechitha, and C. Santhosh Kumar, “Rule-
720 14. J. R. Bellegarda, “Unsupervised, language-independent based grapheme to phoneme converter for Malayalam,”
grapheme-to-phoneme conversion by latent analogy,” Int. J. Comput. Linguist. Nat. Lang. Proc., 2013. 770
Speech Commun., Vol. 46, no. 2, pp. 140–52, 2005.
27. A. Baby, N. L. Nishanthi, A. L. Thomas, and H. A.
15. M. Bisani and H. Ney, “Joint-sequence models for graph- Murthy, “A unified parser for developing Indian language
eme-to-phoneme conversion,” 2008a, Speech Commun., text to speech synthesizers,” in International Conference
725 Vol. 50, no. 5, pp. 434–51, 2008. on Text, Speech, and Dialogue, Springer, 2016, pp. 514–21. Q25

16. S. Jiampojamarn and G. Kondrak., “Letter-phoneme 28. G. Bharadwaja Kumar and M. J. J. Premkumar, “Issues in 775
alignment: An exploration,” in Proceedings of the 48th developing LVCSR system for Dravidian languages: An
Annual Meeting of the Association for Computational Lin- exhaustive case study for Tamil,” Int. J. Comput. Appl.,
guistics, Association for Computational Linguistics, 2010, 2013. Q26
Q17 730 pp. 780–8.
29. E. Veera Raghavendra, S. Desai, B. Yegnanarayana, A.
17. J. R. Novak, N. Minematsu, and K. Hirose, “WFST-based W. Black, and K. Prahallad, “Global syllable set for 780
grapheme-to-phoneme conversion: Open source tools for building speech synthesis in Indian languages,” in
alignment, model-building and decoding,” in Proceedings IEEE Spoken Language Technology workshop, SLT,
of the 10th International Workshop on Finite State Meth- 2008. pp. 49–52. Q27
735 ods and Natural Language Processing, Donostiasan
Q18 Sebastian, Spain, 2012, pp. 45–9. 30. S. Jiampojamarn, G. Kondrak, and T. Sherif, “Applying
many-to-many alignments and hidden Markov models to 785
18. K. Rao, F. Peng, H. Sak, and F. Beaufays, “Grapheme-to- letter-to-phoneme conversion,” in HLT-NAACL, Vol. 7,
phoneme conversion using long short-term memory 2007, pp. 372–9. Q28
recurrent neural networks,” in Proceedings of the IEEE
740 International Conference on Acoustics, Speech and Signal 31. N. Udhyakumar, C. S. Kumar, R. Srinivasan, and R.
Q19 Processing (ICASSP), IEEE, 2015, pp. 4225–9. Swaminathan, “Decision tree learning for automatic
grapheme-to-phoneme conversion for Tamil,” in Pro- 790
19. H. Sak, A. Senior, and F. Beaufays, “Long short-term ceedings of the 9th Conference Speech and Computer,
memory based recurrent neural network architectures for 2004. Q29
large vocabulary speech recognition,” arXiv preprint
Q20 745 arXiv:1402.1128, 2014. 32. A. S. Kurian, B. Narayan, N. Madasamy, A. Bellur, R.
Krishnan, G. Kasthuri, M. V. Vinodh, H. A. Murthy, and
20. H. A. Murthy, A. Bellur, V. Viswanath, B. Narayanan, A. K. Prahallad, “Indian language screen readers and syllable 795
Susan, G. Kasthuri, and R. Krishnan, “Building unit selec- based festival text-to-speech synthesis system,” in Proceed-
tion speech synthesis in Indian languages: An initiative by ings of the Second Workshop on Speech and Language
an Indian consortium,” in Proceedings of the COCOSDA, Processing for Assistive Technologies, Association for
750 Kathmandu, Nepal, 2010. Computational Linguistics, 2011, pp. 63–72. Q30

21. A. W. Black, K. Lenzo, and V. Pagel, “Issues in building 33. S. Karpagavalli and E. Chandra, “A hierarchical 800
Q21 general letter to sound rules,” 1998. approach in Tamil phoneme classification using sup-
port vector machine,” Indian J. Sci. Technol., Vol. 8,
22. M. A. Kumar, V. Dhanalakshmi, R. U. Rekha, K. P. Soman, no. 35, 2015,. Q31
and S. Rajendran, “A novel data driven algorithm for Tamil
Q22 755 morphological generator,” Int. J. Comput. Appl., 2010. 34. K. Karunakaran and V. Jeya, Mozhiyiyal (in Tamil). India:
Kavitha Pathippakam, 1997. 805 Q32
23. A. G. Ramakrishnan, L. N. Kaushik, and L. Narayanan,
“Natural language processing for Tamil TTS,” in Proceed- 35. M. Pushpa and S. Karpagavalli, “Multi-label classification:
ings of the 3rd Language and Technology Conference, Problem transformation methods in Tamil phoneme clas-
Poznan, Poland, pp. 192–6, 2007. sification,” Procedia Comput. Sci., Vol. 115, pp. 572–9,
2017. Q33
760 24. A. Parlikar, S. Sitaram, and A. W. Black, “The Festvox
Indic frontend for grapheme to phoneme conversion,” in 36. B. Hixon, E. Schneider, and S. L. Epstein, “Phonemic simi- 810
Proceedings of the 3rd Workshop on Indian Language larity metrics to compare pronunciation methods,” in Pro-
Q23 Data: Resources and Evaluation, Portoroz, Slovenia, 2016. ceedings of the Interspeech, 2011, pp. 825–8, Q34

25. K. S. Prahallad, A. Vadapalli, S. Kesiraju, H. A. Murthy, S. 37. B. Babych, “Graphonological levenshtein edit distance:
765 Lata, T. Nagarajan, M. Prasanna, H. Patil, A. K. Sao, S. Application for automated cognate identification,” Baltic
J. Modern Comput., Vol. 4, no. 2, pp. 115–28, 2016. 815
TIJR_A_1452642.3d (Standard Serif-IETE) (215£280mm) 29-03-2018 8:56

12 V. RAJENDRAN AND B. KUMAR: A ROBUST SYLLABLE CENTRIC PRONUNCIATION MODEL FOR TAMIL TEXT TO SPEECH

Authors
Vaibhavi Rajendran holds a bachelor’s G. Bharadwaja Kumar holds a PhD
degree in information technology and a degree in computer science and his
master’s degree in software engineering. research interest include machine learn-
820 She is currently pursuing her PhD degree ing, data analytics, Internet of things,
in the field of computer science and en- speech and natural language processing.
gineering. Her research interests include He is very passionate about developing
natural language processing, speech syn- resources and applications for Indian
thesis and artificial intelligence. Languages in the areas of Natural Lan-
guage Processing and Speech.
825 Corresponding author. E-mail: [email protected]
E-mail: [email protected] 835

You might also like