0% found this document useful (0 votes)
2 views

Document

This paper presents a method to enhance the coverage of name pronunciation dictionaries by modeling spelling variations through string rewrite rules. The approach aims to map out-of-vocabulary (OOV) proper names to in-vocabulary homophones without altering their pronunciation, achieving an 80% accuracy in predictions. The algorithm has been successfully applied to various domain-specific dictionaries, significantly increasing their coverage with minimal manual input.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Document

This paper presents a method to enhance the coverage of name pronunciation dictionaries by modeling spelling variations through string rewrite rules. The approach aims to map out-of-vocabulary (OOV) proper names to in-vocabulary homophones without altering their pronunciation, achieving an 80% accuracy in predictions. The algorithm has been successfully applied to various domain-specific dictionaries, significantly increasing their coverage with minimal manual input.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

5th ISCA Speech Synthesis

ISCA Archive Workshop


http://www.isca-speech.org/archive Pittsburgh, PA, USA
June 14-16, 2004

IMPROVING PRONUNCIATION DICTIONARY COVERAGE OF NAMES BY MODELLING


SPELLING VARIATION

Justin Fackrell and Wojciech Skut

Rhetorical Systems Ltd


4 Crichton’s Close
Edinburgh EH8 8DT UK
[email protected]

ABSTRACT of such rules is not particularly high, especially on proper


names.
This paper describes an attempt to improve the coverage
In this paper, we describe a novel method for predict-
of an existing name pronunciation dictionary by modelling
ing OOV proper names. It is based on a simple but effec-
variation in spelling. This is done by the derivation of string
tive principle: mapping an OOV proper name to an in-voca-
rewrite rules which operate on out-of-vocabulary words to
bulary homophone by changing its spelling. The algorithm
map them to in-vocabulary words. These string rewrite rules
automatically learns spelling alternations that lead to such
are derived automatically, and are “pronunciation-neutral”
homophones in the domain of proper names. The tech-
in the sense that the mappings they perform on the existing
nique doesn’t “fire” (i.e. make a prediction) for all OOV
dictionary do not result in a change of pronunciation.
names, but when it does, it produces predictions which are
The approach is data-driven, and can be used online to
phonotactically correct, and it does so without needing gra-
make predictions for some (not all) OOV words, or offline
pheme-phoneme alignment (a requirement of some other
to add significant numbers of new pronunciations to exist-
techniques such as those in [4, 5]).
ing dictionaries. Offline the approach has been used to in-
The paper is organised as follows; we first justify our
crease dictionary coverage for four domain-based dictionar-
approach by describing the coverage statistics of the dic-
ies for forenames, surnames, streetnames and placenames.
tionaries we used as the starting point for this work - this
For surnames, a model trained on a 23,000-entry dictionary
illustrates why data-driven techniques are attractive. Then
was subsequently able to add 5,000 new entries, improving
we review hierarchical approaches to LTS, and describe the
both type coverage and token coverage of the dictionaries
observations which stimulated the current work. The algo-
by about 1%. An informal evaluation suggests that the sug-
rithm is then described in detail, followed by quantitative
gested pronunciations are good in 80% of cases.
measures of how the coverage improved, and informal as-
sessment of how good the predictions of the algorithm are.
1. INTRODUCTION Finally we outline directions in which this work may be de-
veloped in future.
The pronunciation of out-of-vocabulary (OOV) words is one
of the main problems in TTS applications such as automated 1.1. Coverage Requirements
call centres and car navigation systems. Many of the OOV
words are proper names, and these are especially hard to Figure 1 shows how the optimal 1 token coverage and dic-
pronounce because they often originate in other languages tionary size are related for four name-and-address domains.
and they don’t behave like other words. The problem is The token coverage is calculated using frequency data from
worst for languages like English whose underlying orthog- an in-house UK postal database of approximately 50 mil-
raphy is also highly irregular. lion entries, and the details of each domain sub-database
Traditionally this letter-to-sound (LTS) problem has been are shown in Table 1. The figure illustrates that small dic-
attacked by deriving a set of rules. The rules perform a se- tionaries of just 1000 entries provide surprisingly large to-
quence of substitutions, each one replacing a sequence of ken1 Optimal
coverage; the 1000
here implies most
that each common
dictionary surnames
contains provides
those entries which
graphemes by a (possibly empty) sequence of phonemes. over the
cover 50% surname
most tokens. token coverage and the 1000 most com-
The actual substitution mechanism can be based on hand- mon forenames provides over 90% forename token cover-
written string replacement rules [1, 2, 3] or it can be learned
automatically from data [4, 5]. Unfortunately, the accuracy

5th ISCA Speech Synthesis Workshop - Pittsburgh 121


100 the quality of the system. In this paper we propose an au-
forenames tomatically trained filter which has a modest firing rate, but
surnames
streetnames which produces predictions which are judged to be good ap-
90 placenames proximately 80% of the time.
% token coverage

80
1.3. LTS is a many-to-one Mapping
70 The current work was motivated by the observation that,
within a medium-sized surnames dictionary for RP English,
60 roughly 10% of ways of pronouncing a name have more
than one spelling. This is illustrated in Table 1 which shows,
for each domain dictionary, the numbers of unique ortho-
50
1000 10000 100000 1e+06 graphic and phonetic entries.
dictionary size

Fig. 1. Relation between domain-specific dictionary size 


Table 1. Characteristics of pronunciation dictionaries used
and optimal token coverage.
in this paper.
   is the number of dictionary entries

tions, and

(headwords), is the number of distinct pronuncia-
is the number (percentage in brackets)
of pronunciations which have more than one spelling.
age. However, to attain complete or near-complete token
coverage can require very many new types: 100% coverage      (%)
of surname tokens would require the addition of more than
forenames 14962 13479 1747 (13.0)
5,000,000 new entries.
surnames 23746 21487 2641 (12.3)
So the number of new dictionary entries that are re-
streetnames 16211 15267 1358 (8.9)
quired to achieve complete coverage is huge, much too large
placenames 3668 3680 153 (4.2)
to be added by hand. Automatic methods must therefore be
sought which can provide high quality pronunciation pre-
dictions for names. Thus given a list of names which are not in a particular
dictionary, we hypothesize that about 10% of these names
do already have a valid pronunciation in the dictionary. The
1.2. A Hierarchical Approach
LTS problem for these names is then the task of finding the
Liberman and Church [6] recognised that the pronunciation mapping from OOV to in-vocabulary. In other words the
dictionary can be viewed as just the first in a series of fil- task is to try to find a homophone entry in the existing dic-
ters for predicting the pronunciation of a word. In their tionary.
approach, if a word is not found in the pronunciation dic- This problem is closely related to one in the field of
tionary, then attempts to predict the pronunciation are made “name retrieval”, in which database queries are made more
with a sequence of linguistically-motivated filters – these useful by allowing fuzziness in name matching. In name
include the addition of stress-neutral suffixes, rhyming and retrieval, the nearest matches to a search key (i.e. a name)
morphological decomposition. The first filter that fires pro- are returned as “hits”. These hits are found using a variety
duces the pronunciation. What all these filters have in com- of methods (reviewed in [8, 9]) which typically involve the
mon is that they generally do not produce output for every calculation of a distance between the key and each name in
input – it is only the last link in the chain which must be the database.
able to do that. The oldest of these techniques, Soundex and Phonix,
With such a hierarchical approach in mind, it makes perform the distance measure implicitly by attempting to
sense to look for new filters which can make sensible pre- map each word to a representation shared by its “sounda-
dictions for names which are not in the pronunciation dic- likes”. Soundex correctly identifies the names “Reynold”
tionary. A new filter does not have to have a very high firing and “Reynauld” as soundalikes, but it also pairs “Catherine”
rate. All that is required for it to be useful is that, when it and “Cotroneo”[8]. Explicit string edit distances have also
does fire, it produces predictions of a higher accuracy than been used in name retrieval, primarily for the identification
the links in the chain below it. From literature the quality of typing errors [9].
of predictions of automatically trained pronunciation rules Further developments have seen the combination of ex-
is in the region of 70-75% [4, 7] and the best results of other plicit string edit distances with phonetically-motivated sub-
techniques seem to be lower [5]. Therefore any filters with string transformations. The link with phonetics was made
a higher success rate than this have potential for improving explicit in Zobel and Dart’s [8] phonometric approach: LTS

5th ISCA Speech Synthesis Workshop - Pittsburgh 122


rules are used to predict pronunciations of search keys, and strings, to yield a simple context-free substitution string (e.g.
the distance metric is calculated in the phonetic domain. i , y / ). The second and subsequent rules are ob-
While this may provide some improvement for name re- tained by successively adding extra context information, first
trieval systems, the reliance on LTS rules is an obvious weak- to the right, then to the left, where possible.
ness in the approach, and the examples provided in [8] sug-
gest that soundalikes identified by this method are phoneti-
cally diverse i.e. that they are rarely homophones.
Table 2. Rules  postulated from the soundalike pair lin-
sey=lynsey.
If a name retrieval technique could be found which only  substitution rule 
identified homophone matches, then this could be used to
find pronunciations of OOV words by identify their in-voca- 0 i , y /
bulary soundalikes. This is the goal of the current work. 1 i , y / n
2 i , y / - l n
3 i , y / - l ns
2. THE ALGORITHM
4 i , y / - l nse
The current work is based on the idea that, within a partic- 5 i , y / - l nsey$
ular domain (e.g. surnames), there exist universal spelling
alternations which are pronunciation-neutral. That is, there The rules at the top of the list will fire most often, but
are ways in which the spelling of a word can be changed .
will frequently map names to other names with different
pronunciations (e.g. smith smyth). Conversely, the rule at
without changing its pronunciation.
The variation in spelling can be modelled by finding the bottom of the list will fire only once mapping the origi-


string rewrite rules which are pronunciation-neutral in an nal word pair linsey=lynsey.
existing pronunciation dictionary. Given an OOV name, the Each of the rules is evaluated on the rest of the dictio-
algorithm tries to find a string rewrite rule which rewrites nary. For each entry in the dictionary, a particular rule will
the name to an in-vocabulary spelling. If it succeeds, then it do one of four things:
has found a homophone for the OOV word, and the pronun- MISS The pattern doesn’t match (e.g. bilton 2 )
ciation can simply be looked up in the dictionary.
OOV The pattern matches, but the resulting mapping is not
The algorithm will now be described in detail, first by
showing how the model for spelling variation is trained from
in the dictionary (e.g. linton lynton, but lynton is ,
OOV)
an existing dictionary, and then by discussing how the model
is used to make pronunciation predictions for words which DIFF The pattern matches, the resulting mapping is in the
are OOV. dictionary, but the pronunciations are different (e.g.
,
tin tyn, but /t i1 n/ /t ii1 n/
.
2.1. Training GOOD The pattern matches, the resulting mapping is in
the dictionary, and the pronunciations are the same.
The starting point for training is a dictionary which gives
partial coverage of the domain in question. We favour us-
(e.g. linne ,
lynne, and both are pronounced /l
i1 n/)
ing a domain-specific dictionary for this rather than a gen- 
eral purpose dictionary, since we suspect that the nature of
spelling variation is domain-dependent. signed four scores: ,
(0/122 (4 35376 (891:;:
Counting over the whole dictionary, each rule
, and
is as-
.
0(   <=3>398
The first stage is to create a reverse dictionary, which Collectively these scores reflect how useful the rule is – how
maps pronunciations to orthography. All entries in the re- often it can be expected to fire, how often it will map into the
verse dictionary which map one pronunciation to just one dictionary, and how often it makes a pronunciation-neutral


spelling are then removed. For the remainder, each pair of mapping.

  !#"$#%&%'(*)+"
spellings which share a pronunciation are used to generate a Of the , just one rule is chosen for inclusion in the rule


sequence of rewrite rules . Each rewrite set. Currently, the heuristic for choosing the best rule from
,
rule is of the form A B / L R where the pattern A each set is simply to choose the shortest rule which is al-
with L as left context and R as right context is replaced with
maps into the dictionary (
(?8@1:;: A
ways pronunciation-neutral when its pattern matches and it
). In future it may be
the string B.
Consider an example: the pronunciation / l i1 n . advantagous to add sophistication to this part of the tech-
z ii2 / is shared by the spellings linsey and lynsey (lin- nique.
sey=lynsey). Table 2 shows the postulated rewrite rules. The above process is repeated for all other spelling pairs,
The first rewrite rule is obtained by identifying, then to yield a list of substitution rules.
removing, the common prefix and suffix between the two 2 All examples in this list apply to rule BC in Table 2.

5th ISCA Speech Synthesis Workshop - Pittsburgh 123


2.2. Prediction
Table 3. Rewrite rules trained from base dictionaries.
(EDGF
The substitution rules are scored and then sorted by their is the number of previously OOV spellings added as a result
relevance – which is simply the count of how many suc- of the rule.
cessful mappings they make in the existing dictionary. For
no. of highest
(=EDGF
any OOV word, we find the highest-scoring substitution rule
rules scoring rules
which maps the OOV word into the dictionary, and then use
forenames 667 a , / a 126
the pronunciation of that word.
y i / , l 99
This can be done offline to generate new dictionary en-
gh / a , $ 63
tries, or live at synthesis time. In the current work, predic-
igh y / $ , 59
tion is done offline, generating phonetic transcriptions for
surnames 1081 y , i / l 94
a given list of words that are not in the available pronun-
ey ai / , 64
ciation dictionary. The transcriptions are then added to the
n / o , n$ 60
pronunciation dictionary.
all le / $ , 56
Two objections can be made to this approach:
streetnames 702 igh y / $, 57
1. The offline approach restricts the coverage of the new ’ / , 42
lookup method to a predefined set of OOV words al- s ’s /, $ 32
though lookup at synthesis time would enable the sys- y i / , l 31
tem to map unseen OOV words to existing pronunci- placenames 49 t ,
/ t$ 3
ations. However, the application under consideration e / k , $ 3
(UK proper names) means that the domain – although n / n , $ 3
very large – is practically finite and can be covered t et /, 2
by a list of words. Furthermore, the approach taken
is not guaranteed to perform equally well on material
different from proper names. the change in coverage is only a small improvement, but
bear in mind that since the number of types and tokens in the
2. Putting the missing words into the dictionary may be population is very large, this small improvement does in fact
costly in terms of memory. However, memory is gen- represent several thousand new dictionary entries. (As far
erally cheap and the use of efficient representations as token coverage is concerned, a 1% improvement in UK
such as finite-state machines [10, 11] can mean that surname coverage means that about half a million people
this cost is in fact moderate. In the implementation will find their name in the dictionary)
reported in the present paper, a pronunciation dictio-
nary containing over 440K entries was encoded as
a finite-state transducer and then minimised, yield- Table 4. Coverage of dictionary (in %) before and after
ing a finite-state transducer with 215,540 states and application of spelling variation algorithm on the pronun-
549,538 transitions, using less than 8MB of RAM. ciation dictionaries described in Table 1 (FN=forenames,
This figure can be reduced even further by means of SN=surnames, ST=streetnames, PL=placenames).
automata compression [12]. type token
dom. before after H before after H
FN 4.3 5.3 +1.0 94.9 95.3 +0.4
3. EVALUATION SN 4.4 5.3 +0.9 75.2 76.2 +1.0
ST 16.9 18.1 +1.2 81.6 82.0 +0.4
To evaluate the technique, a set of base dictionaries were PL 19.2 19.4 +0.2 75.2 75.2 0.0
used which provide basic coverage of four domains – fore-
names, surnames, streetnames and placenames. Further experiments with larger dictionaries suggest that
The algorithm was used to derive rewrite rules on each the algorithm remains effective at mapping OOV words into
of the four domains of interest, resulting in four sets of the dictionary even when token coverage is 98% and higher.
rewrite rules. The size of these rule sets, plus some example To see whether the mappings suggested by the rewrite
rules, are shown in Table 3. rule algorithm are actually any good, an evaluation experi-
These rewrite rule sets were then used to make predic- ment was carried out. For each domain, a random test set
tions for the remaining OOV words for each domain. was constructed consisting of OOV names for which the
Table 4 shows the percentage improvement in coverage respelling algorithm had found new spellings. For place-
for the dictionaries obtained by using the algorithm. Clearly names, the algorithm only identified 37 new spellings, so

5th ISCA Speech Synthesis Workshop - Pittsburgh 124


all of these were used in the test. For the other domains,
Table 6. Examples of rewrites judged “good”.
200 names were used.
Each stimulus consists of a pair of words: an OOV name forenames hailee hailey,
and the in-vocabulary soundalike identified by the algorithm kymberleigh kymberley ,
,
(e.g. donelly donelley). Subjects were shown the spellings mycheala micheala ,
of both words, and asked to rate each soundalike with the surnames whatkinson watkinson ,
value 1 (“these two words are pronounced the same”) or 0 geoffreys jeffreys ,
(“these two words are not pronounced the same”, or “I don’t casy , casey
know”). Within each domain, the same pairs were shown to streetnames strangways strangeways ,
each subject. The experiment was carried out by five native ailesbury aylesbury ,
British English speakers. macks max ,
Table 5 shows the results from the listening test. The placenames whelford welford ,
predictions of the rewrite algorithm are good, with average holmer homer ,
JI
scores between 80% and 90%. Even if unanimity between lorton laughton ,
all 5 judges is required ( in the table), the results remain
encouraging.
Table 7. Examples of rewrites judged “bad”.
( forenames cansey kasey ,
Table 5. Subjective evaluation of rewrite rules.
K
is the
charistos christos ,
 I
number of test words. is the percentage of “good”
jitendera jitendra ,
words. is the percentage of test words which were judged
surnames nelon nelsen ,
“good” by all 5 subjects.
(  I shazde shazad ,
domain K % % moli morley ,
forenames 200 90.2 70.5 streetnames beechers beeches ,
surnames 200 80.7 61.5 bedes beds ,
streetnames 200 88.6 69.5 cloch clough ,
placenames 37 85.2 81.5 placenames ,
ston seton
prehen preen ,
Table 6 shows examples of successes of the algorithm, longswood longwood ,
in which the mapping was judged “good”. The transforma-
tions which occur are undoubtedly simple, and may well be
produced by other rule-based approaches such as that of [6]. about 80% are good, with a high degree of agreement be-
However, the transformation rules presented here were in- tween the subjects.
ferred fully automatically from an existing dictionary, and One useful property of the technique is that all the pre-
so the applicability of the technique to other domains and dictions it produces are phonotactically correct, since it is
languages appears possible. mapping new words into the existing dictionary. Some rule
based methods such as CART are not constrained in such a
Table 7 shows examples of failures of the algorithm –
way.
what remains to be investigated is what the correlation is
between the rule relevance (i.e. how much evidence for the It is hoped that this approach can form part of a battery
rule is there in the current dictionary) and the quality of the of letter-to-sound approaches to improve dictionary cover-
predictions it makes. age of names.
The algorithm in its current form is fairly simple, and
there is no capacity for more than one rewrite rule to fire on
4. CONCLUSIONS AND FURTHER WORK a particular OOV name. This is something which will be
investigated in future.
In this paper an algorithm has been proposed which con- Further experiments are warranted to investigate the be-
tributes to lexical coverage for names by finding in-vocabu- haviour of the algorithm on larger dictionaries, when token
lary spelling variants for OOV words. The resulting rule sets coverage is approaching 100%, and work is also required to
do not fire with a high frequency, but in an experiment based add sophistication to the context rules.
on a UK database, are able to improve token coverage by ap- Finally, the online applicability of the method described
proximately 1%, which corresponds to about half a million in this paper presents a promising research prospect. If, in
people. An informal evaluation suggests that for those OOV addition to proper names, the algorithm turns out to perform
words for which the algorithm does suggest pronunciations, well on arbitrary input data, applying the rewrite mechanism

5th ISCA Speech Synthesis Workshop - Pittsburgh 125


at synthesis time will increase the coverage of the method [10] Mehryar Mohri, “Finite-state transducers in language
beyond the predefined list of OOV words. For this, an effi- and speech processing,” Computational Linguistics,
cient lookup method is needed that would find the best ap- vol. 23, no. 2, pp. 269–311, 1997.
plicable mapping deterministically for a given string. The
finite-state framework used to encode the pronunciation dic- [11] Stoyan Mihov and Denis Maurel, “Direct construction
tionary in our system offers several efficient methods for of minimal acyclic subsequential transducers,” Lec-
performing this kind of lookup [13, 14]. ture Notes in Computer Science, vol. 2088, 2001.
[12] Jan Daciuk, “Experiments with automata compres-
5. REFERENCES sion,” Lecture Notes in Computer Science, vol. 2088,
pp. 105–112, 2001.
[1] Honey S. Elovitz, Rodney Johnson, Astrid McHugh,
and John E. Shore, “Letter-to-sound rules for au- [13] Kemal Oflazer, “Error-tolerant finite-state recogni-
tomatic translation of english text to phonetics,” in tion with applications to morphological analysis and
IEEE Transactions on Acoustics, Speech and Signal spelling correction,” Computational Linguistics, vol.
Processing ASSP-24, 1976, pp. 446–459. 22, no. 1, pp. 73–89, 1996.

[2] Mehryar Mohri and Richard Sproat, “An efficient [14] M. Crochemore and C. Hancart, “Automata for match-
compiler for weighted rewrite rules,” in Meeting of ing patterns,” in Handbook of Formal Languages,
the Association for Computational Linguistics, 1996, G. Rozenberg and A. Salomaa, Eds., vol. 2, pp. 399–
pp. 231–238. 462. Springer-Verlag, 1997.

[3] I. Lee Hetherington, “An efficient implementation of


phonological rules using finite-state transducers,” in
Proceedings of Eurospeech 2001, 2001.

[4] A. Black, K. Lenzo, and V. Pagel, “Issues in build-


ing general letter to sound rules,” in Proceedings
of ESCA/COCOSDA Workshop on Speech Synthesis,
Jenolan Caves, Australia, 1998.

[5] Yannick Marchand and Robert I. Damper, “A multi-


strategy approach to improving pronunciation by anal-
ogy,” Computational Linguistics, vol. 26, no. 2, pp.
195–219, 2000.

[6] M. Liberman and K. Church, “Text analysis and


word pronunciation in text-to-speech synthesis,” in
Advances in Speech Signal Processing, S. Furui and
M. Sondhi, Eds. Marcel Dekker Inc, 1991.

[7] Ariadna Font Llitjos and Alan W Black, “Knowledge


of language origin improves pronunciation accuracy
of proper names,” 2001.

[8] J. Zobel and P. W. Dart, “Phonetic string match-


ing: Lessons from information retrieval,” in Pro-
ceedings of the 19th International Conference on Re-
search and Development in Information Retrieval, H.-
P. Frei, D. Harman, P. Schäble, and R. Wilkinson,
Eds., Zurich, Switzerland, 1996, pp. 166–172, ACM
Press.

[9] Ulrich Pfeifer, Thomas Poersch, and Norbert Fuhr,


“Searching proper names in databases,” in Hypertext
- Information Retrieval - Multimedia, pp. 259–276.
Universitätsverlag Konstanz, 1995.

5th ISCA Speech Synthesis Workshop - Pittsburgh 126

You might also like