Conversion of NNLM To Back Off Language Model in ASR
Conversion of NNLM To Back Off Language Model in ASR
Volume: 3 Issue: 9
ISSN: 2321-8169
5421 - 5424
_____________________________________________________________________________________
Dr.S.L.lahudkar
AbstractIn daily life, automatic speech recognition is one of the aspect which is widely used for security system. To convert speech into text
using neural network, Language model is one of the block on which efficiency of speech recognition depends. In this paper we developed an
algorithm to convert Neural Network Language model (NNLM) to Back-off language model for more efficient decoding. For large vocabulary
system this conversion gives more efficient result. Efficiency of language model depends on perplexity and Word Error Rate (WER)
Keywords-NNLM,n-gram,back-off model,perplexity,WER
__________________________________________________*****_________________________________________________
I.
INTRODUCTION
P(|h)=
Where w is the current word; h is the history or last n-1
words; h is the truncated history obtained by dropping the first
word in h ; and (h) is a back-off weight that enforces
normalization. The set P contains the n-grams for which we
keep explicit probability estimates
; all other n-gram
probabilities are computed by backing off to a lower order
estimate. The distribution
as well as lower-order
distributions are represented in similar fashion. Since the
majority of probabilities are evaluated using back-off estimates,
the model can be represented with a modest number of
parameters
. Back-off language models have been
expansively studied, and very efficient decoding systems have
been developed.
Generally, language models are a precarious factor in many
speech and language processing technologies, like speech
recognition and accepting, voice searching, informal interaction
and machine conversion. In last few decades, several advanced
language modeling ideas have been proposed. Some of the
methods have engrossed on incorporating linguistic
information such as composition and semantics whereas others
have focused on crucial modeling and parameter estimation
performances. While incredible growth has been made in
language modeling, n-grams are still very much the state-ofthe-art due to the ease of the model and worthy performance
they can achieve. Language models play a significant role in
LITRETURE REVIEW
_______________________________________________________________________________________
ISSN: 2321-8169
5421 - 5424
_____________________________________________________________________________________
which only a very limited amount of in-domain training data is
available. In this paper they present new algorithms to train a
neural network language model on very large text corpora. This
makes possible the use of the approach in domains where
several hundreds of millions words of texts are available. The
neural network language model is evaluated in a state-of-the-art
real-time continuous speech recognizer for French Broadcast
News. Word error reductions of 0.5% absolute are reported
using only a very limited amount of additional processing time.
Holger Schwenk [4] entitled Continuous space language
models describes the usage of a neural network language
model for huge vocabulary continuous speech recognition. The
underlying idea of this approach is to attack the data sparseness
problem by performing the language model probability
estimation in a continuous space. Highly ecient learning
algorithms are described that enable the use of training corpora
of several hundred million words. It is also shown that this
approach can be incorporated into a large vocabulary
continuous speech recognizer using a lattice rescoring
framework at a very low additional processing time. The neural
network language model was thoroughly evaluated in a stateof-the-art large vocabulary continuous speech recognizer for
several international benchmark tasks, in particular the NIST
evaluations on broadcast news and conversational speech
recognition. The new method is compared to four-gram backo language models trained with modied KneserNey
smoothing which has often been reported to be the best known
smoothing method. Usually the neural network language model
is interpolated with the back-o language model. In that way,
consistent word error rate reductions for all considered tasks
and languages were achieved, ranging from 0.4% to almost 1%
absolute.
T. Mikolov, S. Kombrink, L. Burget, J. Cernocky, and S.
Khudanpur[7] entitled Extensions of recurrent neural network
language model, compared Recurrent vs feedforward neural
networks. Recurrent networks have possibility to form short
term memory, so they can better deal with position invariance;
feedforward networks cannot do that.Also, recurrent networks
can learn to compress whole history in low dimensional space,
while feedforward networks compress (project) just single
word. In recurrent networks, history is represented by neurons
with recurrent connections history length is unlimited. In
feedforward networks, history is represented by context of N- 1
words.it is limited in the same way as in N-gram backoff
models.
III.
P(w|h)=
While we can represent the overall NNLM as a back-off
model exactly, it is prohibitively large as noted above. The
technique of pruning can be used to reduce the set of ngrams P for which we explicitly store probabilities P(w|h) .
In this paper, we use entropy-based pruning [9], the most
common method for pruning back-off language models.
5422
IJRITCC | September 2015, Available @ http://www.ijritcc.org
_______________________________________________________________________________________
ISSN: 2321-8169
5421 - 5424
_____________________________________________________________________________________
pruned back-off NNLM. Note that the proposed hierarchical
approach lets us use lower-order NNLMs for backing off and
same-order conventional language models for smoothing zero
probability events.
IV.
RESULT
Weight Metric
3.5
3
2.5
2
1.5
1
0.5
0
10
15
20
25
30
Neuron Count
35
40
45
50
Entropy
Wieght Error
10
220.2647
10
2202647
0.54598
0.073891
0.027183
0.02783
_______________________________________________________________________________________
ISSN: 2321-8169
5421 - 5424
_____________________________________________________________________________________
perplexities of both back-off NNLMs are signicantly better
than the baseline, with the original NNLM achieving lower
perplexities then its back-off counterparts.
REFERENCES
[1] E. Arisoy, S. F. Chen, B. Ramabhadran, and A. Sethy,
Converting neural network language models into back-off
language models for ef- ficient decoding in automatic
speech recognition, in Proc. ICASSP, 2013, pp. 8242
8246.
[2] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, A
neural proba- bilistic language model, J. Mach. Learn. Res.,
vol. 3, pp. 11371155, 2003.
[3] H. Schwenk and J.-L. Gauvain, Training neural network
language models on very large corpora, in Proc. HLTEMNLP, 2005, pp. 201208.
[4] H. Schwenk, Continuous space language models, Comput.
Speech Lang., vol. 21, no. 3, pp. 492518, Jul. 2007.
[5] H.-K. J. Kuo, E. Arisoy, A. Emami, and P. Vozila, Large
scale hierarchical neural network language models, in Proc.
Interspeech, Port- land, OR, USA, 2012.
[6] T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, and S.
Khudanpur, Recurrent neural network based language
model, in Proc. Inter-speech, 2010, pp. 10451048.
[7] T. Mikolov, S. Kombrink, L. Burget, J. Cernocky, and S.
Khudanpur, Extensions of recurrent neural network
language model, in Proc. ICASSP, 2011, pp. 55285531.
[8] H. Schwenk and J.-L. Gauvain, Connectionist language
modeling for large vocabulary continuous speech
recognition, in Proc. ICASSP, Or- lando, FL, USA, 2002,
pp. 765768.
[9] A. Stolcke, Entropy-based pruning of backoff language
models, in Proc. DARPA Broadcast News Transcription
and Understanding Work- shop, Lansdowne, VA, USA,
1998, pp. 270274.
[10] M. Siu and M. Ostendorf, Variable n-grams and extensions
for con- versational speech language modeling, IEEE
Trans. Acoust., Speech, Signal Process., vol. 8, no. 1, pp.
6375, Jan. 2000.
[11] V. Siivola and B. Pellom, Growing an n-gram model, in
Proc. Inter- speech, 2005, pp. 13091312.
[12] V. Siivola, T. Hirsimaki, and S. Virpioja, On growing and
pruning Kneser-Ney smoothed n-gram models, IEEE
Trans. Audio, Speech, Lang. Process., vol. 15, no. 5, pp.
16171624, Jul. 2007.
[13] S. Virpioja and M. Kurimo, Compact n-gram models by
incremental growing and clustering of histories, in Proc.
InterspeechICSLP, Pittsburgh, PA, USA, 2006, pp. 1037
1040.
[14] S. F. Chen and J. Goodman, An empirical study of
smoothing techniques for language modeling, Comput.
Speech Lang., vol. 13, no. 4, pp. 359394, 1999.
[15] P. F. Brown, V. J. Della Pietra, P. V. deSouza, J. C. Lai, and
R. L. Mercer, Class-based n-gram models of natural
language, Comput. Linguist., vol. 18, no. 4, pp. 467479,
1992.
[16] E. Arisoy, T. N. Sainath, B. Kingsbury, and B.
Ramabhadran, Deep neural network language models, in
Proc. NAACL-HLT Workshop: Will We Ever Really
Replace the N-gram Model? On the Future of Lang. Model.
for HLT, Montreal, QC, Canada, Jun. 2012, pp. 2028.
[17] J. Goodman, Classes for fast maximum entropy training,
in Proc. ICASSP, 2001, pp. 561564.
[18] A. Emami, A neural syntactic language model, Ph.D.
dissertation, Johns Hopkins Univ., Baltimore, MD, USA,
2006.
5424
IJRITCC | September 2015, Available @ http://www.ijritcc.org
_______________________________________________________________________________________