deng11_interspeech

Uploaded by

phonokoye

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

deng11_interspeech

Uploaded by

phonokoye

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

INTERSPEECH 2011

Deep Convex Net: A Scalable Architecture for Speech Pattern Classification

Li Deng and Dong Yu

Microsoft Research, Redmond, WA, USA

{deng, dongyu}@microsoft.com

to train a DNN-HMM for speech recognizers with dozens to a

Abstract few hundreds of hours of speech training data with remarkable
We recently developed context-dependent DNN-HMM (Deep- results [5]. To scale up this success with thousands or more
Neural-Net/Hidden-Markov-Model) for large-vocabulary hours of training data, we have been encountering seemingly
speech recognition. While achieving impressive recognition insurmountable difficulty with the current DNN architecture
error rate reduction, we face the insurmountable problem of used in our work in the recent past [5][6][7][8].
scalability in dealing with virtually unlimited amount of The main thrust of the research reported in this paper is a
training data available nowadays. To overcome the scalability new deep learning architecture, referred to as Deep Convex
challenge, we have designed the deep convex network (DCN) Network (DCN), which squarely attacks the learning
architecture. The learning problem in DCN is convex within scalability problem. The organization of this paper is as
each module. Additional structure-exploited fine tuning further follows. In Section 2, we provide an overview of the DCN
improves the quality of DCN. The full learning in DCN is architecture, and focus on how it integrates some key ideas
batch-mode based instead of stochastic, naturally lending it from DBN, boosting, and extreme learning machine. In
amenable to parallel training that can be distributed over many Section 3, an accelerated optimization algorithm we developed
machines. Experimental results on both MNIST and TIMIT recently is outlined, which “fine-tunes” the DCN weights
tasks evaluated thus far demonstrate superior performance of capitalizing on the structure in each module of the DCN. We
DCN over the DBN (Deep Belief Network) counterpart that show experimental results on static classification tasks defined
forms the basis of the DNN. The superiority is reflected not on MNIST (image) and TIMIT (speech), with the accuracy of
DCN exceeding that of DBN on both tasks.
10.21437/Interspeech.2011-607

only in training scalability and CPU-only computation, but

more importantly in classification accuracy in both tasks.
2. The DCN Architecture
Index Terms: deep learning, scalability, convex optimization,
A DCN includes a variable number of layered modules,
neural network, deep belief network, phone state
wherein each module is a specialized neural network
classification, batch-mode training, parallel computing
consisting of a single hidden layer and two trainable sets of
weights. More particularly, the lowest module in the DCN
1. Introduction comprises a first linear layer with a set of linear input units, a
Automatic speech recognition (ASR) has been the subject of a non-linear layer with a set of non-linear hidden units, and a
significant amount of research and commercial development in second linear layer with a set of linear output units. For
recent years. Recent research in ASR has explored deep, instance, if the DCN is utilized in connection with recognizing
layered architectures, motivated partly by the desire to an image, the input units can correspond to a number of pixels
capitalize on some analogous properties in the human speech (or extracted features) in the image, and can be assigned
generation and perception systems; e.g., [1][2]. In these values based at least in part upon intensity values, RGB
studies, learning of model parameters has been one of the most values, or the like corresponding to the respective pixels. If
prominent and difficult problems. In parallel with the the DCN is utilized in connection with speech recognition, the
development in ASR research, recent progresses made in set of input units may correspond to samples of speech
learning methods from neural network research has also waveform, or the extracted features from speech waveforms,
ignited interest in exploration of deep-structured models; e.g. such as power spectra or cepstral coefficients. Note the use of
[3]. One particular advance is the development of effective speech waveform as the raw features to a speech recognizer is
learning techniques for Deep Belief Networks (DBNs), which not a crazy idea. An early study for an HMM-like system (i.e.,
are densely connected, directed belief networks with many the hidden filter) that models speech waveform directly as the
hidden layers. In general, DBNs can be considered as a observation can be found in [9]. And many years later the use
complex nonlinear feature extractor with many layers of of more powerful Restricted Boltzmann Machine (RBM)
hidden units and at least one layer of visible units, where each overcomes some difficulty encountered earlier [10].
layer of hidden units learns to represent features that capture The hidden layer of the lowest module of a DCN
higher order correlations in the original input data [3]-[8] comprises a set of non-linear units that are mapped to the input
While DBNs have been shown to be extremely powerful units by way of a first, lower-layer weight matrix, which we
in connection with performing recognition and classification denote by W. For instance, the weight matrix may comprise a
tasks including speech recognition [4]-[7], training DBNs has plurality of randomly generated values between zero and one,
proven to be more difficult computationally. In particular, or the weights of an RBM trained separately. The non-linear
conventional techniques for training DBNs involve the units may be sigmoidal units that are configured to perform
utilization of a stochastic gradient descent learning algorithm. non-linear operations on weighted outputs from the input units
Although stochastic gradient descent has been shown to be (weighted in accordance with the first weight matrix W).
powerful for fine-tuning weights assigned to a DBN, such The second, linear layer in any module of a DCN includes
learning algorithm is extremely difficult to parallelize across a set of output units that are representative of the targets of
machines, causing learning at large scale to be difficult. It has classification. For instance, if the DCN is configured to
been possible to use one single, very powerful GPU machine perform digit recognition (e.g., the digits 1-10), then the

plurality of output units may be representative of the values 1, connecting any adjacent modules in the DCN has been
2, 3, and so forth up to 10 with a 0-1 coding scheme. If the motivated partly by our earlier work on the deep-structured
DCN is configured to perform ASR, then the output units may conditional random field [11].
be representative of phones, HMM states of phones, or In Fig. 2, we show information flow within each module
context-dependent HMM states of phones in a way that is of the DCN that is not at the lowest layer. The part of the input
similar to [5][6]. The non-linear units in each module of the units in any non-bottom module corresponding to the raw
DCN may be mapped to a set of the linear output units by way training data can be mapped to the hidden units by the first
of a second, upper-layer weight matrix, which we denote by U. weight matrix described earlier, denoted here in Fig. 2 as Wrbm
This second weight matrix can be learned by way of a batch (In our experiments reported in Section 4, the use of separately
learning process, such that learning can be undertaken in trained RBM to initialize these weights gave much better
parallel. Convex optimization can be employed in connection results than all other ways of initialization). The portion of the
with learning U. For instance, U can be learned based at least input units in the module corresponding to the output units of
in part upon the first weight matrix W, values of the coded the lower module can be mapped to the common set of hidden
classification targets, and values of the input units. units by a new weight matrix, which may be initialized by, for
example, random numbers. We denote this set of weights by
Wran in Fig. 2. Thereafter, the aforementioned second weight
matrix U, which connects between the hidden units and the
linear output units of this module, can be learned by way of
convex optimization. This operation is represented by the box
labeled with “Learning Component” in Fig. 2.

Fig.1: Block diagram showing two of many modules in a DCN

and their connection; note the overlapping of two linear layers
in the two adjacent modules Fig. 2. Information flow in one typical module, which is not at
the lowest layer, of the DCN.
As indicated above, the DCN includes a set of serially
connected, overlapping, and layered modules, wherein each The pattern discussed above of including output units in a
module includes the aforementioned three layers -- a first lower module as a portion of the input units in an adjacent
linear layer that includes a set of linear input units whose higher module in the DBN and thereafter learning a weight
number equals the dimensionality of the input features, a matrix that describes connection weights between hidden units
hidden layer that comprises a set of non-linear units whose and linear output units via convex optimization can continue
number is a tunable hyper-parameter, and a second linear layer for many modules -- e.g., tens to hundreds of modules in our
that comprises a plurality of linear output units whose number experiments. A resultant learned DCN may then be deployed
equals that of the target classification classes (e.g., the total in connection with an automatic classification task such as
number of context-dependent phones clustered by a decision frame-level speech phone or state classification. Connecting
tree used in [5][6]). The modules are referred to herein as DCN’s output to an HMM or any dynamic programming
being layered because the output units of a lower module are a device enables continuous speech recognition. Details of this
subset of the input units of an adjacent higher module in the final step can be found in [5] and will not be dealt with in the
DCN. More specifically, in a second module that is directly remaining part of this paper.
above the lowest module in the DCN, the input units can
include the output units of the lower module(s). The input
units can additionally include the raw training data – in other
3. DCN Fine Tuning in Batch Mode
words, the output units of the lowest module can be appended Unlike DBN, the “fine tuning” algorithm of DCN weights we
to the input units in the second module, such that the input developed recently is confined within each module, rather than
units of the second module also include the output units of the across all layers globally. It is batch-mode based, rather than
lowest module. A block diagram showing two of many stochastic; hence it is naturally parallelizable. Further, it
modules in a DCN and their connection is shown in Fig. 1. makes direct use of the DCN structure where the strong
The sharing or overlapping of the two adjacent modules in a constraint is imposed between the upper layer’s weights, U,
DCN is represented explicitly in the overlapping portion of the and the lower layer’s weights, , within the same module as
two large boxes labeled as MODULE 1 and MODULE 2, the weighted pseudo-inverse:
respectively, in Fig. 1. This particular way of serially = (Λ ) Λ . (1)

2286
Here, H is the output vectors of the hidden units: batch training instead of full batch training. After the final
= ( ), (2) mini-batch data are consumed in each training epoch, we then
Λ is the weight matrix constructed to direct the optimization’s use a routine for block matrix multiplication and inversion,
search direction, and T is the classification target vectors. which incurs some undesirable but unavoidable waste of
We use the batch-mode gradient descent to fine tune , computation with a single CPU, to combine the full training
where the gradient of the mean square error, E, after constraint data in implementing the estimation formula of Eq. (1) in
(1) is imposed is given by order to achieve approximate effect of batch-mode training to
our best ability.
= 2 ∘ ( − )

∘ [ (Λ )( ) − Λ ( )], 4.3. TIMIT results
The DCN, as well as DBN, has the strength mainly as a static
and = Λ (Λ ). pattern classifier. HMM or dynamic programing is a
More detail of this DCN fine tuning algorithm is provided convenient tool to help port the strength of a static classifier to
in [13]. handle dynamic patterns, as we recently demonstrated with
DNN [5][6]. (We are nevertheless clearly aware that the
4. Experimental Evaluation unique elasticity of temporal dynamic of speech as explained
in [1] would require temporally-correlated models better than
4.1. NMIST experiments and results HMM for the ultimate success of ASR, and integrating such a
model with the DCN to form the coherent dynamic DCN is by
Comprehensive experiments have been conducted to evaluate itself a more challenging new research beyond the scope of
the performance of the DCN architecture and the related this paper.) Therefore, as our first step of experimentation, we
learning algorithms on the benchmark MNIST database; see focus on evaluation of the static classification ability of DCN
[12] for detail of this task. In brief, the MNIST consists of here. To this end, we choose frame-level phone-state
binary images of handwritten digits, and is one of the most classification error rate as the main evaluation criterion. In this
common classification tasks for evaluating machine learning case, we have a total of 183 state classes, three for each of the
algorithms. We only briefly summarize our strong results on 61 phone labels defined in the TIMIT training set. The actual
MNIST here in Table 1. state labels are obtained by HMM forced alignment. We also
show frame-level phone classification error rates, when the
Table 1: Classification error rate comparison: DBN vs. DCN errors in the state within the same phone are not counted, for
Shallow 61 phone classes.
DCN
DBN [3] DBN DCN (D)CN The results in Table 2 are obtained by a typical run of the
(Fine-
(Hinton’s) (MSR’s) (no Fine- (Fine-tuned DCN program when 6,000 hidden units are used in each
tuning)
tuning) single layer) module of DCN, where “X (Y)” in the first column denotes
1.20% 1.06% 0.83% 0.95% 1.10% the Xth layer of DCN (counted from bottom up) and Yth epoch
in the fine-tuning optimization. The hyper-parameters are
4.2. TIMIT experiments tuned using the development set defined in TIMIT. We used a
We now focus on our more recent experiments where we single-hidden-layer RBM that was trained in the same way as
apply the same DCNs and the related learning algorithms in [4][5] to initialize weights W at the lowest module of the
developed on the MNIST task to the speech database of DCN before applying fine tuning as described in Section 3.
TIMIT. Standard MFCC feature was used, but with a longer We have found empirically that if random noise is used for the
than usual context window of 11 frames. This gives rise to a initialization, then the error rate becomes at least 30% relative
total of 39*11=429 elements in each feature vector, which we higher than presented in Table 2.
call a “super-frame”, as the input to each module of the DCN. Fine-tuned weights from lower modules are used to
For the DCN output, we used 183 target class labels as “phone initialize the weights at higher modules. Then, they are
states”. The 183 target labels correspond to all states in the 61 appended with random weights associated with the output
phone-like units defined in TIMIT. units from the immediately lower module before fine tuning at
The standard training set of TIMIT consisting of 462 the current module.
speakers was used for training the DCN. The total number of Table 2. Frame-level classification error rates of phones (61
super-frames in the training data is about 1.12 million. The classes) and states (183 classes) as a function of the number of
standard development set of 50 speakers, with a total of stacked DCN modules; RBM is used to initialize lowest-level
122,488 super-frames, was used for cross validation. Results network weights.
are reported using the standard 24-speaker core test set
consisting of 192 sentences with 7,333 phone tokens and Train Dev. Test Test
Layer
57,920 super-frames. State State Phone State
(Epoch)
The algorithms presented in this paper all are batch-mode Err % Err % Err % Err %
based. This is because, as an example of convex optimization 1 (1) 27.19 49.50 39.18 49.83
with the global optimum, the pseudo-inverse is carried out … … … … …
necessarily involving the full training set. However, in our 1 (8) 21.20 46.00 36.12 46.30
experiments where the full training set of TIMIT is 2 (1) 13.01 44.44 34.87 44.88
represented by a very large 429 by 1.12M matrix, the various 3 (1) 7.96 44.30 34.64 44.70
batch-mode matrix multiplications required by the algorithms 4 (1) 5.14 44.22 34.67 44.65
easily cause a single computer to run out of memory. (We had 5 (1) 3.51 44.11 34.56 44.53
not implemented our learning algorithms over parallel 6 (1) 2.57 44.25 34.83 44.70
machines at the time of carrying out the reported experiments 7 (1) 1.95 44.25 34.74 44.69
here). To overcome the CPU memory limitation problem, we
block the training data into many mini-batches, and use mini-

2287
The most notable observation from Table 2 is that as the 6. Acknowledgements
layers gradually add up, the error rates for training,
development, and core test sets continue to drop until over- We are grateful to many helpful discussions with, and valuable
fitting occurs at Layer 6 in this example. There has been very suggestions and encouragements by John Platt, Geoff Hinton,
little published work on frame-level phone or phone state Dave Wecker, and Alex Acero. We also thank G.B. Huang for
classification. The closest work we have been able to find discussions on many possible basic modules of the DCN.
reported over 70% phone state error rate with an easier 132
phone state classes (than our 183 state classes) but with a more 7. References
difficult speech database. We ran the DBN system of [7] on
the same TIMIT data and found the corresponding frame-level [1] L. Deng, D. Yu, and A. Acero. “Structured speech
phone state error rate be to 45.04% (which gave 22% phonetic modeling,” IEEE Trans. Audio, Speech & Language
recognition error rate after running a decoder with a standard Proc., vol. 14, no. 5, pp. 1492-1504, September 2006.
bi-gram phonetic “language” model as reported in [7]). This [2] N. Morgan. “Deep and Wide: Multilayers in Automatic
frame-level error rate achieved by DBN is slightly higher than Speech Recognition,” IEEE Trans. on Audio, Speech,
the DCN’s error rate of 44.53% shown in Table 2. and Language Processing, 2011 (in press).
In Table 3 is a summary of the results, with different [3] G. Hinton and R. Salakhutdinov. “Reducing the
hyper-parameters than in Table 2. It shows the dependency of Dimensionality of Data with Neural Networks”, Science,
frame-level classification error rates on the number of hidden Vol. 313. no. 5786, pp. 504 – 507, 2006.
units, which is fixed for all modules of the DCN in our current [4] A. Mohamed, G. Dahl, G. Hinton, “Deep belief networks
implementation. We also fold the 61 classes in the original for phone recognition”, NIPS Workshop on Deep
TIMIT label set into the standard 39 classes. The Learning for Speech Recognition and Related
corresponding results are presented in Table 3 also. These Applications, Dec. 2009.
results are obtained without the use of phone-bound state [5] G. Dahl, D. Yu, L. Deng, and A. Acero. “Context-
alignment. That is, there is no left-to-right constraint, and Dependent Pre-trained Deep Neural Networks for Large
frame-level decision is made. These results are obtained also Vocabulary Speech Recognition”, IEEE Trans. on Audio,
without any phone-level “language” model. Speech, and Language Processing, 2011 (in press).
[6] D. Yu, L. Deng, and G. Dahl, “Roles of Pre-Training
Table 3. Frame-level classification percent error rates of and Fine-Tuning in Context-Dependent DBN-HMMs for
phones (61 or folded 39 classes), and of phone states (183 Real-World Speech Recognition,” NIPS Workshop on
classes) as a function of the size of hidden layer units in DCN. Deep Learning and Unsupervised Feature Learning,
December 2010.
Frame-level Frame-level Frame-level
Size of
Test Phone Test Phone Test State
[7] A. Mohamed, D. Yu, and L. Deng, “Investigation of
Hidden Full-Sequence Training of Deep Belief Networks for
Err % Err % Err %
Units Speech Recognition,” in Interspeech, September 2010.
(39 classes) (61 classes) (181 classes)
3000 27.11 35.97 46.08 [8] L. Deng, M. Seltzer, D. Yu, A. Acero, A. Mohamed, and
4000 26.37 35.27 45.39 G. Hinton. “Binary Coding of Speech Spectrograms
6000 25.44 34.12 44.24 Using a Deep Auto-encoder,” in Interspeech, Sept. 2010.
7000 25.22 34.04 44.04 [9] H. Sheikhzadeh and L. Deng. “Waveform-Based
Speech Recognition Using Hidden Filter Models:
5. Summary and Conclusions Parameter Selection and Sensitivity to Power
Normalization, IEEE Trans. on Speech and Audio
We recently developed a DNN-based architecture for large- Processing, Vol. 2, pp. 80-91, 1994.
vocabulary speech recognition. While achieving remarkable [10] N. Jaitly and G. Hinton. “Learning a Better
success with this approach, we face the scalability problem in Representation of Speech Sound Waves Using Restricted
practical applications, e.g. voice search. In this paper we Boltzmann Machines,” in Proc. ICASSP, 2011, Prague.
present a novel DCN architecture aimed to enable scalability. [11] D. Yu, S. Wang, and L. Deng. “Sequential Labeling
Experimental results on both MNIST and TIMIT tasks Using Deep-Structured Conditional Random Fields,”
demonstrate higher classification accuracy than DBN. The IEEE J. Selected Topics in Sig. Proc., Vol. 4(6),
superiority of DCN over DBN is particularly strong in the pp.965-973, Dec. 2010.
MNIST task so long as we use a much deeper DCN than could [12] Y. LeCun, L. Bottou, Y., Bengio, and P. Haffner
be computationally afforded by the conventional DBN “Gradient-Based Learning Applied to Document
architecture and learning. While the basic module of the DCN Recognition”, Proc. IEEE, Vol. 86, pp. 2278-2324, 1998.
reported in this paper is similar to the extreme learning [13] D. Yu and L. Deng, “Accelerated Parallelizable Neural
machine in the literature (e.g., [14]), any simple or weak Networks Learning Algorithms for Speech Recognition,”
classifier can be embedded in the DCN architecture to make it Proc. Interspeech 2011, accepted.
stronger. [14] G. B. Huang, Q-Y. Zhu, and C.K. Siew. “Extreme
The future directions of our work include: 1) full Learning Machine: Theory and Applications”,
exploration of the rich flexibility in the architecture and Neurocomputing, vol. 70, pp. 489-501, 2006.
module type provided by the basic DCN framework presented [15] J. Baker, et. al. “Research Developments and Directions
in this paper; 2) addition of a dynamic programing based in Speech Recognition and Understanding,” IEEE Sig.
decoder on top of the final layer of the DCN to enable Proc. Mag., vol. 26, pp. 75-80, May 2009.
continuous phonetic or speech recognition; 3) learning (rather [16] L. Deng, “Computational Models for Speech
than tuning) of hyper-parameters in DCN; 4) development of Production,” Chapter in Computational Models of Speech
speaker and environment adaptation techniques for DCN; and Pattern Processing, pp. 199-213, Springer, 1999.
5) development of a temporal DCN which integrates
generative dynamic models of speech (e.g., [15][16]) with the
DCN architecture presented in this paper.

2288

Artificial Intelligence in Economics and Finance Theories: Tankiso Moloi Tshilidzi Marwala
No ratings yet
Artificial Intelligence in Economics and Finance Theories: Tankiso Moloi Tshilidzi Marwala
131 pages
Deep Learning For Intelligent Wireless Networks: A Comprehensive Survey
No ratings yet
Deep Learning For Intelligent Wireless Networks: A Comprehensive Survey
25 pages
Investigation of Full-Sequence Training of Deep Belief Networks For Speech Recognition
No ratings yet
Investigation of Full-Sequence Training of Deep Belief Networks For Speech Recognition
4 pages
A Survey On Deep Network PDF
No ratings yet
A Survey On Deep Network PDF
24 pages
Deep learning - Phil compass - draft 3
No ratings yet
Deep learning - Phil compass - draft 3
19 pages
Survey On Evolving Deep Learning Neural Network Architectures
No ratings yet
Survey On Evolving Deep Learning Neural Network Architectures
10 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
16 pages
HRJ-R1333
No ratings yet
HRJ-R1333
6 pages
A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE
No ratings yet
A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE
5 pages
Machine Learning 4th Unit
No ratings yet
Machine Learning 4th Unit
54 pages
Demystifying Parallel and Distributed Deep Learning
No ratings yet
Demystifying Parallel and Distributed Deep Learning
43 pages
Deep
No ratings yet
Deep
15 pages
DeepLearningBook_RefsByLastFirstNames
No ratings yet
DeepLearningBook_RefsByLastFirstNames
195 pages
7 Deep Learning
No ratings yet
7 Deep Learning
75 pages
A Review On Deep Learning Applications
No ratings yet
A Review On Deep Learning Applications
11 pages
DL Intro
No ratings yet
DL Intro
64 pages
10 The '' Vanilla
No ratings yet
10 The '' Vanilla
6 pages
Introduction To Deep Learning: Poo Kuan Hoong 19 July 2016
No ratings yet
Introduction To Deep Learning: Poo Kuan Hoong 19 July 2016
53 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Full Document- Fake News Detection
No ratings yet
Full Document- Fake News Detection
69 pages
Solving The Cocktail Party Problem Using Deep Neural Networks
No ratings yet
Solving The Cocktail Party Problem Using Deep Neural Networks
2 pages
Oversampling Techniques Deep Belief Network DenseNets DNN
No ratings yet
Oversampling Techniques Deep Belief Network DenseNets DNN
23 pages
2111CS010077 deep learning
No ratings yet
2111CS010077 deep learning
10 pages
Extensive Literature Ondeep Learning
No ratings yet
Extensive Literature Ondeep Learning
52 pages
ML Unit-5
No ratings yet
ML Unit-5
22 pages
Deep Learning I. Introduction (: 1. The History and The Development of Deep Learning
No ratings yet
Deep Learning I. Introduction (: 1. The History and The Development of Deep Learning
21 pages
Deepnet Lourentzou
No ratings yet
Deepnet Lourentzou
49 pages
An Unsupervised Deep Domain Adaptation Approach For Robust Speech Recognition PDF
No ratings yet
An Unsupervised Deep Domain Adaptation Approach For Robust Speech Recognition PDF
12 pages
An Introduction To Deep Learning: January 2011
No ratings yet
An Introduction To Deep Learning: January 2011
14 pages
2023 IEEE TNNLS A Survey On Evolutionary Neural Architecture Search
No ratings yet
2023 IEEE TNNLS A Survey On Evolutionary Neural Architecture Search
21 pages
Sound
No ratings yet
Sound
5 pages
About Deep Learning: How Does Deep Learning Attain Such Impressive Results?
No ratings yet
About Deep Learning: How Does Deep Learning Attain Such Impressive Results?
3 pages
Lec_2
No ratings yet
Lec_2
42 pages
A Comparative Study On Selecting Acoustic Modeling Units in Deep Neural Networks Based Large Vocabulary Chinese Speech Recognition
No ratings yet
A Comparative Study On Selecting Acoustic Modeling Units in Deep Neural Networks Based Large Vocabulary Chinese Speech Recognition
6 pages
A Survey Neural Network-Interpretability
No ratings yet
A Survey Neural Network-Interpretability
17 pages
Multi-Column Deep Neural Networks For Image Classification
No ratings yet
Multi-Column Deep Neural Networks For Image Classification
8 pages
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
No ratings yet
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
43 pages
Unit 5
No ratings yet
Unit 5
39 pages
Deep Belief Network
No ratings yet
Deep Belief Network
4 pages
Wharton vd. - 2021 - Deep Neural Networks for Radar Waveform Classification
No ratings yet
Wharton vd. - 2021 - Deep Neural Networks for Radar Waveform Classification
5 pages
Deep Learning For Edge Computing Applications: A State-of-the-Art Survey
No ratings yet
Deep Learning For Edge Computing Applications: A State-of-the-Art Survey
4 pages
An Experimental Study On Speech Enhancement Based On Deep Neural Networks
No ratings yet
An Experimental Study On Speech Enhancement Based On Deep Neural Networks
4 pages
Image_recognition_based_on_deep_learning (1)
No ratings yet
Image_recognition_based_on_deep_learning (1)
5 pages
DL UNIT 1
No ratings yet
DL UNIT 1
19 pages
Introduction to Deep Learning 17th January 2025 (2)
No ratings yet
Introduction to Deep Learning 17th January 2025 (2)
60 pages
Deep Learning
No ratings yet
Deep Learning
15 pages
Convex Formulation of Overparameterized Deep Neural Networks
No ratings yet
Convex Formulation of Overparameterized Deep Neural Networks
13 pages
Deep Learning Review and Discussion of Its Future PDF
No ratings yet
Deep Learning Review and Discussion of Its Future PDF
7 pages
UNIT-2 DL
No ratings yet
UNIT-2 DL
51 pages
Deep and Evolutionary
No ratings yet
Deep and Evolutionary
19 pages
Hao 2016
No ratings yet
Hao 2016
23 pages
NN&DP Unit3
No ratings yet
NN&DP Unit3
41 pages
Machine Learning
100% (4)
Machine Learning
134 pages
Isolated Speech Recognition Using Artificial Neural Networks
No ratings yet
Isolated Speech Recognition Using Artificial Neural Networks
5 pages
CNN Maryim 2020
No ratings yet
CNN Maryim 2020
12 pages
FRJ Paper
No ratings yet
FRJ Paper
9 pages
CV Mot
No ratings yet
CV Mot
69 pages
UNIT-1 Foundations of Deep Learning
100% (1)
UNIT-1 Foundations of Deep Learning
51 pages
Unit II
No ratings yet
Unit II
56 pages
p6899 Zhang
No ratings yet
p6899 Zhang
5 pages
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
How to read a paper involving AI
No ratings yet
How to read a paper involving AI
10 pages
ML Detention Work
No ratings yet
ML Detention Work
3 pages
Nec Ass-4
No ratings yet
Nec Ass-4
2 pages
Sarvagha K DS
No ratings yet
Sarvagha K DS
1 page
ERP Features with AI
No ratings yet
ERP Features with AI
25 pages
Data Science Tools
No ratings yet
Data Science Tools
8 pages
EDA
No ratings yet
EDA
11 pages
Business Data Mining Week 8
No ratings yet
Business Data Mining Week 8
24 pages
ML Lab Manual
No ratings yet
ML Lab Manual
28 pages
Automated Agriculture Commodity Price Prediction System With Machine Learning Technique
No ratings yet
Automated Agriculture Commodity Price Prediction System With Machine Learning Technique
8 pages
Wjarr 2023 2432
No ratings yet
Wjarr 2023 2432
16 pages
A Survey of Sign Language Recognition
No ratings yet
A Survey of Sign Language Recognition
6 pages
COMP 2039 Artificial Intelligence Coursework Assignment 2011 Secondary Structure Prediction of Globular Proteins Using Neural Networks
No ratings yet
COMP 2039 Artificial Intelligence Coursework Assignment 2011 Secondary Structure Prediction of Globular Proteins Using Neural Networks
8 pages
Adversarial_Attack_and_Defense_Mechanisms_in_Medical_Imaging_A_Comprehensive_Review
No ratings yet
Adversarial_Attack_and_Defense_Mechanisms_in_Medical_Imaging_A_Comprehensive_Review
5 pages
Machine Learning in New
No ratings yet
Machine Learning in New
13 pages
Rainfall Prediction Using Machine Learnin1
No ratings yet
Rainfall Prediction Using Machine Learnin1
11 pages
PME Coding
No ratings yet
PME Coding
2 pages
Unit 3 - MLnotes-WPS Office
No ratings yet
Unit 3 - MLnotes-WPS Office
18 pages
doctorate-philosophy-electrical-computer-engineering
No ratings yet
doctorate-philosophy-electrical-computer-engineering
14 pages
Programa Tecnolgico de Monterrey
No ratings yet
Programa Tecnolgico de Monterrey
6 pages
Artificial Intelligence and Machine Learning Based Financial Risk Network Assessment Model
No ratings yet
Artificial Intelligence and Machine Learning Based Financial Risk Network Assessment Model
6 pages
Integrative Technologies for Real-Time Crowd Management a Case Study of the Hajj
No ratings yet
Integrative Technologies for Real-Time Crowd Management a Case Study of the Hajj
10 pages
Requirements Analysis For Ai Solutions Author Anton Olsson and Gustaf Joelsson
No ratings yet
Requirements Analysis For Ai Solutions Author Anton Olsson and Gustaf Joelsson
57 pages
Car Damage Final PDF
No ratings yet
Car Damage Final PDF
12 pages
Aapm TG 275
No ratings yet
Aapm TG 275
37 pages
Conceptual Framework of Hybrid Style in Fashion Image Datasets For Machine Learning
No ratings yet
Conceptual Framework of Hybrid Style in Fashion Image Datasets For Machine Learning
18 pages
DGA Botnet Detection Using Supervised Learning Methods-1
No ratings yet
DGA Botnet Detection Using Supervised Learning Methods-1
9 pages
How AI transforming finance system
No ratings yet
How AI transforming finance system
40 pages

deng11_interspeech

Uploaded by

deng11_interspeech

Uploaded by

INTERSPEECH 2011

Deep Convex Net: A Scalable Architecture for Speech Pattern Classification

Microsoft Research, Redmond, WA, USA

to train a DNN-HMM for speech recognizers with dozens to a

only in training scalability and CPU-only computation, but

Copyright © 2011 ISCA 2285 28- 31 August 2011, Florence, Italy

Fig.1: Block diagram showing two of many modules in a DCN

You might also like