Discriminative Approach For Sequence Labelling Through The Use of CRFs and RNNs

The objective of this project is to develop a Spoken Language Understanding (SLU) for the movie domain. It can be seen as a sequential labelling problem that assign a domain-related tag to each word in an utterance.

Uploaded by

Federico Marinelli

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views

Discriminative Approach For Sequence Labelling Through The Use of CRFs and RNNs

Uploaded by

Federico Marinelli

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

University of Trento - Language Understanding Systems Course

Final Project – Discriminative approach for sequence labelling through the use of CRFs
and RNNs

Federico Marinelli, 187419, [email protected]

Abstract embed additional features for sequence labelling.

Both of them have concepts that are represented
The extraction of flat concepts out of a through “BIO” encoding. The dataset used is the
given word sequence is usually one of same as that I used in the spring-project and the full
the first steps in building a spoken lan- data analysis is available in the previous report.
guage understanding (SLU) or dialogue
system. This project aims to compare the 3 Conditional Random Fields (CRFs)
performance of labelling a word se-
quence using initially Conditional Ran- 3.1 General idea
dom Fields (CRF) and then Recursive In what follows, X is a random variable over data
Neural Networks (RNN). Will be used sequences to be labelled, and Y is a random variable
different features and parameters in or- over corresponding label sequences. A conditional
der to improve the baseline performanc- random field [1] is defined by a dependency graph G
es, outlined on the spring- project of this and a set of features f to which are associated
course. weights λ. The conditional probability of an annota-
tion given an observation is given by:
1 Introduction
The objective of this project is to develop a Spoken
Language Understanding (SLU) for the movie do-
main. It can be seen as a sequential labelling prob-
lem that assign a domain-related tag to each word in
an utterance. During the spring-project of this course With the using of CRF it is possible to encode, into
I’ve modelled the the extraction of flat concepts out the model, a-priori concept relations and semantic
of a sentence using Stochastic Final State Transduc- features using the previous defined functions. Usual-
ers (SFSTs). In this final project I use discriminative ly these features are represented by binary function
approaches to model the concept labelling. In partic- (they return 1 is there is a match, 0 otherwise). The
ular, I evaluate algorithms based Conditional Ran- weights λ, associated to each features, are the pa-
dom Fields (CRF) and Recursive Neural Networks rameters of the model. Learning a CRF is to com-
(RNN). I compare them in terms of concept accura- pute the weights λ. Using CRFs it is possible to in-
cy (and F1 measure), generalization and robustness corporate together the best of generative and classi-
to annotation ambiguities. I also show how non-local fication models. In fact, they are able to work on
features (e.g. a-priori knowledge) can be modelled many statistically correlated features from the input
with CRF which is the best performing algorithm space and discriminate them during the training.
across tasks. Moreover, like generative models, they can trade off
decisions at different sequence positions. With the
2 Dataset using of conditional random fields, we can easy in-
tegrate domain knowledge as features, we can also
The given dataset is named NL-SPARQL and it model long dependences on the observation x and
concerns the movie domain. It is already split in take global decision on a sequence. However, even if
train and test files. In particular, there are two dif- they represent a cutting-edge methodology for se-
ferent instances of it: the first one is composed by a quence labelling, they are slow to train and do not
train-set and a test-set in token-per-line format, scale very well with large amounts of training data.
with tokens and concept tags for sequence label-
3.2 Implementation
ling; the second one is composed by a train-set and
a test-set with POS-tag and Lemmas in token-per- The tool used in order to implement CRFs is the fol-
line format. This second data-set will be used to lowing:

1
• CRF++: it is an open source and customizable find the template that I used to train the conditional
implementation of Conditional Random Fields random fields. The table 3.5 shows the results ob-
segmenting/labelling sequential data [2]. tained.

3.3 Methodology and results # Unigram #Bigram

U00:%x[-2,0] B00:%x[0,0]
Within the CRF++ tool it is possible to define a tem- U01:%x[-1,0] B01:%x[-1,0]
U02:%x[0,0] B02:%x[-2,0]
plate file in order to add and manage additional fea- U03:%x[1,0] B03:%x[1,0]
tures for the model. The template file has one feature U04:%x[2,0]
U05:%x[3,0]
template per line with the following syntax “%x U06:%x[0,2]
U07:%x[0,1]
[row, column]”. Both row and column are integer U08:%x[-1,2]
numbers, the first one is the row relative positions U09:%x[0,0]/%x[0,2]
U10:%x[0,0]/%x[1,0]
with regard to current token, the second corresponds U11:%x[-1,0]/%x[0,0]
to the absolute position of the column. In particular, U12:%x[-1,2]/%x[0,2]
U13:%x[-1,0]/%x[0,0]/%x[1,0]
there are two kinds of templates: unigram and bi- Table 3.4: Template used to add POS and Lemmas
gram; it is possible to define them using the letters U jointly as additional features
and B respectively in the template. CRF++ automat-
ically generates feature function using such macros.
In this section I report the most significant results
that I obtained using different features set for train- Table 3.5: Results obtained using POS and lemmas
ing the conditional random fields. jointly

Using POS as additional feature Suffixes and prefixes as additional feature

Here I’m using a part-of-speech tag relative to each I added suffixes and prefixes as additional features. The
word as additional feature. In the following table template used is the same as the previous subsection
(Table 3.1) it is possible to find the template that I (I’ve just specified in the template that with these addi-
used for training the conditional random fields and tional features there are more columns to take in con-
in the Table 3.2 I report the results obtained. sideration). I tried different combination and the best
performances were obtained using a suffix and a prefix
# Unigram #Bigram of length 3 for each word. The results are reported in
U00:%x[-2,0] B00:%x[-2,0] the Table 3.6.
U01:%x[-1,0] B01:%x[-1,0]
U02:%x[0,0] B02:%x[0,0]
U03:%x[1,0] B03:%x[1,0]
U04:%x[2,0] B04:%x[2,0]
U05:%x[3,0]
U06:%x[0,0]/%x[1,0] Table 3.6: Results obtained using POS, lemmas, suffix-
U07:%x[0,1]
es and prefixes (both of length 3)
Table 3.1: Template used to add POS and Lemmas
(separately) and additional features
4 Recursive Neural Networks (RNNs)
Recurrent neural networks (RNNs) are a class of
artificial neural network architecture that uses itera-
Table 3.2: Results obtained using POS as feature tive function loops to store information. They can
be used to boil a sequence down into a high-level
Using Lemmas as additional feature understanding, to annotate sequences, and even to
I used the lemma relative to each word as additional generate new sequences from scratch. RNNs have
feature during the training. The template is the same several properties that make them an interesting
as the previous one. The results are reported in the choice also for sequence labelling: they are flexible
table 3.3. in their use of context information (because they
can learn what to store and what to ignore); they
accept many different types and representations of
data; and they can recognise sequential patterns in
the presence of sequential distortions [4]. However,
Table 3.3: Results obtained using Lemmas as feature they also have several drawbacks, one of them is
that it is very difficult to get standard RNNs to
Using Lemma and POS jointly as feature store information for long periods of time, which is
In this subsection I report the results that I obtained of critical importance to sequence labelling. Anoth-
using Lemma and POS jointly as additional features. er issue with the standard RNN architecture is that
In the following table (Table 3.4) it is possible to

2
it can only access contextual information in one di- 4.2 Implementation
rection (typically the past).
The code for the implementation of RNNs was giv-
4.1 General idea en by the course teaching assistant and it’s based
on two paper of Grégoire Mesnil, Xiaodong He
A recurrent neural network could be seen as a Mul- [5][6].
ti layer perceptron (MLP) where we relax the con- Using this implementation, it is possible to train
dition whose connections do not form cycles, and and test RNNs of Jordan and Elman types. It im-
allow cyclical connection as well. An MLP can on- plements also word-embedding. There is a configu-
ly map from input to output vectors, whereas an ration file that permits us to play with the hyper-
RNN can map from the entire history of previous parameters, I report all of them in the following ta-
inputs to each output. RNNs are called recurrent ble. (Table 4.1).
because they perform the same task for every ele-
ment of a sequence, with the output being depend- Parameter Definition
ed on the previous computations. They are able to lr Learning rate value
memorize previous input in the network’s internal
state, and thereby influence the network output. win Number of words in the context
window
Here is what a typical RNN looks like:
bs Number of BPTT steps
nhidden Number hidden units
seed randomness
emb_dimension Dimension of word embedding
nepochs Maximum number of back-
propagation steps

Table 4.1: Hyper-parameters of the RNNs

Figure 4.1: A recurrent neural network and the unfold- 4.2 Methodology and results
ing in time of the computation involved in its forward
computation. Also for the RNNs I’ve used NL-SPARQL dataset.
I’ve took the 10% of the training-set shuffled (300
sentences) in order to create the validation test, that
The above diagram shows a RNN being un- is necessary during the training for the stopping-
rolled into a full network. By unrolling we simply criterion.
mean that we write out the network for the com-
plete sequence. Standard Recursive Neural Networks
• Xt is the input at time step t. In this section I report the results that I’ve obtained
• St is the hidden state at time step t. It’s the using the basic implementation of the RNNs given
“memory” of the network. by the course. I’ve tried different parameters set-
• Ot is the output at step t. tings and the best performance is reported in the
During the course we have seen two types of following table.
RNNS that use recurrent connection: the Elman
type, that feeds the activation of the hidden layer at
previous step with the input; and the Jordan type,
that feeds the activation of the output layer at pre- Table 4.2: Results obtained by using RNNs Elman type
vious time step with the input. For both of them,
during the training, the network is unrolled in time They are achieved using RNNs Elman type. The
backwards and backpropagation is applied, it’s hyper-parameters used are reported in the following
usually called Backpropagation Through Time table.
(BPTT). Words are fed into the neural networks by Parameter Value
using 1-of-n encoding. It’s possible, in order to lr 0.1
make model more accurate, to implement the word win 9
embedding, where the neural networks map the
bs 5
words (given in input) onto a d-dimensional con-
nhidden 100
tinuous space and it’s able to learn distributed rep-
resentation for these words (similar words are seed 38428
mapped to close position at the space). In our task emb_dimension 100
the output of the network is the probability for each nepochs 25
class (concept-tag) for the current word. Figure 4.2: Configuration that gave me the best per-
formances with the Elman type RNNs

3
LSTM Recursive Neural Networks 5 Evaluation and Conclusion
Take for instance that we are trying to predict the
last word in sentence “the clouds are in the sky”; During the spring-project and the final-project
we don’t need any further context, it is pretty ob- I’ve adopted discriminative models (CRFs and
vious the next word is going to be sky. In such RNNs) and generative models (SFSTs). While
cases, where the gap between the relevant infor- generative approaches models the joint distribu-
mation and the place that it’s needed is small, tion p(x, y), discriminative approaches focus sole-
RNNs can learn to use the past information. But ly on the posterior distribution p(y | x). The main
there are also cases where we need more context. difference is that discriminative models have only
Consider trying to predict the last word in the text to model the conditional distribution and they
“I grew up in Itay…I speak fluent Italy.” Recent completely disregard the prior distribution of the
information suggests that the next word is proba- training samples p(x), this gives to discriminative
bly the name of a language, but if we want to nar- models more freedom to fit the training data be-
row down which language, we need the context of
cause they have to tune only the parameters that
Italy. It’s entirely possible for the gap between the
maximize p(y|x). Generally, in a classification
relevant information and the point where it is
task, discriminative models work better than gen-
needed to become very large. Unfortunately, as
that gap grows, RNNs become unable to learn to erative ones, but many factor can influence them,
connect the information. Long Short Term like the size of the training data or the error pre-
Memory networks are a special kind of RNN, ca- sent in the training. In such cases generative mod-
pable of learning these long-term dependencies els have the advantage to use the prior probability
[7]. They were introduced by Hochreiter & in order to identify outlier sample and assign them
Schmidhuber in 1997. LSTMs are explicitly de- lower probability.
signed to combat vanishing gradients through a
gating mechanism. The next table reports all the best results obtained
during these 2 projects.
I found a simple implementation of them and I’ve
integrated the LSTM model into the code of
RNNs given by the course. (I’ve edited the Elman
model provided). In the following table it is possi-
ble to see the results that I’ve obtained.
Figure 5.1: Final comparison between all models.

As it is possible to see from the image above, CRFs

Table 4.4: Results obtained by using LSTMs provided the best classification performances in term
of F1 measure (83,54%) with the usage of different
The hyper-parameters used are reported in the fol- features (POS, lemma, prefixes, suffixes). The mod-
lowing table. el with the best accuracy is the LSTMs with 95.47%.
However, the distance between generative and dis-
Parameter Value criminative models is not so big.
lr 0.62714 I’ve found that CRFs are really good for the task of
win 9 this project, in fact we are able to incorporate a lot of
bs 10 additional features in order to improve the final per-
nhidden 100 formances. The RNNs and LSTMs is the worst
model in term of F1 measure, by the way I think that
seed 38428
with a larger dataset (more than 3338 sentences)
emb_dimension 100
they can perform better, potentially surpassing CRFs
nepochs 25
as shown in Bayer 2017 [8].
Table 4.5: Configuration that gave me the best perfor- In term of training time I can confirm what expected,
mances with the Elman type RNNs in fact generative models are very fast to train (2-3
minutes), instead discriminative models are slower,
The code of the LSTM model is available here:
this is due to the fact that they have much more pa-
https://github.com/feedmari/Language-Understanding-
rameters and computation to do (e.g. backpropaga-
System-Project/blob/master/final_project/rnn/rnn_slu/
rnn/lstm.py tion for the RNNs). To conclude, I think that with a
larger corpus discriminative models can further out-
perform the generative ones in term of performanc-
es.

4
Appendices
The project can be found on GitHub at the following
link: https://github.com/feedmari/LUS-Spring-Project

References
[1] Sutton C., An Introduction to Conditional Random
Fields, arxiv.org/pdf/1011.408.
[2] CRF++, Taku Kudo, dof: taku910.github.io/crfpp
[3]Raymond C. and Riccardi G., Discriminative and
Generative Algorithms for Spoken Language Under-
standing, Proc. Interspeech, Antwerp, 2007
[4] Alex Graves, Supervised Sequence Labelling with
Recurrent Neural Networks.
[5]Grégoire Mesnil, Xiaodong He, Li Deng and
Yoshua Bengio - Investigation of Recurrent Neural
Network Architectures and Learning Methods for
Spoken Language Understanding
[6]Grégoire Mesnil, Yann Dauphin, Kaisheng Yao,
Yoshua Bengio, Li Deng, Dilek Hakkani-Tur, Xiao-
dong He, Larry Heck, Gokhan Tur, Dong Yu and
Geoffrey Zweig - Using Recurrent Neural Networks
for Slot Filling in Spoken Language Understanding
[7]Christopher Olah, Undertanding LSTM Networks.
http://colah.github.io/posts/2015-08-Understanding-
LSTMs/
[8] Ali Orkan Bayer, 2017. Recurrent neural network
for language model and SLU.

Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
4/5 (6)
Unit 5
No ratings yet
Unit 5
61 pages
Computer-Controlled Systems: Theory and Design, Third Edition
From Everand
Computer-Controlled Systems: Theory and Design, Third Edition
Karl J Åström
3/5 (1)
Conditional Random Fields: Probabilistic Models For Segmenting and Labeling Sequence Data
No ratings yet
Conditional Random Fields: Probabilistic Models For Segmenting and Labeling Sequence Data
28 pages
(5)Conditional Random Fields (CRFs).pptx
No ratings yet
(5)Conditional Random Fields (CRFs).pptx
13 pages
Shallow Parsing With Conditional Random Fields
No ratings yet
Shallow Parsing With Conditional Random Fields
8 pages
CRF Klinger Tomanek
No ratings yet
CRF Klinger Tomanek
32 pages
2016-ACL-Combining Discrete and Neural Features For Sequence Labeling
No ratings yet
2016-ACL-Combining Discrete and Neural Features For Sequence Labeling
15 pages
Evaluating The Utility of Hand-Crafted Features in Sequence Labelling
No ratings yet
Evaluating The Utility of Hand-Crafted Features in Sequence Labelling
7 pages
Shallow Parsing With Conditional Random Fields
No ratings yet
Shallow Parsing With Conditional Random Fields
8 pages
What Is CRF?
No ratings yet
What Is CRF?
3 pages
Conditional Random Field Model (CRF)
No ratings yet
Conditional Random Field Model (CRF)
31 pages
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
Character-Level Convolutional Networks For Text Classification
No ratings yet
Character-Level Convolutional Networks For Text Classification
9 pages
Flexcrfs
No ratings yet
Flexcrfs
34 pages
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
NER Lab
No ratings yet
NER Lab
65 pages
P Final
No ratings yet
P Final
5 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
A Friendly Introduction to MATLAB Programming
From Everand
A Friendly Introduction to MATLAB Programming
Orhan Gazi
No ratings yet
Ian Talks Python A-Z
From Everand
Ian Talks Python A-Z
Ian Eress
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Semi-Markov Conditional Random Fields For Information Extraction
No ratings yet
Semi-Markov Conditional Random Fields For Information Extraction
8 pages
Sequence Labelling Using Stochastic Final State Transducers (SFSTS)
No ratings yet
Sequence Labelling Using Stochastic Final State Transducers (SFSTS)
5 pages
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
hlt2004
No ratings yet
hlt2004
8 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
More on C# in Front Office
From Everand
More on C# in Front Office
Xing Zhou
No ratings yet
Computer Algebra: Fundamentals and Applications
From Everand
Computer Algebra: Fundamentals and Applications
Fouad Sabry
No ratings yet
SLU_Deep Belief Network based Semantic Taggers for Spoken Language Understanding
No ratings yet
SLU_Deep Belief Network based Semantic Taggers for Spoken Language Understanding
5 pages
Symbolic Mathematics in Data Science. Algebra, Calculus, and Geometry with Matlab
From Everand
Symbolic Mathematics in Data Science. Algebra, Calculus, and Geometry with Matlab
César Pérez López
No ratings yet
Advanced Dynamic-System Simulation: Model Replication and Monte Carlo Studies
From Everand
Advanced Dynamic-System Simulation: Model Replication and Monte Carlo Studies
Granino A. Korn
No ratings yet
Crftut FNT PDF
No ratings yet
Crftut FNT PDF
109 pages
Design And Analysis Of Algorithm
From Everand
Design And Analysis Of Algorithm
Bhupendra Mandloi
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Word Syllabification With Linear-Chain Conditional Random Fields
No ratings yet
Word Syllabification With Linear-Chain Conditional Random Fields
14 pages
02 Unit 4
No ratings yet
02 Unit 4
10 pages
Multi-Tagging For Transition-Based Dependency Parsing
No ratings yet
Multi-Tagging For Transition-Based Dependency Parsing
10 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
CRF_Laura_Kallmeyer
No ratings yet
CRF_Laura_Kallmeyer
21 pages
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Graph Layout Support for Model-Driven Engineering
From Everand
Graph Layout Support for Model-Driven Engineering
Miro Spönemann
No ratings yet
Bidirectional LSTM-CRF For Named Entity Recognition
No ratings yet
Bidirectional LSTM-CRF For Named Entity Recognition
10 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
8 CRF
No ratings yet
8 CRF
12 pages
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Mini Project
No ratings yet
Mini Project
30 pages
An Introduction To Conditional Random Fields: Charles Sutton and Andrew Mccallum
No ratings yet
An Introduction To Conditional Random Fields: Charles Sutton and Andrew Mccallum
90 pages
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Assignment 3: Named Entity Recognition: Training Dataset
No ratings yet
Assignment 3: Named Entity Recognition: Training Dataset
4 pages
C++ Data Structures Explained: A Practical Guide with Examples
From Everand
C++ Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Recurrent Convolutional Neural Networks For Text Classification
No ratings yet
Recurrent Convolutional Neural Networks For Text Classification
7 pages
Module 3 - Paper 1 - Extracting Relations From Text From Word Sequences To Dependency Paths
No ratings yet
Module 3 - Paper 1 - Extracting Relations From Text From Word Sequences To Dependency Paths
11 pages
Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路（國際英文版）
From Everand
Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路（國際英文版）
Jyh-Horng Jeng
No ratings yet
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience, Methodology
From Everand
Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience, Methodology
Wiley
No ratings yet
Gene Expression Programming: Fundamentals and Applications
From Everand
Gene Expression Programming: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
5a. Recurrent Neural Networks
No ratings yet
5a. Recurrent Neural Networks
45 pages
Unit 4
No ratings yet
Unit 4
9 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
151 pages
DL Practical QP
No ratings yet
DL Practical QP
10 pages
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
No ratings yet
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
14 pages
REG NO. 18MIS7099 Machine Learning - Lab - 10 Name: Dana Vamsi Krishna
No ratings yet
REG NO. 18MIS7099 Machine Learning - Lab - 10 Name: Dana Vamsi Krishna
5 pages
Hedlin Novian Napitupulu Tugas3
No ratings yet
Hedlin Novian Napitupulu Tugas3
7 pages
A Step by Step Backpropagation Example - Matt Mazur
No ratings yet
A Step by Step Backpropagation Example - Matt Mazur
17 pages
The BACKPROPAGATION Algorithm: KD KD
No ratings yet
The BACKPROPAGATION Algorithm: KD KD
10 pages
05 ANN Artificial Neural Networks
No ratings yet
05 ANN Artificial Neural Networks
216 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
4 pages
Deep Learning For Cyber Security Intrusion Detection Approaches, Datasets, and Comparative Study PDF
No ratings yet
Deep Learning For Cyber Security Intrusion Detection Approaches, Datasets, and Comparative Study PDF
20 pages
DNN Cluster S2 22 MidSem Makeup
No ratings yet
DNN Cluster S2 22 MidSem Makeup
7 pages
History of Deep Learning
No ratings yet
History of Deep Learning
23 pages
Deep Learning Lab: Translated MLP - CNN
No ratings yet
Deep Learning Lab: Translated MLP - CNN
19 pages
Cs224n 2025 Lecture06 Fancy Rnn
No ratings yet
Cs224n 2025 Lecture06 Fancy Rnn
57 pages
Transfer Learning Alexnet.ipynb - Colaboratory
No ratings yet
Transfer Learning Alexnet.ipynb - Colaboratory
5 pages
9751 27984 1 PB
No ratings yet
9751 27984 1 PB
10 pages
Supervised Learning
No ratings yet
Supervised Learning
14 pages
LSTM
No ratings yet
LSTM
123 pages
1cO1CO2: A CO1CO1Co1
No ratings yet
1cO1CO2: A CO1CO1Co1
4 pages
Neural-Networks Back Propagation
No ratings yet
Neural-Networks Back Propagation
70 pages
Backpropagation in Neural Network - GeeksforGeeks
No ratings yet
Backpropagation in Neural Network - GeeksforGeeks
7 pages
Unit III
No ratings yet
Unit III
37 pages
Lecture 3 - MATLAB Representation of Neural Network
No ratings yet
Lecture 3 - MATLAB Representation of Neural Network
6 pages
TEAM MEMBERS Noopur Sharma Vartika Singh Vivashwat Thakur
No ratings yet
TEAM MEMBERS Noopur Sharma Vartika Singh Vivashwat Thakur
13 pages
RNN, LSTM, Gru
No ratings yet
RNN, LSTM, Gru
36 pages
An Efficient 3-Dimensional Node Localization Using Recurrent Neural Networks in Unmanned Aerial Vehicle-Assisted Wireless Networks
No ratings yet
An Efficient 3-Dimensional Node Localization Using Recurrent Neural Networks in Unmanned Aerial Vehicle-Assisted Wireless Networks
6 pages
Build Deep Learning NN Models
No ratings yet
Build Deep Learning NN Models
6 pages

Discriminative Approach For Sequence Labelling Through The Use of CRFs and RNNs

Uploaded by

Discriminative Approach For Sequence Labelling Through The Use of CRFs and RNNs

Uploaded by

University of Trento - Language Understanding Systems Course

Federico Marinelli, 187419, [email protected]

Abstract embed additional features for sequence labelling.

3.3 Methodology and results # Unigram #Bigram

Using POS as additional feature Suffixes and prefixes as additional feature

Table 4.1: Hyper-parameters of the RNNs

As it is possible to see from the image above, CRFs

You might also like