0% found this document useful (0 votes)
13 views

Writer-independent Feature Learning for Offline

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Writer-independent Feature Learning for Offline

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Accepted as a conference paper for IJCNN 2016

Writer-independent Feature Learning for Offline


Signature Verification using Deep Convolutional
Neural Networks
Luiz G. Hafemann, Robert Sabourin Luiz S. Oliveira
Lab. d’imagerie, de vision et d’intelligence artificielle Department of Informatics
École de technologie supérieure Federal University of Parana
Université du Québec, Montreal, Canada Curitiba, PR, Brazil
[email protected], [email protected] [email protected]

the signature. In this case, the signature is represented as a


Abstract—Automatic Offline Handwritten Signature Verifica- digital image.
tion has been researched over the last few decades from several Most of the research effort in this area has been devoted
perspectives, using insights from graphology, computer vision,
signal processing, among others. In spite of the advancements on to obtaining a good feature representation for signatures, that
the field, building classifiers that can separate between genuine is, designing good feature extractors. To this end, researchers
signatures and skilled forgeries (forgeries made targeting a have used insights from graphology, computer vision, signal
particular signature) is still hard. We propose approaching the processing, among other areas [3]. As with several problems
problem from a feature learning perspective. Our hypothesis is in computer vision, it is often hard to design good feature
that, in the absence of a good model of the data generation
process, it is better to learn the features from data, instead extractors, and the choice of which feature descriptors to use
of using hand-crafted features that have no resemblance to is problem-dependent. Ideally, the features should reflect the
the signature generation process. To this end, we use Deep process used to generate the data - for instance, neuromotor
Convolutional Neural Networks to learn features in a writer- models of the hand movement. Although this approach has
independent format, and use this model to obtain a feature been explored in the context of online signature verification
representation on another set of users, where we train writer-
dependent classifiers. We tested our method in two datasets: [4], there is not a widely accepted “best” way to model the
GPDS-960 and Brazilian PUC-PR. Our experimental results problem, specially for Offline (static) signature verification,
show that the features learned in a subset of the users are where the dynamic information of the signature generation
discriminative for the other users, including across different process is not available.
datasets, reaching close to the state-of-the-art in the GPDS In spite of the advancements in the field, systems proposed
dataset, and improving the state-of-the-art in the Brazilian PUC-
PR dataset. in the literature still struggle to distinguish genuine signatures
and skilled forgeries. These are forgeries made by a person
I. I NTRODUCTION with access to a user’s signature, that practices imitating it
Biometrics technology is used in a wide variety of security (see Figure 1). Experimental results show somewhat large
applications. The aim of such systems is to recognize a person error rates when testing on public datasets (such as GPDS
based on physiological traits (e.g fingerprint, iris) or behavioral [5]), even when the number of samples for training is around
traits (e.g. voice, handwritten signature) [1]. The handwritten 10-15 (results are worse with 1-3 samples per user, which is
signature is a particularly important type of biometric trait, a common scenario in banks and other institutions).
mostly due to its widespread use to verify a person’s identity In this work we propose using feature learning (also called
in legal, financial and administrative areas. One of the reasons representation learning) for the problem of Offline Signature
for its extensive use is that the process to collect handwritten Verification, in order to obtain better feature representations.
signatures is non-invasive, and people are familiar with their Our hypothesis is that, in the absence of a good model of
use in daily life [2]. the data generation process, it is better to learn the features
Research in signature verification is divided between online from data, rather than using hand-crafted features that have no
(dynamic) and offline (static) scenarios. In the online case, the resemblance to how the signatures are created, which is the
signature is captured using a special input device (such as a case for the best performing systems proposed in the literature.
tablet), and the dynamic information of the signature process For example, recent Offline Signature Verification systems are
is captured (pen’s position, inclination, among others). In this based on texture descriptors, such as Local Binary Patterns
work, we focus on the Offline (static) signature verification [6], interest-point-matching such as SURF [7], among others.
problem, where the signature is acquired after the writing We base our research on recent successful applications of
process is completed, by scanning the document containing purely supervised learning models for computer vision (such
in one dataset transfer to another dataset, and the impact
in performance of the number of samples available for WD
training.
II. R ELATED W ORK
Feature learning methods have not yet been broadly re-
searched for the task of offline signature verification. Murshed
et al. [10], [11], used autoencoders (called Identity-Mapping
Backpropagation in their work) to perform dimensionality
reduction followed by a Fuzzy ARTMAP classifier. This
work, however, considered only a single hidden layer, with
less units than the input. In contrast, in recent successful
applications of autoencoders, multiple layers of representations
Figure 1. Samples from the GPDS-960 dataset. Each row contains three are learned, often in an over-complete format (more hidden
genuine signatures from the same user and a skilled forgery. We notice that units than visible units), where the idea is not to reduce
each genuine signature is different (showing high intra-class variability), while
skilled forgeries resemble the genuine signatures to a large extent (showing
dimensionality, but “disentangle” the factors of variation in the
low inter-class variability) inputs [12]. Ribeiro et al. [13] used unsupervised learning for
learning representations, in particular, Restricted Boltzmann
Machines (RBMs). In this work, the authors tested with a
as image recognition [8]). In particular, we use Deep Convolu- small subset of users (10 users), and only reported a visual
tional Neural Networks (CNN) trained with a supervised cri- representation of the learned weights, and not the results of
terion, in order to learn good representations for the signature using such features to discriminate between genuine signatures
verification problem. This type of architecture is interesting for and forgeries. Khalajzadeh [14] used Convolutional Neural
our problem, since it scales better than fully connected models Networks (CNNs) for Persian signature verification, but did
for larger input sizes, having a smaller number of trainable not considered skilled forgeries.
parameters. This is a desirable property for the problem at A similar strategy to our work has been used by Sun
hand, since we cannot rescale signature images too much et al. [15] for the task of face verification. They trained
without risking losing the details that enable discriminating CNNs on a large dataset of faces and used these networks
between skilled forgeries and genuine signatures. to extract features on another face dataset. In their work, the
The most common formulation of the signature verification verification process consisted in distinguishing between faces
problem is called Writer-Dependent classification. In this from different users. In signature verification, distinguishing
formulation, one classifier is built for each user in the system. between different writers is one of the objectives (when we
Using a supervised feature learning approach directly in this consider “Random Forgeries”), but the main challenge is to
case is not practical, since the number of samples per user distinguish between genuine signatures and skilled forgeries.
is very small (usually around 1-14 samples). Instead, we In this work we evaluate the method for both types of forgery.
propose a two-phase approach: a Writer-Independent feature The framework we propose is also similar to previous work
learning phase followed by Writer-Dependent classification. by Eskander et al. [16]. In this work, a Writer-Independent
The feature learning phase uses a surrogate classification task set is used for feature selection, and a Writer-Dependent set
for learning feature representations, where we train a CNN to is used for training and evaluation. However, in this work
discriminate between signatures from users not enrolled in the the authors used hand-crafted feature extractors, while in the
system. We then use this CNN as a feature extractor and train present work we use the Writer-Independent set for feature
a Writer-dependent classifier for each user. Note that in this learning, instead of feature selection.
formulation, adding a new user to the system requires training
only a Writer-Dependent classifier. III. P ROPOSED M ETHOD
We tested this method using two datasets: the GPDS-960 We propose a two-stage approach, considering a writer-
corpus ([5]) and the Brazilian PUC-PR dataset [9]. The first independent feature learning phase, followed by writer-
is the largest publicly available corpus for offline signature dependent classification. We start by partitioning the dataset
verification, while the second is a smaller dataset that has been into two distinct sets: Development set D and Exploitation
used for several studies in the area. set E. The set D is used to learn the feature representation
Our main contributions are the following: We propose a for signatures. We consider this as a separate dataset from
two-stage framework for offline signature verification, where the enrolled users. The exploitation set E considers the users
we learn features in a Writer-Independent way, and build enrolled to the system. This set is used to train Writer-
Writer-Dependent classifiers. Our results show that we do have Dependent classifiers (using only genuine signatures) and for
enough data in signature datasets to learn relevant features evaluating the performance of the system.
for the task, and the proposed method achieves state-of-the- The proposed system is illustrated in Figure 2. We first use
art performance. We also investigate how the features learned the set D to learn the feature representations, by training a
Writer-Independent Feature learning

CNN
training

Development dataset ( ) CNN model ( )

Writer-Dependent training

Feature Extraction WD Classifier


Signature Extracted
Images ( ) features ( )

Training set (from ) Binary classifier ( )

Generalization
Decision
(Accept/Reject)
Feature Extraction Verification

Signature Extracted
Image ( ) features ( )
New Sample (from )

Figure 2. The proposed architecture for writer-independent feature learning and writer-dependent classification.

Convolutional Neural Network (detailed in the next section). The second approach consisted in first normalizing the
The result is a function φ(.), learned from data, that projects images to the largest image size, by padding the images with
the input images X to another feature space: φ(X) ∈ Rm , white background. In this case, we centered the signatures in
where m is the dimensionality of the projected feature space. a canvas of size 840 x 1360 pixels, aligning the center of mass
Our expectation is that the features learned using D will be of the signature to the center of the image, similar to previous
useful to separate genuine signatures and forgeries from other approaches in the literature, e.g. [17]. We then rescaled the
users. After the CNN is trained, we create a training dataset images to the desired input size of the neural network.
for each user in set E, using a subset of the user’s genuine With the first approach, less fine-grained information is
signatures, and random forgeries. We use the CNN as a feature lost during the rescaling, specially for the users that have
extractor, obtaining a feature vector φ(X) for each signature small signatures. On the other hand, the width of the pen
X in the user’s dataset. This new representation is then used to strokes becomes inconsistent: for the smaller signatures the
train a binary classifier f . For a new sample Xnew , we first use pen strokes become much thicker than the pen strokes from
the CNN to “extract features” (i.e. obtain the feature vector the larger signatures.
φ(Xnew )) and feed the feature vector to the binary classifier, Besides resizing the images to a standard size, we also
obtaining a final decision f (φ(Xnew )). The next sections detail performed the following pre-processing steps:
the WI and WD training procedures. • Removed the background: we used OTSU’s algorithm
[18] to find the optimum threshold between foreground
A. Pre-processing
and background pixel intensities. Pixels with intensity
For all signatures from both datasets (D and E), we apply larger than the threshold were set to white (intensity
the same pre-processing strategy. The signatures from the 255). The signature pixels (with intensity less than the
GPDS dataset have a variable size, ranging from 153 x 258 threshold) remain unchanged in this step.
pixels to 819 x 1137 pixels. Since for training a neural network • Inverted the images: we inverted the images so that
we need the inputs to have all the same size, we need to the white background corresponded to pixel intensity
normalize the signature images. We evaluated two approaches: 0. That is, each pixel of the image is calculated as:
In the simplest approach, we resized the images to a Iinverted (i, j) ← 255 − I(i, j).
fixed size, using bi-linear interpolation. We perform rescaling • Normalized the input: we normalized the input to the
without deformations, that is, when the original image had a neural network by dividing each pixel by the standard
different width-to-height ratio, we cropped the excess in the deviation of all pixel intensities (from all images in
larger dimension. D). We do not normalize the data to have mean 0
(another common pre-processing step) since we want the Table I
background pixels to be zero-valued. S UMMARY OF THE CNN LAYERS

Layer Size Other Parameters


B. Writer-Independent feature learning Convolution 96x11x11 stride = 4, pad=0
α = 10−4 , β = 0.75,
For learning the representation for signatures, we used Local Response Norm. -
k = 2, n = 5
Deep Convolutional Neural Networks. We note that modeling Pooling 96x3x3 stride = 2
directly the problem of interest is not feasible in practice: our Convolution 256x5x5 stride = 1, pad=2
α = 10−4 , β = 0.75,
ultimate goal is to separate genuine signatures from skilled Local Response Norm. -
k = 2, n = 5
forgeries of the users enrolled in the system, but in a realistic Pooling 256x3x3 stride = 2
scenario we only have genuine signatures provided during Convolution 384x3x3 stride = 1, pad=1
Convolution 256x3x3 stride = 1, pad=1
an enrollment phase, and do not have forgeries for these Pooling 256x3x3 stride = 2
users. Therefore, we need to consider a surrogate classification Fully Connected + Dropout 4096 p = 0.5
objective. In this work, we use a separate set of users (the Fully Connected + Softmax N
development set D) to learn the features, by learning a
classification task, considering each user in D as a different Table II
class. The objective function is to minimize a cross-entropy T RAINING H YPERPARAMETERS
classification loss. The expectation is that by learning to
Parameter value
distinguish between signatures from different users in this
dataset, the network will learn features that are relevant for Initial Learning Rate (LR) 0.01
Learning Rate schedule LR ← LR ∗ 0.1 (every 20 epochs)
our problem of interest - separating genuine signatures and Weight Decay 0.0005
forgeries from the exploitation set E. Momentum 0.9
We used a CNN architecture similar to the one defined by Batch size 100
Krizhevsky et al. [8] for an image recognition problem. Initial
tests showed that the capacity of this network seems to be
too large for the problem at hand, particularly considering the C. Writer-dependent classification
fully connected layers (that contain most of the weights in the After the CNN is trained in the set D, we use it to extract
network). We obtained better results with 2 fully-connected features for the Writer-Dependent training. Similar to previous
layers after the convolutions, instead of three layers from the work in transfer learning [23], [24], we use the representation
original model. For the purpose of replicating our experiment, obtained by performing forward propagation of an input image
we provide a full list of the parameters used in our tests. Table until the last layer before softmax. In the notation defined
I lists the definition of the CNN layers. For convolution and above, we consider our feature extractor function φ(X) to be
pooling layers, we list the size as N xHxW where N is the the representation of the network at the last layer before soft-
number of filters, H is the height and W is the width of the max, after forward propagating the input X. As noted in Table
convolution and pooling windows, respectively. Stride refers I, this representation has 4096 dimensions (φ(X) ∈ R4096 ).
to the distance between applications of the convolution (or The hypothesis is that the features learned for the set D, during
pooling) operation, and pad refers to add padding (borders) the CNN training, will be relevant for signatures for other users
to the input, with value 0. Local Response normalization is (from the exploitation set).
applied according to [8], with the parameters listed in the table. For training the Writer-Dependent classifiers, no skilled
For the first two fully-connected layers we use dropout [19], forgeries are used during training or validation, to simulate
with rate 0.5. We use Rectified Linear Units (ReLUs) as the the scenario for a real application. Following previous work on
activation function for all convolutional and fully-connected Writer-Dependent classification, we create a dataset for each
layers, except the last one. The last layer uses a softmax user, consisting of genuine signatures and random forgeries
activation and has N neurons, where N is the number of users (using signatures from other users, from D).
in the set D, indicating the probability of the sample belonging For each user in E, we build a Writer-Dependent training
to each of the users. and testing set. The training set is composed of a subset of
We initialize the weights of the model according to the genuine signatures for the user (as the positive examples),
work of Glorot and Bengio [20], and the biases to 0. We as well as genuine signatures from other users from the
trained the model with Nesterov Momentum for 60 epochs, development dataset (as the negative examples). The testing
using momentum rate of 0.9, and mini-batches of size 100. set consists of genuine signatures from the user (not used for
We started with a learning rate of 0.01, and divided it by training), and the skilled forgeries made for the user. With this
10 twice (after 20 epochs, and after 40 epochs). We used L2 dataset, we first use the CNN to extract the features for each
regularization with a weight decay factor of 0.0005. These signature image (that is, compute φ(X) for each signature X).
values are consolidated in Table II. The networks were trained We then train a standard two-class classifier f for each user.
using the libraries Theano [21] and Lasagne [22], and took For the WD classification, we test both linear SVMs and
around 5h to train on a GPU Tesla C2050. SVMs with the RBF kernel [25]. For the linear SVM, we
datasets, and evaluate if we can obtain a better performance
Training on the brazilian set (which is smaller) by leveraging data from
a larger dataset (GPDS).
Samples

For the Writer-Dependent training, we have slightly differ-


ent protocols for GPDS and the Brazilian dataset, to corre-
Testing

spond to protocols used in other work on these datasets. For


GPDS, we selected up to 14 genuine signatures as positive
samples (from E), and 14 genuine signatures from each user
Users 1 - 160 (160) or Users 161 - 881 (721) or
Users 1 - 300 (300) Users 301 - 881 (581) in the set D as negative samples. For testing, we selected
Users
10 genuine signatures from the user, ensuring they were not
Figure 3. The separation of the GPDS-960 dataset in Development set D and used for training, and all the 30 skilled forgeries. For the
Exploitation set E. Brazilian dataset, we selected up to 30 genuine samples as
positive samples (from E), and 30 genuine samples from the
users in set D as negative samples. For testing, we selected
used the hyperparameter C = 1, while for the SVM with 10 genuine signatures from the user, 10 signatures from other
RBF kernel we optimize the parameters C and γ with a users in E (i.e. not used for training) as random forgeries, and
subset of users from the set D using a grid search. We select all 10 simple forgeries and 10 skilled forgeries available for
the hyperparameters that best classify genuine and skilled each user.
forgeries from these users. To evaluate the impact of different number of sample
During generalization, for a new signature XNEW , we first signatures per user, we trained the WD classifiers using a
use the CNN to obtain the representation of the signature (i.e. variable number of signatures from the enrolled users. This
calculate φ(XNEW ), and then feed this representation to the set-up is summarized in table III.
classifier to obtain a final decision on the sample f (φ(XNEW )). For optimizing the hyperparameters for the SVM training
(for the WD classifiers), we performed a grid search on the
IV. E XPERIMENTAL P ROTOCOL
parameters C and γ. We used 10 users from D, building
Feature learning for complex tasks has shown to work better WD classifiers with the same protocol as above. We selected
with large datasets. The largest publicly available signature the hyperparameters that performed best in separating gen-
dataset is GPDS-960 [5], and therefore it is particularly uine signatures and skilled forgeries for these 10 users, by
suitable for our proposed method. This dataset contains 24 measuring the classification error of each classifier. Before
genuine signatures and 30 forgeries per user, from 881 users, training the SVM models, we rescale the inputs to have a unit
which were captured in a single session [5]. We also tested standard deviation (in each dimension). This slightly improved
with a smaller dataset, that also has been extensively used for performance and significantly decreased the SVM training
offline signature verification: the Brazilian PUC-PR dataset time. Similar to [16], in order to have a balanced dataset for
[9]. This dataset contains signatures from 168 users, and training, we duplicated the genuine examples in the training
forgeries for the first 60 users. set to match the same number of random forgeries (equivalent
The first step is to split the datasets into development set D to having different C for the positive and negative classes).
and exploitation set E. For GPDS, in order to allow comparison In this work we conducted experiments with two datasets,
with previous work, we tested with the set E consisting of the and authors from different studies have reported different
first 160 users, and the first 300 users (which were previously metrics. For GPDS, some authors report two metrics: False
published as GPDS-160 and GPDS-300). Figure 3 shows how Rejection Rate (FRR) and False Acceptance Rate for skilled
the dataset is split. The remaining users are used for the writer- forgeries (FARskilled ). The first metric is the fraction of genuine
independent feature learning phase. For the brazilian set, we signatures that were classified as forgery, while the second is
consider the first 60 users for the set E, and the remaining 108 the fraction of skilled forgeries that were classified as genuine
users are used as the set D. signatures. Other authors report simply the Equal Error Rate,
After splitting the dataset into sets D and E, we preprocess which is the point in a ROC curve where FAR and FRR are
the signature images to a standard size of 155 x 220 pixels, equal. For the results on GPDS, we report these three metrics,
considering the two preprocessing options listed in the previ- and also the mean of the Area Under the Curve (AUC) - that
ous section. This size was chosen to be large enough to keep is, we build a ROC curve for each user, and report the average
details from the pen strokes in the signatures, while still small of the AUC. For calculating the EER, we considered the ROC
enough to enable training on the GPU. We use the set D to curves created for each user (thresholds specific for each user).
train a CNN, that learns to classify input signatures to the For the Brazilian PUC-PR dataset, authors commonly re-
different users in this set. port FRR and FAR for three types of forgeries: Random,
To assess if the learned features generalize to other datasets, Simple and Skilled. Authors also report an average error rate
we used the CNN trained in the GPDS dataset for extracting (AER) which is the average of the four types of error (FRR,
features for the brazilian dataset. This experiment serves two FARrandom , FARsimple , FARskilled ). To allow comparison with
purposes: analyze if the learned features generalize to other the results on GPDS, we also report metrics considering only
Table III
T RAINING AND TESTING SET- UP

Dataset Training Set Testing Set

genuine forgeries (random) genuine forgeries

Brazilian (PUC-PR) 1,5,10,15,30 samples 108 x 30 = 3240 samples 10 samples 10 random, 10 simple, 10 skilled
GPDS-160 4, 8, 12, 14 samples 721 x 14 = 10094 samples 10 samples 30 skilled
GPDS-300 4, 8, 12, 14 samples 581 x 14 = 8134 samples 10 samples 30 skilled

Table IV
C LASSIFICATION ERRORS ON GPDS (%) AND MEAN AUC

Dataset Features Classifier FRR FAR EER Mean AUC


GPDS-160 CNN_GPDS SVM (Linear) 26.62 9.65 14.35 0.9153
GPDS-160 CNN_GPDS SVM (RBF) 37.25 3.66 14.64 0.9097
GPDS-160 CNN_GPDSnorm SVM (Linear) 11.12 16.77 11.32 0.9381
GPDS-160 CNN_GPDSnorm SVM (RBF) 19.81 5.99 10.70 0.9459
GPDS-300 CNN_GPDS SVM (Linear) 25.43 12.80 16.40 0.8968
GPDS-300 CNN_GPDS SVM (RBF) 36.27 5.00 16.22 0.9014
GPDS-300 CNN_GPDSnorm SVM (Linear) 11.93 25.58 16.07 0.8957
GPDS-300 CNN_GPDSnorm SVM (RBF) 20.60 9.08 12.83 0.9257

FRR and FARskilled : AERgenuine + skilled , EERgenuine + skilled and


Mean AUCgenuine + skilled .
Figure 4. Performance on the GPDS-160 dataset varying the number of
samples per user for WD training. The error bars show the smallest and
V. R ESULTS AND D ISCUSSION largest AUC of users in the exploitation dataset.
We first report the results of the search for the best
hyperparameters for the SVM with RBF kernel used for
Writer-Dependent classification. After training classifiers for performance with the CNN trained on GPDS to be better,
10 users in the development set, we noticed that the best since the development set for the Brazilian dataset is much
hyperparameters were the same for most users (8/10 users): smaller (108 users in the Brazilian dataset vs. 721 users for
γ = 2−12 , C = 1. For the other two users, this was the second GPDS-160), and therefore there is much more data on GPDS
best configuration for the parameters. Therefore we used these to learn a good feature representation.
hyperparameters for the subsequent experiments. We evaluated the performance of the system considering
Table IV presents the results of our experiments with the different number of samples per user in the exploitation set.
GPDS dataset. The column “Features” list the method we For these tests, we used the configuration that performed best
used to extract features - in our work this column lists the on the tests above: using the normalized GPDS development
CNN trained in the set D. We considered both alternatives set to learn the features, and using an SVM with RBF kernel
defined in the Pre-processing section - simply resizing the for training the WD classifiers. Figures 4 and 5 present the evo-
signatures images (CNN_GPDS), and first normalizing the lution of the AUC and the Equal Error Rate for the GPDS and
signatures in a canvas with a standard size, before resizing Brazilian datasets. We notice that even with a small number
them (CNN_GPDSnorm ). We notice that this normalization was of samples the performance is reasonable, achieving 15.05%
essential to obtain good classification results on this dataset, EER with 4 signatures in the GPDS dataset, and 9.83% EER
with a boost in performance from 14.64% of EER to 10.70% with 5 signatures on the Brazilian dataset. However, we notice
in the GPDS-160 dataset. We also noticed the best results were that in the extreme case, when a single signature is available,
achieved with the SVM trained with an RBF kernel. Lastly, the performance of the entire system is much worse (around
we noted a drop in performance between the experiments with 17% EER), and some users have very poor performance (for
GPDS-160 and GPDS-300. This can be partially explained by one user, AUC is below 0.5).
the fact that we use more data on the set D for GPDS-160. We compare our results with the state-of-the art in tables
Table V shows the results of our tests with the Brazilian VI and VII. For GPDS, the method achieves state-of-the-art
PUC-PR dataset. We noticed the same characteristics as with performance in terms of Equal Error Rate, when comparing
the GPDS test, with improved results with the non-linear RBF with systems that used a single feature extractor. However, the
kernel for the classifier. In this dataset we tested with both a performance is worse compared to systems where multiple
CNN trained on the brazilian dataset, as well as the CNN feature extractors / classifiers are used. Future work can
trained above for the GPDS dataset. The results were similar, be done in analyzing if the features learned from data are
suggesting that the features learned in one dataset generalize complementary to hand-crafted features.
well to other datasets. On the other hand, we expected the For the Brazilian PUC-PR dataset, authors use other metrics
Table V
C LASSIFICATION ERRORS ON THE B RAZILIAN PUC-PR DATASET (%) AND MEAN AUC

Features Classifier FRR FARrandom FARsimple FARskilled EERgenuine + skilled Mean AUCgenuine + skilled
CNN_Brazilian SVM (Linear) 1.00 0.00 1.67 27.17 7.33 0.9668
CNN_Brazilian SVM (RBF) 2.83 0.17 0.17 14.17 4.17 0.9837
CNN_GPDS SVM (Linear) 1.83 0.00 1.33 27.83 11.50 0.9413
CNN_GPDS SVM (RBF) 6.50 0.17 1.17 15.17 8.50 0.9601
CNN_GPDSnorm SVM (Linear) 0.17 0.00 1.67 29.00 6.67 0.9653
CNN_GPDSnorm SVM (RBF) 2.17 0.17 0.50 13.00 4.17 0.9800

Table VI
C OMPARISON WITH THE STATE - OF - THE - ART ON THE B RAZILIAN PUC-PR DATASET ( ERRORS IN %)

Reference Features Classifier FRR FAR_random FAR_simple FAR_skilled AER AERgenuine + skilled EERgenuine + skilled
Bertolini et al. [26] Graphometric SVM (RBF) 10.16 3.16 2.8 6.48 5.65 8.32 -
Batista et al. [27] Pixel density HMM + SVM 7.5 0.33 0.5 13.5 5.46 10.5 -
Rivard et al. [28] ESC + DPDF Adaboost 11 0 0.19 11.15 5.59 11.08 -
Eskander et al. [16] ESC + DPDF Adaboost 7.83 0.02 0.17 13.5 5.38 10.67 -
Present Work CNN_GPDSnorm SVM (RBF) 2.17 0.17 0.50 13.00 3.96 7.59 4.17

with low FAR, while for the Brazilian dataset we obtained the
opposite. This suggests that a global threshold is not sufficient,
and user-specific thresholds should be considered. Better user-
specific thresholds will be explored in future work.
It is worth noting that in the present work we trained the
WD classifiers with a combination of genuine signatures and
random forgeries. This considers a hypothesis that separating
random forgeries from genuine signatures will also make the
classifier separate genuine signatures from skilled forgeries.
This is a weak hypothesis, as we expect the skilled forgeries to
have much more resemblance to the genuine signatures, where
random forgeries should be quite different. However, given
Figure 5. Performance on the Brazilian PUC-PR dataset varying the number that we only have genuine signatures available for training, this
of samples per user for WD training. The error bars show the smallest and
largest AUC of users in the exploitation dataset. is a reasonable option, and has been used extensively in the
literature for Writer-Dependent classification. An alternative is
Table VII to use one-class classification to model only the distribution
C OMPARISON WITH STATE - OF - THE ART ON GPDS-160 ( ERRORS IN %) of the genuine signatures (e.g. [31]), which can be explored
as future work.
Reference Features Classifier FRR FAR EER
We would like to point out that, although the EER metric
Hu and Chen [29] LBP, GLCM, HOG Adaboost - - 7.66
Yilmaz [30] LBP SVM (RBF) - - 9.64 (Equal Error Rate) is useful to have a single number to
Yilmaz [30] LBP, HOG Ensemble of SVMs - - 6.97
Guerbai et al. [31] Curvelet transform OC-SVM 12.5 19.4 -
compare different systems, it relies on implicitly selecting
Present work CNN_GPDSnorm SVM (RBF) 19.81 5.99 10.70 the decision thresholds using information from the test set.
Therefore, it considers the error rate that can be achieved
with the optimal decision threshold for each user. In a real
to compare - the False Acceptance rates for different types of application, the decision thresholds can only be defined using
forgery and the Average Error Rate among all types of error. data from the enrolled users (i.e. using only genuine signature
Besides using these metrics, we also compare with an average from the training/validation set), or in a writer-independent
error rate considering only genuine signatures and skilled way (a single global threshold). Therefore, besides reporting
forgeries, which is more comparable to the results on GPDS. EER, we consider beneficial to also report FAR and FRR,
In this dataset, the proposed method achieves state-of-the-art stating the procedure used to select the thresholds.
performance. The large gap between AERgenuine + skilled and Lastly, we would like to point out that the WD training
EERgenuine + skilled also shows that optimization of user-specific datasets are significantly imbalanced. We have only a few pos-
decision thresholds is necessary to obtain a good system: in itive samples(1-30), and a large amount of random forgeries
the present work the decision thresholds were kept as default (up to 10 thousand for GPDS-160). Methods betters suited for
(scores larger than 0 were considered forgeries). We notice such scenario can also be explored in future work to improve
that, for GPDS, this default threshold achieved a large FRR, the performance of the system.
VI. C ONCLUSION
[12] Y. Bengio, “Learning Deep Architectures for AI,” Foundations and
We presented a two-stage framework for offline signature Trends in Machine Learning, vol. 2, no. 1, pp. 1–127, Jan. 2009.
verification, based on writer-independent feature learning and [13] B. Ribeiro, I. Gonçalves, S. Santos, and A. Kovacec, “Deep learning
networks for off-line handwritten signature recognition,” in Progress in
writer-dependent classification. This method do not rely on Pattern Recognition, Image Analysis, Computer Vision, and Applica-
hand-crafted features, but instead learn them from data in an tions. Springer, 2011, pp. 523–532.
writer-independent format. Experiments conducted on GPDS [14] H. Khalajzadeh, M. Mansouri, and M. Teshnehlab, “Persian Signature
and the Brazilian PUC-PR datasets demonstrate that this Verification using Convolutional Neural Networks,” in International
Journal of Engineering Research and Technology, vol. 1. ESRSA
method is promising, achieving performance close to the Publications, 2012.
state-of-the-art for GPDS and surpassing the state-of-the-art [15] Y. Sun, Y. Chen, X. Wang, and X. Tang, “Deep Learning Face Rep-
performance in the Brazilian PUC-PR dataset. We have shown resentation by Joint Identification-Verification,” in Advances in Neural
Information Processing Systems, 2014, pp. 1988–1996.
that the features seem to generalize well, by learning the
[16] G. Eskander, R. Sabourin, and E. Granger, “Hybrid writer-independent-
features in the GPDS dataset and achieving good results on writer-dependent offline signature verification system,” IET Biometrics,
the Brazilian PUC-PR dataset. Results with small number of vol. 2, no. 4, pp. 169–181, Dec. 2013.
samples per user also demonstrated that this method can be [17] M. R. Pourshahabi, M. H. Sigari, and H. R. Pourreza, “Offline handwrit-
ten signature identification and verification using contourlet transform,”
effective even with few samples per user (4-5 samples). in Soft Computing and Pattern Recognition, International Conference
Lastly, we note that although these methods achieve low of. IEEE, 2009, pp. 670–673.
Equal Error Rates, the actual False Rejection and False Accep- [18] N. Otsu, “A threshold selection method from gray-level histograms,”
tance rates are very imbalanced, and not stable across multiple Automatica, vol. 11, no. 285-296, pp. 23–27, 1975.
[19] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R.
users and datasets. This highlights the importance of a good Salakhutdinov, “Improving neural networks by preventing co-adaptation
method for defining user-specific thresholds, which we intend of feature detectors,” arXiv e-print 1207.0580, Jul. 2012.
to explore in future work. [20] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep
feedforward neural networks,” in Artificial Intelligence and Statistics,
ACKNOWLEDGMENT International conference on, 2010, pp. 249–256.
[21] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Des-
This research has been supported by the CNPq grant jardins, J. Turian, D. Warde-Farley, and Y. Bengio, “Theano: a CPU
#206318/2014-6. and GPU math expression compiler,” in Proceedings of the Python for
scientific computing conference (SciPy), vol. 4. Austin, TX, 2010, p. 3.
R EFERENCES [22] S. Dieleman, J. Schlüter, C. Raffel, E. Olson, S. K. Sønderby, D. Nouri,
D. Maturana, M. Thoma, E. Battenberg, J. Kelly, J. D. Fauw, M. Heil-
[1] A. K. Jain, A. Ross, and S. Prabhakar, “An introduction to biometric
man, diogo149, B. McFee, H. Weideman, takacsg84, peterderivaz, Jon,
recognition,” Circuits and Systems for Video Technology, IEEE Trans-
instagibbs, D. K. Rasul, CongLiu, Britefury, and J. Degrave, “Lasagne:
actions on, vol. 14, no. 1, pp. 4–20, 2004.
First release.” Aug. 2015.
[2] R. Plamondon and S. N. Srihari, “Online and off-line handwriting
recognition: a comprehensive survey,” Pattern Analysis and Machine [23] M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Learning and Transfer-
Intelligence, IEEE Transactions on, vol. 22, no. 1, pp. 63–84, 2000. ring Mid-level Image Representations Using Convolutional Neural Net-
[3] L. G. Hafemann, R. Sabourin, and L. S. Oliveira, “Offline Hand- works,” in Computer Vision and Pattern Recognition, IEEE Conference
written Signature Verification-Literature Review,” arXiv preprint on, Jun. 2014, pp. 1717–1724.
arXiv:1507.07909, 2015. [24] L. G. Hafemann, L. S. Oliveira, P. R. Cavalin, and R. Sabourin, “Transfer
[4] M. Ferrer, M. Diaz-Cabrera, and A. Morales, “Static Signature Synthe- Learning between Texture Classification Tasks using Convolutional
sis: A Neuromotor Inspired Approach for Biometrics,” Pattern Analysis Neural Networks,” in Neural Networks, The 2015 International Joint
and Machine Intelligence, IEEE Transactions on, vol. 37, no. 3, pp. Conference on, 2015.
667–680, Mar. 2015. [25] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning,
[5] J. Vargas, M. Ferrer, C. Travieso, and J. Alonso, “Off-line Handwritten vol. 20, no. 3, pp. 273–297, Sep. 1995.
Signature GPDS-960 Corpus,” in Document Analysis and Recognition, [26] D. Bertolini, L. S. Oliveira, E. Justino, and R. Sabourin, “Reducing
9th International Conference on, vol. 2, Sep. 2007, pp. 764–768. forgeries in writer-independent off-line signature verification through
[6] Y. Serdouk, H. Nemmour, and Y. Chibani, “Off-line handwritten sig- ensemble of classifiers,” Pattern Recognition, vol. 43, no. 1, pp. 387–
nature verification using variants of local binary patterns,” Networking 396, Jan. 2010.
and Advanced Systems, 2nd International Conference on, p. 75, 2015. [27] L. Batista, E. Granger, and R. Sabourin, “Dynamic selection of genera-
[7] S. Pal, S. Chanda, U. Pal, K. Franke, and M. Blumenstein, “Off-line tive–discriminative ensembles for off-line signature verification,” Pattern
signature verification using G-SURF,” in Intelligent Systems Design and Recognition, vol. 45, no. 4, pp. 1326–1340, Apr. 2012.
Applications, 12th International Conference on. IEEE, 2012, pp. 586–
[28] D. Rivard, E. Granger, and R. Sabourin, “Multi-feature extraction and
591.
selection in writer-independent off-line signature verification,” Interna-
[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification
tional Journal on Document Analysis and Recognition, vol. 16, no. 1,
with Deep Convolutional Neural Networks,” in Advances in Neural
pp. 83–103, 2013.
Information Processing Systems 25, 2012, pp. 1097–1105.
[9] C. Freitas, M. Morita, L. Oliveira, E. Justino, A. Yacoubi, E. Lethelier, [29] J. Hu and Y. Chen, “Offline Signature Verification Using Real Adaboost
F. Bortolozzi, and R. Sabourin, “Bases de dados de cheques bancarios Classifier Combination of Pseudo-dynamic Features,” in Document
brasileiros,” in XXVI Conferencia Latinoamericana de Informatica, Analysis and Recognition, 12th International Conference on, Aug. 2013,
2000. pp. 1345–1349.
[10] N. A. Murshed, F. Bortolozzi, and R. Sabourin, “Binary image com- [30] M. B. Yilmaz, “Offline Signature Verification With User-Based And
pression using identity mapping backpropagation neural network,” in Global Classifiers Of Local Features,” Ph.D. dissertation, Sabancı Uni-
Electronic Imaging’97. International Society for Optics and Photonics, versity, 2015.
1997, pp. 29–35. [31] Y. Guerbai, Y. Chibani, and B. Hadjadji, “The effective use of the
[11] N. A. Murshed, R. Sabourin, and F. Bortolozzi, “A cognitive approach to one-class SVM classifier for handwritten signature verification based
off-line signature verification,” International Journal of Pattern Recog- on writer-independent parameters,” Pattern Recognition, vol. 48, no. 1,
nition and Artificial Intelligence, vol. 11, no. 05, pp. 801–825, 1997. pp. 103–113, Jan. 2015.

You might also like