0% found this document useful (0 votes)
51 views

Deep Learning-Based Feature Extraction in Iris Recognition: Use Existing Models, Fine-Tune or Train From Scratch?

This document explores using deep learning techniques to extract effective features for iris recognition. Specifically, it examines whether to use pre-trained models from other domains, fine-tune existing models on iris data, or train models from scratch on iris data. The authors use a ResNet-50 model trained five different ways to extract features from iris images and evaluate classification accuracy. They find that fine-tuning a pre-trained model on over 360,000 iris images achieves the best performance, outperforming both pre-trained and from-scratch trained models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Deep Learning-Based Feature Extraction in Iris Recognition: Use Existing Models, Fine-Tune or Train From Scratch?

This document explores using deep learning techniques to extract effective features for iris recognition. Specifically, it examines whether to use pre-trained models from other domains, fine-tune existing models on iris data, or train models from scratch on iris data. The authors use a ResNet-50 model trained five different ways to extract features from iris images and evaluate classification accuracy. They find that fine-tuning a pre-trained model on over 360,000 iris images achieves the best performance, outperforming both pre-trained and from-scratch trained models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Deep Learning-Based Feature Extraction in Iris Recognition: Use Existing

Models, Fine-tune or Train From Scratch?

Aidan Boyd, Adam Czajka, Kevin Bowyer


University of Notre Dame
Notre Dame, Indiana
{aboyd3, aczajka, kwb}@nd.edu
arXiv:2002.08916v1 [cs.CV] 20 Feb 2020

Abstract works (CNN), the Daugman’s method of iris code genera-


tion could be visualized as a single convolutional layer with
Modern deep learning techniques can be employed to neurons having hardlim [3] activation functions. Although
generate effective feature extractors for the task of iris this structure seems to be simple, it is not necessarily a triv-
recognition. The question arises: should we train such ial task to find a set of kernels implemented by this single
structures from scratch on a relatively large iris image convolutional layer to extract salient iris features. This task
dataset, or it is better to fine-tune the existing models to does not become significantly simpler even if we narrow
adapt them to a new domain? In this work we explore five ourselves to Gabor wavelets.
different sets of weights for the popular ResNet-50 architec- Convolutional neural networks, recently very successful
ture to find out whether iris-specific feature extractors per- in solving various computer vision tasks, have been also
form better than models trained for non-iris tasks. Features shown to serve as good iris feature extractors [21]. These
are extracted from each convolutional layer and the clas- structures are certainly more complex than Daugman’s ap-
sification accuracy achieved by a Support Vector Machine proach, but the fact that there is no need to search for op-
is measured on a dataset that is disjoint from the samples timal convolutional kernels, and thus the off-the-shelf ar-
used in training of the ResNet-50 model. We show that the chitectures can be directly used in iris recognition, is ap-
optimal training strategy is to fine-tune an off-the-shelf set pealing. However, intuitively the domain-specific image
of weights to the iris recognition domain. This approach re- processing methods should perform better than general-
sults in greater accuracy than both off-the-shelf weights and purpose ones, as it has been shown also for iris recognition
a model trained from scratch. The winning, fine-tuned ap- [9]. In this paper we present experiments and answer the
proach also shows an increase in performance when com- following two questions:
pared to previous work, in which only off-the-shelf (not fine-
tuned) models were used in iris feature extraction. We make
the best-performing ResNet-50 model, fine-tuned with more Q1. Which models perform better in iris recognition: off-
than 360,000 iris images, publicly available along with this the-shelf, i.e., not requiring training with iris data, or
paper. trained with iris images?

Q2. If it is better to use trained models, what training strat-


1. Introduction egy is better: train from scratch on relatively large set
of iris images, fine-tune a model designed for a general
The task of developing reliable feature extractors for iris image recognition task, or fine-tune the model used for
recognition is still an open research problem. Iris recogni- face recognition?
tion has gained a position as one of the fastest and most ac-
curate biometric recognition methods, deployed in several
large-scale national ID [25] and border control [20] pro- For that purpose we use the ResNet-50 model [14] and
grams. The approach of translating the output of Gabor a set of more than 360,000 iris training images. An addi-
filtering into a binary code, proposed more than 25 years tional set of 20,000 subject-disjoint iris images are used for
ago [10], dominates current commercial implementations. classifier training and testing. The fine-tuned models, which
Using present trends in machine learning and explaining achieved higher accuracy than off-the-shelf networks in our
this approach in the language of convolutional neural net- experiments, are made available along with the paper.
Figure 1: Conceptual overview of experiments in this work. Left: Iris images from our in-house corpus (more than 370,000
iris images) and CASIA-Iris-Thousand are segmented with the OSIRIS to create network training and classification train-
ing/testing datasets, respectively. Middle: ResNet-50 trained on ImageNet, ResNet-50 trained on VGGFace, both of these
fine-tuned on 360K+ iris training images, and ResNet-50 trained from scratch on the same 360K+ training iris image set are
used to generate feature vectors from each of the convolutional layers. Right: Classification on the CASIA-Iris-Thousand iris
image set that is subject-disjoint and cross-sensor relative to the in-house 370K+ iris image set used in CNN training is used
to compute accuracy for comparison of approaches.

2. Related Work mentation errors and large changes in pupil sizes.


Convolutional Neural Networks have also been shown to
Convolutional Neural Networks have been employed to
perform as effective feature extractors [18, 8, 16]. In a work
achieve state-of-the-art iris recognition performance. Liu et
by Nguyen et al. [21], off-the-shelf weights are explored
al. proposed DeepIris [15], the first deep learning method
as feature extractors for the task of iris recognition. Five
for heterogeneous iris verification. The authors proposed
state of the art network architectures are examined and fea-
a nine layer architecture including a single convolutional
tures are extracted layers at various stages of the network.
layer, two pooling layers, two normalization layers, two
Promising results are reported even though the off-the-shelf
local layers and one fully connected layer. Experimental
weights utilized were not trained for the task of iris recog-
results validate the effectiveness of applying CNNs to iris
nition. Our paper is different in a way that in their paper
recognition by attaining promising results for both cross-
the results from the five tested architectures come from off-
resolution and cross-sensor iris validation. Gangwar and
the-shelf weights. In our paper, we determine whether the
Joshi [12] later proposed two deeper architectures for iris
fine-tuning of weight parameters increases performance.
recognition. These two networks exhibited superior perfor-
mance on the ND-IRIS-0405 [6] and ND-CrossSensor-Iris- Minaee et al. [19] also explored the use of deep convo-
2013 [5] datasets. Proença and Neves [23] reinforce the ca- lutional features for iris recognition. In this work, the au-
pabilities of neural networks by showing that their proposed thors extract features from each layer of the VGG-Net [24].
model achieved state-of-the-art performance for recognition It is shown that these features result in high classification
for good quality data while also being robust against seg- accuracy. Our paper is different in a way that in their
work, different weight configurations are not explored, in- be known as the classification training set and the remain-
stead they use the ImageNet weights to extract features. In ing samples used to test the classifier will be known as the
our work, features are extracted in a similar way, however, classification test set. The classification training and classi-
instead of investigating which layer is most performant, we fication test sets are both independent splits of the CASIA-
are exploring what is the best way to train the network to Iris-Thousand database. The classification training and test
achieve the best results. databases are both subject-disjoint and cross-sensor in com-
In a paper by Zanlorensi et al. [26], fine-tuned face parison to the network training set.
weights are used in both the ResNet-50 and VGG models
to extract weights from the last layer of the network on iris 3.2. Segmentation
data for the task of iris data augmentation and segmentation. The tool used to segment all iris images in this work
They show that the use of transfer learning leads to the gen- is OSIRIS [22]. OSIRIS locates the pupil and iris bound-
eration of good feature extractors. Our paper is different aries and generates normalized iris images of size 64 × 512.
in a way that in their work features are only extracted from When segmenting the network training set, if the segmenta-
the last layer before the classification layer of the architec- tion failed we excluded that sample entirely from the subset.
tures. In our work, features are extracted from each of the Out of the 373,629 full iris images in the network training
convolutional layers in the network and we see that the best set, there were 10,117 failures (about 3 percent), meaning
performing layers are those from the middle of the network. the final network training set is of size 363,512 normalized
Menon and Mukherjee [17] also proposed a method iris images. The reason for this data curation is to use valid
of feature extraction using deep convolutional networks. training samples and let the network learn iris-related fea-
In their work, they use fine-tuned models starting from tures.
ImageNet weights to extract features for the purpose of iris When segmenting and normalizing the CASIA-Iris-
recognition. Features are extracted from the last layer be- Thousand database, there was only 27 failures, correspond-
fore the classification layer and passed to two single layer ing to less than 0.2% error. For simplicity, failed samples
perceptrons. The input to their proposed method is two iris were eliminated from the dataset meaning that the com-
images and the output is whether they are the same person bined size of the classification training and classification
or not. Our paper is different in a way that we make use of testing set was 19,973 images from 2000 classes. One pos-
a one-versus-rest SVM in which we pass in a single image sible reason for the difference in performance between the
and it outputs which class it belongs to. network training set and the CASIA-Iris-Thousand database
is that the OSIRIS tool was developed and tested using the
3. Methodology CASIA-Iris-Thousand database.
The normalized iris images used in network training and
This section describes the experimental setup for this classification are by default grayscale images. It was re-
work. The weights of all trained networks as well as ran- quired for the ResNet architecture that these be converted
dom seeds have been made available [4] such that tests can to RGB. This was done by copying all pixel values from the
be reproduced. original single channel across all three channels.
3.1. Databases 3.3. Network Architecture
The dataset used to train the network is a set of in-house The chosen network architecture for this work is a deep
iris data collected by the University of Notre Dame. This convolutional neural network model based on the Residual
set consists of 2000 classes of irises, totalling of 373,629 Network architecture with 53 convolutional layers (ResNet-
full iris images. All images in this set are live irises without 50) [14].
contact lenses. Images in this set were acquired using LG ResNet-50 is a fully convolutional architecture. All
2200, LG 4000 and IrisGuard AD100 sensors. weights in a convolutional layer are shared between kernels
The dataset that was used for testing and classification on each pixel of the image, meaning input image dimen-
was the CASIA-Iris-Thousand database [1]. This database sions do not have an effect on the operation of the network.
contains 20,000 images from 1000 subjects, collected us- Only dense layers, located at the end of the network, de-
ing the IKEMB-100 camera from IrisKing. Both left and pend on the number of classes, and since we do not use the
right iris images were acquired meaning there are 2000 to- classification layers of the off-the-shelf networks, it is ac-
tal classes in this database. ceptable to use any input size greater than 32 × 32 pixels,
To simplify the explanation of the different data subsets, specified in the Keras ResNet-50 documentation [2]. This is
labels have been assigned. The subset of our in-house data important as the input to each of the networks in this work
used to train the networks is labelled the network training is the 64 × 512 × 3 normalized iris image. Although the im-
set. The subset that will be used to train the classifier will ages used to train the off-the-shelf network were the default
ResNet dimensions of 224 × 224 × 3, these weights are still classification. For each layer, Principal Component Anal-
applicable to images of different sizes. ysis (PCA) is carried out, and we project all features onto
a new subspace having 2000 dimensions. From a classi-
3.4. Network Training fication standpoint, we want to limit the features to those
In this work we examine five different sets of weights that are most important while not using too many and there-
for the ResNet-50 architecture. Three of these are trained fore over-fitting to the data. Through experimentation, it
or fine-tuned using iris images, and the other two are off- was found that most feature vectors were reduced to within
the-shelf weights obtained from training on the ImageNet the 1000-2000 feature range after PCA, and 2000 dimen-
[11] and VGGFace2 [7] datasets. The first trained net- sions was selected as a good number of features for the fi-
work is initialized using random weights. We denote this nal experiments. The Singular Value Decomposition (SVD)
as being trained from scratch. The second is when the solver that was used for PCA was “randomized” as pro-
training is initialized on ImageNet weights and then the posed in [13]. This was selected as it was shown to run
weight parameters are tweaked to be domain specific to iris faster than the default solver. Once the feature vector size
recognition. The last trained network is initialized using was reduced to 2000, further reduction was made by select-
VGGFace2 weights and the parameters are tweaked as with ing the number of features that correspond to 90% of the
the ImageNet network to be domain specific to iris recog- feature variance. In some cases this did not result in any
nition. The off-the-shelf weights are the default ImageNet reduction from the 2000 features. PCA is employed mainly
weights from the Keras ResNet-50 implementation[2] and due the fact that an SVM is used as a classifier, which does
the set of weights obtained from training on the VGGFace2 not perform well with large dimensionality.
dataset using the keras vggface package[?]. The two off- 3.7. Classification
the-shelf weight sets are used as a comparison to determine
whether the parameter fine-tuning process yields better fea- A one-versus-rest Support Vector Machine (SVM) is im-
ture extractors. plemented with a linear kernel for classification. The clas-
For network training, the final classification layer of the sification training set is used to train these SVMs. Once
architecture is removed and replaced with a custom dense the models have been created, these are tested using the
layer due to the increased number of 2000 iris classes from classification testing set. The classification training set is
the network training set that are being classified. A global 70% of the CASIA-Iris-Thousand database and the classi-
average pooling layer is placed before this final dense layer fication testing subset is the remaining 30%. The train/test
to transform the features to a vector of size 2048. The fea- split is stratified such that if there are 10 images for each
ture vector of size 2048 is created as this is the number of class, seven will be used in training and the remaining three
channels in the output of the previous layer. will be used for testing. This prevents scenarios where all
samples from one class are in either the training or test set
3.5. Feature Extraction and therefore it is impossible to correctly classify these sam-
As the networks are not trained for the classification of ples. A unique one-versus-rest classifier is created for each
the CASIA-Iris-Thousand database, we cannot use these layer and the accuracy reported is how many correct clas-
networks directly as classifiers. Instead, features are ex- sifications were made in the test set over the total number
tracted from layers of the network in the hope that they are of samples in the classification test set. Linear kernels were
generalized to the task of iris recognition. In this work, fea- selected as it was found that these performed best and in the
tures are extracted from the output of each of the 53 indi- least time.
vidual convolutional layers in the network. These features
are in the form of a vector ranging from size 16,384 to size 4. Evaluation
524,288, depending on the convolutional layer. These vec- Figure 2 details the results obtained through experimen-
tors will be referred to as the feature vectors. In order to tation. The x-axis of this graph is the number of the convo-
make sure all features are on the same scale, Min-Max scal- lutional layer that the result was obtained from, i.e., layer 1
ing is performed independently on each feature between a corresponds to the first convoltuional layer in the architec-
range of 0 and 1. This scaling preserves inter-feature vari- ture after the input layer, and layer 53 is the final convolu-
ance while making sure that larger scaled features don’t tional layer before the dense layer at the end. The names
dominate the feature selection even though they may not for these layers differ between the ImageNet networks and
necessarily be the best features for classification. the VGGFace2 networks. To find out the name for a layer
number the list mapping layer numbers to names for both
3.6. Feature Space Dimensionality Reduction
ImageNet and VGGFace2 networks can be found in the
Because the feature vectors are of such a large scale, repository [4]. The layer names for the trained from scratch
we reduce the dimensionality of the feature space prior to network is the same as the ImageNet names. The y-axis
Figure 2: This plot shows the classification accuracy for each convolutional layer of the five networks tested on the CASIA-
Iris-Thousand dataset. The x-axis is the convolutional layer number. Out of frame: results of VGGFace2 off-the-shelf for
layers 48, 51, 52 and 53 which were 47.4%, 25.75%, 39.87% and 53.81% respectively.

is the classification accuracy, meaning how many correct 4.2. Off-the-Shelf Networks
classifications the SVM made over all total classifications.
All classifications were made on a single random 70%/30% The selected off-the-shelf configurations consisted of the
split of the CASIA-Iris-Thousand database into the classifi- weights used to classify the ImageNet database [11] and the
cation training and classification testing sets. It was found weights used to classify the VGGFace2 database [7]. The
that running these experiments on more than one split was ImageNet weights used were the “imagenet” weights for the
infeasible due to the time required to run each. Analysis Keras Implementation of ResNet-50 [2] and the VGGFace2
will now be done on all networks. weights were attained using the default implementation of
ResNet-50 from the keras vggface package [?]. The results
4.1. Network Trained from Scratch for these networks outlines the similarities between these
weight sets at many of the layers. We see that in most cases
As stated before, the training process for this network in the first two thirds of the network, the VGGFace2 off-the-
involved random weight initialization and then all weights shelf network performs slightly better than the ImageNet
are fine-tuned based on the network training set. It is ev- off-the-shelf. However, in the last 6 layers of the VGGFace2
ident from the Figure 2 that this network is the worst per- off-the-shelf architecture we see a drastic decrease in per-
forming network, with most of the reported accuracy falling formance. In this same 6 layers, the ImageNet network per-
beneath the other four networks. Towards the later half of formance also drops but not as extremely as VGGFace2.
the network, however, we see these results stabilize and be- After some investigation, it was found that in these final
gin to perform consistently better than the off-the-shelf net- layers of the VGGFace2 network that the PCA feature se-
works. This trained-from-scratch network performs worse lection reduced the dimensionality to less than 100 features.
than the two fine-tuned networks, as evident from Figure 2. It seems that the selected features had the highest variance
The reason for this performance may be due to the size of but these did not contribute well to classification. In the lay-
the network training set. This set contains 363,512 images ers that performed best, i.e., in the middle of the network,
from 2000 classes, which is very minimal in comparison to the feature vector size was reduced to between 500-2000.
the quantity of data used to train the off-the-shelf networks. The last 6 layers (layers 47 to 53) in both of the off-the-
One interesting thing to note is the high number of layers shelf networks present lower and more variant results and
that are achieving similar accuracy, namely in the second as such can be seen as being the least useful as feature ex-
half of the network. It can be deduced that, even though the tractors. As none of the best results come from the last 6
feature vectors for these layers are variant in size, they are layers in any of the architectures, these poor results in the
describing features that result in similar classification accu- VGGFace2 off-the-shelf do not alter this work as we focus
racy. on only the best performing layers for each network. Lay-
ers in the middle of the architecture perform better and more 4.4. Comparison of results
stable. This PCA reduction is also the cause of the drop in
Although the purpose of this work is to investigate the
accuracy seen by all networks in layers 4 and 5. Layers 4
optimal strategy to apply an example deep learning-based
and 5 must offer some features that are not useful for classi-
feature extraction (ResNet-50) for iris recognition, the ob-
fication, and because this drop happens at the same layers in
tained results can be compared to current literature to mea-
all 5 networks points to the possibility that it is an inherent
sure the performance of our approach.
feature of the ResNet architecture.
In the paper by Nguyen et al. [21], the metric used to
At the early stages of the network the classification ac-
measure performance was the true positive rate at a false
curacy of both off-the-shelf networks is higher than that
match rate of 0.1%. To convert our results so they can be
of the network trained from scratch, even though no iris
comparable to the results in [21], Receiver Operating Char-
domain information was used in the training of these net-
acteristic curves can be generated and the true positive rate
works. This outlines the generalization capabilities of these
at 0.1% false match rate can be extracted. In their paper,
networks as feature extractors. This is an interesting re-
the CASIA-Iris-Thousand database was used in a 70%/30%
sult as it may outline the importance of the size of the
split in the same way as our work. We directly compare to
network training set. Both of these networks were trained
the results obtained on this database. To do this the ROC
on datasets of much larger scale than our network training
curve for the highest performing layer for each network is
set. Both of these datasets used to generate the off-the-
created. We denote the highest performing layer as the layer
shelf weights had high heterogeneity present during train-
that produced the highest accuracy as seen in Figure 2, i.e.,
ing. The ImageNet weights are trained to classify thou-
correct classifications/total samples in the test set. The high-
sands of classes of various images of largely variant sub-
est performing layers are as follows:
jects whereas the VGGFace2 weights are trained to classify
9131 classes of faces. Although the accuracy of the off-the- • For the network trained from scratch, the best perform-
shelf networks was not as high as the fine-tuned networks, ing layer was layer 42. This layer attained an accuracy
they are still useful for iris recognition as multiple layers of 97.03%. The ROC Curve for this layer can be seen
from both off-the-shelf networks obtained a classification in Figure 3(a). Looking at this graph we see that the
accuracy of over 97.5%. true positive rate at a FMR of 0.1% (10−3 FMR in Fig-
ure 3) is 97.93%.
4.3. Fine-tuned Networks
• For the off-the-shelf ImageNet weights, the highest
From looking at Figure 2, it is clear that the fine-tuned accuracy seen was 98.43% using layer layer 25. As
networks are the highest performing. Both the fine-tuned per Figure 3(b), the true positive rate of this layer is
ImageNet and fine-tuned VGGFace2 weights perform sim- 98.93%.
ilarly in many of the layers in the network. As with the
• The best performing layer for the off-the-shelf VG-
network trained from scratch, the second half accuracy is
GFace2 weights saw an accuracy of 98.41% when us-
stable. Fine-tuning the parameters from the ImageNet and
ing layer layer 27. This translated into a true positive
VGGFace2 weights to the iris network training set results
rate of 98.93% as shown in Figure 3(c).
in superior performance. One observation to be made here
is that if training is done on a large heterogeneous dataset, • For the network that used weights that were fine-tuned
this can be fine-tuned to a specific domain through weight from ImageNet weights, an accuracy of 99.03% was
retraining and achieve better results over training directly obtained using layer layer 23. Figure 3(d) depicts the
on the domain specific data. ROC curve for this network configuration. The true
The results from these fine-tuned networks outline the positive rate for this layer is 99.38%.
effectiveness of this network as a feature extractor for iris
• For the network that used weights that were fine-tuned
recognition. These networks were fine-tuned using the net-
from the VGGFace2 weights, the highest accuracy at-
work training set which is independent subject-disjoint and
tained was 99.03%, the same as that for the fine-tuned
cross-sensor iris data to that it was tested on, the CASIA-
ImageNet network. This accuracy was achieved using
Iris-Thousand database [1], and classification accuracy as
layer layer 27. As per Figure 3(e), the true positive
high as 99% is reported for both the fine-tuned ImageNet
rate of this layer is slightly lower than the fine-tuned
and VGGFace2 networks. It is evident that the network has
ImageNet, at 99.27%.
learned efficient features that can be generalized to all iris
data for recognition purposes. These results also display In the paper by Nguyen et al. [21], the highest recorded
the benefits of transfer learning. Feature extractors from recognition rate was 98.8% using the DenseNet architec-
one domain can be effectively transferred to another domain ture. In their work, they also test a shallower ResNet ar-
through a process of fine-tuning. chitecture, attaining a peak recognition rate of 98.5%. The
(a) Trained From Scratch (Layer 42) (b) ImageNet Off-The-Shelf (Layer 25) (c) VGGFace2 Off-The-Shelf (Layer 27)

(d) ImageNet Fine-tuned (Layer 23) (e) VGGFace2 Fine-tuned (Layer 27) (f) Combined graph of all ROC Curves

Figure 3: ROC curves for five networks investigated in this paper. Annotated values correspond to true positive rate seen at
the correspondent false match rate. Annotated by a cross in (f) is the peak recognition rate from the work by Nguyen et al.
[21]

peak recognition rate in our experiments was using the fine-


tuned ImageNet network at 99.38%. Figure 3(f) shows
all five ROC curves generated superimposed on the same
graph, with the peak recognition rate seen in [21] annotated
as a black X. It can be seen from this graph that four of
the five networks tested in this work perform better than the
highest recorded recognition rate of [21]. This may addi-
tionally suggest that fine-tuning of the already trained net-
works to the iris domain is a good approach to use deep
learning-based structures in iris recognition. The use of a
deeper network is also shown to be beneficial as there are
more layers to extract features from, hence a higher chance
of generating a better feature extractor.

4.5. Statistical Significance of Results


Figure 4: Boxplot showing the results of 10 80%/20% splits
To check the statistical significance of the results that
of the test data for all five network configurations.
were obtained, further analysis was done through the use
of a boxplot. For the best performing layer of each net-
work, we took the same 70%/30% split of the classification
database and then further broke the 30% into 10 different sub-splits of the testing data would yield similar results as
80%/20% splits. We discarded the 20% and then ran the what we saw on the full 30%. The result of this experiment
classification on the 80%. This was to check if different was Figure 4.
From this we see clearly that the network trained from quite obvious, it was interesting to see that 360,000 train-
scratch performed the worst over all sub-splits. Figure 4 ing samples seems to be too small for training such struc-
displays something interesting though, the two off-the-shelf tures from scratch. The training dataset size clearly plays a
networks performed differently in this experiment. It can large role in the creation of good feature extractors. Also,
be seen that the ImageNet weights perform statistically bet- starting from non-domain-specific weights and fine-tuning
ter than the VGGFace2 weights, even though the result ob- them increases heterogeneity in training.We conclude that
tained for the full 30% only differed by 0.02% (ImageNet the weights used to classify natural scenes are a good start-
98.43%/VGGFace2 98.41%). This outlines that the Ima- ing point for network training as the highly variant classes
geNet weights actually perform better for this task as the used to generate these weights meant more generalized fea-
results are in general slightly higher on the sub-splits. This ture extractors were created, which, once fine-tuned, per-
information would not be attainable through using just the form well for the task of iris recognition even in a cross-
full 30% for testing, so the creation of 10 sub-splits gives us sensor scenario as presented in this paper.
more information about the overall performance. In this work, we not only show the optimal training
For the two fine-tuned networks, the variance in perfor- method for iris recognition, we also show that our approach
mance was minimal, the upper and lower quartile range for is effective for iris recognition by comparing our attained
both the fine-tuned ImageNet and VGGFace2 weights are results to other recent work in the area. We show that the
similar and close together. The results obtained from this proposed methodology shows that for four out of the five
sub-splitting did not vary greatly. This could be due to weight sets resulted in an increase in recognition rate as
the fine-tuning process tuning the weights to similar values. compared to a previous work. Although this was not the
The upper quartile of the ImageNet off-the-shelf actually primary purpose of this paper, the improved results verifies
matches that of the two fine-tuned networks, however the the approach taken.
range of results is larger. From this we can affirm our con- This paper follows the good practices related to repro-
clusion that the fine-tuned weights are the most performant, ducibility of research results. We made the performing
however, off-the-shelf weights can be employed to generate weights publicly available for those who would like to ex-
effective feature extractors also. plore the best known to us at present deep learning-based
iris feature extractor [4].
5. Discussion and Conclusions
Acknowledgments
The results presented in this paper allow us to provide
the following answers to two questions posed in the intro- The Titan Xp used for this research was donated by the
duction: NVIDIA Corporation. We would also like to thank Vı́tor
Albiero for his help with this work.
Q1. It is worth using deep learning-based model trained on
domain-specific images in iris recognition. References
[1] Chinese academy of sciences institute of automation.
Q2. It is better to take the best-performing model trained http://biometrics.idealtest.org/dbDetailForUser.do?id=4.
on either general-purpose or face images and fine-tune Accessed: 04-12-2019.
it to iris recognition task, rather than train own network [2] Keras documentation - applications.
from scratch. https://keras.io/applications/#resnet. Accessed: 04-12-
2019.
To answer these questions, we examined five different [3] Matlab documentation - hardlim.
sets of weights on the popular ResNet-50 architecture and https://www.mathworks.com/help/deeplearning/ref/hardlim.html.
extracted features from each convolutional layer in the ar- Accessed: 04-14-2019.
chitecture. These sets of weights included a network trained [4] Repository of supplementary material.
from scratch using random weight initialization, off-the- https://github.com/BoydAidan/BTAS2019DeepFeatureExtraction.
shelf ImageNet weights, off-the shelf VGGFace2 weights, Accessed: 04-13-2019.
[5] University of notre dame public datasets.
fine-tuned ImageNet weights and fine-tuned VGGFace2
https://cvrl.nd.edu/projects/data/. Accessed: 04-10-2019.
weights.
[6] K. W. Bowyer and P. J. Flynn. The nd-iris-0405 iris image
The reason for the observed results may be that such dataset. arXiv preprint arXiv:1606.04853, 2016.
complex and deep structures like ResNet-50 require more [7] Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman.
samples than we had for iris recognition domain (around Vggface2: A dataset for recognising faces across pose and
360,000). And it is thus better to start with a solution age. In 2018 13th IEEE International Conference on Auto-
for general-purpose vision problem, and then fine-tune it matic Face & Gesture Recognition (FG 2018), pages 67–74.
to the specific domain. Although this conclusion seems IEEE, 2018.
[8] Y. Chen, H. Jiang, C. Li, X. Jia, and P. Ghamisi. Deep feature [23] H. Proença and J. C. Neves. Irina: Iris recognition (even) in
extraction and classification of hyperspectral images based inaccurately segmented data. In Proceedings of the IEEE
on convolutional neural networks. IEEE Transactions on Conference on Computer Vision and Pattern Recognition,
Geoscience and Remote Sensing, 54(10):6232–6251, 2016. pages 538–547, 2017.
[9] A. Czajka, D. Moreira, K. Bowyer, and P. Flynn. Domain- [24] K. Simonyan and A. Zisserman. Very deep convolutional
specific human-inspired binarized statistical image features networks for large-scale image recognition. arXiv preprint
for iris recognition. In 2019 IEEE Winter Conference on arXiv:1409.1556, 2014.
Applications of Computer Vision (WACV), pages 959–967, [25] Unique Identification Authority of India. AADHAAR:
Jan 2019. http://uidai.gov.in, accessed on April 1, 2019.
[10] J. G. Daugman. High confidence visual recognition of per- [26] L. A. Zanlorensi, E. Luz, R. Laroca, A. S. Britto, L. S.
sons by a test of statistical independence. 15(11):1148–1161, Oliveira, and D. Menotti. The impact of preprocessing on
November 1993. deep representations for iris recognition on unconstrained
[11] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei- environments. In 2018 31st SIBGRAPI Conference on
Fei. Imagenet: A large-scale hierarchical image database. Graphics, Patterns and Images (SIBGRAPI), pages 289–296.
In 2009 IEEE conference on computer vision and pattern IEEE, 2018.
recognition, pages 248–255. Ieee, 2009.
[12] A. Gangwar and A. Joshi. Deepirisnet: Deep iris represen-
tation with applications in iris recognition and cross-sensor
iris recognition. In 2016 IEEE International Conference on
Image Processing (ICIP), pages 2301–2305. IEEE, 2016.
[13] N. Halko, P.-G. Martinsson, and J. A. Tropp. Finding
structure with randomness: Probabilistic algorithms for con-
structing approximate matrix decompositions. SIAM review,
53(2):217–288, 2011.
[14] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learn-
ing for image recognition. In Proceedings of the IEEE con-
ference on computer vision and pattern recognition, pages
770–778, 2016.
[15] N. Liu, M. Zhang, H. Li, Z. Sun, and T. Tan. Deepiris: Learn-
ing pairwise filter bank for heterogeneous iris verification.
Pattern Recognition Letters, 82:154–161, 2016.
[16] A. Mahmood, M. Bennamoun, S. An, and F. Sohel. Resfeats:
Residual network based features for image classification. In
2017 IEEE International Conference on Image Processing
(ICIP), pages 1597–1601. IEEE, 2017.
[17] H. Menon and A. Mukherjee. Iris biometrics using deep con-
volutional networks. In 2018 IEEE International Instrumen-
tation and Measurement Technology Conference (I2MTC),
pages 1–5. IEEE, 2018.
[18] D. Menotti, G. Chiachia, A. Pinto, W. R. Schwartz,
H. Pedrini, A. X. Falcao, and A. Rocha. Deep representations
for iris, face, and fingerprint spoofing detection. IEEE Trans-
actions on Information Forensics and Security, 10(4):864–
879, 2015.
[19] S. Minaee, A. Abdolrashidiy, and Y. Wang. An experimental
study of deep convolutional features for iris recognition. In
2016 IEEE signal processing in medicine and biology sym-
posium (SPMB), pages 1–6. IEEE, 2016.
[20] NEXUS: Joint USA and Canada Trusted Traveler Program.
US official site: https://www.cbp.gov/travel/trusted-
traveler-programs/nexus; Canada official site:
http://www.nexus.gc.ca, accessed on April 1, 2019.
[21] K. Nguyen, C. Fookes, A. Ross, and S. Sridharan. Iris recog-
nition with off-the-shelf cnn features: A deep learning per-
spective. IEEE Access, 6:18848–18855, 2018.
[22] N. Othman, B. Dorizzi, and S. Garcia-Salicetti. Osiris: An
open source iris recognition software. Pattern Recognition
Letters, 82:124–131, 2016.

You might also like