Deep Learning-Based Feature Extraction in Iris Recognition: Use Existing Models, Fine-Tune or Train From Scratch?
Deep Learning-Based Feature Extraction in Iris Recognition: Use Existing Models, Fine-Tune or Train From Scratch?
is the classification accuracy, meaning how many correct 4.2. Off-the-Shelf Networks
classifications the SVM made over all total classifications.
All classifications were made on a single random 70%/30% The selected off-the-shelf configurations consisted of the
split of the CASIA-Iris-Thousand database into the classifi- weights used to classify the ImageNet database [11] and the
cation training and classification testing sets. It was found weights used to classify the VGGFace2 database [7]. The
that running these experiments on more than one split was ImageNet weights used were the “imagenet” weights for the
infeasible due to the time required to run each. Analysis Keras Implementation of ResNet-50 [2] and the VGGFace2
will now be done on all networks. weights were attained using the default implementation of
ResNet-50 from the keras vggface package [?]. The results
4.1. Network Trained from Scratch for these networks outlines the similarities between these
weight sets at many of the layers. We see that in most cases
As stated before, the training process for this network in the first two thirds of the network, the VGGFace2 off-the-
involved random weight initialization and then all weights shelf network performs slightly better than the ImageNet
are fine-tuned based on the network training set. It is ev- off-the-shelf. However, in the last 6 layers of the VGGFace2
ident from the Figure 2 that this network is the worst per- off-the-shelf architecture we see a drastic decrease in per-
forming network, with most of the reported accuracy falling formance. In this same 6 layers, the ImageNet network per-
beneath the other four networks. Towards the later half of formance also drops but not as extremely as VGGFace2.
the network, however, we see these results stabilize and be- After some investigation, it was found that in these final
gin to perform consistently better than the off-the-shelf net- layers of the VGGFace2 network that the PCA feature se-
works. This trained-from-scratch network performs worse lection reduced the dimensionality to less than 100 features.
than the two fine-tuned networks, as evident from Figure 2. It seems that the selected features had the highest variance
The reason for this performance may be due to the size of but these did not contribute well to classification. In the lay-
the network training set. This set contains 363,512 images ers that performed best, i.e., in the middle of the network,
from 2000 classes, which is very minimal in comparison to the feature vector size was reduced to between 500-2000.
the quantity of data used to train the off-the-shelf networks. The last 6 layers (layers 47 to 53) in both of the off-the-
One interesting thing to note is the high number of layers shelf networks present lower and more variant results and
that are achieving similar accuracy, namely in the second as such can be seen as being the least useful as feature ex-
half of the network. It can be deduced that, even though the tractors. As none of the best results come from the last 6
feature vectors for these layers are variant in size, they are layers in any of the architectures, these poor results in the
describing features that result in similar classification accu- VGGFace2 off-the-shelf do not alter this work as we focus
racy. on only the best performing layers for each network. Lay-
ers in the middle of the architecture perform better and more 4.4. Comparison of results
stable. This PCA reduction is also the cause of the drop in
Although the purpose of this work is to investigate the
accuracy seen by all networks in layers 4 and 5. Layers 4
optimal strategy to apply an example deep learning-based
and 5 must offer some features that are not useful for classi-
feature extraction (ResNet-50) for iris recognition, the ob-
fication, and because this drop happens at the same layers in
tained results can be compared to current literature to mea-
all 5 networks points to the possibility that it is an inherent
sure the performance of our approach.
feature of the ResNet architecture.
In the paper by Nguyen et al. [21], the metric used to
At the early stages of the network the classification ac-
measure performance was the true positive rate at a false
curacy of both off-the-shelf networks is higher than that
match rate of 0.1%. To convert our results so they can be
of the network trained from scratch, even though no iris
comparable to the results in [21], Receiver Operating Char-
domain information was used in the training of these net-
acteristic curves can be generated and the true positive rate
works. This outlines the generalization capabilities of these
at 0.1% false match rate can be extracted. In their paper,
networks as feature extractors. This is an interesting re-
the CASIA-Iris-Thousand database was used in a 70%/30%
sult as it may outline the importance of the size of the
split in the same way as our work. We directly compare to
network training set. Both of these networks were trained
the results obtained on this database. To do this the ROC
on datasets of much larger scale than our network training
curve for the highest performing layer for each network is
set. Both of these datasets used to generate the off-the-
created. We denote the highest performing layer as the layer
shelf weights had high heterogeneity present during train-
that produced the highest accuracy as seen in Figure 2, i.e.,
ing. The ImageNet weights are trained to classify thou-
correct classifications/total samples in the test set. The high-
sands of classes of various images of largely variant sub-
est performing layers are as follows:
jects whereas the VGGFace2 weights are trained to classify
9131 classes of faces. Although the accuracy of the off-the- • For the network trained from scratch, the best perform-
shelf networks was not as high as the fine-tuned networks, ing layer was layer 42. This layer attained an accuracy
they are still useful for iris recognition as multiple layers of 97.03%. The ROC Curve for this layer can be seen
from both off-the-shelf networks obtained a classification in Figure 3(a). Looking at this graph we see that the
accuracy of over 97.5%. true positive rate at a FMR of 0.1% (10−3 FMR in Fig-
ure 3) is 97.93%.
4.3. Fine-tuned Networks
• For the off-the-shelf ImageNet weights, the highest
From looking at Figure 2, it is clear that the fine-tuned accuracy seen was 98.43% using layer layer 25. As
networks are the highest performing. Both the fine-tuned per Figure 3(b), the true positive rate of this layer is
ImageNet and fine-tuned VGGFace2 weights perform sim- 98.93%.
ilarly in many of the layers in the network. As with the
• The best performing layer for the off-the-shelf VG-
network trained from scratch, the second half accuracy is
GFace2 weights saw an accuracy of 98.41% when us-
stable. Fine-tuning the parameters from the ImageNet and
ing layer layer 27. This translated into a true positive
VGGFace2 weights to the iris network training set results
rate of 98.93% as shown in Figure 3(c).
in superior performance. One observation to be made here
is that if training is done on a large heterogeneous dataset, • For the network that used weights that were fine-tuned
this can be fine-tuned to a specific domain through weight from ImageNet weights, an accuracy of 99.03% was
retraining and achieve better results over training directly obtained using layer layer 23. Figure 3(d) depicts the
on the domain specific data. ROC curve for this network configuration. The true
The results from these fine-tuned networks outline the positive rate for this layer is 99.38%.
effectiveness of this network as a feature extractor for iris
• For the network that used weights that were fine-tuned
recognition. These networks were fine-tuned using the net-
from the VGGFace2 weights, the highest accuracy at-
work training set which is independent subject-disjoint and
tained was 99.03%, the same as that for the fine-tuned
cross-sensor iris data to that it was tested on, the CASIA-
ImageNet network. This accuracy was achieved using
Iris-Thousand database [1], and classification accuracy as
layer layer 27. As per Figure 3(e), the true positive
high as 99% is reported for both the fine-tuned ImageNet
rate of this layer is slightly lower than the fine-tuned
and VGGFace2 networks. It is evident that the network has
ImageNet, at 99.27%.
learned efficient features that can be generalized to all iris
data for recognition purposes. These results also display In the paper by Nguyen et al. [21], the highest recorded
the benefits of transfer learning. Feature extractors from recognition rate was 98.8% using the DenseNet architec-
one domain can be effectively transferred to another domain ture. In their work, they also test a shallower ResNet ar-
through a process of fine-tuning. chitecture, attaining a peak recognition rate of 98.5%. The
(a) Trained From Scratch (Layer 42) (b) ImageNet Off-The-Shelf (Layer 25) (c) VGGFace2 Off-The-Shelf (Layer 27)
(d) ImageNet Fine-tuned (Layer 23) (e) VGGFace2 Fine-tuned (Layer 27) (f) Combined graph of all ROC Curves
Figure 3: ROC curves for five networks investigated in this paper. Annotated values correspond to true positive rate seen at
the correspondent false match rate. Annotated by a cross in (f) is the peak recognition rate from the work by Nguyen et al.
[21]