Final Report 1
Final Report 1
INSTITUTE OF ENGINEERING
[CODE: EX755]
ON
BY:
SANGEET KHANAL(41172)
LALITPUR, NEPAL
APRIL 2023
REAL-TIME AGE AND GENDER DETECTION
BY:
ABHISHEK PRADHAN (41151)
SANGEET KHANAL(41172)
SUPERVISOR:
ER. DEVENDRA KATHAYAT
TRIBHUVAN UNIVERSITY
CHYASAL, LALITPUR
APRIL, 2023
ACKNOWLEDGEMENT
We would like to thank the Institute of Engineering for including the fourth year
major project in the curriculum. Moreover, we would like to express gratitude to our
Departmant of Electronics, Information and Communcation, Himalaya College of
Engineering, for giving us the golden opportunity to do the project. We are thankful
to the management system for providing us the opportunity to explore our interest and
ideas in the field of engineering.
i
ABSTRACT
Automatic age and gender classification has become relevant to an increasing amount
of applications, particularly since the rise of social platforms and social media.
However, performance of existing methods on real-world images is still significantly
lacking, especially when compared to the tremendous leaps in performance recently
reported for the related task of face recognition. Thus, this report is prepared to show
that by learning representations through the use of deep-convolutional neural
networks (DCNN), a significant increase in performance can be obtained on these
tasks. The two-level CNN architecture includes feature extraction and classification
itself. The feature extraction extracts feature corresponding to age and gender, while
the classification classifies the face images to the correct age group and gender. This
is done by using deep learning, OpenCV which is capable of processing the real-time
frames which is given as input and the determined age and gender as output based on
the evaluation of method on the recent UTKFace (dataset) for gender and age
estimation. The evaluation method includes classification rate, precision, and recall
using UTKface dataset and real-world images to exhibit excellent performance by
achieving good prediction results and computation time with validation accuracy 87%
on gender detection and 7.0851 mean absolute error for age detection.
ii
Contents
ACKNOWLEDGEMENT ............................................................................................. i
ABSTRACT.................................................................................................................. ii
3.2.4 Testing........................................................................................................ 11
3.4.1 Dataset........................................................................................................ 13
iii
3.4.4 Rectified Linear Unit (ReLU) .................................................................... 20
REFERENCES ........................................................................................................... 34
APPENDIX ................................................................................................................. 37
iv
LIST OF FIGURES
Figure 3.1: Block Diagram ........................................................................................... 8
v
LIST OF ABBREVIATIONS
AAM : Active Appearance Model
vi
CHAPTER-1: INTRODUCTION
1.1 Background
Facial analysis has gained much recognition in the computer vision community in the
recent past. Human’s face contains features that determine identity, age, gender,
emotions, and the ethnicity of people. Among these features, age and gender
classification can be especially helpful in several real-world applications including
security and video surveillance, electronic customer relationship management,
biometrics, electronic vending machines, human-computer interaction, entertainment,
cosmetology, and forensic art.
A lot of research has been done using deep learning methods such as ANN, CNN to
determine age and gender estimation. Fundamental facial consideration features are
eyebrows, mouth, nose and eyes. An architecture based on the convolution Neural
network (CNN) is proposed here for age and gender classification. This is one of the
well-known deep artificial neural networks. Convolutional Neural Network based
design models are broadly utilized in classification task because of their remarkable
execution in facial investigation. The Convolutional Neural Network includes feature
extraction which extracts features corresponding to age and gender. Furthermore
CNN includes feature classification which classifies facial images into the correct age
and determines the gender. In current world, works in age and gender classification is
showing encouraging signs of progress in deep learning and CNN, therefore end-to-
end deep learning-based classification model is proposed here that predicts age group
and gender of unfiltered facial images. The age and gender classifications task as a
classification problem is formulated in which the CNN model learns to predict the
age and gender from a face.
1
1.2 Objectives
Age and gender detection and classification has its scope in numerous field. It can be
used for forensic testing, security and video surveillance, human-computer
interaction, cosmetology, electronic vending machines, marketing purposes and so
on.
• Easy detection of age and gender in forensic or biometrics helps deduce the
conclusion faster
• Age and gender determination can reduce the effort to search the culprit,
hence helpful for video surveillance and security at the same time
• Useful for marketing proposes i.e. showing ads on different platforms as per
the age and gender, surely would be fruitful
• Can be used to automate the access to adult content sites or any other
platforms having age-limit criteria
• Can be used to restrict access of alcohol from vending machines
2
1.4 Problem Statement
CCTV footages can show the criminal activities but can’t deduce the culprit easily.
Analysis and prediction of the customer’s need varies as per the age, so marketing
strategy for different age groups on different platform is a hurdle. Quick biometrics
tests could simplify the efforts needed to save one’s life. Use of Alcohol among lots
of teens has been a major issue due to easy access to vending machines that provides
the alcoholic beverages without being able to consider the possibility of underage
kids taking it. Several such issues exist at present, and all these and many others may
be avoided for good. This project emphasis on eliminating all these issues. It can help
enhance the marketing policy, reduce the time needed to find culprit, diagnosis the
health issue without much delay and so on. Simply detecting the age and gender can
assist in numerous problems and not to mention, numerous fields.
3
CHAPTER-2: LITERATURE REVIEW
Facial analysis has gained much recognition in the computer vision community in the
recent past due to its enormous application and possibilities. Human’s face contains
features that determine age, gender, emotions, ethnicity and identity of people.
Among these features, age and gender classification can be especially helpful in
several real-world applications including security and video surveillance, electronic
customer relationship management, biometrics, electronic vending machines, human-
computer interaction, entertainment, cosmetology, and forensic art. However, several
issues in age and gender classification are still open enigma. Age and gender
predictions of unfiltered real-life faces are yet to meet the requirements of
commercial and real-world applications in spite of the scrutiny computer vision
community keeps making with the continuous amelioration of the new techniques
that improves the state of the art [1, 2, 3].
Over the past years, a lot of methods have been proposed to solve the classification
issues. Many of those methods are handcrafted which perform unsatisfactorily on the
age and gender predictions of unconstrained in-the-wild images [2, 4]. These
conventional hand-engineered methods relied on the differences in dimensions of
facial features and face descriptors [5, 6, 7] which do not have the ability to handle
the varying degrees of variation observed in these challenging unconstrained imaging
conditions. The images in these categories have some variations in appearance, noise,
pose, and lighting which may affect the ability of those manually designed computer
vision methods to accurately classify the age and gender of the images. Recently,
deep learning-based methods [8, 9] have shown encouraging performance in this field
especially on the age and gender classification of unfiltered face images. In light of
the current works in age and gender classification and encouraging signs of progress
in deep learning and CNN, a deep learning-based classification model that predicts
age group and gender of unfiltered facial images has been proposed in this report. The
4
age and gender classifications task has been formulated as a classification problem in
which the CNN model learns to predict the age and gender from a face image.
Almost all of the early methods in age and gender classifications were handcrafted,
focusing on manually engineering the facial features from the face and mainly
providing a study on constrained images that were taken from controlled imaging
conditions. To mention a few, in 1999, Kwon and Lobo [10] developed the very first
method for age estimation focusing on geometric features of the face that determined
the ratios among different dimensions of facial features. These geometric features
separated babies from adult successfully but were incapable of distinguishing
between young adult and senior adult. Hence, in 2004, Lanitis et al. [11] proposed an
Active Appearance Model (AAM) based method that included both the geometric and
texture features, for the estimation task. This method was not suitable for the
unconstrained imaging conditions attributed to real-world face images which have
different degrees of variations in illumination, expression, poses, and so forth. From
2007, most of the approaches employed manually designed features for the estimation
task: Gabor [5], Spatially Flexible Patches (SFP) [6], Local Binary Patterns (LBP)
[12], and Biologically Inspired Features (BIF) [13]. Classification methods in [3, 14]
used Support Vector Machine (SVM) based methods for age and gender
classification. Linear regression [7, 15], Support Vector Regression (SVR) [16],
Canonical Correlation Analysis (CCA) [17], and Partial Least Squares (PLS) [18] are
the common regression methods for age and gender predictions. Dileep and Danti
[19] also proposed an approach that used feed-forward propagation neural networks
and 3-sigma control limits approach that classified people’s age into children, middle-
aged adults, and old-aged adults. However, all of these methods were only suitable
and effective on constrained imaging conditions; they couldn’t handle the
unconstrained nature of the real-world images and therefore, couldn’t be relied on to
achieve respectable performance on the in-the-wild images which are common in
practical applications [3].
5
More recently, an expanding number of researchers started to use CNN for age and
gender classification. It could classify the age and gender of unfiltered face images
relying on its good feature extraction technique [8, 9, 20]. Availability of sufficiently
large data for training and high-end computer machines also helped in the adoption of
the deep CNN methods for the classification task. CNN model can learn, compact and
discriminative facial features, especially when the volume of training images is
sufficiently large, to obtain the relevant information needed for the two
classifications. For example, in 2015, Levi et al. [4] proposed a CNN based model,
comprising of five layers, three convolutional and two fully connected layers, to
predict the age of real-world face images. The model included center-crop and
oversampling method, to handle the small misalignment in unconstrained images. Yi
et al. [21], in their paper, applied an end-to-end multitask CNN system that learns a
deeper structure and the parameters needed, to solve the age, gender, and ethnicity
classification task. In [22], the authors investigated a pre-trained deep VGG-Face
CNN approach, for automatic age estimation from real-world face images. The CNN
based model consists of eleven layers, including eight convolutional and three fully
connected layers. The authors in [1] also proposed a novel CNN based method, for
age group and gender estimation: Residual Networks of Residual Networks (RoR).
The model includes an RoR architecture, which was pretrained on gender and
weighted loss layer and then on ImageNet dataset, and finally it was fine-tuned on
IMDb-WIKI-101 dataset. Ranjan et al. in [23] presented a model that simultaneously
solved a set of face analysis tasks, using a single CNN. The end-to-end solution is a
novel multitask learning CNN framework, which shared the parameters from lower
layers of CNN among all the tasks for gender recognition, age estimation, etc. In [2],
the authors proposed a CNN solution for age estimation, from a single face image.
The CNN based solution included a robust face alignment phase that prepared and
preprocessed the face images before being fed to the designed model. The authors
also collected large-scale face images, with age and gender label: IMDb-WIKI
dataset. In 2018, Liu et al. [24] developed a CNN based model that employed a
6
multiclass focal loss function. The age estimation model was validated on Adience
benchmark for performance accuracy, and it achieved a comparable result with state-
of-the-art methods. Also in [25], Duan et al. introduced a hybrid CNN structure for
age and gender classification. The model included a CNN and Extreme Learning
Machine (ELM). The CNN extracts\ed the features from the input images while ELM
classified the intermediate results. In [26], the authors proposed a robust estimations
solution (CNN2ELM) that also included a CNN and ELM. The model, an
improvement of the work in [25], is three CNN based solutions for age, gender, and
race classification from face images. The authors in [27] proposed a novel method
based on “attention long short-term memory (LSTM) network” for age estimation in-
the-wild. The method was evaluated on Adience, MORPH-II, FG-NET, LAP15, and
LAP16 datasets for performance evaluation.
7
CHAPTER-3: METHODOLOGY
In order to classify the unconstrained faces, image preprocessing stage is required that
preprocess and prepare the face images before they are input into the proposed
network. Therefore, to accomplish the whole process the solution is divided into three
major steps: image preprocessing, features learning, and classification.
Image preprocessing included resizing of image and grey scale conversion. Feature
learning included the use of convolution layers which applied a set of learnable filters
to the input image to extract relevant features. Classification included probability
distribution to predict the relevant class.
The block diagram representing the methodology for our project is shown below:
8
The camera is used as the input source through which a real-time video is taken for
the system. The video is further processed by the system to detect the face, determine
the age and gender and classify them.
When a frame/video is input, the Haar-Cascade algorithm first detects for faces in
each frame. Once it find faces in the frame, the face is fed to CNN architecture used
to determine gender which consists of two labels; essentially Male and Female and
gender is detected. Again, for age detection, the face detected using Haar-cascade is
fed to CNN architecture used to determine age and here age is determined using
regression. The determined age may fall between 0-116 years. Finally, the result is
displayed on the frame containing the gender and age using OpenCV. The resulting
frame consists of the square box around the face/s with the estimated gender and the
age.
3.2 Algorithm
Step 1: Collect positive (image that contain face/s) and negative (image that don’t
contain face/s) samples
9
Step 2: Extract Haar-like features (rectangular patterns that can detect edges, lines
and corners in an image) from the samples
Step 5: Apply the cascade to each region of the image to detect faces
Step 6: Perform post-processing to remove false positives and refine the locations of
the detected faces
Once the face is detected using above algorithm, next step is to identify the gender
from that face. For that, the algorithm used is listed below:
Step 1: Detect faces in the input image using the Haar-cascade algorithm
Step 2: Preprocess the detected faces by resizing and gray scaling them to a fixed size
Step 4: The model extracts features from the input image and make a prediction on
the gender
Step 5: The output of the model will be a probability distribution over the possible
classes (male or female)
Step 6: The class with the highest probability will be chosen as the predicted gender
10
3.2.3 Age Detection
Once the gender is detected, age classification is done and is classified in the range 0-
116 years. The algorithm is:
Step 1: Detect faces in the input image using the Haar-cascade algorithm
Step 2: Preprocess the detected faces by resizing and gray scaling them to a fixed size
Step 3: Feed the preprocessed faces into a trained model for regression
Step 4: The model extracts features from the input image and output a continuous
value representing the estimated age from the detected face
3.2.4 Testing
Step 1: Detect faces in the input image using the Haar-cascade algorithm
Step 2: Preprocess the detected faces by resizing and gray scaling them to a fixed size
11
3.3 System Flow Chart
12
3.4 Model Implementation
3.4.1 Dataset
UTKFace dataset is a large-scale face dataset with long age span (range from 0 to 116
years old). The dataset consists of over 20,000 face images with annotations of age,
gender, and ethnicity. The images cover large variation in pose, facial expression,
illumination, occlusion, resolution, etc. Hence, this dataset has been utilized to
accomplish the project.
The labels of each face image is embedded in the file name, formatted like:
[age]_[gender]_[race]_[date&time].jpg
Among 23,078 images on UTKFace dataset, 35% have been used for testing and the
remaining 65% for training the model.
13
3.4.2 CNN
CNN Architecture
• A convolution tool that separates and identifies the various features of the
image for analysis in a process called as Feature Extraction.
• A fully connected layer that utilizes the output from the convolution process
and predicts the class of the image based on the features extracted in previous
stages.
A fully connected layer is comprised of flatten and dense layers. Flatten layers takes
the 3D output tensor from the previous layer and converts it into a 1D array, which is
14
then fed into the dense layer. The dense layer then map the flattened feature vector to
the target output class using a set of learnable weights and biases.
Images are initially rescaled to 200*200 pixels and then sent to the convolution
layers. Following that, the five convolutional layers are defined as follows:
15
• The fourth convolution layer, which has 256 filter of size 256*20*20 pixels
and max pooling layer is 256*9*9, is followed by a ReLU and a dropout layer
• A fifth convolution layer , which has 512 filter of size 512*7*7 pixels and
max pooling layer 512*3*3 pixels, followed by a ReLU ,a flatten and a
dropout layer
• Finally, a dense layer with 512 neurons is applied followed by cross entropy
for prediction of gender
Images are initially rescaled to 200*200 pixels and then sent to the convolution
layers. Following that, the four convolutional layers are defined as follows:
16
• The third convolutional layer applies a set of 128 filters of size 128*45*45
pixels and max layer applied 128*22*22, followed by ReLU and a max
pooling layer
• A four convolution layer , which has 256 filter of size 256*20*20 pixels and
max pooling layer 256*9*9 pixels, followed by a ReLU, a flatten and a
dropout layer
• Finally, a dense layer with 512 neurons is applied followed by a linear
function for age classification
17
3.4.3 Summary of Model Layer
18
CNN architecture for Age Detection is comprised of 4 convolutional layers with a
fully connected layers, summarized below:
19
3.4.4 Rectified Linear Unit (ReLU)
The Rectified Linear activation function is a piecewise linear function that will output
the input directly if it is positive, otherwise, it will output zero. It has become the
default activation function for many types of neural networks because a model that
uses it is easier to train and often achieves better performance.
According to equation 1, the output of ReLU is the maximum value between zero and
the input value. An output is equal to zero when the input value is negative and the
input value when the input is positive. Thus, we can rewrite equation 1 as follows:
f (x)= { 0, if x < 0
{x, if x ≥ 0 (3.2)
20
faster pace. Adaptive Gradient Algorithm (AdaGrad) that maintains a per-parameter
learning rate that improves performance on problems with sparse gradients (e.g.
natural language and computer vision problems).
The Mean Squared Error measures how close a regression line is to a set of data
points. It is a risk function corresponding to the expected value of the squared error
loss. Mean square error is calculated by taking the average, specifically the mean, of
errors squared from data as it relates to a function. It does this by taking the distances
from the points to the regression line (these distances are the “errors”) and squaring
them. The squaring is necessary to remove any negative signs. It also gives more
weight to larger differences. Mathematically,
1
MSE = N ∑N ̂ i )2
i=1(yi − y (3.3)
To calculate the mean squared error from a set of X and Y values, first find the
regression line and insert the X values into the linear regression equation to find the
new Y values. Subtract the new Y value from the original to get the error and then
square the errors. Add up the errors and find the mean.
The sigmoid function is a mathematical function that maps any input to a value
between 0 and 1. It is often used in machine learning for binary classification tasks
such as gender detection.
Here, the sigmoid function is used to predict the probability that a given image belong
to a particular gender (e.g., male or female). The sigmoid function takes in the output
21
of the model's final layer, which is typically a weighted sum of the input features, and
produces a value between 0 and 1.
If the sigmoid output is closer to 0, the model predicts that the input belongs to the
negative class (e.g., male). If the sigmoid output is closer to 1, the model predicts that
the input belongs to the positive class (e.g., female).
age = m * x + b (3.4)
Where, age is the predicted age, x is the input feature, m is the slope of the line, and b
is the y-intercept. The slope and y-intercept are learned during the training process
using a dataset of input features and corresponding age labels. Once the slope and y-
intercept have been learned, the model uses them to predict the age of new inputs
based on their input features.
22
CHAPTER-4: MODEL TRAINING AND TESTING
4.1 Training Gender Model
50 epochs were performed for model fitting. Results from early 6 epochs and latter 6
epochs are shown below:
• On first epoch, the loss was 1.3350 with accuracy 68% and 0.5017 validation
loss with 75% validation accuracy.
• Coming to the 50th epoch, the loss was 0.1222 with 94% accuracy and 0.3851
validation loss with 87% validation accuracy.
23
Below is the lineplots showing accuracy and loss:
24
Confusion Matrix
Below is the confusion matrix of the testing data yielded by the model for gender
detection:
The above result depicts the accuracy of the Gender Detection Model is 87%
25
4.2 Training Age Model
50 epochs were performed for model fitting. Results from early 6 epochs and latter 6
epochs are shown below:
• On first epoch, the loss was 100832.67 with 37.29 MAE and 258.14 validation
loss with 12.05 validation MAE.
• Coming to the 50th epoch, the loss was 35.487 with 4.48 MAE and 94.218
validation loss with 7.0851 validation MAE.
26
Below is the lineplot showing Loss:
27
CHAPTER-5: RESULTS AND DISCUSSION
The proposed system exhibits excellent performance by achieving a good
classification of age and gender with reduced computation time and higher accuracy.
The system receives the input picture in real-time via the camera. The source image is
preprocessed to enhance the matching process’s efficiency. Images are initially scaled
at 200*200. The entry to the convolution network is 200*200 significantly. The
convolutional layer applies a set of filters to the input image to extract important
features and create a set of output feature maps. These feature maps contain
information about the presence and location of specific features in the image. After
each convolution is a MaxPooling layer which takes these output feature maps and
reduces their spatial dimensionality by selecting the maximum value in each pooling
window. This operation effectively down samples the feature maps, reducing their
size while preserving the most important features. The activation function decides the
value of pixels that help to build the model for the prediction of age and gender.
After building and training of the CNN models for age and gender prediction with
UTK dataset, Haar-cascade classifier is used for the detection of faces and converted
into a gray scale image for real time video by creating rectangle on the face. The gray
scale image is reshaped into three channel for the input of the model. Gender and age
model takes input from real time video and predict the age and gender of the face.
Running the system for gender and age detection and classification respectively, we
were able to observe the following results:
28
Figure 5.1: Results yielded by the system
For the age detection model, the first epoch had a high loss of 100832.67 and an
MAE of 37.29, indicating poor performance. The validation loss was 258.14 with a
validation MAE of 12.05, suggesting that the model was overfitting to the training set
and performing poorly on new data. However, by the 50th epoch, the model had
significantly improved, with a loss of 35.487 and an MAE of 4.48. The validation loss
was 94.218 with a validation MAE of 7.0851, indicating that the model was able to
generalize well to new data and achieve a reasonable level of accuracy for age
detection.
These results indicate that the model was able to accurately estimate the age of the
subjects in the dataset, with a mean absolute error (MAE) of approximately 10 years
on the test set.
On the other hand, the gender detection model had a better performance from the first
epoch, with a loss of 1.3350 and an accuracy of 68%, and a validation loss of 0.5017
and a validation accuracy of 75%. By the 50th epoch, the model improved
significantly, with a loss of 0.1222 and an accuracy of 94%, and a validation loss of
0.3851 and a validation accuracy of 87%.
This indicates that the model was able to accurately detect gender from the data and
perform well on new, unseen data, achieving an accuracy of 87% on the test set.
29
Figure 5.2: Illustration of system’s capacity to detect multiple faces
The designed system is capable of detecting multiple faces from a frame. The above
figure shows inclusion of about 75% faces from a single frame.
Overall, the results demonstrate that CNN models can be effective for gender and age
detection tasks, with the ability to achieve high accuracy and generalize well to new
data. However, further improvements could be made by using larger and more diverse
datasets, fine-tuning hyperparameters, and incorporating additional techniques such
as attention mechanisms or ensembling.
30
CHAPTER-6: LIMITATIONS
Below are the limitations of the designed program:
• Input image resolution isn’t good enough, which impacted the better result
prediction.
• The brightness of the surrounding alters the results; input taken in dark
surrounding has less accuracy compared to the input taken in bright
surrounding.
31
CHAPTER-7: CONCLUSION
We tackled the classification of age group and gender of unfiltered real-world face
images. We used the UTKFace dataset and developed the model. Haar-Cascade
algorithm was used to detect face/s, binary crossentropy for classification of gender
and finally linear regression function for age detection. Training and testing accuracy
was visualized using lineplots and confusion matrix (gender). The image
preprocessing algorithm, handled some of the variability observed in typical
unfiltered real-world faces, and this confirmed the model applicability for age group
and gender classification in-the-wild.
32
CHAPTER-8: FUTURE ENHANCEMENTS
Utilizing following methods could significantly improve the result of the system:
• Using a high speed processor and a better resolution camera with high focus
efficiency/capability could improve the accuracy and also quickly
identify/detect the face/s on the frame.
• Enhancing the dataset to train and also test the system with variety of images
would improve the decision making ability of the system which could result in
better performance.
• Training the model focusing on pre-processing, normalization, augmentation
and multi-scale prediction could significantly reduce the effects of brightness
of surrounding.
33
REFERENCES
1. K. Zhang, C. Gao, L. Guo et al., “Age group and gender estimation in the wild
with deep RoR architecture,” IEEE Access, vol. 5, pp. 22492–22503, 2017.
2. R. Rothe, R. Timofte, and L. Van Gool, “Deep expectation of real and apparent
age from a single image without facial landmarks,” International Journal of
Computer Vision, vol. 126, no. 2–4, pp. 144–157, 2018.
3. E. Eidinger, R. Enbar, and T. Hassner, “Age and gender estimation of unfiltered
faces,” IEEE Transactions on Information Forensics and Security, vol. 9, no. 12,
pp. 2170–2179, 2014.
4. G. Levi and T. Hassncer, “Age and gender classification using convolutional
neural networks,” in Proceedings of the 2015 IEEE Conference on Computer
Vision and Pattern Recognition Workshops (CVPRW), pp. 34–42, Boston, MA,
USA, June 2015.
5. F. Gao and H. Ai, “Face age classification on consumer images with gabor feature
and fuzzy LDA method,” in Lecture Notes in Computer Science (including
subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), vol. 5558, pp. 132–141, LNCS, Springer Science+Business
Media, Berlin, Germany, 2009.
6. S. Yan, M. Liu, and T. S. Huang, “Extracting age information from local spatially
flexible patches,” in Proceedings of the IEEE International Conference on
Acoustics, Speech and Signal Processing, pp. 737–740, Las Vegas, NV, USA,
March 2008.
7. Y. Fu and T. S. Huang, “Human age estimation with regression on discriminative
aging manifold,” IEEE Transactions on Multimedia, vol. 10, no. 4, pp. 578–584,
2008.
8. C. Szegedy, W. Liu, Y. Jia et al., “Going deeper with convolutions,”
in Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pp. 1–9, Boston, MA, USA, June 2015.
34
9. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pp. 770–778, Las Vegas, NV, USA, June 2016.
10. Y. H. Kwon and N. Da Vitoria Lobo, “Age classification from facial
images,” Computer Vision and Image Understanding, vol. 74, no. 1, pp. 1–21,
1999.
11. A. Lanitis, C. Draganova, and C. Christodoulou, “Comparing different classifiers
for automatic age estimation,” IEEE Transactions on Systems, Man and
Cybernetics, Part B (Cybernetics), vol. 34, no. 1, pp. 621–628, 2004.
12. A. Günay and V. V. NabIyev, “Automatic age classification with LBP,”
in Proceedings of the 2008 23rd International Symposium on Computer and
Information Sciences, pp. 6–9, Istanbul, Turkey, 2008.
13. G. Guo, G. Mu, Y. Fu, and T. S. Huang, “Human age estimation using bio-
inspired features,” in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pp. 1589–1592, Miami, FL, USA, 2009.
14. M. A. Beheshti-nia and Z. Mousavi, “A new classification method based on
pairwise support vector machine (SVM) for facial age estimation,” Journal of
Industrial and Systems Engineering, vol. 10, no. 1, pp. 91–107, 2017.
15. A. Demontis, B. Biggio, G. Fumera, and F. Roli, “Super-sparse regression for fast
age estimation from faces at test time,” Image Analysis and Processing—ICIAP,
Springer, Berlin, Germany, 2015.
16. G. Guo, Y. Fu, C. R. Dyer, and T. S. Huang, “Image-based human age estimation
by manifold learning and locally adjusted robust regression,” IEEE Transactions
on Image Processing, vol. 17, no. 7, pp. 1178–1188, 2008.
17. Y. Fu and G. Guo, “Age synthesis and estimation via faces,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 32, no. 11, pp. 1955–1976,
2010.
18. G. Guo and G. Mu, “Simultaneous dimensionality reduction and human age
estimation via kernel partial least squares regression,” in Proceedings of the 24th
35
IEEE Conference on Computer Vision and Pattern Recognition, pp. 657–664,
Colorado Springs, CO, USA, June 2011.
19. M. R. Dileep and A. Danti, “Human age and gender prediction based on neural
networks and three human age and gender prediction based on neural networks
and three sigma control limits,” Applied Artificial Intelligence, vol. 32, no. 3, pp.
281–292, 2018.
20. . M. Lin, Q. Chen, and S. Yan, “Network in network,” 2013.
21. D. Yi, Z. Lei, and S. Z. Li, “Age estimation by multi-scale convolutional
network,” Computer Vision–ACCV 2014, Springer, Berlin, Germany, 2015.
22. Z. Qawaqneh, A. A. Mallouh, and B. D. Barkana, “Deep convolutional neural
network for age estimation based on VGG-face model,,” 2017.
23. R. Ranjan, S. Sankaranarayanan, C. D. Castillo, and R. Chellappa, “An all-in-one
convolutional neural network for face analysis,” in Proceedings of the 12th IEEE
International Conference on Automatic Face & Gesture Recognition (FG 2017),
pp. 17–24, Biometrics Wild, Bwild, Washington, DC, USA, June 2017.
24. W. Liu, L. Chen, and Y. Chen, “Age classification using convolutional neural
networks with the multi-class focal loss,” IOP Conference Series: Materials
Science and Engineering, vol. 428, no. 1, 2018.
25. M. Duan, K. Li, C. Yang, and K. Li, “A hybrid deep learning CNN–ELM for age
and gender classification,” Neurocomputing, vol. 275, pp. 448–461, 2018.
26. . M. Duan, K. Li, and K. Li, “An ensemble CNN2ELM for age estimation,” IEEE
Transactions on Information Forensics and Security, vol. 13, no. 3, pp. 758–772,
2018.
27. . K. Zhang, N. Liu, X. Yuan, S. Member, X. Guo, C. Cao et al., “Fine-grained age
estimation in the wild with attention LSTM networks,” IEEE Transactions on
Circuits and Systems for Video Technolog, p. 1, 2019.
36
APPENDIX
Below are the few observed results:
37