0% found this document useful (0 votes)
42 views

Final Report 1

This final year project report describes the development of a real-time age and gender detection system using convolutional neural networks. The system uses OpenCV for face detection and two CNN models for gender and age classification. The gender detection model achieved 87% validation accuracy while the age detection model had a mean absolute error of 7.0851 years. The models were trained on the UTKFace dataset and tested on real-world images with good prediction performance and computation time, demonstrating the effectiveness of the deep learning approach for age and gender estimation.

Uploaded by

190148sandip
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Final Report 1

This final year project report describes the development of a real-time age and gender detection system using convolutional neural networks. The system uses OpenCV for face detection and two CNN models for gender and age classification. The gender detection model achieved 87% validation accuracy while the age detection model had a mean absolute error of 7.0851 years. The models were trained on the UTKFace dataset and tested on real-world images with good prediction performance and computation time, demonstrating the effectiveness of the deep learning approach for age and gender estimation.

Uploaded by

190148sandip
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

TRIBHUVAN UNIVERSITY

INSTITUTE OF ENGINEERING

HIMALAYA COLLEGE OF ENGINEERING

[CODE: EX755]

FINAL YEAR PROJECT REPORT

ON

REAL-TIME AGE AND GENDER DETECTION

BY:

ABHISHEK PRADHAN (41151)

DINESH OSTI (41156)

SANDEEP SHRESTHA (41170)

SANGEET KHANAL(41172)

A PROJECT SUBMITTED TO DEPARTMENT OF ELECTRONICS AND COMPUTER


ENGINEERING IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR BACHELOR’S
DEGREE IN ELECTRONICS, COMMUNICATION AND INFORMATION ENGINEERING

DEPARTMENT OF ELECTRONICS AND COMPUTER ENGINEERING

LALITPUR, NEPAL

APRIL 2023
REAL-TIME AGE AND GENDER DETECTION

BY:
ABHISHEK PRADHAN (41151)

DINESH OSTI (41156)

SANDEEP SHRESTHA (41170)

SANGEET KHANAL(41172)

SUPERVISOR:
ER. DEVENDRA KATHAYAT

A REPORT SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR


THE DEGREE OF BACHELOR IN ELECTRONICS, COMMUNICATION AND
INFORMATION ENGINEERING

DEPARTMENT OF ELECTRONICS AND COMPUTER ENGINEERING

HIMALAYA COLLEGE OF ENGINEERING

TRIBHUVAN UNIVERSITY

CHYASAL, LALITPUR

APRIL, 2023
ACKNOWLEDGEMENT
We would like to thank the Institute of Engineering for including the fourth year
major project in the curriculum. Moreover, we would like to express gratitude to our
Departmant of Electronics, Information and Communcation, Himalaya College of
Engineering, for giving us the golden opportunity to do the project. We are thankful
to the management system for providing us the opportunity to explore our interest and
ideas in the field of engineering.

We would like to acknowledge the respected DHOD of electronics,communication


and information and our project supervisor; Er.Devendra Kathayat. Special thanks to
the project co-ordinators Er.Narayan Adhikari Chhetri and Er.Ramesh Tamang for
their constant support and encouragement. Further , we would like to acknowledge
the teachers who assisted during the research. Its all thanks to them, we were able to
make up the mind on doing the project. Thanks for being the inspiration to us.

i
ABSTRACT
Automatic age and gender classification has become relevant to an increasing amount
of applications, particularly since the rise of social platforms and social media.
However, performance of existing methods on real-world images is still significantly
lacking, especially when compared to the tremendous leaps in performance recently
reported for the related task of face recognition. Thus, this report is prepared to show
that by learning representations through the use of deep-convolutional neural
networks (DCNN), a significant increase in performance can be obtained on these
tasks. The two-level CNN architecture includes feature extraction and classification
itself. The feature extraction extracts feature corresponding to age and gender, while
the classification classifies the face images to the correct age group and gender. This
is done by using deep learning, OpenCV which is capable of processing the real-time
frames which is given as input and the determined age and gender as output based on
the evaluation of method on the recent UTKFace (dataset) for gender and age
estimation. The evaluation method includes classification rate, precision, and recall
using UTKface dataset and real-world images to exhibit excellent performance by
achieving good prediction results and computation time with validation accuracy 87%
on gender detection and 7.0851 mean absolute error for age detection.

Index Terms - Convolutional neural network (CNN), deep learning, face


recognition, gender and age estimation, UTKFace

ii
Contents
ACKNOWLEDGEMENT ............................................................................................. i

ABSTRACT.................................................................................................................. ii

LIST OF FIGURES ...................................................................................................... v

LIST OF ABBREVIATIONS ...................................................................................... vi

CHAPTER-1: INTRODUCTION ................................................................................. 1

1.1 Background ......................................................................................................... 1

1.2 Objectives ........................................................................................................... 2

1.3 Scope and Applications....................................................................................... 2

1.4 Problem Statement .............................................................................................. 3

CHAPTER-2: LITERATURE REVIEW ...................................................................... 4

CHAPTER-3: METHODOLOGY ................................................................................ 8

3.1 Block Diagram .................................................................................................... 8

3.2 Algorithm ............................................................................................................ 9

3.2.1 Face Detection: Haar-Cascade Classifier..................................................... 9

3.2.2 Gender Detection ....................................................................................... 10

3.2.3 Age Detection ............................................................................................ 11

3.2.4 Testing........................................................................................................ 11

3.3 System Flow Chart ............................................................................................ 12

3.4 Model Implementation ...................................................................................... 13

3.4.1 Dataset........................................................................................................ 13

3.4.2 CNN ........................................................................................................... 14

3.4.3 Summary of Model Layer .......................................................................... 18

iii
3.4.4 Rectified Linear Unit (ReLU) .................................................................... 20

3.4.5 Adam Optimiser ......................................................................................... 20

3.4.6 Mean Squared Error (MSE) ....................................................................... 21

3.4.7 Sigmoid Function ....................................................................................... 21

3.4.8 Linear Function .......................................................................................... 22

CHAPTER-4: MODEL TRAINING AND TESTING ............................................... 23

4.1 Training Gender Model..................................................................................... 23

4.2 Training Age Model .......................................................................................... 26

CHAPTER-5: RESULTS AND DISCUSSION ......................................................... 28

CHAPTER-6: LIMITATIONS ................................................................................... 31

CHAPTER-7: CONCLUSION ................................................................................... 32

CHAPTER-8: FUTURE ENHANCEMENTS ............................................................ 33

REFERENCES ........................................................................................................... 34

APPENDIX ................................................................................................................. 37

iv
LIST OF FIGURES
Figure 3.1: Block Diagram ........................................................................................... 8

Figure 3.2: System Flow Chart ................................................................................... 12

Figure 3.3: Sample images from UTKFace dataset .................................................... 13

Figure 3.4: CNN Architecture for Gender Detection.................................................. 15

Figure 3.5: CNN Architecture for Age Detection ....................................................... 16

Figure 3.6: Model Layer for Gender Detection .......................................................... 18

Figure 3.7: Model Layer for Age Detection ............................................................... 19

Figure 4.1: Model Training process for Gender Detection ......................................... 23

Figure 4.2: Training and Validation Accuracy for Gender ......................................... 24

Figure 4.3: Training and Validation Loss for Gender................................................. 24

Figure 4.4: Confusion Matrix ..................................................................................... 25

Figure 4.5: Model Training process for Age Detection .............................................. 26

Figure 4.6: Mean Absolute Error for Age ................................................................... 27

Figure 4.7: Training loss for age ................................................................................. 27

Figure 5.1: Results yielded by the system .................................................................. 29

Figure 5.2: Illustration of system’s capacity to detect multiple faces ......................... 30

v
LIST OF ABBREVIATIONS
AAM : Active Appearance Model

ANN : Artificial Neural Network

BIF : Biologically Inspired Features

CNN : Convolutional Neural Network

CCA : Canonical Correlation Analysis

CCTV : Close Circuit Television

DCNN : Deep-Convolutional Neural Network

ELM : Extreme Learning Machine

GPIO : General Purpose Input Output

IMDb : Internet Movie Database

LBP : Local Binary Patterns

LDA : Logical Device Address

LSTM : Long Short-Term Memory

MORPH : Molecular and Organismic Research in Plant History

OpenCV : Open Source Computer Vision Library

PLS : Partial Least Squares

RoR : Residual network of Residual Network

SFP : Spatially Flexible Patches

SVM : Support Vector Machine

SVR : Support Vector Regression

VGG : Visual Geometry Group

vi
CHAPTER-1: INTRODUCTION

1.1 Background

Facial analysis has gained much recognition in the computer vision community in the
recent past. Human’s face contains features that determine identity, age, gender,
emotions, and the ethnicity of people. Among these features, age and gender
classification can be especially helpful in several real-world applications including
security and video surveillance, electronic customer relationship management,
biometrics, electronic vending machines, human-computer interaction, entertainment,
cosmetology, and forensic art.

A lot of research has been done using deep learning methods such as ANN, CNN to
determine age and gender estimation. Fundamental facial consideration features are
eyebrows, mouth, nose and eyes. An architecture based on the convolution Neural
network (CNN) is proposed here for age and gender classification. This is one of the
well-known deep artificial neural networks. Convolutional Neural Network based
design models are broadly utilized in classification task because of their remarkable
execution in facial investigation. The Convolutional Neural Network includes feature
extraction which extracts features corresponding to age and gender. Furthermore
CNN includes feature classification which classifies facial images into the correct age
and determines the gender. In current world, works in age and gender classification is
showing encouraging signs of progress in deep learning and CNN, therefore end-to-
end deep learning-based classification model is proposed here that predicts age group
and gender of unfiltered facial images. The age and gender classifications task as a
classification problem is formulated in which the CNN model learns to predict the
age and gender from a face.

1
1.2 Objectives

Following are the major objectives:

➢ To detect the face/s from real time video


➢ To determine the age and gender of the detected face/s

1.3 Scope and Applications

Age and gender detection and classification has its scope in numerous field. It can be
used for forensic testing, security and video surveillance, human-computer
interaction, cosmetology, electronic vending machines, marketing purposes and so
on.

Major applications of the project includes:

• Easy detection of age and gender in forensic or biometrics helps deduce the
conclusion faster
• Age and gender determination can reduce the effort to search the culprit,
hence helpful for video surveillance and security at the same time
• Useful for marketing proposes i.e. showing ads on different platforms as per
the age and gender, surely would be fruitful
• Can be used to automate the access to adult content sites or any other
platforms having age-limit criteria
• Can be used to restrict access of alcohol from vending machines

• Useful for editing apps or software related to cosmetology

2
1.4 Problem Statement

CCTV footages can show the criminal activities but can’t deduce the culprit easily.
Analysis and prediction of the customer’s need varies as per the age, so marketing
strategy for different age groups on different platform is a hurdle. Quick biometrics
tests could simplify the efforts needed to save one’s life. Use of Alcohol among lots
of teens has been a major issue due to easy access to vending machines that provides
the alcoholic beverages without being able to consider the possibility of underage
kids taking it. Several such issues exist at present, and all these and many others may
be avoided for good. This project emphasis on eliminating all these issues. It can help
enhance the marketing policy, reduce the time needed to find culprit, diagnosis the
health issue without much delay and so on. Simply detecting the age and gender can
assist in numerous problems and not to mention, numerous fields.

3
CHAPTER-2: LITERATURE REVIEW

Facial analysis has gained much recognition in the computer vision community in the
recent past due to its enormous application and possibilities. Human’s face contains
features that determine age, gender, emotions, ethnicity and identity of people.
Among these features, age and gender classification can be especially helpful in
several real-world applications including security and video surveillance, electronic
customer relationship management, biometrics, electronic vending machines, human-
computer interaction, entertainment, cosmetology, and forensic art. However, several
issues in age and gender classification are still open enigma. Age and gender
predictions of unfiltered real-life faces are yet to meet the requirements of
commercial and real-world applications in spite of the scrutiny computer vision
community keeps making with the continuous amelioration of the new techniques
that improves the state of the art [1, 2, 3].

Over the past years, a lot of methods have been proposed to solve the classification
issues. Many of those methods are handcrafted which perform unsatisfactorily on the
age and gender predictions of unconstrained in-the-wild images [2, 4]. These
conventional hand-engineered methods relied on the differences in dimensions of
facial features and face descriptors [5, 6, 7] which do not have the ability to handle
the varying degrees of variation observed in these challenging unconstrained imaging
conditions. The images in these categories have some variations in appearance, noise,
pose, and lighting which may affect the ability of those manually designed computer
vision methods to accurately classify the age and gender of the images. Recently,
deep learning-based methods [8, 9] have shown encouraging performance in this field
especially on the age and gender classification of unfiltered face images. In light of
the current works in age and gender classification and encouraging signs of progress
in deep learning and CNN, a deep learning-based classification model that predicts
age group and gender of unfiltered facial images has been proposed in this report. The

4
age and gender classifications task has been formulated as a classification problem in
which the CNN model learns to predict the age and gender from a face image.

Almost all of the early methods in age and gender classifications were handcrafted,
focusing on manually engineering the facial features from the face and mainly
providing a study on constrained images that were taken from controlled imaging
conditions. To mention a few, in 1999, Kwon and Lobo [10] developed the very first
method for age estimation focusing on geometric features of the face that determined
the ratios among different dimensions of facial features. These geometric features
separated babies from adult successfully but were incapable of distinguishing
between young adult and senior adult. Hence, in 2004, Lanitis et al. [11] proposed an
Active Appearance Model (AAM) based method that included both the geometric and
texture features, for the estimation task. This method was not suitable for the
unconstrained imaging conditions attributed to real-world face images which have
different degrees of variations in illumination, expression, poses, and so forth. From
2007, most of the approaches employed manually designed features for the estimation
task: Gabor [5], Spatially Flexible Patches (SFP) [6], Local Binary Patterns (LBP)
[12], and Biologically Inspired Features (BIF) [13]. Classification methods in [3, 14]
used Support Vector Machine (SVM) based methods for age and gender
classification. Linear regression [7, 15], Support Vector Regression (SVR) [16],
Canonical Correlation Analysis (CCA) [17], and Partial Least Squares (PLS) [18] are
the common regression methods for age and gender predictions. Dileep and Danti
[19] also proposed an approach that used feed-forward propagation neural networks
and 3-sigma control limits approach that classified people’s age into children, middle-
aged adults, and old-aged adults. However, all of these methods were only suitable
and effective on constrained imaging conditions; they couldn’t handle the
unconstrained nature of the real-world images and therefore, couldn’t be relied on to
achieve respectable performance on the in-the-wild images which are common in
practical applications [3].

5
More recently, an expanding number of researchers started to use CNN for age and
gender classification. It could classify the age and gender of unfiltered face images
relying on its good feature extraction technique [8, 9, 20]. Availability of sufficiently
large data for training and high-end computer machines also helped in the adoption of
the deep CNN methods for the classification task. CNN model can learn, compact and
discriminative facial features, especially when the volume of training images is
sufficiently large, to obtain the relevant information needed for the two
classifications. For example, in 2015, Levi et al. [4] proposed a CNN based model,
comprising of five layers, three convolutional and two fully connected layers, to
predict the age of real-world face images. The model included center-crop and
oversampling method, to handle the small misalignment in unconstrained images. Yi
et al. [21], in their paper, applied an end-to-end multitask CNN system that learns a
deeper structure and the parameters needed, to solve the age, gender, and ethnicity
classification task. In [22], the authors investigated a pre-trained deep VGG-Face
CNN approach, for automatic age estimation from real-world face images. The CNN
based model consists of eleven layers, including eight convolutional and three fully
connected layers. The authors in [1] also proposed a novel CNN based method, for
age group and gender estimation: Residual Networks of Residual Networks (RoR).
The model includes an RoR architecture, which was pretrained on gender and
weighted loss layer and then on ImageNet dataset, and finally it was fine-tuned on
IMDb-WIKI-101 dataset. Ranjan et al. in [23] presented a model that simultaneously
solved a set of face analysis tasks, using a single CNN. The end-to-end solution is a
novel multitask learning CNN framework, which shared the parameters from lower
layers of CNN among all the tasks for gender recognition, age estimation, etc. In [2],
the authors proposed a CNN solution for age estimation, from a single face image.
The CNN based solution included a robust face alignment phase that prepared and
preprocessed the face images before being fed to the designed model. The authors
also collected large-scale face images, with age and gender label: IMDb-WIKI
dataset. In 2018, Liu et al. [24] developed a CNN based model that employed a

6
multiclass focal loss function. The age estimation model was validated on Adience
benchmark for performance accuracy, and it achieved a comparable result with state-
of-the-art methods. Also in [25], Duan et al. introduced a hybrid CNN structure for
age and gender classification. The model included a CNN and Extreme Learning
Machine (ELM). The CNN extracts\ed the features from the input images while ELM
classified the intermediate results. In [26], the authors proposed a robust estimations
solution (CNN2ELM) that also included a CNN and ELM. The model, an
improvement of the work in [25], is three CNN based solutions for age, gender, and
race classification from face images. The authors in [27] proposed a novel method
based on “attention long short-term memory (LSTM) network” for age estimation in-
the-wild. The method was evaluated on Adience, MORPH-II, FG-NET, LAP15, and
LAP16 datasets for performance evaluation.

Unfortunately, some of these methods mentioned above have been verified


effectively on constrained imaging conditions; few studied the unconstrained imaging
conditions. Still, it is a challenging task to classify unconstrained faces with large
variations in illumination, viewpoint, nonfrontal, etc. Here, those issues has been
addressed by designing a robust image preprocessing algorithm, pretraining the
model on large-scale facial aging benchmarks with noisy age and gender labels, and
regularize the CNN parameters with self-designed CNN framework.

7
CHAPTER-3: METHODOLOGY

In order to classify the unconstrained faces, image preprocessing stage is required that
preprocess and prepare the face images before they are input into the proposed
network. Therefore, to accomplish the whole process the solution is divided into three
major steps: image preprocessing, features learning, and classification.

Image preprocessing included resizing of image and grey scale conversion. Feature
learning included the use of convolution layers which applied a set of learnable filters
to the input image to extract relevant features. Classification included probability
distribution to predict the relevant class.

3.1 Block Diagram

A block diagram is a visual representation of a system that uses simple, labeled


blocks that represent single or multiple items, entities or concepts, connected by lines
to show relationships between them.

The block diagram representing the methodology for our project is shown below:

Figure 3.1: Block Diagram

8
The camera is used as the input source through which a real-time video is taken for
the system. The video is further processed by the system to detect the face, determine
the age and gender and classify them.

When a frame/video is input, the Haar-Cascade algorithm first detects for faces in
each frame. Once it find faces in the frame, the face is fed to CNN architecture used
to determine gender which consists of two labels; essentially Male and Female and
gender is detected. Again, for age detection, the face detected using Haar-cascade is
fed to CNN architecture used to determine age and here age is determined using
regression. The determined age may fall between 0-116 years. Finally, the result is
displayed on the frame containing the gender and age using OpenCV. The resulting
frame consists of the square box around the face/s with the estimated gender and the
age.

3.2 Algorithm

Algorithm is a process or set of rules to be followed in calculations or other problem-


solving operations. Four Algorithms (face detection, gender detection, age
classification and testing) followed for project’s accomplishment are explained
below:

3.2.1 Face Detection: Haar-Cascade Classifier

The Haar-cascade algorithm is a machine learning-based approach for object


detection, which was originally proposed by Viola and Jones in 2001 for detecting
faces in images. The algorithm works by using a set of Haar-like features and a
cascade classifier to identify objects of interest. The algorithm is:

Step 1: Collect positive (image that contain face/s) and negative (image that don’t
contain face/s) samples

9
Step 2: Extract Haar-like features (rectangular patterns that can detect edges, lines
and corners in an image) from the samples

Step 3: Train a classifier using the AdaBoost algorithm (algorithm works by


iteratively selecting the most informative features and training weak classifiers on
them, weak classifiers are combined to form strong classifier that can accurately
detect faces)

Step 4: Create a cascade (series of stages) of weak classifiers

Step 5: Apply the cascade to each region of the image to detect faces

Step 6: Perform post-processing to remove false positives and refine the locations of
the detected faces

3.2.2 Gender Detection

Once the face is detected using above algorithm, next step is to identify the gender
from that face. For that, the algorithm used is listed below:

Step 1: Detect faces in the input image using the Haar-cascade algorithm

Step 2: Preprocess the detected faces by resizing and gray scaling them to a fixed size

Step 3: Feed the preprocessed faces into a trained model

Step 4: The model extracts features from the input image and make a prediction on
the gender

Step 5: The output of the model will be a probability distribution over the possible
classes (male or female)

Step 6: The class with the highest probability will be chosen as the predicted gender

10
3.2.3 Age Detection

Once the gender is detected, age classification is done and is classified in the range 0-
116 years. The algorithm is:

Step 1: Detect faces in the input image using the Haar-cascade algorithm

Step 2: Preprocess the detected faces by resizing and gray scaling them to a fixed size

Step 3: Feed the preprocessed faces into a trained model for regression

Step 4: The model extracts features from the input image and output a continuous
value representing the estimated age from the detected face

3.2.4 Testing

Algorithm for testing is numbered below:

Step 1: Detect faces in the input image using the Haar-cascade algorithm

Step 2: Preprocess the detected faces by resizing and gray scaling them to a fixed size

Step 3: Feed the processed faces data into a trained model

Step 4: Load faces image from training directory for prediction

Step 5: Retrieve matched image’s age from database

Step 6: Display Result

11
3.3 System Flow Chart

Figure 3.2: System Flow Chart

12
3.4 Model Implementation
3.4.1 Dataset

UTKFace dataset is a large-scale face dataset with long age span (range from 0 to 116
years old). The dataset consists of over 20,000 face images with annotations of age,
gender, and ethnicity. The images cover large variation in pose, facial expression,
illumination, occlusion, resolution, etc. Hence, this dataset has been utilized to
accomplish the project.

Figure 3.3: Sample images from UTKFace dataset

The labels of each face image is embedded in the file name, formatted like:
[age]_[gender]_[race]_[date&time].jpg

• [age] is an integer from 0 to 116, indicating the age


• [gender] is either 0 (male) or 1 (female)
• [race] is an integer from 0 to 4, denoting White, Black, Asian, Indian, and
Others (like Hispanic, Latino, Middle Eastern)
• [date&time] is in the format of yyyymmddHHMMSSFFF, showing the date
and time an image was collected to UTKFace

Among 23,078 images on UTKFace dataset, 35% have been used for testing and the
remaining 65% for training the model.

13
3.4.2 CNN

A CNN is a type of feedforward network structure that is formed by multiple layers


of convolutional filters alternated with subsampling filters followed by fully
connected layers. They are a class of Deep Neural Networks that can recognize and
classify particular features from images and are widely used for analyzing visual
images. Their applications range from image and video recognition, image
classification, medical image analysis, computer vision and natural language
processing. The term ‘Convolution” in CNN denotes the mathematical function of
convolution which is a special kind of linear operation wherein two functions are
multiplied to produce a third function which expresses how the shape of one function
is modified by the other. In simple terms, two images which can be represented as
matrices are multiplied to give an output that is used to extract features from the
image.

CNN Architecture

There are two main parts to a CNN architecture:

• A convolution tool that separates and identifies the various features of the
image for analysis in a process called as Feature Extraction.
• A fully connected layer that utilizes the output from the convolution process
and predicts the class of the image based on the features extracted in previous
stages.

The network of feature extraction consists of many pairs of convolutional or pooling


layers. CNN model of feature extraction aims to reduce the number of features
present in a dataset. It creates new features which summarizes the existing features
contained in an original set of features.

A fully connected layer is comprised of flatten and dense layers. Flatten layers takes
the 3D output tensor from the previous layer and converts it into a 1D array, which is

14
then fed into the dense layer. The dense layer then map the flattened feature vector to
the target output class using a set of learnable weights and biases.

The CNN Architecture for the project is explained hereby:

Figure 3.4: CNN Architecture for Gender Detection

Images are initially rescaled to 200*200 pixels and then sent to the convolution
layers. Following that, the five convolutional layers are defined as follows:

• The first convolutional layer applies 36 filters of size 36*198*198 pixels to


the input, followed by a rectified linear operator (ReLU), a max pooling layer
that takes the maximum value of 3*3 regions with two-pixel steps, and a local
response normalization layer
• The second convolutional layer, which contains 64 filters of size 64*96*96
pixels, processes the max layer's 64*47*47 output. The same hyper
parameters as before are used for ReLU, a max pooling layer, and a local
response normalization layer
• The third convolutional layer applies a set of 128 filters of size 128*45*45
pixels and max layer applied 128*22*22, followed by ReLU and a max
pooling layer

15
• The fourth convolution layer, which has 256 filter of size 256*20*20 pixels
and max pooling layer is 256*9*9, is followed by a ReLU and a dropout layer
• A fifth convolution layer , which has 512 filter of size 512*7*7 pixels and
max pooling layer 512*3*3 pixels, followed by a ReLU ,a flatten and a
dropout layer
• Finally, a dense layer with 512 neurons is applied followed by cross entropy
for prediction of gender

Figure 3.5: CNN Architecture for Age Detection

Images are initially rescaled to 200*200 pixels and then sent to the convolution
layers. Following that, the four convolutional layers are defined as follows:

• The first convolutional layer applies 36 filters of size 36*198*198 pixels to


the input, followed by a rectified linear operator (ReLU), a max pooling layer
that takes the maximum value of 3*3 regions with two-pixel steps, and a local
response normalization layer

• The second convolutional layer, which contains 64 filters of size 64*96*96


pixels, processes the max layer's 64*47*47 output. The same hyper
parameters as before are used for ReLU, a max pooling layer

16
• The third convolutional layer applies a set of 128 filters of size 128*45*45
pixels and max layer applied 128*22*22, followed by ReLU and a max
pooling layer
• A four convolution layer , which has 256 filter of size 256*20*20 pixels and
max pooling layer 256*9*9 pixels, followed by a ReLU, a flatten and a
dropout layer
• Finally, a dense layer with 512 neurons is applied followed by a linear
function for age classification

17
3.4.3 Summary of Model Layer

CNN architecture for Gender Detection is comprised of 5 convolutional layers with a


fully connected layers, summarized below:

• An input 2D convolutional layer(with 36 filters) paired with a 2D MaxPooling


layer
• 4 pairs of 2D convolutional layers with 64,128,256 & 512 filters respectively
paired again with 2D MaxPooling layers
• 1 Flatten layer and then 1 Dropout Layer
• 1 Dense layer with 512 nodes and finally
• An output Dense layer with 2 nodes which are essentially, labels; male or
female

Figure 3.6: Model Layer for Gender Detection

18
CNN architecture for Age Detection is comprised of 4 convolutional layers with a
fully connected layers, summarized below:

• An input 2D convolutional layer(with 36 filters) paired with a 2D MaxPooling


layer
• 3 pairs of 2D convolutional layers with 64,128 & 512 filters respectively
paired again with 2D MaxPooling layers
• 1 Flatten layer and then 1 Dropout layer
• 1 Dense layer with 512 nodes and finally
• 1 output Dense layer with nodes that specify age

Figure 3.7: Model Layer for Age Detection

19
3.4.4 Rectified Linear Unit (ReLU)

The Rectified Linear activation function is a piecewise linear function that will output
the input directly if it is positive, otherwise, it will output zero. It has become the
default activation function for many types of neural networks because a model that
uses it is easier to train and often achieves better performance.

ReLU is a non-linear activation function that we used in multi-layer neural networks


or deep neural networks. This function can be represented as:

f(x)= max (0 , x) (3.1)

Where x = an input value

According to equation 1, the output of ReLU is the maximum value between zero and
the input value. An output is equal to zero when the input value is negative and the
input value when the input is positive. Thus, we can rewrite equation 1 as follows:

f (x)= { 0, if x < 0

{x, if x ≥ 0 (3.2)

Where x = an input value

3.4.5 Adam Optimiser

Adam is an adaptive learning rate optimization algorithm that's been designed


specifically for training deep neural networks. The name is derived from adaptive
moment estimation. The optimizer is called Adam because it uses estimations of the
first and second moments of the gradient to adapt the learning rate for each weight of
the neural network. This algorithm is used to accelerate the gradient descent
algorithm by taking into consideration the 'exponentially weighted average' of the
gradients. Using averages makes the algorithm converge towards the minima in a

20
faster pace. Adaptive Gradient Algorithm (AdaGrad) that maintains a per-parameter
learning rate that improves performance on problems with sparse gradients (e.g.
natural language and computer vision problems).

3.4.6 Mean Squared Error (MSE)

The Mean Squared Error measures how close a regression line is to a set of data
points. It is a risk function corresponding to the expected value of the squared error
loss. Mean square error is calculated by taking the average, specifically the mean, of
errors squared from data as it relates to a function. It does this by taking the distances
from the points to the regression line (these distances are the “errors”) and squaring
them. The squaring is necessary to remove any negative signs. It also gives more
weight to larger differences. Mathematically,

1
MSE = N ∑N ̂ i )2
i=1(yi − y (3.3)

To calculate the mean squared error from a set of X and Y values, first find the
regression line and insert the X values into the linear regression equation to find the
new Y values. Subtract the new Y value from the original to get the error and then
square the errors. Add up the errors and find the mean.

3.4.7 Sigmoid Function

The sigmoid function is a mathematical function that maps any input to a value
between 0 and 1. It is often used in machine learning for binary classification tasks
such as gender detection.

Here, the sigmoid function is used to predict the probability that a given image belong
to a particular gender (e.g., male or female). The sigmoid function takes in the output

21
of the model's final layer, which is typically a weighted sum of the input features, and
produces a value between 0 and 1.

If the sigmoid output is closer to 0, the model predicts that the input belongs to the
negative class (e.g., male). If the sigmoid output is closer to 1, the model predicts that
the input belongs to the positive class (e.g., female).

The decision boundary is adjusted by changing the threshold value to determine


whether a sigmoid output is classified as positive or negative.

3.4.8 Linear Function

A linear function is a mathematical equation that represents a straight line on a graph.


Here, linear function is used to predict a person's age based on certain input features,
such as facial features extracted from an image.

The linear function takes the form:

age = m * x + b (3.4)

Where, age is the predicted age, x is the input feature, m is the slope of the line, and b
is the y-intercept. The slope and y-intercept are learned during the training process
using a dataset of input features and corresponding age labels. Once the slope and y-
intercept have been learned, the model uses them to predict the age of new inputs
based on their input features.

22
CHAPTER-4: MODEL TRAINING AND TESTING
4.1 Training Gender Model
50 epochs were performed for model fitting. Results from early 6 epochs and latter 6
epochs are shown below:

Figure 4.1: Model Training process for Gender Detection

• On first epoch, the loss was 1.3350 with accuracy 68% and 0.5017 validation
loss with 75% validation accuracy.
• Coming to the 50th epoch, the loss was 0.1222 with 94% accuracy and 0.3851
validation loss with 87% validation accuracy.

23
Below is the lineplots showing accuracy and loss:

Figure 4.2: Training and Validation Accuracy for Gender

Figure 4.3: Training and Validation Loss for Gender

24
Confusion Matrix

A confusion matrix is a table that is used to define the performance of a classification


algorithm. A confusion matrix visualizes and summarizes the performance of a
classification algorithm.

Below is the confusion matrix of the testing data yielded by the model for gender
detection:

Figure 4.4: Confusion Matrix

The above result depicts the accuracy of the Gender Detection Model is 87%

25
4.2 Training Age Model
50 epochs were performed for model fitting. Results from early 6 epochs and latter 6
epochs are shown below:

Figure 4.5: Model Training process for Age Detection

• On first epoch, the loss was 100832.67 with 37.29 MAE and 258.14 validation
loss with 12.05 validation MAE.
• Coming to the 50th epoch, the loss was 35.487 with 4.48 MAE and 94.218
validation loss with 7.0851 validation MAE.

26
Below is the lineplot showing Loss:

Figure 4.6: Mean Absolute Error for Age

Figure 4.7: Training loss for age

27
CHAPTER-5: RESULTS AND DISCUSSION
The proposed system exhibits excellent performance by achieving a good
classification of age and gender with reduced computation time and higher accuracy.
The system receives the input picture in real-time via the camera. The source image is
preprocessed to enhance the matching process’s efficiency. Images are initially scaled
at 200*200. The entry to the convolution network is 200*200 significantly. The
convolutional layer applies a set of filters to the input image to extract important
features and create a set of output feature maps. These feature maps contain
information about the presence and location of specific features in the image. After
each convolution is a MaxPooling layer which takes these output feature maps and
reduces their spatial dimensionality by selecting the maximum value in each pooling
window. This operation effectively down samples the feature maps, reducing their
size while preserving the most important features. The activation function decides the
value of pixels that help to build the model for the prediction of age and gender.

After building and training of the CNN models for age and gender prediction with
UTK dataset, Haar-cascade classifier is used for the detection of faces and converted
into a gray scale image for real time video by creating rectangle on the face. The gray
scale image is reshaped into three channel for the input of the model. Gender and age
model takes input from real time video and predict the age and gender of the face.

Running the system for gender and age detection and classification respectively, we
were able to observe the following results:

28
Figure 5.1: Results yielded by the system
For the age detection model, the first epoch had a high loss of 100832.67 and an
MAE of 37.29, indicating poor performance. The validation loss was 258.14 with a
validation MAE of 12.05, suggesting that the model was overfitting to the training set
and performing poorly on new data. However, by the 50th epoch, the model had
significantly improved, with a loss of 35.487 and an MAE of 4.48. The validation loss
was 94.218 with a validation MAE of 7.0851, indicating that the model was able to
generalize well to new data and achieve a reasonable level of accuracy for age
detection.
These results indicate that the model was able to accurately estimate the age of the
subjects in the dataset, with a mean absolute error (MAE) of approximately 10 years
on the test set.

On the other hand, the gender detection model had a better performance from the first
epoch, with a loss of 1.3350 and an accuracy of 68%, and a validation loss of 0.5017
and a validation accuracy of 75%. By the 50th epoch, the model improved
significantly, with a loss of 0.1222 and an accuracy of 94%, and a validation loss of
0.3851 and a validation accuracy of 87%.

This indicates that the model was able to accurately detect gender from the data and
perform well on new, unseen data, achieving an accuracy of 87% on the test set.

29
Figure 5.2: Illustration of system’s capacity to detect multiple faces

The designed system is capable of detecting multiple faces from a frame. The above
figure shows inclusion of about 75% faces from a single frame.

Overall, the results demonstrate that CNN models can be effective for gender and age
detection tasks, with the ability to achieve high accuracy and generalize well to new
data. However, further improvements could be made by using larger and more diverse
datasets, fine-tuning hyperparameters, and incorporating additional techniques such
as attention mechanisms or ensembling.

30
CHAPTER-6: LIMITATIONS
Below are the limitations of the designed program:

• Input image resolution isn’t good enough, which impacted the better result
prediction.
• The brightness of the surrounding alters the results; input taken in dark
surrounding has less accuracy compared to the input taken in bright
surrounding.

31
CHAPTER-7: CONCLUSION
We tackled the classification of age group and gender of unfiltered real-world face
images. We used the UTKFace dataset and developed the model. Haar-Cascade
algorithm was used to detect face/s, binary crossentropy for classification of gender
and finally linear regression function for age detection. Training and testing accuracy
was visualized using lineplots and confusion matrix (gender). The image
preprocessing algorithm, handled some of the variability observed in typical
unfiltered real-world faces, and this confirmed the model applicability for age group
and gender classification in-the-wild.

Hence, we conclude the report stating the objectives accomplished. Finally, we


investigated the classification accuracy on UTKFace dataset for age and gender; the
self-trained model achieved the state-of-the-art performance, in both age group and
gender classification, significantly outperforming the existing model.

32
CHAPTER-8: FUTURE ENHANCEMENTS

Utilizing following methods could significantly improve the result of the system:

• Using a high speed processor and a better resolution camera with high focus
efficiency/capability could improve the accuracy and also quickly
identify/detect the face/s on the frame.
• Enhancing the dataset to train and also test the system with variety of images
would improve the decision making ability of the system which could result in
better performance.
• Training the model focusing on pre-processing, normalization, augmentation
and multi-scale prediction could significantly reduce the effects of brightness
of surrounding.

33
REFERENCES
1. K. Zhang, C. Gao, L. Guo et al., “Age group and gender estimation in the wild
with deep RoR architecture,” IEEE Access, vol. 5, pp. 22492–22503, 2017.
2. R. Rothe, R. Timofte, and L. Van Gool, “Deep expectation of real and apparent
age from a single image without facial landmarks,” International Journal of
Computer Vision, vol. 126, no. 2–4, pp. 144–157, 2018.
3. E. Eidinger, R. Enbar, and T. Hassner, “Age and gender estimation of unfiltered
faces,” IEEE Transactions on Information Forensics and Security, vol. 9, no. 12,
pp. 2170–2179, 2014.
4. G. Levi and T. Hassncer, “Age and gender classification using convolutional
neural networks,” in Proceedings of the 2015 IEEE Conference on Computer
Vision and Pattern Recognition Workshops (CVPRW), pp. 34–42, Boston, MA,
USA, June 2015.
5. F. Gao and H. Ai, “Face age classification on consumer images with gabor feature
and fuzzy LDA method,” in Lecture Notes in Computer Science (including
subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), vol. 5558, pp. 132–141, LNCS, Springer Science+Business
Media, Berlin, Germany, 2009.
6. S. Yan, M. Liu, and T. S. Huang, “Extracting age information from local spatially
flexible patches,” in Proceedings of the IEEE International Conference on
Acoustics, Speech and Signal Processing, pp. 737–740, Las Vegas, NV, USA,
March 2008.
7. Y. Fu and T. S. Huang, “Human age estimation with regression on discriminative
aging manifold,” IEEE Transactions on Multimedia, vol. 10, no. 4, pp. 578–584,
2008.
8. C. Szegedy, W. Liu, Y. Jia et al., “Going deeper with convolutions,”
in Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pp. 1–9, Boston, MA, USA, June 2015.

34
9. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), pp. 770–778, Las Vegas, NV, USA, June 2016.
10. Y. H. Kwon and N. Da Vitoria Lobo, “Age classification from facial
images,” Computer Vision and Image Understanding, vol. 74, no. 1, pp. 1–21,
1999.
11. A. Lanitis, C. Draganova, and C. Christodoulou, “Comparing different classifiers
for automatic age estimation,” IEEE Transactions on Systems, Man and
Cybernetics, Part B (Cybernetics), vol. 34, no. 1, pp. 621–628, 2004.
12. A. Günay and V. V. NabIyev, “Automatic age classification with LBP,”
in Proceedings of the 2008 23rd International Symposium on Computer and
Information Sciences, pp. 6–9, Istanbul, Turkey, 2008.
13. G. Guo, G. Mu, Y. Fu, and T. S. Huang, “Human age estimation using bio-
inspired features,” in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, pp. 1589–1592, Miami, FL, USA, 2009.
14. M. A. Beheshti-nia and Z. Mousavi, “A new classification method based on
pairwise support vector machine (SVM) for facial age estimation,” Journal of
Industrial and Systems Engineering, vol. 10, no. 1, pp. 91–107, 2017.
15. A. Demontis, B. Biggio, G. Fumera, and F. Roli, “Super-sparse regression for fast
age estimation from faces at test time,” Image Analysis and Processing—ICIAP,
Springer, Berlin, Germany, 2015.
16. G. Guo, Y. Fu, C. R. Dyer, and T. S. Huang, “Image-based human age estimation
by manifold learning and locally adjusted robust regression,” IEEE Transactions
on Image Processing, vol. 17, no. 7, pp. 1178–1188, 2008.
17. Y. Fu and G. Guo, “Age synthesis and estimation via faces,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 32, no. 11, pp. 1955–1976,
2010.
18. G. Guo and G. Mu, “Simultaneous dimensionality reduction and human age
estimation via kernel partial least squares regression,” in Proceedings of the 24th

35
IEEE Conference on Computer Vision and Pattern Recognition, pp. 657–664,
Colorado Springs, CO, USA, June 2011.
19. M. R. Dileep and A. Danti, “Human age and gender prediction based on neural
networks and three human age and gender prediction based on neural networks
and three sigma control limits,” Applied Artificial Intelligence, vol. 32, no. 3, pp.
281–292, 2018.
20. . M. Lin, Q. Chen, and S. Yan, “Network in network,” 2013.
21. D. Yi, Z. Lei, and S. Z. Li, “Age estimation by multi-scale convolutional
network,” Computer Vision–ACCV 2014, Springer, Berlin, Germany, 2015.
22. Z. Qawaqneh, A. A. Mallouh, and B. D. Barkana, “Deep convolutional neural
network for age estimation based on VGG-face model,,” 2017.
23. R. Ranjan, S. Sankaranarayanan, C. D. Castillo, and R. Chellappa, “An all-in-one
convolutional neural network for face analysis,” in Proceedings of the 12th IEEE
International Conference on Automatic Face & Gesture Recognition (FG 2017),
pp. 17–24, Biometrics Wild, Bwild, Washington, DC, USA, June 2017.
24. W. Liu, L. Chen, and Y. Chen, “Age classification using convolutional neural
networks with the multi-class focal loss,” IOP Conference Series: Materials
Science and Engineering, vol. 428, no. 1, 2018.
25. M. Duan, K. Li, C. Yang, and K. Li, “A hybrid deep learning CNN–ELM for age
and gender classification,” Neurocomputing, vol. 275, pp. 448–461, 2018.
26. . M. Duan, K. Li, and K. Li, “An ensemble CNN2ELM for age estimation,” IEEE
Transactions on Information Forensics and Security, vol. 13, no. 3, pp. 758–772,
2018.
27. . K. Zhang, N. Liu, X. Yuan, S. Member, X. Guo, C. Cao et al., “Fine-grained age
estimation in the wild with attention LSTM networks,” IEEE Transactions on
Circuits and Systems for Video Technolog, p. 1, 2019.

36
APPENDIX
Below are the few observed results:

37

You might also like