1822 B.E Cse Batchno 340
1822 B.E Cse Batchno 340
LEARNING
by
Vanshika Dravid(38110616)
Shivangi Ashim Sen(38110685)
SCHOOL OF COMPUTING
SATHYABAMA
INSTITUTE OF SCIENCE AND TECHNOLOGY
(DEEMED TO BE UNIVERSITY)
Accredited with Grade “A” by NAAC
JEPPIAAR NAGAR, RAJIV GANDHI SALAI, CHENNAI - 600 119
MARCH - 2022
DECLARATION
We Vanshika Dravid & Shivangi Ashim Sen hereby declare that the Project Report entitled Age
and Gender Detector using Deep Learning done by me under the guidance of Dr. L.
Lakshamanan(M.E, Ph.d) is submitted in partial fulfillment of the requirements for the award of
Bachelor of Engineering / Technology degree in Sathyabama Institute of Science and Technology.
BONAFIDE CERTIFICATE
This is to certify that this Project Report is the bonafide work of Vanshika Dravid (38110616) and
Shivangi Ashim Sen ( 38110685) who carried out the project entitled “ AGE AND GENDER
DETECTOR USING DEEP LEARNING” under my supervision from October 2021 to May 2022.
INTERNAL GUIDE
DR. L. LAKSHMANAN(M.E, Ph.d)
HEAD OF DEPARTMENT
DR. L. LAKSHMANAN(M.E.,Ph.d)
Since the advent of social media, there has been an increased interest in automatic age
and gender classification through facial images. So, the process of age and gender
classification is a crucial stage for many applications such as face verification, aging
analysis, ad targeting and targeting of interest groups. Yet most age and gender
classification systems still have some problems in real-world applications. This work
involves an approach to age and gender classification using multiple convolutional
neural networks (CNN). The proposed method has 5 phases as follows: face detection,
remove background, face alignment, multiple CNN and voting systems. The multiple
CNN model consists of three different CNN in structure and depth; the goal of this
difference It is to extract various features for each network. Each network is trained
separately on the AGFW dataset, and then we use the Voting system to combine
predictions to get the result.
CHAPTER 1
INTRODUCTION
2.1 Age and Gender Classification using Multiple Convolutional Neural Network
Abstract
Since the advent of social media, there has been an increased interest in automatic age
and gender classification through facial images. So, the process of age and gender
classification is a crucial stage for many applications such as face verification, aging
analysis, ad targeting and targeting of interest groups. Yet most age and gender
classification systems still have some problems in real-world applications. This work
involves an approach to age and gender classification using multiple convolutional neural
networks (CNN). The proposed method has 5 phases as follows: face detection, remove
background, face alignment, multiple CNN and voting systems. The multiple CNN model
consists of three different CNN in structure and depth; the goal of this difference It is to
extract various features for each network. Each network is trained separately on the
AGFW dataset, and then we use the Voting system to combine predictions to get the
result.
Introduction
Age and gender play fundamental roles in social interactions. Languages reserve different
salutations and grammar rules for men or women, and very often different vocabularies
are used when addressing elders compared to young people. Despite the basic roles these
attributes play in our day-to-day lives, the ability to automatically estimate them
accurately and reliably from face images is still far from meeting the needs of
commercial applications. This is particularly perplexing when considering recent claims
to super-human capabilities in the related task of face recogni
tion (e.g., [48]).
Past approaches to estimating or classifying these attributes from face images have relied
on differences in facial feature dimensions [29] or “tailored” face descriptors
(e.g., [10, 15, 32]). Most have employed classifification schemes designed particularly
for age or gender estimation tasks, including [4] and others. Few of these past methods
were designed to handle the many challenges of unconstrained imaging conditions [10].
Moreover, the machine learning methods employed by these systems did not fully Figure
1. Faces from the Adience benchmark for age and gender classifification [10].
These images represent some of the challenges of age and gender estimation from
real-world, unconstrained images. Most notably, extreme blur (low-resolution),
occlusions, out-of-plane pose variations, expressions and more exploit the massive
numbers of image examples and data available through the Internet in order to improve
classifification capabilities.
In this paper we attempt to close the gap between automatic face recognition capabilities
and those of age and gender estimation methods. To this end, we follow the successful
example laid down by recent face recognition systems: Face recognition techniques
described in the last few years have shown that tremendous progress can be made by the
use of deep convolutional neural networks (CNN) [31].
All three color channels are processed directly by the network. Images are fifirst rescaled
to 256 × 256 and a crop of 227 × 227 is fed to the network. The three subsequent
convolutional layers are then defined as follows.
1. 96 filters of size 3×7×7 pixels are applied to the input in the first convolutional layer,
followed by a rectified linear operator (ReLU), a max pooling layer taking the
maximal value of 3 × 3 regions with two-pixel strides and a local response normalization
layer [28].
3. Finally, the third and last convolutional layer operates on the 256 × 14 × 14 blob by
applying a set of 384 filters of size 256 × 3 × 3 pixels, followed by ReLU and a max
pooling layer. The following fully connected layers are then defined by:
4. A first fully connected layer that receives the output of the third convolutional layer
and contains 512 neurons, followed by a ReLU and a dropout layer.
5. A second fully connected layer that receives the 512- dimensional output of the first
fully connected layer and again contains 512 neurons, followed by a ReLU
and a dropout layer.
6. A third, fully connected layer which maps to the final classes for age or gender.
Finally, the output of the last fully connected layer is fed to a soft-max layer that assigns a
probability for each class. The prediction itself is made by taking the class with the
maximal probability for the given test image.
Initialization. The weights in all layers are initialized with random values from a zero
mean Gaussian with standard deviation of 0.01. To stress this, we do not use pretrained
models for initializing the network; the network is trained, from scratch, without using
any data outside of the images and the labels available by the benchmark. This, again,
should be compared with CNN implementations used for
face recognition, where hundreds of thousands of images are used for training [48].
Target values for training are represented as sparse, binary vectors corresponding to the
ground truth class of the number of classes (two for gender, eight for the eight
age classes of the age classifification task), containing 1 in the index of the ground truth
and 0 elsewhere Network training. Aside from our use of a lean network
architecture, we apply two additional methods to further limit the risk of overfifitting.
First we apply dropout learning [24] (i.e. randomly setting the output value of net
work neurons to zero). The network includes two dropout layers with a dropout ratio of
0.5 (50% chance of setting a neuron’s output value to zero). Second, we use data
augmentation by taking a random crop of 227 × 227 pixels from the 256 × 256 input
image and randomly mirror it in each forward-backward training pass. This, similarly to
the multiple crop and mirror variations used by [48]. Training itself is performed using
stochastic gradient decent with image batch size of fififty images. The initial learning rate
is e−3, reduced to e−4 after 10K iterations. Prediction. We experimented with two
methods of using the network in order to produce age and gender predictions
for novel faces:
• Center Crop: Feeding the network with the face image, cropped to 227 × 227 around the
face center.
• Over-sampling: We extract fifive 227 × 227 pixel crop regions, four from the corners of
the 256 × 256 face image, and an additional crop region from the center
of the face. The network is presented with all five images, along with their horizontal
reflections. Its final prediction is taken to be the average prediction value
across all these variations. We have found that small misalignments in the Audience
images, caused by the many challenges of these images (occlusions, motion blur, etc.)
can have a noticeable impact on the quality of our results. This second, over-sampling
method, is designed to compensate for these small misalignments, bypassing the need for
improving alignment quality, but rather directly feeding the network with multiple
translated versions of the same face.
4. Experiments
Our method is implemented using the Caffe open-source framework [26]. Training was
performed on an Amazon GPU machine with 1,536 CUDA cores and 4GB of video
memory. Training each network required about four hours, predicting age or gender on a
single image using our network requires about 200ms. Prediction running times can
conceivably be substantially improved by running the network on image batches.
We test the accuracy of our CNN design using the recently released Adience benchmark
[10], designed for age and gender classifification. The Adience set consists of images
automatically uploaded to Flickr from smart-phone devices. Because these images were
uploaded without prior manual fifiltering, as is typically the case on media webpages
(e.g., images from the LFW collection [25]) or social
websites (the Group Photos set [14]), viewing conditions in these images are highly
unconstrained, reflflecting many of the real-world challenges of faces appearing in
Internet images. Adience images therefore capture extreme variations in head pose,
lightning conditions quality, and more. The entire Adience collection includes roughly
26K images of 2,284 subjects. Table 1 lists the breakdown of the collection into the
different age categories. Testing for both age or gender classifification is performed using
a standard five-fold, subject-exclusive cross-validation protocol, defined in [10]. We use
the in-plane aligned version of the faces, originally used in [10]. These images are used
rather than newer alignment techniques in order to highlight the performance gain
attributed to the network architecture, rather than better preprocessing. We emphasize that
the same network architecture is used for all test folds of the benchmark and in fact, for
both gender and age classifification tasks. This is performed in order to ensure the
validity of our results across folds, but also to
demonstrate the generality of the network design proposed here; the same architecture
performs well across different, related problems. We compare previously reported results
to the results computed by our network. Our results include both methods for testing:
center-crop and over-sampling (Section 3).
4.2. Results
Table 2 and Table 3 presents our results for gender and age classifification respectively.
Table 4 further provides a confusion matrix for our multi-class age classifification results.
For age classifification, we measure and compare both the accuracy when the algorithm
gives the exact age-group classifification and when the algorithm is off by one adjacent
age-group (i.e., the subject belongs to the group im
mediately older or immediately younger than the predicted group). This follows others
who have done so in the past, and reflflects the uncertainty inherent to the task – facial
features often change very little between oldest faces in one age class and the youngest
faces of the subsequent class. Both tables compare performance with the methods
described in [10]. Table 2 also provides a comparison with [23] which used the same
gender classifification pipeline of [10] applied to more effective alignment of the faces;
faces in their tests were synthetically modifified to appear facing forward.
Evidently, the proposed method outperforms the reported state-of-the-art on both tasks
with considerable gaps. Also evident is the contribution of the over-sampling approach,
which provides an additional performance boost over the original network. This implies
that better alignment (e.g., frontalization [22, 23]) may provide an additional boost in
performance. We provide a few examples of both gender and age misclassififications in
Figures 4 and 5, respectively. These show that many of the mistakes made by our system
are due to extremely challenging viewing conditions of some of the Adience benchmark
images. Most notable are mistakes caused by blur or low resolution and occlusions
(particularly from heavy makeup). Gender estimation mistakes also frequently occur for
images of babies or very young children where
obvious gender attributes are not yet visible.
Conclusions
Though many previous methods have addressed the problems of age and gender
classifification, until recently, much of this work has focused on constrained images taken
in lab settings. Such settings do not adequately reflflect appearance variations common to
the real-world images in social websites and online repositories. Internet images,
however, are not simply more challenging: they are also abundant. The easy availability
of huge image collections provides modern machine learning based systems with
effectively endless training data, though this data is not always suitably labeled for
supervised learning. Taking example from the related problem of face recognition we
explore how well deep CNN perform on these tasks using Internet data. We provide
results with a lean deep-learning architecture designed to avoid overfifitting due to the
limitation of limited labeled data. Our network is “shallow” compared to some of the
recent network architectures, thereby reducing the number of its parameters and the
chance for overfifitting. We further inflflate the size of the
training data by artifificially adding cropped versions of the images in our training set.
The resulting system was tested on the Adience benchmark of unfifiltered images and
shown to signifificantly outperform recent state of the art. Two important conclusions can
be made from our results. First, CNN can be used to provide improved age and gender
classifification results, even considering the much smaller size of contemporary
unconstrained image sets labeled for age and gender. Second, the simplicity of our model
implies that more elaborate systems using more training data
may well be capable of sub
Acknowledgments
This research is based upon work supported in part by the Offifice of the Director of
National Intelligence (ODNI), Intelligence Advanced Research Projects Activity
(IARPA), via IARPA 2014-14071600010. The views and conclusions contained herein
are those of the authors and should not be interpreted as necessarily representing the
offificial policies or endorsements, either expressed or implied, of ODNI, IARPA, or the
U.S. Government. The U.S. Government is authorized to reproduce and distribute
reprints for Governmental purpose notwithstanding any copyright annotation thereon.
-
2.2 Human Age And Gender Classification using Convolutional Neural Network
Abstract
Pattern recognition and automatic classification are very active research areas, their main
objectives are to develop intelligent systems able to achieve efficiently learning and
recognizing objects. An essential section of these applications is attached to biometrics,
which is used for security purposes in general. The facial modality as a fundamental
biometric technology has become increasingly important in the field of
research. The goal of this work is to develop a gender prediction and age estimation
system based on convolutional neural networks for a face image or a real-time video. In
this paper, three CNN network models were created with different architecture (the
number of filters, the number of convolution layers...) validated on IMDB and WIKI
dataset, the results obtained showed that CNN networks greatly improve the performance
of the system as well as the accuracy of the recognition.
INTRODUCTION
The CNNs models proposed in table 1 was build using Keras which has many advantages
to improve efficiency of the model. We input 2500 images of male and female separately
2000 images for train and 500 images for the test. The CNNs models were trained for
1500 epochs, after every epoch the accuracy was calculated, which is the count of
predictions where the predicted value is equal to the true value, it is typically expressed
as a percentage. The input is passed through a pile of convolutional and maxpooling
layer, the non-linear activation function (ReLu) was
used, in output result we applied a sigmoid function as shown in table 1, for all models,
RMSpro was used as an optimizer.
D. Discussion
In this work, we aimed to automate a system for gender prediction and age estimation by
using CNN and deep learning techniques, first, we build three models: CNN1, CNN2,
CNN3 as described in table 1, these models were trained on IMDB dataset, we noticed
that the CNN 3 present best results compared to the CNN 2 and CNN 1, due to the depth
of the network. In CNN 3 we used three convolutional layer but in CNN 2 and CNN 1 we
used only two layers of convolution with various filter size,16 filters were used at the 1st
convolution layer in CNN 1, 32 filters were applied in the 1st convolutional layer in CNN
2, when the number of the filter was large the performance of system increase. In other
word, the depth of network and the number of the filters have a great influence in
creating an efficient convolutional network ranking. For age estimation, we used the
model CNN 3 to classify age in three categories; young (20-39 years), middle (40-59
years), old (more than 60 years), this kind of classification will eventually be useful for
marketing to identify customers. After training model CNN 3, we noticed that this CNN
model can obtain a perfectly acceptable result, as show in figures 6 and 7. Furthermore,
the rate of classification growth with the number of epochs, this reflects that with each
epoch the model
learns more information.
VII. CONCLUSION
In this article, we analyzed the implementation of deep convolutional neural network for
human age and gender prediction using CNN. During this study various design was
developed for this task, age and gender classification is one of the key segments of
research in the biometric as social applications with the goal that the future forecast and
the information disclosure about the particular individual should be
possible adequately. In this study, the main conclusion that can be drawn is that age and
gender from face recognition are very popular among panels to implement an intelligent
system that can achieve good and robust results in the accuracy of recognition, we
employed a deep learning algorithm, as a convolutional neural network to propose a
simple study contain various CNNs model in gender
classification, trained in well-known datasets IMDB-WIKI, then we applied an efficient
model for age estimation, the different results obtained in terms of precision, compared
with those cited in the state of the art, have shown that the depth of the
convolutional networks used in this work is an important factor in achieving better
precision. The interpretation of the figure (4- 5-6-7) and the outcome of tables 3 and 4
was based on parameter settings in our experiment described in table 2. The proposed
network provides significant precision improvements in age and gender classification, but
takes considerable time to train the network to implement the correct prediction.
Finally, as a perspective, an extension of this work can be envisaged by creating a face
detection and recognition system based on CNNs as a feature extractor and the machine
vector support as a classifier, another perspective would be the tests our
approach on other facial databases showing strong variations in lighting and pose.
Abstract
This paper focuses on the problem of gender and age classifification for an image. I build
off of previous work [12] that has developed effificient, accurate architectures for these
tasks and aim to extend their approaches in order to improve results. The fifirst main area
of experimentation in this project is modifying some previously published, effective
architectures used for gender and age classifification [12]. My
attempts include reducing the number of parameters (in the style of [19]), increasing the
depth of the network, and modifying the level of dropout used. These modififications
actually ended up causing system performance to decrease (or
at best, stay the same) as compared with the simpler architecture I began with. This
verifified suspicions I had that the tasks of age and gender classifification are more prone
to over-fitting than other types of classification. The next facet of my project focuses on
coupling the architectures for age and gender recognition to take adwe must make do
with the nature of this problem we are apvantage of the gender-specifific age
characteristics and age specifific gender characteristics inherent to images. This stemmed
from the observation that gender classifification is an inherently easier task than age
classifification, due to both the fewer number of potential classes and the more prominent
intra-gender facial variations. By training different age classififiers for each gender I
found that I could improve the performance of age classification, although gender
classification did not see any significant gains.
1. Introduction
Over the last decade, the rate of image uploads to the Internet has grown at a nearly
exponential rate. This new found wealth of data has empowered computer scientists
to tackle problems in computer vision that were previously either irrelevant or intractable.
Consequently, we have witnessed the dawn of highly accurate and effificient facial
detection frameworks that leverage convolutional neural networks under the hood.
Applications for these systems include everything from suggesting who to “tag” in
Facebook photos to pedestrian detection in self-driving cars. However
the next major step to take building off of this work is to ask not only how many faces are
in a picture and where they are, but also what characteristics do those faces have. The
goal of this project do exactly that by attempting to classify the age and gender of the
faces in an image. Applications for this technology have a broad scope and
the potential to make a large impact. For example, many languages have distinct words to
be used when addressing a male versus a female or an elder versus a youth. Therefore
automated translation services and other forms of speech generation can factor in gender
and age classifification of subjects to improve their performance. Also, having an idea
about the age and gender of a subject makes the task of
recognizing that subject signifificantly easier. This could be used to aid assisted vision
devices for those with deteriorating, or lost, eyesight. Social media websites like Face
book could use the information about the age and gender of the people to better infer the
context of the image. For example, if a picture contains many people studying together,
Facebook might be able to caption the scene with “study session.” However if it can also
detect that the people are all men in their early 20s and that some are wearing shirts with
the same letters, it may predict “College students in a fraternity studying.” Age and
gender classifification is an inherently challenging problem though, more so than many
other tasks in computer vision. The main reason for this discrepancy in difficulty lies in
the nature of the data that is needed to train these types of systems. While general object
classification tasks can often have access to hundreds of thousands, or even millions, of
images for training, datasets with age
and/or gender labels are considerably smaller in size, typically numbering in the
thousands or, at best, tens of thousands. The reason for this is that in order to have labels
for such images we need access to the personal information of the subjects in the images.
Namely we would need their date of birth and gender, and particularly the date of birth is
a rarely released piece of information. Therefore, proaching and tailor network
architectures and algorithmic approaches to cope with these limitations. These reasons
are the primary motivation behind [12] choosing to implement a relatively shallow
architecture for age and gender classifification using convolutional neural networks, and
we have followed this pattern. The input to my algorithm is an image of a human face of
size 256x256 that is then cropped to 227x227 and fed into
either the age classififier, gender classififier or both. The age classififier returns a integer
representing the age range of the individual. There are 8 possible age ranges (see Section
4), so the age classififier returns an integer between 0 and 7. The gender classifier returns
a binary result where 0 indicates male and 1 represents female.
Methods
3.1. Network Architecture
The network architecture used throughout my project is based off of the work in [12]. As
mentioned toward the end of Section 2, this network design is intended to be relatively
shallow so as to prevent over-fifitting the data. Figure 1 visualizes the network, which is
explained below. An RGB image being input to the network is fifirst scaled to 3x256x256
and then cropped to 3x227x227. The types of cropping are described further in Section
3.2. There are 3 convolution layers, followed by 3 fully connected layers. The
convolution layers are:
1. Conv1- 96 fifilters of size 3x7x7 are convolved with stride 4 and padding 0, resulting
in an output volume size of 96x56x56. This is followed by a ReLU, max
pooling pooling which reduces the size to 96x28x28, and a local-response normalization
(LRN).
2. Conv2- 256 fifilters of size 96x5x5 are convolved with stride 1 and padding 2,
resulting in an output volume size of 256x28x28. This is also followed by a
ReLU, max-pool, and LRN, reducing the output size
to 256x14x14.
3. Conv3- 256 fifilters of size 256x3x3 are convolved with stride 1 and padding 1,
followed by a ReLU and maxpool, resulting in an output volume of 256x7x7. The fully
connected layers are
1. FC6- 512 neurons fully connected to the 256x7x7 output of Conv3, followed by a
ReLU layer and dropout layer.
2. FC7- 512 neurons fully connected to the 1x512 output of FC6 followed by a ReLU
layer and dropout layer.
3. FC8- 2 or 8 neurons fully connected to the 1x512 output of FC7, yielding the
un-normalized class scores for either gender or age, respectively.
And finally there is a softmax layer that sits on top of FC8, which gives the loss and
fifinal class probabilities.
Finally, [12] proposes 2 types of sampling of an input image when being classifified. One
is to simply take a center crop of 227x227 out of the 256x256 image and classify
that. The other is to take 5 such crops, one from each of the corners and one from the
center, classify them all, and take the majority classifification from between them. While
they found that the latter technique can improve accuracy slightly, for the sake of
reducing testing time, I used the first approach for this project.
3.3. Goals
My fifirst objective in this project was to determine if the proposed network architecture
(see Section 3.1) was indeed optimal. Although the authors of [12] claimed that any
deeper network would suffer from over-fifitting, I wanted to verify this for myself. To
this end I experimented with adding additional convolution layers, removing fully
connected layers (in the style of [19]), and modifying the parameters used for dropout as
well as LRN. The primary goal, however, was to experiment with a new higher-level
approach for composing these classififiers to improve performance. The observation I
made early on was that gender classifification is an inherently easier task than age
classifification, both due to the fewer number of classes to distinguish between and the
more marked differences that exist between genders than between many age groups. This
then led me to the conclusion that while it is
reasonable to assume one should be able to ascertain someones gender apart from
knowing their age, or vice versa, there is also some plausibility of using one of these
attributes to better inform the prediction of the other. For example, the amount of hair on
a man’s head can often be a useful indicator of age, but the same is not true for women.
Furthermore, separating the tasks of classifying men’s age and women’s age should, in
theory, give the networks more expressive power by freeing them from having to learn a
gender-neutral concept of age. Therefore, I proposed that training separate age classifiers
for men and women could simulate the added power of deeper networks while avoiding
the danger of over-fitting.
Stochastic Gradient Descent. Now that we know how to calculate the loss, we need to
know how to minimize it in order to train an accurate classifier. The type of optimization
used in this experiment is stochastic gradient descent. In order to explain this, first I will
elaborate on the more generalized form of gradient descent. The gradient of a function is
really just its derivative, and therefore by definition it is the direction of steepest ascent
(or descent if you move backwards along it). Therefore if we compute the gradient of the
loss function with respect to all of the system variables/weights (in CNNs there can be up
to millions of these), we will have the direction along which we can move toward our
minimum loss most quickly by following the negative of the gradient. Each time we
compute the gradient we take a small step (governed by a hyperparameter) in the opposite
direction, and we re-evaluate the loss, re-compute the gradient, and repeat. The hope (and
in fact the reality) is that by repeating this process we will iteratively decrease our loss
function, which is reflective of the model becoming iteratively better at its classification
task.
Mathematically, we can write this as
w = w − η∇wL
where η is the learning rate, also sometimes called the step size and ∇wL is the gradient
of the loss term with respect to the weight vector w. While this is theoretically great, the
truth is that computing the gradient across the entire training set in order to make an
incremental update to the weights is prohibitively computationally expensive. Therefore
alternate approaches have been invented that evaluate the gradient of the loss 4function
over a sample of the training data, and use that approximate gradient to make the update.
The reason this gradient is approximate is that although it is the optimal direction to
travel down given the sample of images it was computed over, there is no telling what
kinds of images it did not look at when computing the gradient. Therefore making this
form of mini-batch gradient descent, as it is called, will still usually reach the minimum
loss over time (or at least a local minimum), but it will require many more iterations on
average. However the time it takes to evaluate the gradient drops so dramatically when
we operate on a mini-batch that it is actually significantly faster to perform many more
mini-batch gradient updates than a few full gradient updates. Finally, stochastic gradient
descent is a special form of gradient descent in which the mini-batch size is 1. This is
extremely fast to compute since it only requires passing 1 image forward (to calculate the
loss) and backward (to calculate the gradient) through the network, but the gradients are
even less globally optimal than mini-batch gradient descent, therefore a smaller step size
is required at each iteration and many more iterations are required.
4. Dataset
The dataset used for training and testing for this project is the Adience face dataset,
which comes from the Face Image Project[12] from the Open University of Israel (OUI).
This dataset contains a total of 26,580 photos of 2,284 unique subjects that are collected
from Flickr. Each image is annotated with the person’s gender and age-range (out of 8
possible ranges). The images are subject to various levels of occlusion, lighting, and blur,
which reflflects real-world circumstances. I used those images which were mostly front
facing, which limited the total number of images to around
20,000. Table 1 includes details regarding the distribution of images in each gender and
age range. Figure 2 shows some examples of images of both males and females in the
dataset of various ages. The images were originally of size 768x768, so they were
preprocessed by all being resized down to 256x256.
Figure 2. Adience image dataset examples. Top row: 6 males of various ages. Bottom
row: 6 females of various ages.
5. Experiments
The training and testing for this project were done exclusively using Caffe [6], running on
Amazon EC2 using be- 0-2 4-6 8-13 15-20 25-32 38-43 48-53 60+ Total
Male 745 928 934 734 2308 1294 392 442 8192
Female 682 1234 1360 919 2589 1056 433 427 9411
Both 1427 2162 2294 1653 4897 2350 825 869 19487
Table 1. Adience image dataset distribution. Number of images for each gender and
age range between 1 and 3 instances at a time, each with 1,536 CUDA cores and 4GB of
video memory. As described in Section 3.2, a 6-fold subject-exclusive cross validation
protocol was used to provide added robustness to the training, while
preserving the reliability of results. Although much of my network architecture was built
off of the work in [12], my networks were trained from scratch without using any
pretrained weights provided by that, or any other, project. The reason for this had to do
with the way that I divided up the dataset for the purposes of my project (again, see
Section 3.2 for more information). The fifirst step I took was to attempt to reproduce the
results of [12] as a baseline since I had their network architecture and training data. I
attempted to replicate their experiment as closely as possible, so I also used SGD with a
learning rate of 1e-3 that decays by a factor of 10 every 10,000 iterations and a batch size
of 50 images. This proved quickly successful, and within a few hours of training I
reached accuracies that were very close to their results for both age and gender
classification. These results are recorded in Table 2. Note that their slightly higher
accuracies are likely due to the oversampling they do of the input images followed by
taking the majority classifification of the various samples. For the sake of faster iteration
in my model, I avoided this technique tribute much to the overall performance and that
depth in the convolutional layers is actually preferable. To this end I removed 1 and then
2 of the fully connected layers (out of 3) and added 1 and then 2 additional convolution
layers (on top of the existing 3). I attempted 5 different combinations of these modifified
architectures, but like with the attempt at
using Adam, after multiple days there was no clear benefifit, and if anything added
complexity was making the system perform the same or, sometimes, worse.
At this point I chose to focus my efforts on the main insight that motivated this project,
which was that coupling the architectures for gender and age classifification could give
more expressive power to the classififiers, particularly for age. As a sort of “proof of
concept”, I attempted to train classififiers on each gender separately to see what would
happen. The results pleasantly surprised me and are summarized in Figure 3. I saw that
when training a classififier from the ground up only on male images, the accuracy when
predicting the age of men increased. Conversely, I saw the accuracy of classifying
women’s ages decreases over the average (which may or may not be taken as a social
commentary on how women are more effective at hiding their age)fifiers for each age
group, and those results are summarized in Table 3. These results are less striking than
those of Figure 3, but there is some reassurance in how intuition lines up with the results.
Namely it can be seen that the age range in which it is most diffificult to predict gender,
with just a 27% accuracy, is 0-2 years old. Of course though that makes perfect sense as
gender-specifific features are not usually present at such a young age, or at least not as
much as later in life. Also the age range in which gender prediction is the best is 15-20,
which also seems reasonable since that is the time when there is the most development of
gender-specifific features.
Given these results, it seemed most promising to use the remaining time I had to develop
and train a chained gender age network that would fifirst classify gender (as before)
Conclusion
Although many previous methods have tackled the problem of age and gender
classification of images, in this paper. I establish a benchmark for the task based on
state-of-the-art network architectures and show that chaining the prediction of age with
that of gender can improve overall accuracy. If there had been more time, I would have
dedicated more effort towards fine-tuning the parameters and the modified architectures I
experimented with. Specifically, I would have liked to get the Adam learning algorithm
in place with equal or improved performance to SGD, and I would have liked to replace
the multiple fully connected layers at the end of the architecture with only one and
instead shifted those parameters over to additional convolutional layers. By far the most
difficult portion of this project was setting up the training infrastructure to properly
divide the data into folds, train each classifier, cross-validate, and combine the resulting
classifiers into a test-ready classifier. I foresee future directions building off of this work
to include using gender and age classification to aid face recognition, improve
experiences with photos on social media, and much more. Finally I hope that additional
training data will become available with time for the task of age and gender classification
which will allow successful techniques from other types of classification with huge
datasets to be applied.
ABSTRACT
In this paper, the author has worked on a technique for age and gender classification
using python algorithm. Human identification and classification are being utilized in
various field for a very long time. Fields like Government ID Cards, Verification
procedures etc. We have already developed techniques like retina scan, iris scans,
fingerprint and other sophisticated systems such as DNA fingerprinting to identify the
individuals. Although these already built methods works efficiently, the hardware,
software and human proficiency requirement are way too demanding for several simpler
task which may or may not require a professional efficiency. Technique reported in this
paper is simple and easy for human classification which can be performed using only a
webcam and a decent computer system.
INTRODUCTION
Human Classification is an age-old procedure and being done in various fields and
technology such as biometrics, forensics sciences, Image processing, Identification
system, etc. With the development of Artificial Intelligence and techniques such as
Neural Network and Deep Learning, it has become increasingly easier to classify human.
These new technologies help identification, classification of Individuals without the need
of another professional or Individual records. Also Being immensely fast, these
technologies can classify millions of individuals way faster than a professional. Human
Facial Image Processing provides many clues and cues applicable to industries such as
security, entertainment, etc [1]. Human Face can provide immense amount of information
like their emotional state, slightest agreement or disagreement, irony or anger, etc. This is
the reason why faces have been long research topic in psychology [2]. This data (or in
our case digital data) is very valuable as they help recognition, selection or identification
of individual according to the requirement. Age and Gender Detection can alone provide
a lot of information to places such as recruitment team of organizations, Verification of
ID cards, example: Voter ID cards which millions of individual uses to cast their vote at
the time of election, etc. Human Facial Image processing eases the task of finding
ineligible or counterfeit individuals.
PROCEDURE
Since the technique is implemented, we can start testing it for its accuracy. The general
procedure to be followed is • Input the data. • Create a frame. • Detect the face. • Classify
the Gender. • Classify the Age Group. • Attach the result in the image. • Output the image
in specified location. 4. TEST RUN To verify the efficiency of the technique we collected
some of human face along with their mentioned ages when the photo was captured and
fed them to the program. The performance can be judged using the chart human face like
non-human object was provided as an input and since the no data for the non-human
object was stored in training datasets it gave the inaccurate results.
5. KEY FEATURES
Main aim of this technique is to provide faster and cost-effective method of age and
gender classification of human. Key Features of this model are: • There is no need of high
precision hardware or software. It can process the image directly through the camera
device such as webcam. Although a better device will provide more efficient result. •
This technique is easy to use, it does not require a professional level knowledge. A
normal computer knowledge is enough. • It can process and store hundreds of faces along
with the corresponding result without any lag or delay
. USE CASES Several uses cases for this project includes the following: • Identification
of the target audience in marketing organisation. • In Recruitment procedure, to verify
legitimacy of the applicants. • Verification of authentic person applying for government
IDs. • Classification of human resources in bulk.
. CONCLUSION
“Human Age and gender classification” are two of the many important information
gathering resource from and individual. Human faces provide enough data which may be
used for many purposes. In order to reach the correct audience human age and gender
classification is very essential. Here we tried to do the same process but with general
equipment. The efficiency of the algorithm depends on several factor but the main motif
of this project is being easy and faster while also being as accurate as possible. Work is
being done to the improve the efficiency of the algorithm. Some future improvements
include discarding the face like non-human objects, more datasets for people belonging to
different ethnic groups and more granular control over the workflow of the algorithm.
CHAPTER 3
SYSTEM DESIGN
OBJECTIVE
● From the camera sources, from satellites, aeroplanes, and the images caught in
everyday lives is called picture processing.
● Image processing has two main steps followed by simple steps. The improvement of
an image with the end goal of more good quality pictures; that can be adopted by
other programs are called picture upgrades.
● The other procedure is the most pursued strategy utilized for the extraction of data
from a picture. The division of images into certain parts is called segmentation.
● The evolving of ideas helps in figuring certain boundaries. Age assessment is a
multi-class issue in which the years; are categorized into classes. Individuals of
various ages have various facials, so it is hard to assemble the pictures.
● To identify the age and gender of several faces’ procedures, are followed by several
methods. From the neural network, features are taken by the convolution network. In
light of the prepared models, the image is processed into one of the age classes. The
highlights are handled further and shipped off the preparation frameworks.
EXISTING SYSTEM
● ecascaded Adaboost learning algorithm in face detection and achieved the age
estimation mechanism using Gabor wavelets and OLPP.
● This paper is organized in the following sections. First, our presented face detection
system includes histogram lighting normalization, feature selection, the cascaded
Adaboost classifier and the region‐based clustering algorithm.
● The age estimation process, including the feature extraction using Gabor wavelets,
feature reduction and selection, and age classification, is then introduced.
● Finally, the experimental results and conclusions are provided and summarized
DISADVANTAGE
The downside to Haar cascades is that they tend to be prone to false-positive detections,
require parameter tuning when being applied for inference/detection, and just, in general,
are not as accurate as the more “modern” algorithms
PROPOSED SYSTEM
● Age and Gender Detection, Deep EXpectation (DEX) – is used for age estimation
which can be seen in image classification [5, 32, 47] and object detection fuelled by
deep learning. From the deep learning concept we learn four key ideas that we apply
to our solution:
● the deeper the open cv(by sheer increase of parameters / model complexity) the
better is the capacity to model highly non-linear transformations - with some optimal
depth on current architectures;
● the larger and more diverse the datasets used for training, the better the network
learns to generalize and the more effective it becomes to over-fitting;
● the alignment of the object in the input image impacts the overall performance;
● when the training data is small that is when we must finetune a network pre-trained
for comparable inputs and goals which would benefit us from the transferred
knowledge.
ADVANTAGE
This gives us two advantages: first, the code is as fast as the original C/C++ code (since it
is the actual C++ code working in background) and second, it easier to code in Python
than C/C++. OpenCV-Python is a Python wrapper for the original OpenCV C++
implementation.
Block diagram
FLOW DIAGRAM
Software Prerequisites
Hardware
Deep learning is a machine learning technique that teaches computers to do what comes
naturally to humans: learn by example. Deep learning is a key technology behind
driverless cars, enabling them to recognize a stop sign, or to distinguish a pedestrian from
a lamppost. It is the key to voice control in consumer devices like phones, tablets, TVs,
and hands-free speakers. Deep learning is getting lots of attention lately and for good
reason. It’s achieving results that were not possible before.
In deep learning, a computer model learns to perform classification tasks directly from
images, text, or sound. Deep learning models can achieve state-of-the-art accuracy,
sometimes exceeding human-level performance. Models are trained by using a large set
of labeled data and neural network architectures that contain many layers.
How does deep learning attain such impressive results?
In a word, accuracy. Deep learning achieves recognition accuracy at higher levels than ever
before. This helps consumer electronics meet user expectations, and it is crucial for
safety-critical applications like driverless cars. Recent advances in deep learning have
improved to the point where deep learning outperforms humans in some tasks like classifying
objects in images.
While deep learning was first theorized in the 1980s, there are two main reasons it has only
recently become useful:
1. Deep learning requires large amounts of labeled data. For example, driverless car
development requires millions of images and thousands of hours of video.
2. Deep learning requires substantial computing power. High-performance GPUs have a
parallel architecture that is efficient for deep learning. When combined with clusters or
cloud computing, this enables development teams to reduce training time for a deep
learning network from weeks to hours or less.
Examples of Deep Learning at Work
Deep learning applications are used in industries from automated driving to medical devices.
Automated Driving: Automotive researchers are using deep learning to automatically detect
objects such as stop signs and traffic lights. In addition, deep learning is used to detect
pedestrians, which helps decrease accidents.
Aerospace and Defense: Deep learning is used to identify objects from satellites that locate
areas of interest, and identify safe or unsafe zones for troops.
Medical Research: Cancer researchers are using deep learning to automatically detect cancer
cells. Teams at UCLA built an advanced microscope that yields a high-dimensional data set
used to train a deep learning application to accurately identify cancer cells.
Industrial Automation: Deep learning is helping to improve worker safety around heavy
machinery by automatically detecting when people or objects are within an unsafe distance of
machines.
Electronics: Deep learning is being used in automated hearing and speech translation. For
example, home assistance devices that respond to your voice and know your preferences are
powered by deep learning applications.
Most deep learning methods use neural network architectures, which is why deep
learning models are often referred to as deep neural networks.
The term “deep” usually refers to the number of hidden layers in the neural
network. Traditional neural networks only contain 2-3 hidden layers, while deep
networks can have as many as 150.
Deep learning models are trained by using large sets of labeled data and neural network
architectures that learn features directly from the data without the need for manual feature
extraction.
One of the most popular types of deep neural networks is known as convolutional neural
networks (CNN or ConvNet). A CNN convolves learned features with input data, and
uses 2D convolutional layers, making this architecture well suited to processing 2D data,
such as images.
CNNs eliminate the need for manual feature extraction, so you do not need to identify
features used to classify images. The CNN works by extracting features directly from
images. The relevant features are not pretrained; they are learned while the network trains
on a collection of images. This automated feature extraction makes deep learning models
highly accurate for computer vision tasks such as object classification.
CNNs learn to detect different features of an image using tens or hundreds of hidden
layers. Every hidden layer increases the complexity of the learned image features. For
example, the first hidden layer could learn how to detect edges, and the last learns how to
detect more complex shapes specifically catered to the shape of the object we are trying
to recognize.
Another key difference is deep learning algorithms scale with data, whereas shallow
learning converges. Shallow learning refers to machine learning methods that plateau at a
certain level of performance when you add more examples and training data to the
network.
A key advantage of deep learning networks is that they often continue to improve as the
size of your data increases.
Figure 3. Comparing a machine learning approach to categorizing vehicles (left) with
deep learning (right).
When choosing between machine learning and deep learning, consider whether you have
a high-performance GPU and lots of labeled data. If you don’t have either of those things,
it may make more sense to use machine learning instead of deep learning. Deep learning
is generally more complex, so you’ll need at least a few thousand images to get reliable
results. Having a high-performance GPU means the model will take less time to analyze
all those images.
To train a deep network from scratch, you gather a very large labeled data set and design
a network architecture that will learn the features and model. This is good for new
applications, or applications that will have a large number of output categories. This is a
less common approach because with the large amount of data and rate of learning, these
networks typically take days or weeks to train
Transfer Learning
Most deep learning applications use the transfer learning approach, a process that
involves fine-tuning a pretrained model. You start with an existing network, such as
AlexNet or GoogLeNet, and feed in new data containing previously unknown classes.
After making some tweaks to the network, you can now perform a new task, such as
categorizing only dogs or cats instead of 1000 different objects. This also has the
advantage of needing much less data (processing thousands of images, rather than
millions), so computation time drops to minutes or hours.
In machine learning, you manually choose features and a classifier to sort images. With
deep learning, feature extraction and modeling steps are automatic.
Feature Extraction
A slightly less common, more specialized approach to deep learning is to use the network
as a feature extractor. Since all the layers are tasked with learning certain features from
images, we can pull these features out of the network at any time during the training
process. These features can then be used as input to a machine learning model such
as support vector machines (SVM).
Modules
● image data
● pre- processing
● segmentation image
● feature extraction
● data training and testing
● deep learning algorithm
● detection
Dataset collection
Data Cleaning
● Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly
formatted, duplicate, or incomplete data within a dataset.
● When combining multiple data sources, there are many opportunities for data to be
duplicated or mislabeled.
● data cleaning or data scrubbing, is the process of fixing incorrect, incomplete,
duplicate or otherwise erroneous data in a data set.
● It involves identifying data errors and then changing, updating or removing data to
correct them.
Feature Extraction:
Model training
● Plan and simplify. In the beginning we must think about how does the computer sees
the images. ...
● Collect. For all the tasks try to get the most variable and diverse training dataset. ...
● Sort and upload. You have your images ready and it's time to sort them. ...
● Train and precise.
● Load and normalize the CIFAR10 training and test datasets using torchvision.
● Define a Convolutional Neural Network.
● Define a loss function.
● Train the network on the training data.
● Test the network on the test data.
Testing model:
● In this module we test the trained deep learning model using the test dataset
● A type of test that makes detailed pictures of areas inside the body. Imaging tests use
different forms of energy, such as x-rays (high-energy radiation), ultrasound
(high-energy sound waves), radio waves, and radioactive substances. They may be
used to help diagnose disease, plan treatment, or find out how well treatment is
working.
● Examples of imaging tests are computed tomography (CT), mammography,
ultrasonography, magnetic resonance imaging (MRI), and nuclear medicine tests.
Also called imaging procedure
Performance Evaluation
● In this module, we evaluate the performance of trained deep learning model using
performance evaluation criteria such as F1 score, accuracy and classification error.
● To evaluate object detection models like R-CNN and YOLO, the mean average
precision (mAP) is used. The mAP compares the ground-truth bounding box to the
detected box and returns a score. The higher the score, the more accurate the model
is in its detections.
● Model evaluation is the process of using different evaluation metrics to understand a
machine learning model's performance, as well as its strengths and weaknesses.
● Model evaluation is important to assess the efficacy of a model during initial
research phases, and it also plays a role in model monitoring.
Detection
Open cv:
● OpenCV (Open Source Computer Vision Library) is an open source computer vision
and machine learning software library. OpenCV was built to provide a common
infrastructure for computer vision applications and to accelerate the use of machine
perception in the commercial products.
● OpenCV (Open Source Computer Vision Library) is a library of programming
functions mainly aimed at real-time computer vision. Originally developed by Intel,
it was later supported by Willow Garage then Itseez (which was later acquired by
Intel).
● OpenCV is a great tool for image processing and performing computer vision tasks.
It is an open-source library that can be used to perform tasks like face detection,
objection tracking, landmark detection, and much more. ..
● . Some of these functions are really common and are used in almost every computer
vision task.
Output:
In the Output phase, we apply the same feature extraction process to the new images and
we pass the features to the trained machine learning algorithm to predict the label.
WORK FLOW
● The project has to show a final product as a website that accepts your input data as a
picture and then tells us age and gender.
● The website will be an instantaneous website that will convert the information right
away, yet cannot store or reproduce the same information again. Just acts as an
end-to-end volatile conversion interface.
● Following the conversions, the user can also avail some other functions in the
website.
CHAPTER 6
APPENDIX:
Source Code
import cv2
import numpy as np
import math
import argparse
from flask import Flask, render_template, Response, request
from PIL import Image
import io
UPLOAD_FOLDER = './UPLOAD_FOLDER'
app = Flask(__name__)
app.config['UPLOAD_FOLDER'] = UPLOAD_FOLDER
faceBoxes = []
args = parser.parse_args()
'''
Each model comes with two files: weight file and model file
weight file stores the data of the deployment of the model
model file stores actual predication done by the model
We are using pre trained models
The .prototxt file(s) which define the model architecture (i.e., the layers themselves)
The .caffemodel file which contains the weights for the actual layers
Both files are required when using models trained using Caffe for deep learning.
'''
def gen_frames():
faceProto = "opencv_face_detector.pbtxt"
faceModel = "opencv_face_detector_uint8.pb"
ageProto = "age_deploy.prototxt"
ageModel = "age_net.caffemodel"
genderProto = "gender_deploy.prototxt"
genderModel = "gender_net.caffemodel"
# LOAD NETWORK
faceNet = cv2.dnn.readNet(faceModel, faceProto)
ageNet = cv2.dnn.readNet(ageModel, ageProto)
genderNet = cv2.dnn.readNet(genderModel, genderProto)
ageNet.setInput(blob)
# ageNet.forward method will detect the age of the face detected
agePreds = ageNet.forward()
age = ageList[agePreds[0].argmax()]
print(f'Age: {age[1:-1]} years') # print the age in the console
if resultImg is None:
continue
def gen_frames_photo(img_file):
faceProto = "opencv_face_detector.pbtxt"
faceModel = "opencv_face_detector_uint8.pb"
ageProto = "age_deploy.prototxt"
ageModel = "age_net.caffemodel"
genderProto = "gender_deploy.prototxt"
genderModel = "gender_net.caffemodel"
# LOAD NETWORK
faceNet = cv2.dnn.readNet(faceModel, faceProto)
ageNet = cv2.dnn.readNet(ageModel, ageProto)
genderNet = cv2.dnn.readNet(genderModel, genderProto)
ageNet.setInput(blob)
# ageNet.forward method will detect the age of the face detected
agePreds = ageNet.forward()
age = ageList[agePreds[0].argmax()]
print(f'Age: {age[1:-1]} years') # print the age in the console
if resultImg is None:
continue
@app.route('/')
def index():
"""Video streaming home page."""
return render_template('index.html')
@app.route('/video_feed')
def video_feed():
# Video streaming route. Put this in the src attribute of an img tag
return Response(gen_frames(), mimetype='multipart/x-mixed-replace; boundary=frame')
@app.route('/webcam')
def webcam():
return render_template('webcam.html')
if __name__ == '__main__':
app.run(debug=True)
HTML CODE
<!DOCTYPE html>
<html>
<title>Detect Age & Gender</title>
<link rel = "icon" href =
"https://cdn.pixabay.com/photo/2019/06/23/05/32/deer-head-4292868_1280.png" type =
"image/x-icon">
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="../static/styles/style.css">
<link rel="stylesheet" href="../static/styles/aj.css">
<link rel="stylesheet" href="https://fonts.googleapis.com/css?family=Lato">
<link rel="stylesheet"
href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
<body>
&nbs
p; &n
bsp;
<img src="https://cdn.pixabay.com/photo/2021/03/23/09/01/webcam-6116845_1280.png"
height="120" width="120">
<br><br>
<a href='/webcam'><input type="submit" class="button" value="Go live to detect"></a>
</div>
</div>
<div class="photo-htm">
<form action="/upload" method="POST" enctype="multipart/form-data" style="color: #aaa">
<br><br>Select image to upload:<br><br><br>
<input type="file" name="fileToUpload" id="fileToUpload">
<br><br><br><br>
<div class="group">
<input type="submit" class="button" value="Upload and detect" name="submit">
</div>
</form>
</div>
</div>
</div>
</div>
<script>
// Modal Image Gallery
function onClick(element) {
document.getElementById("img01").src = element.src;
document.getElementById("modal01").style.display = "block";
var captionText = document.getElementById("caption");
captionText.innerHTML = element.alt;
}
</script>
</body>
</html>
<!doctype html>
<html lang="en">
<head>
<link rel = "icon" href =
"https://cdn.pixabay.com/photo/2021/03/23/09/01/webcam-6116845_1280.png" type = "image/x-icon">
<!-- Required meta tags -->
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<link rel="stylesheet" href="static/css/style.css">
<title>Photo Detection</title>
<style>
html {
background: #100a1c;
background-image:
radial-gradient(50% 30% ellipse at center top, #201e40 0%, rgba(0,0,0,0) 100%),
radial-gradient(60% 50% ellipse at center bottom, #261226 0%, #100a1c 100%);
background-attachment: fixed;
color: #6cacc5;
}
img {
margin-left: 110px;
padding: 0;
width: 80vw;
height: 80vh;
border: 2px solid #6cacc5;
border-radius: 4px;
}
button {
float: left;
}
h2 {
text-align: center;
font-family: 'EB Garamond', serif;
/*color: #ff1a1a;*/
text-shadow: 2px 2px 4px #000000;
}
/* --- STYLING THE BUTTONS --- */
button {
border: 0;
background: rgba(42,50,113, .28);
color: #6cacc5;
cursor: pointer;
font: inherit;
margin: 0.25em;
transition: all 0.5s;
border-radius: 4px;
}
/* --- WHEN THE CURSOR HOVERS OVER THE BUTTONS THE COLOR CHNAGES --- */
button:hover {
background: #201e40;
}
</style>
</head>
<body>
<div>
<div class="header">
<a href="/"><button>Home</button></a>
<h2>Detecting Age and Gender from Photo</h2>
</div>
<div class="container">
<img src="{{ url_for('upload_file') }}" width="100%">
</div>
</div>
</body>
</html>
<!doctype html>
<html lang="en">
<head>
<link rel = "icon" href =
"https://cdn.pixabay.com/photo/2021/03/23/09/01/webcam-6116845_1280.png" type = "image/x-icon">
<!-- Required meta tags -->
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<link rel="stylesheet" href="static/css/style.css">
<title>Live Detection</title>
<style>
html {
background: #100a1c;
background-image:
radial-gradient(50% 30% ellipse at center top, #201e40 0%, rgba(0,0,0,0) 100%),
radial-gradient(60% 50% ellipse at center bottom, #261226 0%, #100a1c 100%);
background-attachment: fixed;
color: #6cacc5;
}
img {
margin-left: 110px;
padding: 0;
width: 80vw;
height: 80vh;
border: 2px solid #6cacc5;
border-radius: 4px;
}
button {
float: left;
}
h2 {
text-align: center;
font-family: 'EB Garamond', serif;
/*color: #ff1a1a;*/
text-shadow: 2px 2px 4px #000000;
}
/* --- STYLING THE BUTTONS --- */
button {
border: 0;
background: rgba(42,50,113, .28);
color: #6cacc5;
cursor: pointer;
font: inherit;
margin: 0.25em;
transition: all 0.5s;
border-radius: 4px;
}
/* --- WHEN THE CURSOR HOVERS OVER THE BUTTONS THE COLOR CHNAGES --- */
button:hover {
background: #201e40;
}
</style>
</head>
<body>
<div>
<div class="header">
<a href="/"><button>Home</button></a>
<h2>Detecting Age and Gender</h2>
</div>
<div class="container">
<img src="{{ url_for('video_feed') }}" width="100%">
</div>
</div>
</body>
</html>
OUTPUT:
CONCLUSION:
REFRENCES
● Aurélien Géron (2019). Hands-on Machine Learning with Scikit- Learn, Keras, and
TensorFlow: Second Edition.
● Hisham, A., Harin, S. (2017). Deep Learning – the new kid in Artificial Intelligence
● Robin Nixon (2014). Learning PHP, MySQL, JavaScript, CSS & HTML5: A
Step-by-Step Guide to Creating Dynamic
● Choi, S.E.; Lee, Y.J.; Lee, S.J.; Park, K.R.; Kim, J. Age Estimation Using a
Hierarchical Classifier Based on Global and Local Facial Features. Pattern
Recognition
● Ricanek, K.; Tesafaye, T. Morph: A Longitudinal Image Database of Normal Adult
Age-Progression. In Proceedings of the Seventh International