Image Processing
Image Processing
Coursework Assignment
1 Assignment Overview
This assignment will involve you designing, building, testing and critiquing a system for per-
forming face alignment, aka. locating facial landmarks in images. There is also a secondary
extension task detailed below.
This assignment is worth 80% of the grade for this module. It is designed to ensure you can
demonstrate achieving the learning outcomes for this module, which are:
• Write and document a computer program to extract useful information from image data.
• A summary and justification for all the steps in your face alignment system, including
preprocessing, choice of image features and prediction model. Explaining diagram-
matically is very welcome.
• Results of your experiments: This should include some discussion of qualitative (ex-
ample based) and quantitative (number based) comparisons between different ap-
proaches that you have experimented with.
• Qualitative examples of your face alignment approach running on the small set of
provided example images, found in the compressed numpy file (examples.npz) here.
Ivor Simpson: Computer Vision @ University of Sussex – Spring 2022
• Examples of failure cases in the face alignment system and a critical analysis of these,
identifying potential causes, biases and solutions.
• A summary of how your lip/eye colour modification system works with several exam-
ple results.
2. A .csv file that contains the face landmark positions on the test set of images, found in the
compressed numpy file (“test images”.npz) here. You must use the provided “save as csv”
function in the colab worksheet to process an array of shape (number test image, num-
ber points, 2) to a csv file. Please make sure you run this on the right data and submit in
the correct format to avoid losing marks.
3. Either .ipynb files or .py files containing annotated code for all data preprocessing, model
training and testing.
4. You may optionally include your trained model parameters, but please do not hand in any
other additional files, datasets or supplementary results as this complicates the marking
process .
• what methods you have used, with what parameters and why.
• what image features you have used, briefly describe how they were calculated, and
why you chose them.
• any image pre-processing steps you have used, and why.
For top marks, you should clearly demonstrate a creative and methodical approach for
designing your system, drawing ideas from different sources and critically evaluating your
choices. Explaining using diagrams and/or flowcharts is very welcome.
Quantitative measures including measuring the cumulative error distribution (see lecture
slides) or using boxplots or other plots to compare methods. Please note that we are
interested in your final prediction results, rather than how the cost function changes during
training.
A detailed qualitative analysis would investigate and identify systematic failure cases and
biases, providing visual examples, and proposing potential solutions.
5 Marks Code annotation is for annotating sections of the training/testing code with what they
do. To get maximum marks, explain each algorithmic step (not necessarily each line) in
your notebook/.py files.
• Read things! Provide references to anything you find useful. You can take figures from
other works as long as you reference them appropriately.
• Diagrams, flowcharts and pictures are very welcome, make sure you label them properly
and refer to them from the text.
• Saving the results to a .csv file, which contains some checks to make sure you’re predicting
on the correct dataset.
A set of test images, without landmarks is provided in the compressed numpy array (test images.npz)
here. This data is loaded the same way as before, but there are no points stored in the file.
I also include 6 images to use for qualitative comparisons found in the compressed numpy
array (examples.npz) here. These images should be included in your report to demonstrate face
alignment performance across different genders, ethnicities and poses.
A well justified and high performing CNN approach will receive equivalently high marks as if
you’d built it any other way.
In terms of sourcing additional labelled data, this is not allowed for this assignment. This
is because in real-world commercial projects you will typically have a finite dataset, and even if
there are possibly useful public datasets available, their license normally prohibits commercial
use. On the other hand data augmentation, which effectively synthesises additional training
examples from the labelled data that you have, is highly encouraged. If you use this, please try
and add some text or a flow-chart of this process in your report.
5 Where do I start
5.1 Face Alignment
Face alignment is covered in lecture 14, so that’s a good place to look for information. I briefly
discussed the assignment at the end of the lecture, which you can listen to on Canvas. I’ve also
included some references below.
I have included a very basic colab worksheet illustrating how to load the data and visualise
the points on the face.
The simplest approach would be to treat this as either a regular or a cascaded regression
problem, where given an image you want to predict the set of continuous landmark coordinate
locations. To follow this approach you will need to consider what image features are helpful
to predict the landmarks and what pre-processing is required on the data. Although you could
directly use the flattened image as input, this will not be the optimal data representation for
this task.
A better representation would be to describe a set of locations, either evenly spaced across
the image, or in some more useful pattern (think about where in the image you might want to
calculate more informations) using a feature descriptor, such as SIFT. These descriptions can
then be concatenated together and used as input into a linear regression model. Note that you
do not need to use the keypoint detection process for this task - rather the descriptors should be
computed at defined locations (hint: look at sift.compute() or similar) to create a representation
of the image that is comparable across the dataset.
You’re not restricted to taking this approach, and for higher marks creativity is very much
encouraged. Face alignment has seen a lot of interesting and varied ideas, and if you find some
good ideas while reading around the topic that would be great.
Figure 1: Illustration of the 0-indexed (counting from 0 as you would in Python) locations of
the points on the face. For example, if we wanted to find the tip of the nose, that’s index 14
so we would look up points[14,:], which would give you the x and y coordinate of the tip of the
nose.
• Start with a simple achievable goal and use that as a baseline to test against. Keep track
of early models/results to use as points of comparison.
• Remember that even if it doesn’t work well, having a go at the extension tasks is worth a
few marks. We’re only looking for simple solutions.
• You don’t need to work at very high resolution to get accurate results. Particularly when
doing initial tests, resize your images to a lower resolution images. Make sure you also
transform your training points so they are in the same geometry as the image. For your
predicted points, make sure these are all at the same resolution as the original images.
• Think about things that you’ve learned about in FML as well as Computer Vision. Di-
mensionality reduction could be helpful. Overfitting and outliers may be an issue, and you
should consider using methods to minimise this.
7 Further reading
Face alignment is a reasonably well researched field, and a wide variety of methods have been
proposed. Some relatively recent approaches are documented below. [1] is probably a good one
to look at, [2] contains a survey of methods, which might give you some ideas and [3] describes
the results of a competition. The other references are very much optional reading on other
popular recent methods.
Ivor Simpson: Computer Vision @ University of Sussex – Spring 2022
References
[1] Xiong X, De la Torre F. Supervised descent method and its applications to face alignment.
InProceedings of the IEEE conference on computer vision and pattern recognition 2013
(pp. 532-539). Paper link.
[2] Learned-Miller E, Huang GB, RoyChowdhury A, Li H, Hua G. Labeled faces in the wild: A
survey. InAdvances in face detection and facial image analysis 2016 (pp. 189-248). Springer,
Cham. Paper link.
[4] Cao X, Wei Y, Wen F, Sun J. Face alignment by explicit shape regression. International
Journal of Computer Vision. 2014 Apr 1;107(2):177-90. Paper link.
[5] Burgos-Artizzu XP, Perona P, Dollár P. Robust face landmark estimation under occlusion.
InProceedings of the IEEE international conference on computer vision 2013 (pp. 1513-
1520). Paper link
[6] Zhu S, Li C, Change Loy C, Tang X. Face alignment by coarse-to-fine shape searching.
InProceedings of the IEEE conference on computer vision and pattern recognition 2015
(pp. 4998-5006). Paper link