A Practical Introduction to Computer Vision with OpenCV 1st Edition Kenneth Dawson-Howe pdf download
A Practical Introduction to Computer Vision with OpenCV 1st Edition Kenneth Dawson-Howe pdf download
https://ebookname.com/product/a-practical-introduction-to-
computer-vision-with-opencv-1st-edition-kenneth-dawson-howe/
https://ebookname.com/product/opencv-computer-vision-projects-
with-python-1st-edition-joseph-howse/
https://ebookname.com/product/an-introduction-to-3d-computer-
vision-techniques-and-algorithms-1st-edition-boguslaw-cyganek/
https://ebookname.com/product/essential-algorithms-a-practical-
approach-to-computer-algorithms-1st-edition-rod-stephens/
https://ebookname.com/product/fundamentals-of-evaluation-and-
diagnostics-of-welded-structures-1st-edition-anatoliy-
yakovlevich-nedoseka/
The Diatessaron in the Syriac Acts of John Jacob of
Serug and the Diatessaron R. H. Connolly
https://ebookname.com/product/the-diatessaron-in-the-syriac-acts-
of-john-jacob-of-serug-and-the-diatessaron-r-h-connolly/
https://ebookname.com/product/foundations-of-sql-
server-2005-business-intelligence-1st-edition-lynn-langit/
https://ebookname.com/product/reason-5-ignite-the-visual-guide-
for-new-users-1st-edition-g-w-childs/
https://ebookname.com/product/a-little-matter-of-genocide-
holocaust-and-denial-in-the-americas-1492-to-the-present-ward-
churchill/
https://ebookname.com/product/earthquake-resistant-
structures-1st-edition-mohiuddin-ali-khan/
The Dynamics of Managing Diversity Second Edition Gill
Kirton
https://ebookname.com/product/the-dynamics-of-managing-diversity-
second-edition-gill-kirton/
Kenneth Dawson-Howe
20
10
-10
-20
40
20
80 0 60 80
20 40
0
A Practical Introduction to
Computer Vision
with OpenCV
A PRACTICAL
INTRODUCTION TO
COMPUTER VISION
WITH OPENCV
A PRACTICAL
INTRODUCTION TO
COMPUTER VISION
WITH OPENCV
Kenneth Dawson-Howe
Trinity College Dublin, Ireland
This edition first published 2014
© 2014 John Wiley & Sons Ltd
Registered office
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom
For details of our global editorial offices, for customer services and for information about how to apply for
permission to reuse the copyright material in this book please see our website at www.wiley.com.
The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright,
Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any
form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK
Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be
available in electronic books.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and
product names used in this book are trade names, service marks, trademarks or registered trademarks of their
respective owners. The publisher is not associated with any product or vendor mentioned in this book.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing
this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of
this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It is
sold on the understanding that the publisher is not engaged in rendering professional services and neither the
publisher nor the author shall be liable for damages arising herefrom. If professional advice or other expert
assistance is required, the services of a competent professional should be sought.
1 2014
I am grateful to many people for their help and support during the writing
of this book. The biggest thanks must go to my wife Jane, my children,
William and Susie, and my parents, all of whose encouragement
has been unstinting.
I must express my thanks to my students for their interest and enthusiasm
in this subject. It is always refreshing to hear students discussing how to
solve vision problems in tutorials and great to hear their solutions to
problems which are often different (and sometimes better) than my own.
I thank my colleagues (in particular Arthur Hughes, Jeremy Jones and
Hilary McDonald) for their encouragement and support.
Contents
Preface xiii
1 Introduction 1
1.1 A Difficult Problem 1
1.2 The Human Vision System 2
1.3 Practical Applications of Computer Vision 3
1.4 The Future of Computer Vision 5
1.5 Material in This Textbook 6
1.6 Going Further with Computer Vision 7
2 Images 9
2.1 Cameras 9
2.1.1 The Simple Pinhole Camera Model 9
2.2 Images 10
2.2.1 Sampling 11
2.2.2 Quantisation 11
2.3 Colour Images 13
2.3.1 Red–Green–Blue (RGB) Images 14
2.3.2 Cyan–Magenta–Yellow (CMY) Images 17
2.3.3 YUV Images 17
2.3.4 Hue Luminance Saturation (HLS) Images 18
2.3.5 Other Colour Spaces 20
2.3.6 Some Colour Applications 20
2.4 Noise 22
2.4.1 Types of Noise 23
2.4.2 Noise Models 25
2.4.3 Noise Generation 26
2.4.4 Noise Evaluation 26
2.5 Smoothing 27
2.5.1 Image Averaging 27
2.5.2 Local Averaging and Gaussian Smoothing 28
2.5.3 Rotating Mask 30
2.5.4 Median Filter 31
viii Contents
3 Histograms 35
3.1 1D Histograms 35
3.1.1 Histogram Smoothing 36
3.1.2 Colour Histograms 37
3.2 3D Histograms 39
3.3 Histogram/Image Equalisation 40
3.4 Histogram Comparison 41
3.5 Back-projection 43
3.6 k-means Clustering 44
4 Binary Vision 49
4.1 Thresholding 49
4.1.1 Thresholding Problems 50
4.2 Threshold Detection Methods 51
4.2.1 Bimodal Histogram Analysis 52
4.2.2 Optimal Thresholding 52
4.2.3 Otsu Thresholding 54
4.3 Variations on Thresholding 56
4.3.1 Adaptive Thresholding 56
4.3.2 Band Thresholding 57
4.3.3 Semi-thresholding 58
4.3.4 Multispectral Thresholding 58
4.4 Mathematical Morphology 59
4.4.1 Dilation 60
4.4.2 Erosion 62
4.4.3 Opening and Closing 63
4.4.4 Grey-scale and Colour Morphology 65
4.5 Connectivity 66
4.5.1 Connectedness: Paradoxes and Solutions 66
4.5.2 Connected Components Analysis 67
5 Geometric Transformations 71
5.1 Problem Specification and Algorithm 71
5.2 Affine Transformations 73
5.2.1 Known Affine Transformations 74
5.2.2 Unknown Affine Transformations 75
5.3 Perspective Transformations 76
5.4 Specification of More Complex Transformations 78
5.5 Interpolation 78
5.5.1 Nearest Neighbour Interpolation 79
5.5.2 Bilinear Interpolation 79
5.5.3 Bi-Cubic Interpolation 80
5.6 Modelling and Removing Distortion from Cameras 80
5.6.1 Camera Distortions 81
5.6.2 Camera Calibration and Removing Distortion 82
Contents ix
6 Edges 83
6.1 Edge Detection 83
6.1.1 First Derivative Edge Detectors 85
6.1.2 Second Derivative Edge Detectors 92
6.1.3 Multispectral Edge Detection 97
6.1.4 Image Sharpening 98
6.2 Contour Segmentation 99
6.2.1 Basic Representations of Edge Data 99
6.2.2 Border Detection 102
6.2.3 Extracting Line Segment Representations of Edge Contours 105
6.3 Hough Transform 108
6.3.1 Hough for Lines 109
6.3.2 Hough for Circles 111
6.3.3 Generalised Hough 112
7 Features 115
7.1 Moravec Corner Detection 117
7.2 Harris Corner Detection 118
7.3 FAST Corner Detection 121
7.4 SIFT 122
7.4.1 Scale Space Extrema Detection 123
7.4.2 Accurate Keypoint Location 124
7.4.3 Keypoint Orientation Assignment 126
7.4.4 Keypoint Descriptor 127
7.4.5 Matching Keypoints 127
7.4.6 Recognition 127
7.5 Other Detectors 129
7.5.1 Minimum Eigenvalues 130
7.5.2 SURF 130
8 Recognition 131
8.1 Template Matching 131
8.1.1 Applications 131
8.1.2 Template Matching Algorithm 133
8.1.3 Matching Metrics 134
8.1.4 Finding Local Maxima or Minima 135
8.1.5 Control Strategies for Matching 137
8.2 Chamfer Matching 137
8.2.1 Chamfering Algorithm 137
8.2.2 Chamfer Matching Algorithm 139
8.3 Statistical Pattern Recognition 140
8.3.1 Probability Review 142
8.3.2 Sample Features 143
8.3.3 Statistical Pattern Recognition Technique 149
8.4 Cascade of Haar Classifiers 152
8.4.1 Features 154
8.4.2 Training 156
x Contents
9 Video 167
9.1 Moving Object Detection 167
9.1.1 Object of Interest 168
9.1.2 Common Problems 168
9.1.3 Difference Images 169
9.1.4 Background Models 171
9.1.5 Shadow Detection 179
9.2 Tracking 180
9.2.1 Exhaustive Search 181
9.2.2 Mean Shift 181
9.2.3 Dense Optical Flow 182
9.2.4 Feature Based Optical Flow 185
9.3 Performance 186
9.3.1 Video Datasets (and Formats) 186
9.3.2 Metrics for Assessing Video Tracking Performance 187
References 209
Index 213
Preface
Perception is essential in order for any entity to interact in a meaningful way with its environ-
ment. Humans draw on many senses (such as sight, sound, touch and smell) to perceive the
world. Most machines can only receive input through simple input devices, such as keyboards
and mice, or through wired and wireless communication channels. However, in recent years,
cameras and microphones have been added as standard parts of computers and mobile devices
(such as phones and tablets). At the same time, the speed of these devices has increased sig-
nificantly, making it possible to start to process this data in a meaningful manner. Computer
Vision is about how we can automate image or video understanding on machines. It covers
the techniques used to automate tasks ranging from industrial inspection (where the image
understanding problem is constrained to one which we could easily address 20 years ago)
to video understanding in order to guide autonomous robots so that they can interact in a
meaningful and safe manner in a world designed for humans.
This book provides a brief introduction to this exciting field, covering the basics of image
processing and providing the reader with enough information to solve many practical problems.
Computer vision systems are becoming ubiquitous. They are in our homes (in the interfaces of
the games consoles which our children use), in our cameras and phones (providing automatic
face detection and red eye removal), on our streets (determining the licence plates of vehicles
passing through toll gates), in our offices (providing biometric verification of identity), and
even more so in our factories, helping to guide robots to manufacture goods (such as cars)
and automatically inspecting goods to ensure they look right. Yet it seems that we are only at
the beginning of how computer vision can be employed, and we can expect significantly more
vision systems to emerge.
For those interested in this field as developers (and that hopefully includes you as you are
reading this book) there is very good news as there are a number of high quality systems
in which computer vision solutions can be developed, of which two stand out in particular:
MATLAB® and OpenCV. MATLAB® provides an environment that allows relatively rapid
prototyping of vision solutions. OpenCV is a high quality library for C and C++, with wrappers
for Python and Java (on Windows, Linux, MacOS, FreeBSD, OpenBSD, Android, Maemo and
iOS), which provides implementations of many state-of-the-art vision techniques. OpenCV is
the platform of choice for many vision developers, is developed collaboratively by the vision
community and is available free of charge for educational and commercial use. OpenCV code
snippets are provided throughout this book so that readers can easily take the theory and easily
create working solutions to vision problems.
xiv Preface
Electronic Resources
The electronic resources which accompany this text inlcude:
r the code examples from the text along with images generated from the code to give an idea
of the processing done by each section of the code.
r Powerpoint slides for each of the chapters.
r the media (images and videos) for each of the application problems in Chapter 10 of the
book.
r links to information on OpenCV.
For tutorials, it is suggested that the class be broken into groups of three or four students
(all in a single large venue) and that the groups should be asked to come up with solutions
to some of the vision problems in Chapter 10 (using the vision techniques they have learnt).
The intention is that the students discuss how to solve the problems, coming up with ways
of combining the techniques that they have learnt in order to solve them. There is more than
one solution to all of the problems, so some of the groups should present their solutions to the
class, and the class and lecturer should discuss how appropriate the solutions are. For labs and
assignments, the same problems can be used, as OpenCV provides the functionality to allow
students to prototype solutions to these problems.
1
Introduction
Computer vision is the automatic analysis of images and videos by computers in order to
gain some understanding of the world. Computer vision is inspired by the capabilities of the
human vision system and, when initially addressed in the 1960s and 1970s, it was thought to
be a relatively straightforward problem to solve. However, the reason we think/thought that
vision is easy is that we have our own visual system which makes the task seem intuitive to
our conscious minds. In fact, the human visual system is very complex and even the estimates
of how much of the brain is involved with visual processing vary from 25% up to more
than 50%.
As I look out of my window, I see grass and trees, gently swaying in the wind, with a lake
beyond . . . An asphalt path leads down through the trees to the lake and two squirrels
are chasing each other to and fro across it, ignoring the woman coming up the path . . .
A Practical Introduction to Computer Vision with OpenCV, First Edition. Kenneth Dawson-Howe.
© 2014 John Wiley & Sons, Ltd. Published 2014 by John Wiley & Sons, Ltd.
2 A Practical Introduction to Computer Vision with OpenCV
67 67 66 68 66 67 64 65 65 63 63 69 61 64 63 66 61 60
69 68 63 68 65 62 65 61 50 26 32 65 61 67 64 65 66 63
72 71 70 87 67 60 28 21 17 18 13 15 20 59 61 65 66 64
75 73 76 78 67 26 20 19 16 18 16 13 18 21 50 61 69 70
74 75 78 74 39 31 31 30 46 37 69 66 64 43 18 63 69 60
73 75 77 64 41 20 18 22 63 92 99 88 78 73 39 40 59 65
74 75 71 42 19 12 14 28 79 102 107 96 87 79 57 29 68 66
75 75 66 43 12 11 16 62 87 84 84 108 83 84 59 39 70 66
76 74 49 42 37 10 34 78 90 99 68 94 97 51 40 69 72 65
76 63 40 57 123 88 60 83 95 88 80 71 67 69 32 67 73 73
78 50 32 33 90 121 66 86 100 116 87 85 80 74 71 56 58 48
80 40 33 16 63 107 57 86 103 113 113 104 94 86 77 48 47 45
88 41 35 10 15 94 67 96 98 91 86 105 81 77 71 35 45 47
87 51 35 15 15 17 51 92 104 101 72 74 87 100 27 31 44 46
86 42 47 11 13 16 71 76 89 95 116 91 67 87 12 25 43 51
96 67 20 12 17 17 86 89 90 101 96 89 62 13 11 19 40 51
99 88 19 15 15 18 32 107 99 86 95 92 26 13 13 16 49 52
99 77 16 14 14 16 35 115 111 109 91 79 17 16 13 46 48 51
Figure 1.1 Different versions of an image. An array of numbers (left) which are the values of the
grey scales in the low resolution image of a face (top right). The task of computer vision is most like
understanding the array of numbers
This is the scene I experience, a world of objects with background, acted upon and
sometimes acting and interacting in events. I have no problem seeing and hearing and
smelling and feeling all these things because they affect my senses directly and they
make up the real world.
Or do they? I can look again and notice things I missed before, or see the scene in new
ways. There is a white wall framing the window I am looking through and the window
in fact fills less of my field of view that the wall, but I did not even notice the wall at first,
and my impression was that the scene through the window was a panorama right across
in front of me. There are metal bars dividing the window into squares and the glass is
obscured with dust and spots but for me the view seems complete and un-obscured. The
‘grass’ is patches of colour ranging from nearly white in the bright sun to nearly black
in the shade but I ‘saw’ green grass in light and shade. Other changing greenish shapes
were for me permanent leafy branches moved by a wind I neither saw nor felt, and two
constantly varying grey shapes were squirrels moving with a purpose. Another shape
increasing in size and changing in position was an approaching woman. (Wilding, 1983)
If you consider your eyes, it is probably not clear to you that your colour vision (provided
by the 6–7 million cones in the eye) is concentrated in the centre of the visual field of the eye
(known as the macula). The rest of your retina is made up of around 120 million rods (cells
that are sensitive to visible light of any wavelength/colour). In addition, each eye has a rather
large blind spot where the optic nerve attaches to the retina. Somehow, we think we see a
continuous image (i.e. no blind spot) with colour everywhere, but even at this lowest level of
processing it is unclear as to how this impression occurs within the brain.
The visual cortex (at the back of the brain) has been studied and found to contain cells that
perform a type of edge detection (see Chapter 6), but mostly we know what sections of the
brain do based on localised brain damage to individuals. For example, a number of people with
damage to a particular section of the brain can no longer recognise faces (a condition known
as prosopagnosia). Other people have lost the ability to sense moving objects (a condition
known as akinetopsia). These conditions inspire us to develop separate modules to recognise
faces (e.g. see Section 8.4) and to detect object motion (e.g. see Chapter 9).
We can also look at the brain using functional MRI, which allows us to see the concentration
of electrical activity in different parts of the brain as subjects perform various activities. Again,
this may tell us what large parts of the brain are doing, but it cannot provide us with algorithms
to solve the problem of interpreting the massive arrays of numbers that video cameras provide.
r Inspect printed circuits boards to ensure that tracks and components are placed correctly.
See Figure 1.2.
r Inspect print quality of labels. See Figure 1.3.
r Inspect bottles to ensure they are properly filled. See Figure 1.3.
Figure 1.2 PCB inspection of pads (left) and images of some detected flaws in the surface mounting
of components (right). Reproduced by permission of James Mahon
4 A Practical Introduction to Computer Vision with OpenCV
Figure 1.3 Checking print quality of best-before dates (right), and monitoring level to which bottles
are filled (right). Reproduced by permission of Omron Electronics LLC
On the factory floor, the problem is a little simpler than in the real world as the lighting
can be constrained and the possible variations of what we can see are quite limited. Computer
vision is now solving problems outside the factory. Computer vision applications outside the
factory include:
r The automatic reading of license plates as they pass through tollgates on major roads.
r Augmenting sports broadcasts by determining distances for penalties, along with a range of
other statistics (such as how far each player has travelled during the game).
r Biometric security checks in airports using images of faces and images of fingerprints. See
Figure 1.4.
r Augmenting movies by the insertion of virtual objects into video sequences, so that they
appear as though they belong (e.g. the candles in the Great Hall in the Harry Potter movies).
30.8
30.0
29.1
28.3
27.5
26.6
25.8
25.0
24.2
23.3
22.5
°C
Figure 1.4 Buried landmines in an infrared image (left). Reproduced by permission of Zouheir Fawaz,
Handprint recognition system (right). Reproduced by permission of Siemens AG
Introduction 5
r Assisting drivers by warning them when they are drifting out of lane.
r Creating 3D models of a destroyed building from multiple old photographs.
r Advanced interfaces for computer games allowing the real time detection of players or their
hand-held controllers.
r Classification of plant types and anticipated yields based on multispectral satellite images.
r Detecting buried landmines in infrared images. See Figure 1.4.
Some examples of existing computer vision systems in the outside world are shown in
Figure 1.4.
Figure 1.5 The ASIMO humanoid robot which has two cameras in its ‘head’ which allow ASIMO to
determine how far away things are, recognise familiar faces, etc. Reproduced by permission of Honda
Motor Co. Inc
Ultimately, computer vision is aiming to emulate the capabilities of human vision, and to
provide these abilities to humanoid (and other) robotic devices, such as ASIMO (see Figure
1.5). This is part of what makes this field exciting, and surprising, as we all have our own
(human) vision systems which work remarkably well, yet when we try to automate any
computer vision task it proves very difficult to do reliably.
Chapter 6 describes the extraction and use of edges (locations at which the brightness
or colour changes significantly) in images. These cartoon-like features allow us to abstract
information from images. Edge detection does not perform well at corners, and in Chapter 7
we look at corner/feature points that can act as a complement to edges or can be used on their
own to provide less ambiguous features with which to match different images or objects.
In Chapter 8, we look at a number of common approaches to recognition in images, as
in many applications we need to determine the location and identity of objects (e.g. license
plates, faces, etc.).
Chapter 9 looks at the basics of processing videos, concentrating particularly on how we
detect moving objects in video feeds from static cameras (a problem that occurs frequently
in video surveillance), how we track objects from frame to frame and how we can assess
performance in video processing.
Finally, in Chapter 10, we present a large number of vision application problems to provide
students with the opportunity to solve real problems (which is the only way to really appreciate
how difficult computer vision is). Images or videos for these problems are provided in the
resources associated with this book.
We end this introduction with a quote from the 19th century: ‘… apprehension by the senses
supplies after all, directly or indirectly, the material of all human knowledge, or at least the
stimulus necessary to develop every inborn faculty of the mind. It supplies the basis for the
whole action of man upon the outer world … For there is little hope that he who does not begin
at the beginning of knowledge will ever arrive at its end’ (von Helmholtz, 1868).
Language: English
By
William Blades
Sweet Tully.
2 Henry VI, iv, 1.
The fact to be noted with reference to these classical quotations is
this: Shakspere quotes those Latin authors, and those only, of which
Vautrollier had a ‘license’; and makes no reference to other and
popular writers, such as Virgil, Pliny, Aurelius, and Terence, editions
of whose works Vautrollier was not allowed to issue, but all of which,
and especially the last, were great favorites in the sixteenth century,
as is shown by the numerous editions which issued from the presses
of Vautrollier’s fellow-craftsmen.
Among other publications of Vautrollier was an English translation of
Ludovico Guicciardini’s Description of the Low Countries, originally
printed in 1567. In this work is one of the earliest accounts of the
invention of printing at Haarlem, which is thus described in the
Batavia of Adrianus Junius, 1575. ‘This person [Coster] during his
afternoon walk, in the vicinity of Haarlem, amused himself with
cutting letters out of the bark of the beech tree, and with these, the
characters being inverted as in seals, he printed small sentences.’
The idea is cleverly adapted by Orlando:
these trees shall be my books,
And in their barks my thoughts I’ll character.
As You Like It, iii, 2.
Lastly, it would be an interesting task to compare the Mad Folk of
Shakspere, most of whom have the melancholy fit, with
A Treatise of Melancholie: containing the Causes thereof and
Reasons of the Strange Effects it worketh in our Minds and Bodies.
London, 8vo., 1586.
This was printed by Vautrollier, and probably read carefully for press
by the youthful Poet.
The disinclination of Shakspere to see his plays in print has often
been noticed by his biographers, and is generally accounted for by
the theory that reading the plays in print would diminish the desire
to hear them at the theatre. This is a very unsatisfactory reason, and
not so plausible as the supposition that, sickened with reading other
people’s proofs for a livelihood, he shrunk from the same task on his
own behalf. His contemporaries do not appear to have shared in the
same typographical aversion. The plays of Ben Jonson and
Beaumont and Fletcher were all printed in the life-time of their
authors. Francis Quarles had the satisfaction and pride of seeing all
his works in printed form, and showed his appreciation and
knowledge of Typography by the following quaint lines, which we
quote from the first edition, literatim:
On a Printing-house.
The world’s a Printing-house: our words, our thoughts,
Our deeds, are Characters of sev’rall sizes:
Each Soule is a Compos’ter; of whose faults
The Levits are Correctors: Heav’n revises;
Death is the common Press; fro whence, being driven,
W’ are gathered Sheet by Sheet, & bound for Heaven.
From Divine Fancies, 1632,
lib. iv, p. 164.
II. THE TECHNICALITIES OF
PRINTING, AS USED BY SHAKSPERE
The impressure.
Twelfth Night, ii, 5.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookname.com