Developments in Medical Image Processing and Computational Vision (PDFDrive)
Developments in Medical Image Processing and Computational Vision (PDFDrive)
and Biomechanics
Volume 19
Series Editors
João Manuel R.S. Tavares
Departamento de Engenharia Mecânica
Universidade do Porto, Faculdade de Engenharia, Porto, Portugal
R. M. Natal Jorge
Departamento de Engenharia Mecânica
Universidade do Porto, Faculdade de Engenharia, Porto, Portugal
Research related to the analysis of living structures (Biomechanics) has been carried out extensively
in several distinct areas of science, such as, for example, mathematics, mechanical, physics, infor-
matics, medicine and sports. However, for its successful achievement, numerous research topics
should be considered, such as image processing and analysis, geometric and numerical modelling,
biomechanics, experimental analysis, mechanobiology and Enhanced visualization, and their ap-
plication on real cases must be developed and more investigation is needed. Additionally, enhanced
hardware solutions and less invasive devices are demanded. On the other hand, Image Analysis
(Computational Vision) aims to extract a high level of information from static images or dynamical
image sequences. An example of applications involving Image Analysis can be found in the study
of the motion of structures from image sequences, shape reconstruction from images and medical
diagnosis. As a multidisciplinary area, Computational Vision considers techniques and methods
from other disciplines, like from Artificial Intelligence, Signal Processing, mathematics, physics
and informatics. Despite the work that has been done in this area, more robust and efficient methods
of Computational Imaging are still demanded in many application domains, such as in medicine,
and their validation in real scenarios needs to be examined urgently. Recently, these two branches
of science have been increasingly seen as being strongly connected and related, but no book series
or journal has contemplated this increasingly strong association. Hence, the main goal of this book
series in Computational Vision and Biomechanics (LNCV&B) consists in the provision of a com-
prehensive forum for discussion on the current state-of-the-art in these fields by emphasizing their
connection. The book series covers (but is not limited to):
• Applications of Computational Vision and Biomechanics
• Biometrics and Biomedical Pattern Analysis
• Cellular Imaging and Cellular Mechanics
• Clinical Biomechanics
• Computational Bioimaging and Visualization
• Computational Biology in Biomedical Imaging
• Development of Biomechanical Devices
• Device and Technique Development for Biomedical Imaging
• Experimental Biomechanics
• Gait & Posture Mechanics
• Grid and High Performance Computing on Computational Vision and Biomechanics
• Image Processing and Analysis
• Image processing and visualization in Biofluids
• Image Understanding
• Material Models
• Mechanobiology
• Medical Image Analysis
• Molecular Mechanics
• Multi-modal Image Systems
• Multiscale Biosensors in Biomedical Imaging
• Multiscale Devices and BioMEMS for Biomedical Imaging
• Musculoskeletal Biomechanics
• Multiscale Analysis in Biomechanics
• Neuromuscular Biomechanics
• Numerical Methods for Living Tissues
• Numerical Simulation
• Software Development on Computational Vision and Biomechanics
• Sport Biomechanics
• Virtual Reality in Biomechanics
• Vision Systems
• Image-based Geometric Modeling and Mesh Generation
• Digital Geometry Algorithms for Computational Vision and Visualization
In order to match the scope of the Book Series, each book has to include contents relating, or
combining both Image Analysis and mechanics. Indexed by SCOPUS and Springerlink
Developments in Medical
Image Processing and
Computational Vision
2123
Editors
João Manuel R.S. Tavares Renato Natal Jorge
Departamento de Engenharia Mecânica Departamento de Engenharia Mecânica
Universidade do Porto Universidade do Porto
Faculdade de Engenharia Faculdade de Engenharia
Porto Porto
Portugal Portugal
This book presents novel and advanced topics in Medical Image Processing and
Computational Vision in order to solidify knowledge in the related fields and define
their key stakeholders.
The twenty-two chapters included in this book were written by invited experts of
international recognition and address important issues in Medical Image Processing
and Computational Vision, including: 3D Vision, 3D Visualization, Colour Quanti-
sation, Continuum Mechanics, Data Fusion, Data Mining, Face Recognition, GPU
Parallelisation, Image Acquisition and Reconstruction, Image and Video Analysis,
Image Clustering, Image Registration, Image Restoring, Image Segmentation, Ma-
chine Learning, Modelling and Simulation, Object Detection, Object Recognition,
Object Tracking, Optical Flow, Pattern Recognition, Pose Estimation, and Texture
Analysis.
Different applications are addressed and described throughout the book, com-
prising: Biomechanical Studies, Bio-structure Modelling and Simulation, Bone
Characterization, Cell Tracking, Computer-Aided Diagnosis, Dental Imaging, Face
Recognition, Hand Gestures Detection and Recognition, Human Motion Analysis,
Human-Computer Interaction, Image and Video Understanding, Image Processing,
Image Segmentation, Object and Scene Reconstruction, Object Recognition and
Tracking, Remote Robot Control, and Surgery Planning.
Therefore, this book is of crucial effectiveness for Researchers, Students, End-
Users and Manufacturers from several multidisciplinary fields, as the ones related
with Artificial Intelligence, Bioengineering, Biology, Biomechanics, Computational
Mechanics, Computational Vision, Computer Graphics, Computer Sciences, Com-
puter Vision, Human Motion, Imagiology, Machine Learning, Machine Vision,
Mathematics, Medical Image, Medicine, Pattern Recognition, and Physics.
The Editors would like to take this opportunity to thank to all invited authors for
sharing their works, experiences and knowledge, making possible its dissemination
through this book.
v
Contents
A Fast and Accurate Algorithm for Detecting and Tracking Moving Hand
Gestures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
Walter C. S. S. Simões, Ricardo da S. Barboza, Vicente F. de Jr Lucena
and Rafael D. Lins
Center for Medical Image Science and Visualization (CMIV), Linköping University,
Linköping, Sweden
Linköping University, Linköping, Sweden
Edson A. Capello de Sousa Faculdade de Engenharia de Bauru, Universidade
Estadual Paulista-Unesp, Bauru, São Paulo, Brazil
M. M. G. Souza Federal University of Rio de Janeiro, Ilha do Fundão, Rio de
Janeiro, Brazil
B. Taboada ESTiG, IPB, C. Sta. Apolonia, Bragança, Portugal
CEFT, FEUP, R. Dr. Roberto Frias, Porto, Portugal
L. Teresi Dipartimento di Matematica e Fisica, LaMS-Modeling & Simulation Lab,
Università Roma Tre, Rome, Italy
R. P. Tornow Department of Ophthalmology, University of Erlangen, Erlangen-
Nuremberg, Erlangen, Germany
Pattern Recognition Lab and Erlangen Graduate School of Advanced Optical
Technologies, University of Erlangen, Erlangen-Nuremberg, Erlangen, Germany
C. Torromeo Dipartimento di Scienze Cardiovascolari, Respiratorie, Nefrologiche,
Anestesiologiche, Sapienza Università di Roma, Rome, Italy
Paulo Trigueiros Insituto Politécnico do Porto, IPP, Porto, Portugal
DEI/EEUM-Departamento de Electrónica Industrial, Escola de Engenharia,
Universidade do Minho, Guimarães, Portugal
Centro Algoritmi, Universidade do Minho, Guimarões, Portugal
V. Varano Dipartimento di Architettura, LaMS-Modeling & Simulation Lab,
Università Roma Tre, Rome, Italy
R. Vigário Department of Information and Computer Science, Aalto University
School of Science, Aalto, Finland
M. Vodakova Department of Biomedical Engineering, Faculty of Electrical En-
gineering and Communication, Brno University of Technology, Brno, Czech
Republic
G. Vranou Department of Informatics, Technological Education Institute, Sindos,
Thessaloniki, Greece
G. Wachs-Lopes Inaciana Educational Foundation, Sao Paulo, Brazil
About the Editors
xvii
xviii About the Editors
Abstract The present work deals with segmentation of Glial Tumors in MRI images
focusing on critical aspects in manual labeling and reference estimation for seg-
mentation validation purposes. A reproducibility analysis was conducted confirming
the presence of different sources of uncertainty involved in the process of manual
segmentation and responsible of high intra-operator and inter-operator variability.
Technical and conceptual solutions aimed to reduce operator variability and support
in the reference estimation process are integrated in GliMAn (Glial Tumor Manual
Annotator), an application allowing to view and manipulate MRI volumes and imple-
menting a label fusion strategy based on fuzzy connectedness. A set of experiments
was conceived and conducted to evaluate the contribution of the solutions proposed
in the process of manual segmentation and reference data estimation.
1 Introduction
Magnetic Resonance (MR) imaging plays a fundamental role in scientific and clin-
ical studies of brain pathologies. By visual inspection of MRI imagery, physicians
can accurately examine and identify tissues thanks to the high spatial resolution
and contrast and their enhanced differentiation. Segmentation intended as a precise
E. Binaghi ()
Dipartimento di Scienze Teoriche e Applicate—Sezione Informatica,
Università degli Studi dell’Insubria, Varese, Italy
e-mail: [email protected]
V. Pedoia
Musculoskeletal Quantitative Imaging Research Group
Department of Radiology and Biomedical Imaging University of California,
San Francisco, USA
D. Lattanzi · E. Monti · S. Balbi
Dipartimento di Biotecnologie e Scienze della Vita,
Università degli Studi dell’Insubria Varese, Varese, Italy
R. Minotto
Unità Operativa di Neuroradiologia Ospedale di
Circolo e Fondazione Macchi, Varese, Italy
delineation of the pathological and healthy tissues composing the MR image is im-
portant to develop quantitative analysis, understand pathologies, evaluate the evolu-
tionary trend, plan the best surgical approach or evaluate alternative strategies [1–3].
In some areas, such as Glial Tumor studies, it is particularly difficult to objectively
establish the limits between the tumor and the normal brain tissue. However glial tu-
mor segmentation is of great importance to plan resection, quantify the postoperative
residual, identify radiotherapy margins and evaluate the therapy response based on
the tumor volume evaluation. Segmentation accomplished through a complete man-
ual tracing is a difficult, time consuming task usually affected by intra- and inter-
variation that limits the stability and reproducibility of the results. Difficulties en-
countered in manual labeling make in some cases computer support highly desirable
offering segmentation procedures with varying degrees of automation. However, the
use of automated segmentation procedures poses in turn the problem of a reference
standard representative of the true segmentation which is required for the assessment
of accuracy of the automated results. Recent works focus the attention on methods
which do not require ground truth, but rely on behavioral comparison [2, 4–6]. With
this approach, the evaluation involves the design of a reliable common agreement
strategy able to define a suitable reference standard through combining manually
traced segmentations. Proceeding from these considerations, the contribution of the
present work is twofold. Firstly a reproducibility study is proposed, aimed to ex-
perimentally assess quantitatively the extent of the operator variability in the critical
context of Glial Tumor segmentation studies. The motivation of this experimental
investigation lies in the fact that few studies have been recently developed to in-
vestigate the extent of the operator variability in specific MRI clinical applications.
The second contribution is the design of GliMAn (Glial Tumor Manual Annotator),
an integrated system that offers visualization tools and facilities in support to man-
ual labeling and reference data estimation for validating automated segmentation
results. The facilities offered by GliMAn in truth label collection for fully manual
segmentation was the subject of a previous work [7]. An extended version of GliMAn
is here presented implementing fuzzy connectedness algorithms [8] used to merge
individual labels and generate segmentation representative of a common agreement.
A precise volumetric computation of the pathological MRI signal has several funda-
mental implications in clinical practice. In fact, the accurate definition of both the
topographical features and the growing pattern of the tumor is crucial in order to
select the most appropriate treatment, to plan the best surgical approach and to post-
operatively correctly evaluate the extent of resection and monitoring the evolution
over time of any eventual residue [9]. However, it is worth noting that gliomas are
characterized by constant local growth (4 mm/year) within the brain parenchyma,
migration along white matter pathways both in ipsilateral and even contralateral
On the Evaluation of Automated MRI Brain Segmentations 3
The aim of the present analysis is twofold: to assess the agreement of segmentations
as performed by different experts (inter-variability) and to assess the reproducibility
of the manual segmentations as performed by the same expert (intra-variability). The
dataset used is composed of four FLAIR MRI gray scale volumes with the following
acquisition parameters:
• gray scale
• 12 bit depth
• Volume Size [432 × 432 × 300]
• Slice Thickness 0.6 mm
• Spacing Between Slices 0.6 mm
• Pixel Spacing (0.57, 0.57) mm
• Repetition Time 8000
• Echo Time 282.89
All dataset volumes are altered by the presence of glial tumors, which are heteroge-
neous in terms of position, dimension, intensity and shape. A team of five medical
experts was asked to segment axial, sagittal and coronal slices of these volume data
by employing an image annotator normally in use in clinical practice and by offer-
ing standard image viewing facilities. Figure 1 shows an example of slice-by-slice
manual segmentation of glial tumor areas provided by 5 experts along the axial plane
and superimposed on the original MRI slice.
MRI segmentation was performed with the purpose of determining the size of
pathological tissues and their spatial distribution in two or three dimensions according
to the nature of the data. Metrics adopted in the present analysis for size estimation
error and spatial distribution error are described below.
Size Estimation Error Let be S1i S2i S3i the size estimation of the region (surface
or volume) extracted from the axial, sagittal and coronal plane segmentation respec-
tively, performed by the i − th expert. The intra- and inter- size estimation errors
4 E. Binagh et al.
Fig. 1 Slice-by-slice manual segmentations of low grade glioma brain tumor performed by 5
medical experts
along the plane p with p ∈ {1, 2, 3} and the i − th expert are computed as follows:
Nseg
Spi − 1
Nseg j =1 Sji
intraSizeErr ip = Nseg ;
1
Nseg j =1 Sji
Nexp j
Spi − 1
Nexp j =1 Sp
interSizeErr ip = Nexp j
(1)
1
Nexp j =1 Sp
where Nseg is the number of segmentations performed by the same expert on the
same volume and Nexp is the total number of experts.
Spatial Distribution Error Let be M1i M2i M3i the 2D or 3D masks obtained from
the segmentations along axial, sagittal and coronal plane respectively, performed by
the i − th expert. The intra- and inter- spatial distribution errors, evaluated in terms
of Jaccard Distance [13] are computed as follows:
Expert 1
Expert 2
Expert 3
100% Expert 4
Expert 5
80%
Surface Estimation Error
60%
40%
20%
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
a Tumor Slices
2d Spatial Distribution
100%
Expert 1
Expert 2
90% Expert 3
Expert 4
Expert 5
80%
70%
2D Jaccard Distance
60%
50%
40%
30%
20%
10%
0
b 0 0.1 0.2 0.3 0.4 0.5
Tumor Slices
0.6 0.7 0.8 0.9 1
Fig. 2 2D intra-variability analysis conducted on each expert on one MRI volume: (a) Surface
estimation error (b) 2D Spatial distribution error
Figure 2a shows the mean of the intra-size estimation error intraSizeErrpi as com-
puted varying the segmentation plane p and referring to each slice presenting a tumor
of one MRI volume in the data set as operated by each varying expert.
i
Figure 2b shows the mean of the spatial distribution error Jp,t , as computed varying
all the possible pairs of planes p, t and referring to each slice presenting a tumor of
one MRI volume as operated by each varying expert.
The intra-variability measures confirm consistently an acceptable level of repro-
ducibility for slices including the central part of the tumor area, with values lower
than 15 and 20 % for the surface estimation error and for the Jaccard distance re-
spectively. The intra-variability increases considerably in the slices which include
6 E. Binagh et al.
70%
60%
Surface Estimation Error
50%
40%
30%
20%
10%
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
a Tumor Slices
2d Spatial Distribution
70%
Case1
Case2
Case3
60% Case4
50%
2D Jaccard Distance
40%
30%
20%
10%
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
b Tumor Slices
the marginal part of the tumor with peaks of 103 % in surface estimation error and
92 % in spatial distribution error. This result can be interpreted mainly in light of
two facts that the boundary masks are smaller and an error computed on few pixels
results in a large percentage error; secondly that the slices are difficult to segment
considering the high level of infiltration in the healthy tissue.
Figure 3a shows the mean of the inter-size estimation error interSizeErrpi as
computed varying the expert i and referring to both each volume in the data set and
each segmentation along the axial plane.
i,j
Figure 3b shows the mean of the spatial distribution error Jp as computed by
each varying pair of experts i, j and referring to each volume in the data set and to
each segmentation along the axial plane.
On the Evaluation of Automated MRI Brain Segmentations 7
Both the inter- variability measures adopted confirm a high level of variability
when segmenting both central and boundary slices with peaks exceeding 50 % and
with definitely unacceptable results in the boundary slices.
Table 1 reports the results of intra-variability analysis both in terms of volume esti-
mation error and of 3D spatial distribution for 2 cases of the dataset. The analysis of
the volume estimation shows an acceptable level of variability. The Jaccard Distances
indicate instead a high level of variability in spatial distribution. The inconsistency
of the two metrics comes from the compensation of errors in volume estimation.
Table 2 reports the results of the inter-variability analysis for all the 4 cases of
the dataset. The results obtained lead to the same conclusion drawn in the previous
case. The low variance values equal to 0.14 and 0.10 % as computed on the volume
estimation and the 3D spatial distribution errors respectively, indicates that dissim-
ilarities are equally distributed among experts. Anomalous behaviors of individuals
or sub-groups of experts (i.e. neuroradiologists and neurosurgeons) is not detected.
Results obtained were discussed and interpreted through a close dialogue between
physicians and computer scientists during joint meetings. Our analysis confirms
8 E. Binagh et al.
the well known result that a validation procedure based on interactive drawing of
the desired segmentation by domain experts which is often considered the only
acceptable approach, suffers from intra-expert and inter-expert variability. In addition
the analysis allows to conclude that the extent of the problem in the specific context
of MRI brain tumor segmentation strongly affects the reliability of manual labeling
as a source of reference standard in segmentation studies.
Dissimilarities among experts can be traced back to two main sources. A first
source of uncertainty is identified in the lack of information during the visual inspec-
tion phase. Considering the trend of the areas of tumor sections manually annotated
by each expert and reported in Fig. 4 we notice large transitions between consecutive
slices indicating non-compliance with the constraint of continuity. We concluded
that physicians should explore a resonance volume through subsequent axial coronal
and sagittal slices and the inspection on a given slice must be contextually related to
the inspection of previous and subsequent slices.
The second source of uncertainty originates within the process of assigning a
region under a given category, based on complex and vague clinical signs. The as-
signment of crisp labels is accomplished arbitrarily reducing uncertainty and forcing
a boolean decision.
We assume that such an intrinsic uncertainty can be properly managed within the
fuzzy set framework. Images are fuzzy by nature, and object intensities come from
different factors such as the material heterogeneity of the object and the degrada-
tion introduced by the imaging device. Under these critical conditions, the labeling
process has to be properly modeled as a matter of degrees in order to completely
represent the expert decisional attitudes in connecting heterogeneous image elements
forming objects.
4000
3000
Tumor Surface
2000
On the Evaluation of Automated MRI Brain Segmentations
1000
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Tumor Slices
Fig. 4 Trends of the areas of tumor sections manually segmented along the axial direction by the experts
9
10 E. Binagh et al.
MRI Data
Manual
Navigator
Read Annotator
Volume
Manager
Report
Read/Write Execuon
Mode
Switch
Viewer
Experts
Fuzzy
Masks & Data
Connectedness
Output Data
Seeds & ROI
Acquision
Collecve
Truth
Producon
sagittal and coronal) and the synchronized visualization of the input labels. Human-
computer interaction principles and usability guidelines have been strictly observed
in the GliMAn physical design, in order to limit eyestrain and ambiguities that can
undermine the effectiveness of conceptual solutions in the GUI interaction. The GUI
is composed of 3 principal areas (Fig. 4): upper, central and lateral. The upper zone
includes standard I/O features of an image viewer and management tools, the central
zone shows orthogonal planes and the lateral zone allows to change the execution
mode. Plan layout has been designed in accordance with solutions adopted in standard
image processing and viewer environments for medical applications.
Moreover the method of orthogonal projections is universally used to represent
objectively and dimensionally accurate volumetric object. The essential feature of
this visualization method is that it preserves the correct proportions between the
elements of the volume. The visualization in all three planes is synchronized: when
choosing a point of coordinates (x0 ; y0 ; z0 ), the three images represented are the
intersection of the MRI volume with the sagittal coronal and axial planes respectively
passing through the point.
Fig. 7 Crop of brain MRI axial, sagittal and coronal sections with presence of Low Grade Glial
Tumor; label assignement considering axial section alone is made under a high level of uncertainty
that can be reduced considering the label position in the other two planes
Fig. 8 GliMAn Manual Segmentation interactive procedure: (a) broken line joining selected points
(b)segmentation mask superimposed on the original image MRI
On the Evaluation of Automated MRI Brain Segmentations 13
the red circle to the edge of the tumor. The visualization of the orthogonal control
planes in GliMAn interface reduces the uncertainty in the assignment of the point
to the boundary. The selected points are then joined by a broken line (Figure 8a).
Clicking on the first point the broken line becomes a polygon that encloses the area of
interest. Figure 8b shows a segmentation superimposed to the original MRI image.
During the segmentation of the N − th slice the segmentation performed on the slice
(N − 1) − th is visualized.
GliMAn implements a reference data estimation method which uses Fuzzy Connect-
edness principles to merge individual labels and generate segmentation representative
of a common agreement [14]. Labels are provided by the experts who are asked to
manually identify few highly reliable points belonging to the objects of interest.
Points collected by each experts are conceived as multiple seeds and starting from
them, the Fuzzy Connectedness algorithm computes the segmentation. The proposed
strategy, rooted in the fuzzy set theory, is able to deal with uncertain information
and then to manage dissimilarity among manually identified labels. The operator
intervention is drastically limited with respect to a complete manual tracing and the
formal fuzzy framework supports in the overall process of estimation.
The overall session is organized in two phases:
• collection of information by each expert,
• fusion of the information.
In the first phase, GliMAm provides a specific execution mode Fuzzy Obj, which
imposes a change of visualization based on Maximum Intensity Projection Images
[15]. The Maximum Projection Intensity Images (MIP) computed in axial, sagittal
and coronal direction are shown on the 3 plans (Fig. 9).
The experts surround the region of interest on each plane containing the tumor.
With the intersection of the three projections a Volume of Interest (VoI) is identified
and the display of orthogonal planes moves in theVoI extracted from the original MRI.
In a second step users identify a set of objects and background seeds together with
two regions characterizing the same object and background.
As in a manual labeling session the selection of seed points and regions made by
experts is supported with the synchronized visualization in all of the three planes.
The set of parameters provided by each expert is then stored using a own identifer. In
a subsequent phase the total set of parameters is loaded and the Fuzzy Connectedness
is with it initialized and executed.
Segmentation results can be represented in two different ways corresponding to
two different display modes. In the absolute mode, fuzzy grades of membership
associated to an absolute fuzzy object are hardened according to a threshold value
provided by experts and the resulting crisp object is display. In the relative mode,
object elements for which the grades of membership are higher than the grades of
the background are computed and displayed.
14 E. Binagh et al.
Fig. 9 Axial Sagittal and Coronal Maximun Intensity Projection (MIP) Images shown by GliMAn
for the Volume Of Interest(VoI) identification
4 Experiments
The same group of experts who worked in the operation variability analysis, has been
involved again to segment, 4 slices for each of the 4 MRI volumes (case 1-4) of our
On the Evaluation of Automated MRI Brain Segmentations 15
Fig. 10 Mean of the Surface Errors (a) and 2D Jaccard distances computed for each expert, varying
the 4 slices segmented using conventional and GliMAn tools
dataset with the support of GliMAn. We measured the 2D inter- variability using the
metrics described in section 2 and we compared results with those obtained using
the conventional annotator (see Fig. 10a and b). Results are expressed in terms of
mean of the surface estimation error and mean of the Jaccard distance respectively,
each varying 4 slices and expert. The use of GliMAn has determined a significant
reduction of the surface estimation error, with a maximum value equal to 16.95 %
for case 1 expert 5 and minimum value equal to −0.30 % for case 3 expert 2. The
average reduction of the Jaccard distance is equal to 5.14 %, with a maximum value
equal to 26.79 % for case 1 between experts 4 and 5; and minimum value equal to
−2.63 % for case 3 between experts 2 and 3.
16 E. Binagh et al.
Fig. 11 Reference segmentations of source slice 1-case2 (a,b) obtained by Fuzzy Connectedness
(i) and by majory voting (h) applied on the individual fully manual labels (c–g)
5 Conclusion
This paper analyzes and meaures the inter-intra-operator variability in glial tumor
segmentation. Based on the results obtained, a strategy for label collection and refer-
ence data estimation was designed and implemented in the system GliMAn. As seen
in our experimental context, fully manual labeling benefits from the use of GliMAn
facilities that preserves the volumetric nature of image data. The reference data es-
timation based on fuzzy connectedness allows to estimate consensus segmentation
with improved reproducibility and low requirements on operator time.
References
Abstract The retinal nerve fiber layer (RNFL) is one of the most affected retinal
structures due to the glaucoma disease. Progression of this disease results in the RNFL
atrophy that can be detected as the decrease of the layer’s thickness. Usually, the
RNFL thickness can be assessed by optical coherence tomography (OCT). However,
an examination using OCT is rather expensive and still not widely available. On the
other hand, fundus camera is considered as a common and fundamental diagnostic
device utilized at many ophthalmic facilities worldwide. This contribution presents
a novel approach to texture analysis enabling assessment of the RNFL thickness in
widely used colour fundus photographs. The aim is to propose a regression model
based on different texture features effective for description of changes in the RNFL
textural appearance related to the variations of RNFL thickness. The performance
evaluation uses OCT as a gold standard modality for validation of the proposed
approach. The results show high correlation between the models predicted output
and RNFL thickness directly measured by OCT.
1 Introduction
2 Image Database
The database contains a number of 19 fundus image sets of healthy subjects and a
number of 8 image sets of glaucomatous subjects with distinctive focal wedge-shaped
RNFL loss. Only one eye of each subject was images. Each image set contains images
acquired by a common non-mydriatic digital fundus camera CANON CR–1 (EOS
40D) with 60-degree field of view (FOV). The images have size of 3504 × 2336
pixels. Standard CANON raw data format (CR2) was used for storage of the images
(Fig. 1).
The database also contains three-dimensional volume data and circular scans,
acquired by spectral domain OCT system (Spectralis HRA—OCT, Heidelberg En-
gineering) for each of the 27 subjects. Infrared reflection images (scanning laser
ophthalmoscope—SLO) and OCT B–scan (cross-sectional) images were acquired
simultaneously. Acquisition of the OCT image volume (Fig. 2a) was performed
within the peripapillary area. Circular scan pattern (Fig. 2b) is usually used for glau-
coma diagnosis via OCT. A circle with diameter 3.4 mm is placed in the center of
the optic disc (OD) and one B-scan is measured along this circle [18].
22 J. Odstrcilik et al.
Fig. 1 An example of an original RGB fundus image of the healthy eye and individual colour
channels of the image. In standard fundus image, the red (R) channel appears oversaturated, while
the green (G), and the blue (B) channels show the blood vessels and retinal nerve fiber layer highly
contrasted
3 Methods
The fundus images are preprocessed in several steps. First, standard uncompressed
TIFF format is reconstructed from the raw data, whereas a linear gamma transfer
function is applied in the reconstruction process. Secondly, non-uniform illumination
Analysis of the Retinal Nerve Fiber Layer Texture . . . 23
Fig. 2 An example of OCT volume and circular scans. a SLO image (left) with the volume scan
pattern allocated by the green lines and one B-scan (right) measured at the position depicted by
the blue line in SLO image; b SLO image (left) with the circular scan pattern defined by the blue
circle and the B-scan (right) measured along this circle in direction given by the arrow. The curves
in individual B-scans define segmentation of the RNFL
of fundus images is corrected together with the increase of image contrast using
CLAHE (Contrast Limited Adaptive Histogram Equalization) technique [20]. The
RNFL texture is the most contrasted in the green (G) and the blue (B) channels of
the input RGB image (Fig. 1). Therefore, an average of G and B channel (called
GB image) is computed for each fundus image after CLAHE. Further, only the GB
images are used for processing.
In the first step, we manually selected small square-shaped image regions of
interest (ROIs) with size of 61 × 61 pixels from all fundus images included in
the group of normal subjects. Extraction of ROIs was performed uniformly in the
peripapillary area to the maximum distance not exceeding 1.5 × diameter of the OD;
whereas only locations without the blood vessels were considered (Fig. 4). In this
way, a total number of 354 ROIs was collected. Particular ROIs then represent the
typical RNFL pattern depending on the position in the peripapillary area for normal
subjects without any signs of glaucoma disease.
24 J. Odstrcilik et al.
The OCT volume data were preprocessed in order to obtain the RNFL thickness in
the peripapillary area of each subject in the database. Hence, the RNFL was
segmented and the corresponding RNFL thickness map was created using freely
available research software [21]. Segmentation of the RNFL layer was done auto-
matically with very good precision so that only subtle manual corrections had to be
performed in some B-scans using this software package (see segmentation of the
RNFL in Fig. 2), especially in the area of large blood vessels (shadow artifacts in the
B-scans). The final RNFL thickness map can be seen in Fig. 5.
Fig. 4 At the top: section of the GB image after CLAHE processing with depiction of ROI positions;
at the bottom: few examples of magnified ROIs with the RNFL texture taken in different positions
in the peripapillary area (around the OD)
26 J. Odstrcilik et al.
The advance texture analysis methods, namely Gaussian Markov random field
(GMRF) [24] and local binary patterns (LBP) [25] were used for the descrip-
tion of RNFL texture. These approaches were selected due to their rotation- and
illumination-invariant properties as well as noise robustness.
Fig. 6 A fifth-order
symmetric rotation-invariant
neighborhood structure
1
σ = (y(s) − φ T q(s))2 , (4)
M2 Ω
where
⎡ ⎤
q(s) = col ⎣ y(s + r); i = 1, ..., I ⎦, (5)
r∈Ni
The second applied method—LBP is based on conversion of the local image texture
into the binary code using rotation-invariant and uniform LBP operator [25]. The
local image texture around the central pixel (xc , yc ) can be characterized by the LBP
code, which is derived via the Eq. [25]:
⎧P −1
⎨ s(gp − gc ) U (G ) ≤ 2,
⎪
P
LBPP,R (xc , yc ) = p=0
riu2
(6)
⎪
⎩ otherwise
P +1
28 J. Odstrcilik et al.
−1
P
U (GP ) = |s(gP −1 − gc ) − s(g0 − gc )| + s(gP − gc ) − s(gp−1 − gc ) (7)
p=1
In Eqs. 6 and 7, gc corresponds to the grey value of the central pixel (xc , yc ) of
a local neighborhood and gp (p = 0, . . . , P–1) corresponds to the grey values of P
equally spaced pixels on a circle of radius R (R > 0) that form a circularly symmetric
neighborhood structure. The LPB operator expressed by Eg. 6 assumes uniform
patterns. The “uniformity” of a pattern is ensured by the term U(GP ). Patterns with
U value of less than or equal to two are considered as “uniform” [25]. It means these
patterns have at most two 0–1 or 1–0 transitions in the circular binary code.
Two variants of LBP were utilized in the proposed approach. Both variants are
based on the rotation-invariant and uniform LBP16,2 operator (i.e. P = 16, R = 2).
One variant uses only LBP distribution computed from an input GB image. Then,
the grey-level histogram of such parametric image is computed and extraction of
6 statistical features follows [25]: mean value, standard deviation, skewness, kur-
tosis, total energy and entropy. In the second variant, standard LBP distribution is
supplemented with computation of local contrast CP ,R :
P −1 P −1
1 1
CP ,R = (gp − μ)2 , where μ = gp (8)
P p=0 P p=0
separated form as w = 41 − a2 , 41 , a, 41 , 41 − a2 , where a = 0.4, is utilized. Finally, a
78-dimensional feature vector is obtained via extraction of the features from G0 (i,j),
G1 (i,j), and G2 (i,j). Composition of the final feature vector is depicted schematically
in Fig. 7.
The aim of this work is to propose the utilization of texture analysis in fundus
images for description of changes in the RNFL pattern related to variations in the
RNFL thickness. The ability of the proposed texture analysis methods, in connection
with several regression models, to predict the RNFL thickness has been investigated.
Different regression models—linear regression (LinReg) [29], two types of support
vector regression (ν-SVR, ε-SVR) [30], and multilayer neural network (NN) [31]
have been tested to predict values of the RNFL thickness using the proposed texture
features. In addition, different feature selection approaches [32] have been tested
towards identification of the most relevant feature subset of the original feature set.
Finally, we have chosen a popular wrapper-based feature selection strategy with a
sequential forward search method (SFS) that provided the most accurate prediction
of the RNFL thickness using various regression models. In SFS strategy, standard
forward hill-climbing procedure is utilized. The procedure starts with an empty
feature set and sequentially adds a feature that yields in the best improvement of a
subset. This proceeds until there is no improvement in performance of a particular
feature subset. In each iteration of the wrapper approach (for each feature subset), a
cross-validation procedure is used to evaluate model output via a chosen evaluation
criterion (e.g. mean squared error).
Spearman’s rank correlation coefficient (ρ) and root mean squared error of predic-
tion (RMSEP) were used as evaluation criteria of the models output. ρ is computed
30 J. Odstrcilik et al.
between the model predicted output y and the target variable c as follows [33]:
n
6 (ci − yi )2
i=1
ρ =1− , (10)
n(n2 − 1)
where n is number of samples. The values of y and c are separately ranked from 1 to n
in increasing order. yi and ci in Eq. 10 represent the ranks of particular observations
i = 1, . . . ,n of the respective variables. Spearman’s rank correlation coefficient was
chosen because of two main properties: (i) it can measure a general monotonic
relationship between two variables, even when the relationship is not necessarily
linear, and (ii) it is robust to outliers due to ranking of values.
Even when the correlation between y and c can be strong, the predicted values
can still differ from the target values with some error. In order to evaluate model
accuracy in the error sense, a frequently used heuristic criterion is utilized:
n
(ci − yi )2
i=1
RMSEP = . (11)
n
Evaluation of the proposed approach was carried out in two stages. In the first stage,
ability of the proposed features to predict the RNFL thickness with particular regres-
sion models was evaluated. A feature vector was computed for each of the 354 ROIs.
The target variable, i.e. the vector of the RNFL thicknesses at particular locations on
the retina, was derived from the interpolated RNFL thickness map provided by the
OCT volume data. The repeated random sub-sampling cross-validation technique
was used for performance evaluation. This random sub-sampling procedure was re-
peated 100 times. In each cross-validation run, 70 and 30 % of randomly selected
ROIs was utilized for training and testing the model, respectively. Parameters ρ
and RMSEP, computed between the predicted output and the vector of RNFL
thicknesses, were used to evaluate the models performance.
In the second stage, the proposed method was evaluated utilizing entire fundus
images. Usually, the OCT device acquires a circular scan (with diameter 3.4 mm)
around the ONH and the RNFL thickness is then evaluated in this single scan [18].
Hence, evaluation of the RNFL in fundus images was performed similarly as in OCT
in a predefined peripapillary area. First, the blood vessels in fundus images were
extracted via our match filtering approach [19] to be able to conduct an analysis in
the non-vessel areas only. A circular scan pattern was placed manually into the ONH
center for each fundus image. This scan pattern consists of five particular circles
(to make the scan reasonably thick). Scanning was performed for individual circles
and the final profile was interpolated. The same interpolation technique was used to
interpolate final profile in the non-vessel areas as well.
Analysis of the Retinal Nerve Fiber Layer Texture . . . 31
Table 1 Averaged cross-validation results of particular regression models using the wrapper-based
SFS search strategy
Model ρ[–] RMSEP [μm]
LinReg 0.7430 ± 0.0370 20.0054 ± 1.4542
(5,6,9,11,37,39,48,49,54,64,69,71,78)
ν-SVR 0.7450 ± 0.0379 19.9746 ± 1.3609
(5,6,10,32,37,39,44,58,78)
ε-SVR 0.7437 ± 0.0375 20.0587 ± 1.3689
(5,6,12,32,37,39,44,49,78)
NN 0.6497 ± 0.0469 24.5163 ± 1.7310
(1,6,19,40,44,46,78)
All values of ρ are statistically significant with p–values 0.05
In the first step, the regression models were evaluated using above-mentioned set
of 354 ROIs. An optimal feature subset was identified for individual models by an
iterative wrapper algorithm (as mentioned in a previous section) minimizing the error
between the model output and the RNFL thickness. This way, different subsets were
selected for particular models (Table 1). As the best subsets were identified, both
ρ and RMSEP were evaluated for particular models. Cross-validation results are
presented graphically in Figs. 8 and 9, along with their averaged values in Table 1.
The selected features are numerically listed below the name of particular models in
this table.
Fig. 8 Cross-validation results of particular models with the wrapper-based SFS search strategy—ρ computed between the models predicted output and the
RNFL thickness. The results are depicted graphically in terms of a particular cross-validation runs and b statistical boxplot diagrams
J. Odstrcilik et al.
Analysis of the Retinal Nerve Fiber Layer Texture . . .
Fig. 9 Cross-validation results for particular models using the wrapper-based SFS search strategy—RMSEP computed between the models predicted output
and the RNFL thickness. The results are depicted graphically in terms of a particular cross-validation runs and b statistical boxplot diagrams
33
34 J. Odstrcilik et al.
Fig. 10 Relation between the ν-SVR predicted output and the RNFL thickness for a feature subset
identified via the wrapper-based SFS approach. The model output was computed for each of the
354 ROIs
by OCT1 . Examples of the results are shown in Fig. 11–12 to demonstrate major
outcomes and drawbacks of the proposed approach. Particularly, Fig. 11 shows re-
sults of the image that achieved one of the highest performance in terms of ρ along
with one of the lowest error of prediction (image no. 1 in Table 2). Inspecting the
result in detail, one can reveal that the model predicted output follows correctly
the RNFL thickness profile with subtle differences only. Possible deviations can
be caused probably by variations in image quality (blurring and presence of noise
in a couple of images). In addition, one drawback concerns the blood vessels that
cover rather large area of the retina, especially in the OD surroundings. At the lo-
cations of blood vessels and their near neighborhood, the texture representing the
RNFL is missing in fundus images. Hence, the texture analysis needs to be carried
out at the locations without the blood vessels only. Due to this issue, the predicted
values are reduced particularly at the locations of major blood vessel branches. How-
ever, even in the worst case, the evaluation revealed that the results are still relevant
capturing variations in the RNFL thickness significantly. Figure 12 then shows an
example of glaucomatous subject. The performance of the method evaluated using
images of glaucomatous subjects is lower than for normal subjects. Generally, this
is probably due to worse image quality of glaucomatous subjects that were tested
(possibly caused by cataracts and unclear ocular media). In addition, limited number
of patients also influences the evaluation. Mean values of RMSEP (Table 2 and 3)
1
Significance of the results was statistically validated by t-test at the 5 % significance level.
Analysis of the Retinal Nerve Fiber Layer Texture . . . 35
Fig. 11 Images of circular scans of the normal subject and corresponding profiles: a ν-SVR model
output with the corresponding profile, b SLO image with circular scan pattern and the RNFL
thickness profile. Red curves represent polynomial approximation of each profile. The red arrow
indicates direction of scanning
could signalize limited precision of the proposed methodology. However, these error
values are probably acceptable (especially for screening purposes), since they are
comparable with the general difference between normal and glaucomatous RNFL
thickness (∼20–25 μm) [34]. Despite the drawbacks mentioned, the evaluation part
revealed that the proposed methodology could satisfactorily contribute to the RNFL
assessment based only on fundus camera. The proposed texture analysis approach
is able to capture continues variations in the RNFL thickness and thus can be used
for possible detection of the RNFL thinning caused by pathological changes in the
retina. Additional advantage of this texture approach is that the proposed features
are invariant to changes of illumination and light reflection.
36 J. Odstrcilik et al.
Fig. 12 Images of circular scans of the glaucomatous subject and corresponding profiles: a ν-SVR
model output with corresponding profile, b SLO image with circular scan pattern and the RNFL
thickness profile. Red curves represent polynomial approximation of each profile. The red arrow
indicates direction of scanning. The RNFL loss can be seen approx. at the angular position of
270-degrees
5 Conclusions
The complex approach to texture analysis of the RNFL in colour fundus images is
presented. The results revealed that the proposed texture features can be satisfactorily
applied for quantitative description of continues variations in the RNFL thickness.
Obtained values of ρ and RMSEP confirmed usability of the proposed approach
for prediction of the RNFL thickness using only colour fundus images. Thus, it
promises applicability of this approach for detection of the RNFL thinning caused
by pathological changes in the retina.
One limitation of the proposed texture approach may be requirement of high qual-
ity fundus images with sufficient resolution and sharpness. However, many recent
fundus cameras are able to take images with sufficient resolution and it will be no
longer a problem due to progressive development of fundus imaging in the future.
In addition, some preprocessing approaches could be considered for enhancement
Analysis of the Retinal Nerve Fiber Layer Texture . . . 37
Table 2 Evaluation of the method on the images of normal subjects. The values in brackets deal
with approximated profiles (the red curves in Fig. 11). The values are computed for the non-vessel
locations only. Minimum and maximum values are boldfaced in each column
Image no. ρ [−] RMSEP [μm]
1 0.90 (0.98) 15.85 (10.50)
2 0.81 (0.82) 16.51 (24.89)
3 0.69 (0.88) 18.23 (14.59)
4 0.83 (0.98) 16.34 (16.25)
5 0.85 (0.96) 22.37 (22.48)
6 0.60 (0.92) 22.11 (12.75)
7 0.67 (0.91) 17.50 (11.34)
8 0.82 (0.97) 23.08 (23.50)
9 0.79 (0.90) 18.65 (12.02)
10 0.72 (0.92) 25.65 (17.81)
11 0.90 (0.99) 24.36 (24.29)
12 0.80 (0.98) 24.73 (25.49)
13 0.80 (0.92) 22.10 (22.13)
14 0.79 (0.80) 21.66 (20.43)
15 0.64 (0.90) 23.69 (15.18)
16 0.70 (0.90) 22.67 (20.71)
17 0.71 (0.95) 21.12 (17.86)
18 0.83 (0.94) 14.56 (10.58)
19 0.65 (0.99) 16.15 (13.57)
mean 0.76 (0.93) 20.39 (17.70)
std 0.09 (0.05) 3.48 (5.18)
All values of ρ are statistically significant with p–values 0.05
of the RNFL in fundus images (e.g. as in [35]) or for improving an image quality
using some image restoration techniques (e.g. as in [36]). Moreover, using a contrast
enhancing optical filter may also help to enhance appropriate colour channels and
improve visibility of the RNFL pattern [37].
The proposed methodology is not limited for utilization of presented texture anal-
ysis methods, namely GMRF and LBP. Other approaches, with respect to noise
robustness and rotation- and illumination- invariant properties, can be probably used
as well. Then, different feature sets could be used as an input for regression models.
Hence, in the further development, possible addition of other texture features could
be considered.
The performance evaluation has been so far performed with limited sample size
(especially of glaucomatous subjects). Nevertheless, evaluation on normal subjects is
38 J. Odstrcilik et al.
Table 3 Evaluation of the method on the images of glaucomatous subjects. The values in brackets
deal with approximated profiles (the red curves in Fig.12). The values are computed for the non-
vessel locations only. Minimum and maximum values are boldfaced in each column
Image no ρ [–] RMSEP [μm]
1 0.66 (0.71) 20.08 (19.49)
2 0.59 (0.68) 20.44 (18.05)
3 0.57 (0.82) 12.63 (10.48)
4 0.36 (0.36) 24.23 (21.30)
5 0.37 (0.41) 32.05 (28.91)
6 0.69 (0.75) 27.45 (25.24)
7 0.53 (0.82) 18.97 (12.91)
8 0.45 (0.43) 23.71 (18.12)
mean 0.53 (0.62) 22.44 (19.31)
std 0.13 (0.19) 5.86 (6.02)
All values of ρ are statistically significant with p–values 0.05
Acknowledgments This work has been supported by European Regional Development Fund—
Project FNUSA-ICRC (No.CZ.1.05/1.1.00/02.0123). In addition, the authors gratefully acknowl-
edge funding of the Erlangen Graduate School in Advanced Optical Technologies (SAOT) by the
German Research.
References
1. Bock R, Meier J, Nyul L et al (2010) Glaucoma risk index: automated glaucoma detection
from color fundus images. Med Image Anal 14:471–481
2. Hoyt WF, Frisen L, Newman NM (1973) Fundoscopy of nerve fiber layer defects in glaucoma.
Invest Ophthalmol Vis Sci 12:814–829
3. Airaksinen JP, Drance MS, Douglas RG et al (1984) Diffuse and localized nerve fiber loss in
glaucoma. Am J Opthalmol 98(5):566–571
4. Peli E, Hedges TR, Schwartz B (1989) Computer measurement of the retina nerve fiber layer
striations. Appl Optics 28:1128–1134
Analysis of the Retinal Nerve Fiber Layer Texture . . . 39
5. Yogesan K, Eikelboom RH, Barry CJ (1998) Texture analysis of retinal images to determine
nerve fibre loss. Proceedings of the 14th International Conference on Pattern Recognition, vol
2, Aug. 16–20, Brisbane, Australia, pp 1665–1667
6. Dardjat MT, Ernastuti E (2004) Application of image processing technique for early diagnosis
and monitoring of glaucoma. Proceedings of KOMMIT, Aug. 24–25, Jakarta, pp 238–245
7. Lee SY, Kim KK, Seo JM et al (2004) Automated quantification of retinal nerve fiber layer
atrophy in fundus photograph, Proceedings of 26th IEEE IEMBS, pp 1241–1243
8. Hayashi Y, Nakagawa T, Hatanaka Y et al (2007) Detection of retinal nerve fiber layer defects
in retinal fundus images using Gabor filtering. Proceedings of SPIE, vol 6514, pp 65142Z
9. Muramatsu Ch, Hayashi Y, Sawada A et al (2010) Detection of retinal nerve fiber layer defects
on retinal fundus images for early diagnosis of glaucoma. J Biomed Opt 15(1):1–7
10. Oliva AM, Richards D, Saxon W (2007) Search for color–dependent nerve–fiber–layer thinning
in glaucoma: a pilot study using digital imaging techniques. Proc Invest Ophthalmol Vis Sci
2007 (ARVO), May 6–10, 2007, Fort Lauderdale, USA, E-Abstract 3309
11. Prageeth P, Sukesh K (2011) Early detection of retinal nerve fiber layer defects using fun-
dus image processing. Proc. of IEEE Recent Advances in Intelligent Computational Systems
(RAICS), Sept. 22–24, Trivandrum, India, pp 930–936
12. Kolar R, Jan J (2008) Detection of glaucomatous eye via color fundus images using fractal
dimensions. Radioengineering 17(3):109–114
13. Novotny A, Odstrcilik J, Kolar R et al (2010) Texture analysis of nerve fibre layer in retinal
images via local binary patterns and Gaussian Markov random fields, Proceedings of 20th
International EURASIP Conference (BIOSIGNAL 2010), Brno, Czech Republic, pp 308–315
14. Acharya UR, Dua S, Du X, Sree SV et al (2011) Automated diagnosis of glaucoma using
texture and higher order spectra features. IEEE Trans Inf Technol Biomed 15:449–455
15. Odstrcilik J, Kolar R, Jan J et al (2012) Analysis of retinal nerve fiber layer via Markov random
fields in color fundus images, Proceedings of 19th International Conference on Systems, Signals
and Image Processing (IWSSIP 2012), Vienna, Austria, pp 518–521
16. Odstrcilik J, Kolar R, Tornow RP et al (2013) Analysis of the retinal nerve fiber layer texture
related to the thickness measured by optical coherence tomography, Proceedings of VIPimage
2014 conference, Funchal-Madeira, Portugal, pp 105–110
17. Jan J, Odstrcilik J, Gazarek J et al (2012) Retinal image analysis aimed at blood vessel tree
segmentation and early detection of neural–layer deterioration. Comput Med Imag Graph
36:431–441
18. Bendschneider D, Tornow RP, Horn F et al (2010) Retinal nerve fiber layer thickness in normal
measured by spectral domain OCT. J Glaucoma 19(7):475–482
19. Odstrcilik J, Kolar R, Budai A et al (2013) Retinal vessel segmentation by improved matched
filtering: evaluation on a new high-resolution fundus image database. IET Image Process
7(4):373–383
20. Pizer SM, Amburn EP, Austin JD (1987) Adaptive histogram equalization and its variations.
Comput Vis Graph Image Proc 39:355–368
21. Mayer M, Hornegger J, Mardin CY, Tornow RP (2010) Retinal nerve fiber layer segmentation on
FD–OCT scans of normal subjects and glaucoma patients. Biomed Opt Express 1:1358–1383
22. Kolar R, Harabis V, Odstrcilik J (2013) Hybrid retinal image registration using phase
correlation. Imaging Sci J 61(4):269–384
23. Ghassabi Z, Shanbehzadeh J, Sedeghat A, Fatemizadeh E (2013) An efficient approach for
robust multimodal retinal image registration based on UR-SIFT features and PIIFD descriptors.
EURASIP J Image Video Proc 25:1–16
24. Porter R, Canagarajah N (1997) Robust rotation–invariant texture classification: wavelet, Gabor
filter and GMRF based schemes. IEEE Proc Vis–Image Signal Proc 144(3):180–188
25. Ojala T, Pietikäinen M, Mäenpää T (2002) Multiresolution gray-scale and rotation invari-
ant texture classification with Local Binary Patterns. IEEE Trans Pattern Anal Mach Intell
24(7):971–987
40 J. Odstrcilik et al.
26. Haralick RM, Shanmugan K, Dinstein I (1973) Textural Features for Image Classification.
IEEE Trans Syst, Man, Cybern 3(6):610–621
27. Othmen MB, Sayadi M, Fnaiech F (2008) A multiresolution approach for noised texture clas-
sification based on co–occurrence matrix and first–order statistics. World Acad Sci, Eng Tech
39:415–421
28. Burt P (1983) The Laplacian pyramid as a compact image code. IEEE Trans Commun
31(4):532–540
29. Murphy KP (2012) Machine learning: a probabilistic perspective. The MIT Press, Cambridge,
p 1067
30. Chang Ch, Lin Ch (2011) LIBSVM: a library for support vector machines. ACM Trans Intell
Syst Technol 27(2):1–27
31. Mandic DP, Chambers JA (2001) Recurrent neural networks for prediction. Wiley, New York,
p 285
32. Liu H, Motoda H (2007) Computational methods of feature selection. Chapman Hall/CRC
Data Mining and Knowledge Discovery Series, Boca Raton, p 440
33. Indrayan A (2008) Medical biostatistics, 2nd ed. Chapman and Hall/CRC, Boca Raton, p 771
34. Madeiros FA, Zangwill LM, Bowd C, Vessain RM, Susanna R et al (2005) Evaluation of
retinal nerve fiber layer, optic nerve head, and macular thickness measurements for glaucoma
detection using optical coherence tomography. Am J Ophthalmol 139:44–55
35. Frisén L (2007)Anisotropic enhancement of the retinal nerve fiber layer. Neuro-Ophthalmology
31(4):99–103.
36. Marrugo AG, Šorel M, Šroubek F et al (2011) Retinal image restoration by means of blind
deconvolution. J Biomed Optics 16(11):1–11
37. Tornow RP, Laemmer R, Mardin C et al (2007) Quantitative imaging using a fundus camera.
Proceeding of Invest Ophthalmol Vis Sci (ARVO), Fort Lauderdale, USA, vol 48, E-Abstract
1206
Continuum Mechanics Meets Echocardiographic
Imaging: Investigation on the Principal Strain
Lines in Human Left Ventricle
Abstract We present recent investigations on the state of strain in human left ven-
tricle based on the synergy between continuum mechanics and echocardiographic
imaging. When data from three-dimensional Speckle Tracking Echocardiography
are available, special strain directions can be detected on the epicardial and endo-
cardial surfaces, which are well-known in continuum mechanics as principal strain
lines (PSLs), further classified into primary and secondary strain lines. An appro-
priate investigation on PSLs can help to identify lines where strains are largest as
primary and smallest as secondary. As PSLs change when cardiac diseases appear,
the challenge is that the analysis may allow for the identification of new indicators
of cardiac function.
1 Introduction
The heart is a specialised muscle that contracts regularly and pumps blood to the
body and the lungs. The center of the pumping function are the ventricles; due to the
higher pressures involved, the left ventricle (LV) is especially studied. On a simplistic
level, LV is a closed chamber, whose thick walls are composed of muscle fibres. It is
the contraction originated in the muscles that translates into pressure and/or volume
changes of the chamber. Moreover, the helicoidal fibres make relevant the torsion of
the chamber with respect to the longitudinal axis due to both pressure changes and
muscle contraction. The LV cycle may be schematized as the sequence of four steps:
filling-the diastolic phase; isovolumetric contraction; ejection-the systolic phase;
isovolumetric relaxation [1].
During the cycle, both pressure and volume vary in time, and a quite useful
determinant of the cardiac performance is the plot representing the pressure-volume
relationship in the LV during the entire cycle, that is, the PV loop; some of the many
clues contained in the plot (see Fig. 1) are briefly summarized in the following. Point
1 defines the end of the diastolic phase and is characterized by the end-diastolic
volume (EDV) and pressure (EDP); at this point the mitral valve closes and cardiac
muscle starts to contract in order to increase the blood pressure. At point 2 the
systolic phase begins: the aortic valve opens and blood is ejected outside the LV;
muscle keeps on contracting in order to further the ejection, while volume decreases
to a minimum. Point 3 defines the end of the systolic phase, and is characterized by
the end-systolic volume (ESV) and pressure (ESP); starting from here, LV undergoes
an isovolumic relaxation until point 4, where mitral valve opens and filling begins.
During the filling phase, muscle keeps on relaxing in order to accomodate a large
increase in blood volume, while maintaining the pressure at a quite low level. Filling
is completed at point 1. The difference between maximum and minimum volume is
called stroke volume (SV): SV := EDV − ESV . From a mechanical point of view,
the most intense work is performed along the pattern from point 2 to point 3, that
is, along the systolic phase, when pressures are high and muscle contraction too.
Typically, critical behaviors of the ventricular function are evidenced in this phase,
and mechanics can suggest which are good indicators of cardiac function. A relevant
condition about these indicators is the possibility to catch them through noninvasive
analyses.
A well-known example is ventricular torsion [23]. The role played by the LV
torsional rotation with respect to LV ejection and filling was only recently recognized
by application of speckle tacking echocardiography, whose output includes, among
the other things, the pattern of ventricular torsion along the cardiac cycle [4, 5,
10–12, 16]. As ventricular torsion is altered when a few pathologies are present
(see [3, 13, 18, 22, 24, 28]), it can be used as an indicator of cardiac function
which can be noninvasively investigated through 3-dimensional speckle tracking
echocardiography (3DSTE).
Detection of principal strain lines in LV may emerge as another possible non-
invasive tool to discriminate among different LVs as well as a tool that can help
clinicians to identify cardiac diseases at the early stages. On the other hand, and
differently from ventricular torsion, PSLs are not delivered as output by 3DSTE
devices, and a post-processing analysis of 3DSTE data is needed to identify them,
based on concepts borrowed from continuum mechanics. In [17, 20], it was initially
proposed to look at PSLs to identify muscle fiber architecture on the endocardial sur-
face. Therein, the echocardiographic analysis was limited to the endocardial surface,
and it was noted as due to the high contractions suffered by muscle fibres along the
systolic phase, PSLs may well determine just muscle fiber directions. Successively,
in [8] an accurate protocol of measurement of PSLs was proposed, tested, and suc-
cesfully verified through a computational model. The conclusions of this last work
were partially in contrast with the ones in [20]. It was demonstrated, firstly, that on
the endocardial surface of healthy LVs, primary strain lines identify circumferential
material directions; secondly, that on the epicardial surface primary strain lines are
similar to muscle fiber directions. In [6, 8, 9], a comparison between a real human LV
and a corresponding model was implemented by the same Authors; the conclusions
of [8] were confirmed, and made precise through a statistical analysis involving real
and computational data.
What is emerging, even if further investigations are needed, is that endocardial
PSLs coincide with circumferential material lines, due to the relevant stiffening effect
of the circumferential material lines when high pressures are involved, as it occurs
along the systolic phase, and to the capacity of the same material fibers to contrast
the LV dilation. It would mean that these visible functional strain lines are related to
the capacity of elastic response of the cardiac tissue to the high systolic pressure, and
that it might be important to follow this pattern when, due to pathological conditions,
this capacity is missing.
44 A. Evangelista et al.
2 Continuum Cardio-Mechanics
strain lines and, wherever needed, fiber architecture is conceived in order to make the
fiber lines coincide with the PSLs. Fibers make a tissue highly anisotropic; hence,
principal strain and stress lines may be distinct. Whereas principal strains can be
measured starting with the analysis of tissue motion, being only dependent on the
three-dimensional strain state of the tissue, principal stresses can only be inferred.
Thus, the PSL have a predominant role where the analysis of the mechanics of a body
is concerned, and can reveal which are the lines where largest strains are expected,
and how they change when diseases occur.
Key point is the evaluation at any place within the body of the nonlinear strain
tensor C, whose eigenpairs (eigenvalues-eigenvectors) deliver principal strains
and PSLs, respectively. Fixed a body, identified with the region B of the three-
dimensional Euclidean space E it occupies at a time to denoted as reference
configuration of the body, we are interested in following the motion of the body
at any time t ∈ I ⊂ R, with the time interval I identifying the duration of a human
cardiac cycle (hence, different from subject to subject, as it is discussed later). The
displacement field u, that is a map from B × T → V = T E, delivers at any time and
for any point y ∈ B the position p(y, t) of that point at that time: p(y, t) = y +u(y, t).
Strains are related to displacement gradients within the body; precisely, it can be
shown as, introduced the deformation gradient F = ∇ p = I + ∇u, the nonlinear
Cauchy-Green strain tensor is
C = FT F = I + ∇u + ∇uT + ∇uT ∇u , (1)
being I the identity tensor in V. In general, C is a three-dimensional tensor, describing
the strain state at any point y and time t of the body. If there is within the body a
distinguished surface S, whose unit normal field is described by the unit vector field
n, the corresponding surface strain tensor Ĉ can be obtained through a preliminary
projection of C onto that surface. The projector P = I − n ⊗ n, leads to the following
definition:
Ĉ = PCP . (2)
It is expected that Ĉ will represent a plane strain state, hence, that it will have a
zero eigenvalue corresponding to the eigenvector n. The primary strain lines on the
surface will be the streamlines of the eigenvector c2 , which lies on the surface and
corresponds to the smallest non-zero eigenvalues; the secondary strain lines are the
streamlines corresponding to the eigenvector c3 . Of course, when the strain tensor
C is ab initio evaluated from surface deformation gradients F̌ = PFP, it naturally
arises as a plane tensor.
Fig. 2 Speckles moving with tissue as viewed through STE (left); the apical four chamber view
(A); the second apical view orthogonal to plane A (B); three short-axis planes (C), in the apical
region (C1), in the mid-ventricle (C2), and at the basal portion of the LV (C3) (right) (unmodified
from the original ARTIDA image)
in the body has a unique speckle pattern that moves with tissue (Fig. 2, left panel).
A square or cubic template image is created using a local myocardial region in the
starting frame of the image data. The size of the template image is around 1 cm2
in 2D or 1 cm3 in 3D. In the successive frame, the algorithm identifies the local
speckle pattern that most closely matches the template (see [29] for further details).
A displacement vector is created using the location of the template and the matching
image in the subsequent frame. Multiple templates can be used to observe displace-
ments of the entire myocardium. By using hundreds of these samples in a single
image, it is possible to provide regional information on the displacement of the LV
walls, and thus, other parameters such as strain, rotation, twist and torsion can be
derived.
Echocardiographic examinations were performed with anAplio-Artida ultrasound
system (Toshiba Medical Systems Co, Tochigi, Japan). Full-volume ECG-gated 3D
data sets were acquired from apical positions using a 1–4 MHz 3D matrix array
transducer to visualize the entire LV in a volumetric image. To obtain these 3D data
sets, four or six sectors were scanned from consecutive cardiac cycles and combined
to provide a larger pyramidal volume covering the entire LV. The final LV geometry
was reconstructed by starting from a set of 6 homologous landmarks (see Fig. 2),
manually detected by the operator for all subjects under study. The manual detection
for a given set of landmarks is crucial because it allows recording spatial coordinates
in perfectly comparable anatomical structures of different subjects (following a ho-
mology principle). The results of our 3DSTE system is a time-sequence of shapes,
each constituted by 1297 landmarks, assumed to be homologous, for both the epicar-
dial and endocardial surfaces, positioned along 36 horizontal circles, each comprised
of 36 landmarks, plus the apex (see Fig. 3).
Typically, the results of the 3D-wall motion analysis are presented to the user as
averaged values for each segment identified by the device according to the American
Continuum Mechanics Meets Echocardiographic Imaging 47
Fig. 3 The markers automatically set by the software supporting 3DSTE are shown as small yellow
points on both three planes taken perpendicularly to the LV axis (left panel) and on two vertical
sections (right panel). In particular, in the figure the color code corresponds to the torsional rotation
of the LV at the beginning of the cardiac cycle (as evidenced by the small bar at the right bottom
corner of the figure)
In our case, it was possible to obtain the landmark clouds (upon which the standard
rotational, torsional and strain parameters are computed and outputted by each Artida
machine) by an unlocked version of the software equipping our PST25SX Artida
device, thanks to a special opportunity provided in the context of an official research
and development agreement between the Dipartimento di Scienze Cardiovascolari,
Respiratorie, Nefrologiche Anestesiologiche e Geriatriche (Sapienza Università di
Roma) and Toshiba Medical System Europe (Zoetermeer, The Netherland).
Our 3DSTE data were based on the acquisition made on a group of volunteers,
who were randomly selected from the local list of employees at a single University
Hospital Department. Individuals were subjectively healthy without a history of
hypertension or cardiac disease and were not taking medications. They all had normal
ECG and blood pressure below 140/90 mmHg [25]. Being the aim of the present
48 A. Evangelista et al.
Fig. 4 Circumferential strains versus time: mean values of the circumferential strains on the six
segments of the mid-myocardium identified by their acronyms (MA for middle-anterior, MAS
for middle antero-septum, MS for middle infero-septum, MI for middle inferior, MP for middle
posterior, ML for middle lateral); (dashed lines); mean value at mid-myocardium (solid, magenta)
work the analysis of the primary and secondary strain-line patterns in the LV walls,
rough data from 3DSTE are played through MatLab, as prescribed by the protocol of
measurement proposed and tested in [8], and shortly summed up in the next section.
Starting from 3DSTE data on walls’s motion and using the protocol proposed and
verified in [8], the surface nonlinear strain tensor C on the LV epicardium and
endocardium can be evaluated. Precisely, C is evaluated in correspondence of the
landmarks (see Fig. 3), at each time along the cardiac cycle.
As already written, the real LV is identified by a cloud of 36 × 36 × 2 + 1 points
(called markers pi ) whose motion is followed along the cardiac cycle: the position of
each of the (36 × 36) × 2 points pi (i = 1, 36 × 36 × 2) is registered by the device at
each time frame j of the cardiac cycle, and represented through the set of its Cartesian
coordinates. These coordinates refer to a system represented by the i3 axis defined
by the longitudinal LV axis and the (i1 , i2 ) axes on the orthogonal planes. The clouds
of markers are intrinsically ordered. Figure 5 shows the endocardial (left panel)
and epicardial (right panel) clouds Sendo and Sepi of points corresponding to our
representative individual within the sample survey. To each point P ∈ Sendo (Sepi ),
identified within the intrinsic reference system by the pairs of 3DSTE coordinates z
and φ, corresponds a set of n positions within the Cartesian coordinate system, where
Continuum Mechanics Meets Echocardiographic Imaging 49
Fig. 5 Cloud of 1296 points automatically identified by the software on the endocardial (left panel,
green empty dots) and epicardial (right panel, violet empty dots) surface, so as rendered by MatLab
for a human subject within our group
n is the number of equally spaced frames registered by the device along the cardiac
cycle. Moreover, let Pz ∈ Sendo and Pφ ∈ Sendo be the points close to the point P
in the 3DSTE topology, i.e. identified within the intrinsic reference system by the
pair (z + hz , φ) and (z, φ + hφ ) of 3DSTE coordinates, where hz = H (LV )/36,
hφ = 2π/10, and H (LV ) the height of the LV model. The vectors Pz − P and
Pφ − P span a non-orthonormal covariant basis (a1 , a2 ) which corresponds to the
3DSTE coordinate system. The corresponding controvariant basis (a1 , a2 ) can be
easily evaluated. Let p, pz , and pφ denote the positions occupied by the points P ,
Pz , and Pφ respectively at the frame j ; they define the covariant basis ã1 = pz − p
and ã2 = (pφ − p).
Both aα and ãα are known in terms of their Cartesian coordinates. Thus, the
following holds:
φ
ã1 = λzi (j ) ii and ã2 = λi (j ) ii , (3)
where j refers to the frame along the cardiac cycle;
φ
a1 = λzi ii and a2 = λi ii , (4)
φ φ
where λi = λi (0) and λzi = λzi (0).
At each point, the nonlinear strain tensor C can be evaluated through its
components
γ
Cβδ = Fβα F δ (aα · aγ ) , α, β = 1, 2, (5)
with
Fβα = F aβ · aα = ãβ · aα . (6)
50 A. Evangelista et al.
Fig. 6 Representation, in one subject, of the endocardial primary and secondary strain lines (from
left to right: panel 1 and 2); and of the epicardial primary and secondary strain lines (from left to
right: panel 3 and 4). It can be appreciated the behavior of primary strain lines in the middle (green)
part of the LV
The eigenvalue analysis on C reveals a plane strain state, thus delivering the ex-
pected results concerning the primary and secondary strain lines. The corresponding
eigenvalue-eigenvector pairs are denoted as (γ̄α , c̄α ), where α = 2, 3.
Through the protocol shortly summed up in the previous section, we can evaluate
PSLs corresponding to different subjects. We started with the evaluation of primary
and secondary strains and strain lines at the systolic peak. As cardiac cycle’s duration
is different from subject to subject, we needed at first to fix a few points along
the cardiac cycle identifying homologous times, that is, times corresponding to the
occurrences of special mechanical and electrical events which can be identified along
any cardiac cycles. With this, we associate to the real-time scale based on the finite
number of times caught by the 3DSTE device along the cardiac cycle, a new time
scale, based on 6 homologous times which are the same along any cardiac cycle.
The systolic time was identified as the one corresponding to the end systolic volume.
Other homologous times before the systolic one are those corresponding to the peak
of R wave and to the end of T wave; homologous times after the systolic one are those
corresponding to the mitral-valve opening, to the end of rapid filling (beginning of
diastasis), and to the onset of Q wave.
Figures 6 and 7 show endocardial and epicardial primary and secondary strain
lines corresponding to the homologous systolic times and to two different human LVs,
chosen among our data as representative of the group. The colors identify different
parts of the LVs: grey for the apical part, green for the middle part, and orange for the
basal part. Each line identifies the direction of the primary and secondary strain line
at a point of the endocardial and epicardial cloud defined by 3DSTE. The endocardial
and epicardial surfaces which in the figure represent the support for the strain lines
correspond to the images of those surfaces at the systolic time.
Continuum Mechanics Meets Echocardiographic Imaging 51
Fig. 7 Representation, in another subject, of the endocardial primary and secondary strain lines
(from left to right: panel 1 and 2); and of the epicardial primary and secondary strain lines (from
left to right: panel 3 and 4). It can be appreciated the behavior of primary strain lines in the middle
(green) part of the LV
As Figs. 6 and 7 show, the LV endocardial primary strain lines (first panel from
left to right in both figures) have the same circumferential pattern evidenced in the
model we studied in [8], even if in the basal part of the LV, the influence of the stiffer
structure of the mitral annulus alter the circumferential pattern. Even if volumes and
shapes are different one from another, endocardial primary strain lines are almost
the same and circumferential. The first and second panels (from left to right) in both
Figs. 6 and 7 are referred to the endocardial primary and secondary strain lines, which
appear prevalently longitudinal in the middle part of the ventricle, according to the
prevalent circumferential orientation of the primary strain lines there. The third and
fourth panels show instead epicardial primary and secondary strain lines. The pat-
tern of primary strain lines is less regular, even over the middle part of the ventricle,
which should not be much influenced by the stiffer structure of the mitral annulus.
However, in both cases, different zones are evident where epicardial primary strain
lines follow lines which resemble muscle fiber directions. The results we got from
our investigations, even if they need be supported by further data, allowed us to make
a conjecture based on the pattern of endocardial primary strain lines. We conjecture
that the inflation-induced dilation due to blood pressure is more effective for suben-
docardial layers, which dilating reduce the circumferential shortening induced by
muscle contraction. Better is the capacity of the elastic response of the endocardial
surface, smaller is the dilation induced by blood pressure.
It means that the capacity to contrast blood pressure is reduced in patients with
volume overload, hence the primary strain values which correspond to the circumfer-
ential principal strain lines are smaller It follows that the behavior of primary strain
lines when special remodeling effects take place in the left ventricle, due to the onset
of cardiac pathologies is one of our future objectives [21].
Importantly, the noninvasive analysis of this kind of data may be easily supported
within a 3DSTE device, through the post-processing method we proposed in [8] and
shortly summed up here. As an example, and with reference to the same human
subject whose 3DSTE circumferential strain were shown in Fig. 4, we pictured in
Fig. 8, the endocardial and epicardial mean values, taken over the middle part of the
52 A. Evangelista et al.
Secondary Strain
Fig. 8 Representation of the pattern of the mean values, over the middle part of the left ventricle,
of the primary (left panel) and secondary (right panel) epicardial and endocardial strains along the
cardiac cycle
left ventricle, of the primary and secondary strains. It might be possible to infer from
a large scale investigation appropriate confidence intervals for these values, when
referred to healthy situations.
Acknowledgements The work is supported by Sapienza Università di Roma through the grants N.
C26A11STT5 and N. C26A13NTJY. The authors wish to express their gratitude to Willem Gorissen,
Clinical Market Manager Cardiac Ultrasound at Toshiba Medical Systems Europe, Zoetermeer, The
Netherland, for his continuous support and help.
References
1. Burkhoff D, Mirsky I, Suga H (2005) Assessment of systolic and diastolic ventricular properties
via pressure-volume analysis: a guide for clinical, translational, and basic researchers. AJP-
Heart 289:501–512
2. Cerqueira MD, Weissman NJ, Dilsizian V, Jacobs AK, Kaul S, Laskey WK, Pennel DJ, Rum-
berger JA, Ryan T, Verani MS (2002) Standardized myocardial segmentation and nomenclature
for tomographic imaging of the heart: a statement for healthcare professionals from the Cardiac
Imaging Committee of the Council on Clinical Cardiology of the American Heart Association.
Circulation 105:539–542
3. DeAnda A, Komeda M, Nikolic SD, Daughters GT, Ingels NB, Miller DC (1995) Left
ventricular function, twist, and recoil after mitral valve replacement. Circulation 92:458–466
4. Evangelista A, Nesser J, De Castro S, Faletra F, Kuvin J, Patel A, Alsheikh-Ali AA, Pandian
N (2009) Systolic wringing of the left ventricular myocardium: characterization of myocardial
rotation and twist in endocardial and midmyocardial layers in normal humans employing
three-dimensional speckle tracking study. (Abstract) J Am Coll Cardiol 53(A239):1018–268
5. Evangelista A, Nardinocchi P, Puddu PE, Teresi L, Torromeo C, Varano V (2011) Torsion of
the human left ventricle: experimental analysis and computational modelling. Prog Biophys
Mol Biol 107(1):112–121
6. Evangelista A, Gabriele S, Nardinocchi Piras P, Puddu PE, Teresi L, Torromeo C,
Varano V (2015) On the strain-line pattern in the real human left ventricle. J Biomech.
doi:10.1016/j.jbiomech.2014.12.028. published on line: 15 Dec 2015
Continuum Mechanics Meets Echocardiographic Imaging 53
1 Introduction
Wireless capsule endoscopy (WCE), also called capsule endoscopy (CE), is a non-
invasive endoscopic procedure which allows visualization of the small intestine,
without sedation or anesthesia, which is difficult to reach by conventional endo-
scopies. As the name implies, capsule endoscopy makes use of a swallowable capsule
that contains a miniature video camera, a light source, batteries, and a radio trans-
mitter (see Fig. 1). This takes continual images during its passage down the small
intestine. The images are transmitted to a recorder that is worn on a belt around
the patient’s waist. The whole procedure lasts 8 h, after which the data recorder
is removed and the images are stored on a computer so that physicians can review
them and analyze the potential source of diseases. Capsule endoscopy is useful for
detecting small intestine bleeding, polyps, inflammatory bowel disease (Crohn’s dis-
ease), ulcers, and tumors. It was first invented by Given Imaging in 2000 [12]. Since
its approval by the FDA (U.S. Food and Drug Administration) in 2001, it has been
widely used in hospitals.
Although capsule endoscopy demonstrates a great advantage over conventional
examination procedures, some improvements remain to be done. One major issue
with this new technology is that it generates approximately 56,000 images per exam-
ination for one patient, whose analysis is very time consuming. Furthermore, some
abnormalities may be missed because of their size or distribution, due to visual fa-
tigue. So, it is of great importance to design a real-time computerized method for the
inspection of capsule endoscopic images. Given Imaging Ltd. has also developed the
so called RAPID software for detecting abnormalities in CE images. But its sensitiv-
ity and specificity, respectively, were reported to be only 21.5 and 41.8 % [10], see
also [19]. Recent years have witnessed some development on automatic inspection
of CE images, see [1, 4–6, 7, 9, 14, 15, 18, 20].
The main indication for capsule endoscopy is obscure digestive bleeding [5, 9,
14, 18, 20]. In fact, in most of these cases, the source of the bleeding is located in the
small bowel. However, often, these bleeding regions are not imaged by the capsule
endoscopy. This is why the blood detection is so important when we are dealing
with capsule endoscopy. The current work is an extension of the paper [8], where
an automatic blood detection algorithm for CE images was proposed. Utilizing Ohta
color channel (R+G+B)/3 (where R, G and B denote the red, green and blue channel,
respectively, of the input image), we employed analysis of eigenvalues of the image
Hessian matrix and multiscale image analysis approach for designing a function
to discriminate between blood and normal frames. The experiments show that the
algorithm is very promising in distinguishing between blood and normal frames.
But, the algorithm is not able to process huge number of images produced by WCE
examination of a patient, within a very less stipulated amount of time. However, the
computations of the algorithm can indeed be parallelized, and thus, can process the
huge number of images within a very less stipulated amount of time. In the algorithm
we identified two crucial steps, segmentation (for discarding non-informative regions
in the image that can interfere with the blood detection) and the construction of an
A GPU Accelerated Algorithm for Blood Detection. . . 57
appropriate blood detector function, as being responsible for taking most of the global
processing time. We propose a suitable GPU-based framework for speeding up the
segmentation and blood detection execution times, and hence the global processing
time. Experiments show that the accelerated procedure is on average 50 times faster
than the original one, and is able of processing 72 frames per second.
This chapter is structured as follows. A choice of the suitable color channel is
made in Sect. 2.1 and segmentation of informative regions is done in Sect. 2.2. A
blood detector function is introduced in Sect. 2.3. The outline of the the algorithm is
given in Sect. 2.4. Validation of the algorithm on our current data set is provided in
Sect. 3. The GPU procedure for speeding up the segmentation and blood detection
is described in Sect. 4. Finally, the chapter ends with some conclusions in Sect. 5.
Notation Let Ω be an open subset of R 2 , representing the image (or pixel) domain.
For any scalar, smooth enough, function u defined on Ω, u L1 (Ω) and u L∞ (Ω) ,
respectively, denote the L1 and L∞ norms of u.
Color of an image carries much more information than the gray levels. In many
computer vision applications, the additional information provided by color can aid
image analysis. The Ohta color space [17] is a linear transformation of the RGB color
space. Its color channels are defined by A1 = (R + G + B)/3, A2 = R − B, and
58 S. Kumar et al.
2.2 Segmentation
Many WCE images contain uninformative regions such as bubbles, trash, dark re-
gions and so on, which can interfere with the detection of blood. More information
on uninformative regions can be found in [1]. We observe that the second component
(which we call henceforth a-channel) of the CIE Lab color space has the tendency
of separating these regions from the informative ones. More precisely, for better
removal of the uninformative regions, we first decompose the a-channel into geo-
metric and texture parts using the model described in [2, Sect. 2.3], and perform the
two phase segmentation. This latter relies on a reformulation of the Chan and Vese
variational model [2, 3], over the geometric part of the a-channel.
The segmentation method is described as follows: We first compute the constants
c1 and c2 (representing the averages of I in a two-region image partition). We then
solve the following minimization problem
1
min T Vg (u) + u − v 2L2 (Ω) + λ r(I , c1 , c2 ) v + α ν(v) dx dy (1)
u,v 2θ Ω
where T Vg (u) := Ω g(x, y)|∇u| dx dy is the total variation norm of the function u,
weighted by a positive function g; r(I , c1 , c2 )(x, y) := (c1 −I (x, y))2 −(c2 −I (x, y))2
is the fitting term, θ > 0 is a fixed small parameter, λ > 0 is a constant parameter
weighting the fitting term, and α ν(v) is a term resulting from a reformulation of
the model as a convex unconstrained minimization problem (see [2, Theorem 3]).
Here, u represents the two-phase segmentation and v is an auxiliary unknown. The
segmentation curve, which divides the image into two disjoint parts, is a level set of
u, {(x, y) ∈ Ω : u(x, y) = μ}, where in general μ = 0.5 (but μ can be any number
between 0 and 1, without changing the segmentation result, because u is very close
to a binary function).
The above minimization problem is solved by minimizing u and v separately, and
iterated until convergence. In short we consider the following two steps:
1. v being fixed, we look for u that solves
1
min T Vg (u) + u − v L2 (Ω) .
2
(2)
u 2θ
2. u being fixed, we look for v that solves
1
min u − v L2 (Ω) +
2
λ r(I , c1 , c2 ) v + α ν(v) dx dy . (3)
v 2θ Ω
A GPU Accelerated Algorithm for Blood Detection. . . 59
u = v − θ divp,
The problem for p can be solved using the following fixed point method
pn + δt∇(divp n − v/θ)
p 0 = 0, pn+1 = .
1 + δtg |∇(divp n − v/θ)|
The segmentation results for some of the WCE images are shown in Fig. 2. The
first row corresponds to the original images, the second row shows the segmentation
masks, and the third row displays the segmentation curves superimposed on the
original images.
In these experiments (and also in the tests performed in Sect. 3) the values chosen
for the parameters involved in the definition of (1), are those used in [2], with g the
following edge indicator function g(∇u) = 1+β 1∇u 2 and β = 10−3 .
We now introduce the detector function that is designed to discriminate between blood
and non-blood frames. We resort to the analysis of eigenvalues of the image Hessian
matrix and multiscale image analysis approach. Based on the eigenvalues, both blob-
like and tubular-like structures can be detected. For a scalar image I : Ω ⊆ R2 → R,
we define the Hessian matrix of one point (x, y), and at a scale s, by
⎛ ⎞
s s
I Ixy
Hs (x, y) = ⎝ xx ⎠,
s s
Ixy Iyy
s s s
where Ixx , Ixy and Iyy are the second-order partial derivatives of I and the scale
s is involved in the calculation of these derivatives. The Hessian matrix describes
the second order local image intensity variations around the selected point. Suppose
λs,1 and λs,2 are two eigenvalues of the Hessian matrix Hs . Further, suppose that
|λs,1 | ≤ |λs,2 |. Setting Fs = λ2s,1 + λ2s,2 , we define
Fig. 2 First row: Original image. Second row: Segmentation mask. Third row: Original image with
segmentation curve superimposed
where smin and smax are the minimum and maximum scales at which the blood regions
are expected to be found. We remark that they can be chosen so that they cover the
whole range of blood regions.
Setting now
!
λs,1 2
f1 = exp −βFs 2
and f2 = 1 − exp −α ,
λs,2
and motivated from [11], we define the blob (Bs ) and ridge (Rs ) detectors (at each
point of the domain)
⎧
⎨ 0, if λs,1 λs,2 < 0 or |λs,2 − λs,1 | > δ
Bs = (5)
⎩
(1 − f1 )f2 , otherwise,
A GPU Accelerated Algorithm for Blood Detection. . . 61
and
⎧
⎨ 0, if λs,2 > 0,
Rs = (6)
⎩
(1 − f1 )(1 − f2 ), otherwise.
Here α and β are the parameters which control the sensitivity of the functions and δ
is an user chosen threshold. We then compute the maximum for each scale
In the computations, we take s = 8, 10, 12, 14. The results of the functions F and
the sum B + R, for blood and non-blood images are displayed in Figs. 3 and 4,
respectively.
We denote by Ω, " in the image domain, the segmented region of I , that is, Ω
"=
Ω ∩ Ωseg , where Ωseg is the segmented sub-domain of I containing the blood. We
use the intensity and gradient information of the above functions for designing our
detector function, DF , which is defined by
||F ||L∞ (Ω)
" ||B + R||L∞ (Ω)
"
DF = .
||B + R||L1 (Ω)
"
For each WCE image the algorithm consists of the following four steps:
1. Firstly, we remove additional details (such as patient name, date and time) from the
original image. For this purpose, we clip around the circular view of the original
image. Next, we apply an automatic illumination correction scheme [22], for
reducing the effect of illumination.
2. We then consider the Ohta color channel (R + G + B)/3 for the illumination
corrected image.
3. We next apply the two-phase segmentation method [2] for removing uninforma-
tive regions (such as bubbles, trash, liquid, and so on) over the geometric part of
the second component of the CIE Lab color space.
4. Finally, we compute the functions F , B + R and the blood detector function DF.
We test the performance of the algorithm on a data set prepared by medical the
experts. Given Imaging’s Pillcam SB capsule was used to collect the videos in the
University Hospital of Coimbra. To make the data set representative, the images
62 S. Kumar et al.
80 80 80
70 70 70
60 60 60
50 50 50
40 40 40
30 30 30
20 20 20
10 10 10
0 0 0
600 400
300
350
500
250 300
400
200 250
300 200
150
150
200 100
100
100 50 50
0 0
0.5
0.5 0.4
0.4
0.4 0.3
0.3
0.3
0.2
0.2
0.2
0.1 0.1
0.1
0 0 0
Fig. 3 First row: Original image with blood region. Second row: A1 color channel. Third row:
Function F. Fourth row: Function B + R
were collected from 4 patients video segments. The data set consists of 27 blood
images and 663 normal images. We use standard performance measures: sensitivity,
specificity and accuracy. These are defined as follows:
TP TN
Sensitivity = , Specificity = ,
TP + FN TN + FP
A GPU Accelerated Algorithm for Blood Detection. . . 63
80 80 80
70 70 70
60 60 60
50 50 50
40 40 40
30 30 30
20 20 20
10 10 10
0 0 0
200
100 160
140
150 80
120
100
60
100 80
40 60
50 40
20
20
0 0
0.09 0.16
0.25
0.08 0.14
0.2 0.07 0.12
0.06 0.1
0.15
0.05
0.08
0.04
0.1 0.06
0.03
0.04
0.05 0.02
0.01 0.02
Fig. 4 First row: Original image without blood region. Second row: A1 color channel. Third row:
Function F. Fourth row: Function B + R
TN + TP
Accuracy = ,
TN + FP + TP + FN
where TP, FN, FP and TN represent the number of true positives, false negatives,
false positives and true negatives, respectively. For a particular decision threshold T ,
if for an image frame J , DF > T , it is a positive frame; if DF ≤ T , it is a negative
64 S. Kumar et al.
0.9
0.8
0.7
0.6
Sensitivity
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
FAR = 1 − Specificity
frame. If J belongs to the class of blood image frames and it is classified as negative,
it is counted as a false negative; if it is classified as positive, it is counted as a true
positive. If J belongs to the class of non-blood image frames and it is classified as
positive, it is counted as a false positive; if it is classified as negative, it is counted
as a true negative.
Sensitivity represents the ability of the algorithm to correctly classify an image as
a frame containing blood, while specificity represents the ability of the algorithm to
correctly classify an image as a non-blood frame. The third measure, accuracy, is used
to assess the overall performance of the algorithm. There is also another performance
measure commonly used in the literature, false alarm rate (FAR). However, it can be
computed from the specificity: FAR=1-Specificity.
Receiver operating characteristic (ROC) curve is a fundamental tool for detection
evaluation. In a ROC curve sensitivity is plotted in function of FAR. Each point
on the ROC curve represents a sensitivity/FAR pair corresponding to a particular
decision threshold. It shows the tradeoff between sensitivity and specificity. Figure 5
represents the ROC curve with respect to the function DF. For FAR ≤ 10 %, the
best sensitivity achieved is 70.37 %. In particular, the sensitivity, FAR and accuracy
obtained are 70.37, 9.6 and 89.56 %, respectively, for the threshold 2.8928E + 007.
In summary, these results show that the presented algorithm is very promising for
the detection of blood regions.
A GPU Accelerated Algorithm for Blood Detection. . . 65
In this section we describe general facts about the apparatus specifications. In par-
ticular, we detail the GPUs adopted and the underlying architectures. Finally, we
address the parallelization of the algorithms proposed, namely by detailing the seg-
mentation and blood detector parallelization procedures on the GPU, and reporting
the results obtained for the current medical dataset.
The pipeline of the algorithm, described in Sect. 2, has been first implemented on a
CPU Intel Core i7 950 CPU @ 3.07 GHz, with 12 GB of RAM, running a GNU/Linux
kernel 3.8.0-31-generic. The C/C++ code was compiled using GCC-4.6.3.
In order to process more frames per second, the segmentation and blood detector
steps have been paralellized, for executing on GPU NVidia C2050 and NVidia GTX
680, compiled using NVIDIA Compute Unified Device Architecture (CUDA) driver
5.5 [21].
The host system usually consists of a CPU that orchestrates the entire processing
by sending data and launching parallel kernels on the GPU device. At the end of
processing, it collects computed data from the device and terminates execution. The
parallelization of segmentation and blood detection procedures is carried out using
the CUDA parallel programming model, by exploiting the massive use of thread- and
data-parallelism on the GPU. CUDA allows the programmer to write in a transparent
way, scalable parallel C code [21] on GPUs.
As shown in Fig. 6, each thread processes one pixel and thus multiple elements can
be processed at the same time. This introduces a significant reduction in the global
processing time of the proposed algorithm. When the host launches a parallel kernel,
the GPU device executes a grid of thread blocks, where each block has a predefined
number of threads executing the same code segment. Organized in groups of 32
threads (a warp), they execute synchronously and are time-sliced among the stream
processors of each multiprocessor.
Figure 7 depicts a simplified overview of the GPU architecture. It shows that
several multiprocessors contain a large number of stream processors (the number of
stream processors and multiprocessors depends on the model and architecture of the
GPU). In the present case, the NVidia GTX 680 GPU, which contains eight multi-
processors with each multiprocessor containing 192 stream processors, performing
a total of 1536 CUDA cores, executes the algorithm faster.
Before processing starts on the GPU, data is uploaded to device memory. This
process is typically slow and consists in transferring the information from the host
CPU memory to the GPU global memory (device). At the end of the processing,
results are transferred from the GPU device global memory to the host CPU RAM
memory.
66 S. Kumar et al.
Fig. 6 Demonstration of the structure of a grid and thread blocks and how the same segment of
code is executed by multiple threads. Each thread computes the result for one pixel
In the GPU, there are several memory types and they have different impacts on
the throughput performance. We highlight two of them:
• Global memory accesses are time consuming operations with high latency and
may represent a bottleneck in the desired system’s performance. Instead, co-
alesced accesses should be performed whenever possible. They imply data in
global memory to be contiguously aligned, so that all 32 threads within a warp
can access the respective 32 data elements concurrently on the same clock cycle,
with thread T(x,y) accessing pixel P(x,y), as depicted in Fig. 8.
• Also, modern GPUs have small and fast blocks of memory tightly coupled to the
cores, which is shared by all threads within the same block. We can have several
threads processing the same local data to optimize memory bandwidth (typically
shared memory is faster than global memory when we need to share information
among several threads), but shared memory is small in size. To maximize its use
and performance, it is important to consider such size limitations. When large
amounts of data have to be processed, data has to be partitioned in smaller blocks
in order not to exceed the limits of shared memory. This action also represents
penalties, since it increases the amount of data exchanges with global memory.
Therefore, in the current work we use shared memory for calculating some pro-
cedures and global memory to perform the remaining functionalities, globally
achieving an efficient memory usage as reported in later subsections.
A GPU Accelerated Algorithm for Blood Detection. . . 67
Fig. 7 Simplified GPU arquitecture. An example of how thread blocks are processed on GPU
multiprocessors. A multiprocessor can execute more then one thread block concurrently
Fig. 8 Coalesced memory accesses illustrating a warp of 32 threads reading/writing the respective
32 data elements on a single clock cycle
Some functions in the segmentation procedure, mentioned in Sect. 2.2, need to share
image data between threads (e.g. neighboring pixels on the convolution procedure).
68 S. Kumar et al.
Table 1 Computation times in milliseconds (ms) for the segmentation procedure and throughput
measured in frames per second (fps). The tests were performed on WCE images with 576 × 576
pixels
Processing Platform Segmentation execution time (ms) Segmentation (fps)
CPU Intel i7 240.0 4.2
GPU NVidia C2050 6.0 166.7
GPU NVidia GTX 680 4.8 208.3
Therefore, the use of shared memory is the best option to achieve a higher speedup
(see [16] for a related work). These functions are: finding maximum and mean
values, and 2D separable convolution [13]. All other functions perform slower if
shared memory is used, because the total number of transactions to global memory
will be greater.
The results of maximum and mean values are processed in two steps: the first step
uses GPU grids with 256 × 256 block size; the second step uses 1 × 256 ; and in the
2D convolution, block sizes of dimension 16 × 16 are used.
The remaining functions in the segmentation step always use global memory and
1296 × 256 block sizes.
The computation times regarding the segmentation procedure are represented in
Table 1, that shows the real speedups obtained using parallel computation on the
GPU; as displayed, this procedure runs 40 times faster on GPU NVidia C2050 and
50 times faster on GPU NVidia GTX 680, when compared to an Intel i7 CPU.
For speeding up the blood detector procedure, described in Sect. 2.3, we only use
one function that shares image data between threads: 2D separable convolution [13].
The remaining functions perform slower if we use shared memory because the total
number of transactions to global memory would assume a higher impact. The results
of 2D separable convolution are computed using block sizes of dimension 16 × 16
and 8×8 for the scale values s = [8 10] and s = [12 14] (see Sect. 2.3), respectively.
All other functions always use global memory blocks with size 8 × 8.
The computation times of the blood detector procedure are presented in Table 2.
We clearly see the speedup obtained using parallel computation on GPU. This algo-
rithm runs 58.9 times faster on GPU NVidia C2050 and 59.5 times faster on GPU
NVidia GTX 680, when compared to an Intel i7 CPU.
A GPU Accelerated Algorithm for Blood Detection. . . 69
Table 2 Computation times in millisecons (ms) for the blood detector procedure and throughput
measured in frames per second (fps). The tests were performed on WCE images with 576 × 576
pixels
Processing platform Blood detector execution time (ms) Blood detector (fps)
CPU Intel i7 529.9 1.9
GPU NVidia C2050 9.0 111.1
GPU NVidia GTX 680 8.9 112.4
Table 3 Throughput measured in fps and speedup archived to the complete algorithm (Segmentation
and Blood Detector). Tests performed on WCE images with 576 × 576 pixels
Processing platform Segmentation and blood detector (fps) Speedup
CPU Intel i7 1.3 ——–
GPU NVidia C2050 66.7 51.3 times faster
GPU NVidia GTX 680 72.9 56.1 times faster
4.4 Speedup
Table 3 shows throughput measured in frames per second (fps) and the speedup of
the full algorithm achieved. It can be seen that GPU NVidia GTX 680 is faster than
NVidia C2050.
With the obtained speedup, the GPU NVidia GTX 680 shows to be able of pro-
cessing 72 fps, which is equivalent to observe that the approximate total number of
56000 frames, generated by a complete WCE exam, can be computed in less than
13 min.
5 Conclusions
with an appropriate use of memory (shared and global). This novel approach allows
processing multiple pixels of an image at the same time, thus sustaining the obtained
throughput levels.
References
17. Ohta YI, Kanade T, Sakai T (1980) Color information for region segmentation. Comput
Graphics Image Process 13:222–241
18. Pan G, Xu F, Chen J (2011) A novel algorithm for color similarity measurement and the
application for bleeding detection in WCE. Int J Image Graphics Signal Process 5:1–7
19. Park SC, Chun HJ, Kim ES, Keum B, Seo YS, Kim YS, Jeen YT, Lee HS, Um SH, Kim CD,
Ryu HS (2012) Sensitivity of the suspected blood indicator: an experimental study. World J
Gastroenterol 18(31):4169–4174
20. Penna B, Tilloy T, Grangettoz M, Magli E, Olmo G (2009) A technique for blood detec-
tion in wireless capsule endoscopy images. In: 17th European signal processing conference
(EUSIPCO 2009), pp. 1864–1868
21. Podlozhnyuk V, Harris M, Young E (2012) NVIDIA CUDA C programming guide. NVIDIA
Corporation
22. Zheng Y, Yu J, Kang SB, Lin S, Kambhamettu C (2008) Single-image vignetting correction
using radial gradient symmetry. In: Proceedings of the 26th IEEE conference on Computer
Vision and Pattern Recognition (CVPR ’08), pp. 1–8. Los Alamitos, California, USA, June
2008
Automated Image Mining in fMRI Reports: a
Meta-research Study
Abstract This chapter describes a method for meta-research, based on image mining
from neuroscientific publications. It extends earlier investigation to the study of
a large scale data set. Using a framework for extraction and characterisation of
reported fMRI images, based on their coordinates and colour profiles, we propose
that significant information can be harvested automatically. The coordinates of the
brain activity regions, in relation to a standard reference templates are estimated.
We focus on the analysis of scientific reports of the default mode network. Both the
commonalities and the differences of brain activity between control, Alzheimer and
schizophrenic patients are identified.
1 Introduction
An open field of research with increasing interest in neuroscience is the resting state
and default mode networks (RSN & DMN, respectively). These networks comprise
areas such as the occipital, temporal and frontal areas of the brain. They are active
when the individual is not performing any goal-oriented task, and suppressed during
activity [6, 17]. In spite of the great attention to those networks, scientific research
of brain’s “resting state” still poses various conceptual and methodological difficul-
ties [19]. A commonly topic of study consists in investigating the differences and
commonalities in the activity of healthy brains when compared to, e.g., Alzheimer
or schizophrenic brains. Specifically, how different is the composition of RSN and
DMN, in healthy and pathological brains, and how do these differences influence
cognitive and functional performances.
Automated Image Mining in fMRI Reports: a Meta-research Study 75
2 Methodology
2.1 Data
The first step of our research consisted in the construction of a database of rele-
vant publications. With this in mind, we searched for neuroscientific publications
published online, in which the topic of discussion was related to the default mode
network. This search was carried out using a keyword based search, with words such
as DMN, Alzheimer, fMRI, cognitive impairment, Schizophrenia and resting state.
We gathered 183 articles in pdf format, from journals such as NeuroImage, Hu-
man Brain Mapping, Brain, Magnetic Resonance Imaging, PNAS and PLOS ONE.
The time-frame for these articles ranged from early 2000 to June 2013. All papers
were then separated according to the specificity of the analysis carried therein (see
Table 1), distinguishing between studies on healthy brains (132), Alzheimer (29) and
Schizophrenia (18) research.
76 N. Gonçalves et al.
Fig. 1 Examples of images presented in fMRI reports. On the leftmost image (adapted from [13]),
activity is present in the occipital, left temporal and frontal areas of the brain, and the activity is
reported using the hot colour scale. The activity on the second image (adapted from [9]) is shown
in three different uniform colours, while the third image (adapted from [22]) shows a combination
of hot and cold colour scales, for increase and decrease of activity when compared to the reference
Blob identification
Fig. 2 Flowchart describing the blob mining procedure. First, figures are retrieved from articles
(images adapted from Johnson et al.(2007)). This is then followed by the detection of possible
objects containing fMRI activity reports. After processing and retrieval of these images, they are
cleaned of artifacts, such as lines and text, allowing for a final stage of blob identification
The first stage is the object identification. Many figures have a simple background
colour, like black or white, but others have different colours, e.g., gray. Hence, the
background colour needs to be detected, which is done through histogram and border
analysis. The possible background colours are detected from the borders of the image,
and the one with highest number of pixels is selected.
78 N. Gonçalves et al.
To detect different objects in a figure, and after background detection, figures are
converted to black (background) and white (objects) colour. In those binary images,
the white areas correspond to the smallest rectangle enclosing a object. Objects in
the border of the respective figure, as well as those composed of only a few pixels are
discarded. The next step is to analyse the images that are left inside the remaining
objects. After extracting said images, we need to identify and extract the ones that
correspond to fMRI reports. This is done using various properties, such as:
• a minimum perimeter of the image, which we have set to 80 pixels, to allow a
sufficient processing resolution;
• a minimum and maximum number of image/background pixel ratio, between 0.1
and 97.5, to avoid non-brain images;
• percentage of colour pixels in the image between 0 and 40 % of coloured pixels,
filtering out non-fMRI images or images with activity all over the brain;
• image aspect ratio between 0.66 and 1.6, typical of a brain image;
• one image should occupy more than 50 % of the frame, to eliminate multiple
images in the same object frame.
Regarding the last property, we repeated the object identification procedure when
objects included several images, until no more images could be found.
In the example shown in Fig. 2, the object frame containing the figure colour map
is discarded, due to the aspect ratio. Two of the brain images are also discarded since
they don’t have colour present, therefore not being considered as originating from
an fMRI study.
The following step removes undesired annotations. In Fig. 2, these correspond
to coordinate axis as well as letters ‘L’and ‘R’. This stage is done by removing
all images inside the frame, except for the biggest one. Also any lines in 0 or 90
degree angles are removed, using the Hough transform [8, 20] on each frame. Pixels
belonging to vertical/horizontal lines that are present in more than two thirds of the
height/width of the object are replaced with an average intensity of the surrounding
pixels.
Once the activity images have been retrieved and cleaned, the type of template used
in the images, i.e. volume type, and sections are identified, to estimate the three-
dimensional coordinates of the activated regions. To represent the three dimensional
changes in brain activity, views from three different planes are used to represent
them in two dimensions. Thus we have axial sections, along the transversal plane
that travels from the top of the brain to bottom; the sagittal section that travels along
the median plane, from left to right; and the coronal section, along the frontal plane,
that travels from front to back. To do a proper characterisation of the images, instead
of focusing on the internal features of each section, the symmetry characteristics of
the section shapes are used, as show in Fig. 3.
Automated Image Mining in fMRI Reports: a Meta-research Study 79
Fig. 3 Section identification—Top column contains example fMRI activity images (after conversion
to grey scale) and below them their corresponding binary masks. From left to right, we have axial,
coronal and sagittal sections
The images are again converted to binary images, thereby outlining the respective
shape of the section. Simple symmetry allows for a suitable distinction between
sections. One axial section is mostly symmetric about both the horizontal and vertical
axis (Fig. 3a, d). The coronal section displays some symmetry only with respect to
its vertical axis (Fig. 3b, e) while the sagittal section is asymmetric (Fig. 3c, f).
Most researchers map the activity changes found onto either SPM [10] or Colin [3]
volume templates. Colin volumes contain higher resolution sections, when compared
to SPM. Regarding the spatial separation between adjacent sections, SPM volume
templates uses 2 mm, whereas that distance is 1mm for Colin templates.
To detect the volume type, one can use a complexity measure of the images. We
used a Canny filter, [4], to detect the voxels corresponding to contrast edges. This is
done for both template volumes, i.e. Colin and SPM, and for all the image slices from
the section identified before. The volume template we select corresponds to the one
with the minimal difference between the analysed image and the volume template
images. This difference is calculated for the whole image and for a centred square
with half the image size. We then average both values and use this as the difference
measure.
80 N. Gonçalves et al.
Once the geometrical considerations of the image have been dealt with, we can now
characterise in the more detail the regions reported therein.
Activity regions are generated in response to stimulation. The properties of these
regions largely define the fMRI activity and hence it is crucial that an analysis of the
coloured blobs is carried out. Since we assume that only activations are color coded,
these regions are easily segmented based on hue information (cf. ‘blob identification’
box in Fig. 2).
As mentioned before, the reporting style of different researchers can vary. This
variety of reporting methods restricts the analysis that can be performed, since the
same article can contain images with different colour scales. We tried to obtain
intensity information from each fMRI image by using a colour map detection pro-
cedure, through histogram analysis. Since some images showed both increased and
decreased activity, this step comprised from mild human intervention, aimed at fixing
some wrongly detected colour maps. This was only applied to the rare cases where
the automatic histogram analysis couldn’t detect the correct colour scale, and was
performed rather easily.
Using the Colin brain template as a reference to our own reporting, we mapped
all blob intensity information to their respective coordinates. We sum all intensities
found in the data, for each voxel. Then those intensities are normalised to a scale
from 0 to 1, where one corresponds to the highest possible common activity.
This produces a three-dimensional intensity map, where each voxel displays the
intensity corresponding to the average activity in the data, for the respective voxels.
Since this intensity map was built using two-dimensional images, we also performed
a 3D smoothing, using a Gaussian ellipsoid with dimensions corresponding to 5 %
of the template size.
In our reporting, we decided to use the jet colour scale for the summarising
intensity map. There, colours go from dark blue to dark red, covering also green
and yellow. One big difference between our scale and typical fMRI reports is that
we don’t distinguish between increase or decrease of activity, when compared to a
reference, but consider any coloured report as “interesting”. Therefore the dark blue
corresponds to locations with very low reporting of activity (positive or negative)
while dark red is used for locations with many reports. To avoid showing all the brain
in dark blue, we show only intensities for locations where the number of reported
blobs is more than 10 % of the total.
Automated Image Mining in fMRI Reports: a Meta-research Study 81
By superimposing this intensity map over the template volume, we can obtain a
visual summary of all the results found in the articles1 .
3 Results
3.1 Extracted Information
Table 1 shows how many articles, figures, images and blobs were found according
to the publications analysed. Note that the number of samples for the unhealthy
cases is quite small when compared to the healthy brains. This bias might affect the
quality of the results, but the same problem would occur to any other meta-researcher,
investigating brain activity in Alzheimer/Schizophrenia, due to the smaller sample
of research dealing with these cases, when compared to the healthy controls.
Regarding the accuracy of the method, a simple visual inspection shows that
once the volume and type of section are identified, the section coordinates were
typically accurate within 1 voxel of distance. Also, the cleaning procedure of images
mentioned in Sect. 2.3 doesn’t remove all artifacts from images, e.g., when the
letters are inside the brain. Nonetheless we found that leftover artifacts rarely affected
activity detection and subsequent mapping.
3.2 Meta-Analysis
After the compilation of all the results and the creation of the three-dimensional
activity maps, one can perform analyses on the different types of brains studied.
Figure 4 shows the brain activity reported for healthy subjects, displayed on axial
section of the standard Colin reference. The highest areas of activity are the typical
subsystems that compose the DMN: the posterior cingulate/precuneous, the medial
pre-frontal cortex and the inferior parietal lobes. Note that, in the majority of the
reports, including Alzheimer and Schizophrenia, most subjects presented the bulk of
the activity in these major areas.
We can now focus on the comparison between healthy, Alzheimer and Schizophrenia
DMN activity, for example at axial height 114 of Colin’s standard brain (see Fig. 5).
1
The 2D projections of said summarising volumes were produced using the ITK-SNAP tool [23].
82 N. Gonçalves et al.
Fig. 4 Average brain activity reported in publications dealing with healthy brains, superimposed
on a Colin-based brain template, shown at various axial heights. Most of the activity is reported on
the occipital, temporal and frontal areas of the brain, which correspond to the typical default mode
network areas
According to [2], one would expect that older brains have larger areas of activity than
younger ones. We can see this in the posterior cingulate and in the inferior parietal
lobes for Alzheimer when compared to the healthy brain image. On the other hand,
the aged brain image shows somewhat less spread activity on the frontal lobe, when
compared to the other areas of DMN. This seems counter-intuitive in light of the
referred work. One may say that the lack of samples could cause this phenomenon,
but our results seem rather consistent for the other areas. To find out a possible reason
for this discrepancy, we can search for corroborating evidence in one of the articles
analysed. In Fig. 1 of [5], there is a similar decrease in activity for aged brains,
compared to healthier ones, confirming our own results.
Automated Image Mining in fMRI Reports: a Meta-research Study 83
Fig. 5 Brain activity reported for healthy (a), Alzheimer (b) and schizophrenic (c) brains, at height
114 of the colin standard brain. The reports on brains affected by Alzheimer show a smaller intensity
of activity in the pre-frontal cortex, when compared to the other DMN areas, unlike the reports for
healthy and schizophrenic brains
Fig. 6 Brain activity reported for healthy (a), Alzheimer (b) and schizophrenic (c) brains, at height
130 of the colin standard brain. (a) image shows wider activation in the posterior cingulate area,
suggesting that both Schizophrenia and Alzheimer might play a big role in this area of the brain
Another analysis that can be performed with our method relates to finding areas of the
brain with different activities between unhealthy brains and healthy ones. In Fig. 6,
one can see images for axial height 130, where publications dealing with healthy
brains report a bigger area of activity in the posterior cingulate area (PCC), when
compared with brains suffering from Alzheimer, and even more so on schizophrenic
brains.
Healthy
Alzheimer
Schizophrenia
Fig. 7 Three dimensional images of brain activity reported for healthy (top row), Alzheimer (middle
row) and schizophrenic (bottom row) brains. All reported images show the expected main DMN
areas, although the reports on Alzheimer show a decreased intensity and the brains suffering with
Schizophrenia report a more distributed activity pattern
from normal controls. Regarding the brains with Schizophrenia, we can see an area
increase in the frontal region of the brain, while several smaller foci of activation
appear, e.g., near the cerebellum.
4 Discussion
We gathered more than 180 articles studying the default mode network, and analysed
the images contained therein, in order to get a summarising overview of their results.
Our main goal was to automatically map the results of studies reported by several
researchers, onto a standard brain, and use this mapping to analyse the differences
between healthy and unhealthy brains. This task would involve a tremendous amount
of work and time if done by a human curator, whereas our method retrieves most
information in a uniform and almost automatic manner.
The complete procedure is done in approximately 1 min per article (including
human intervention if needed), while it takes 30–60 min when done by a curator, as
in [15]. In that publication, the researchers went through 13 publications to obtain the
information they desired. Using our method, not only it would save a considerable
Automated Image Mining in fMRI Reports: a Meta-research Study 85
amount of manual work, it would enable them to find other fMRI studies related to
the areas they are interested in.
Looking at the results, it seems clear that our method performs remarkably well,
suggesting that it could be used to help creating a comprehensive functional brain
atlas. Since we only performed a rough analysis of a particular research topic, we
didn’t aim at a complete report of all brain activities that might be studied.
There are some problems with our approach, that also occur in other automatic
data-mining approaches. First, by using only image information we are giving the
same weight to all publications, irrespectively of the number of subjects studied.
Furthermore, statistical thresholds and analysis methods vary in every publication,
hence we cannot claim to make a thorough statistical analysis. Also, the number
of articles dealing with the unhealthy cases is quite small when compared to the
healthy brains. All these problems will affect quantitatively our analysis, although
we may still draw valuable information from the data. We also expect their influence
to decrease with an increasing number of analysed publications.
We showed that with a clear topic in mind, it is possible to obtain results of high
relevance. As an example, we have seen that most reports on DMN, regardless of the
health condition of the subjects show activity on the posterior cingulate/precuneous,
the medial pre-frontal cortex and the inferior parietal lobes. On the other hand,
the pre-frontal activity of Alzheimer subjects is shown to be spatially restricted.
Corroborating evidence for this finding can be traced back to the original published
reports. Due to the reduced sample statistics for the unhealthy brains, we can’t
guarantee that there is a ‘real’ lack of activity, or just the absence of reports, but it
suggests a possible area of investigation.
As stated before, there is a considerable variability in how each researcher displays
their results. In the future, and to mitigate the lack of availability of original data,
our method could be included in online submission systems for publication, after
authors have uploaded their document. With minimal manual effort, the authors
could validate the proposed summarising data, and hence improve the quality of the
information gathered.
Lately there have been more and more efforts to increase data availability, either
through common databases or by submitting the data at the same time as the article.
Naturally, when available, this would allow for a much better analysis of the data,
avoiding all the problems of detecting fMRI images or which colour scale they have.
Nevertheless, these databases are still rather rare.
Despite the specificity of the method regarding fMRI images, we believe the
principles behind it could be easily ported to other areas of investigation, such as
weather reports or earthquake maps.
We hope to further refine our method by combining it with a text-mining approach,
and test it in situations where there is either a clear agreement between different
research reports, or a challenge between theories. The former is a key aspect to the
construction of functional neuro-atlases, whereas the latter may lead to true findings
in neuroscience.
86 N. Gonçalves et al.
5 Appendix–Articles Database
Healthy 35. E. Erhardt, E. Allen, E. Damaraju, V. Cal- 71. Littow, Front. Syst. Neurosci. (2010).
houn, Brain Connect 1, 1 (2011). 72. D. Liu, Front. Syst. Neurosci. (2010).
36. F. Esposito, et al., Magnetic Resonance 73. D. Lloyd, Consciousness and Cognition
Imaging 26, 905913 (2008). 21, 695703 (2012).
1. A. Abou-Elseoud, et al., Human Brain 37. F. Esposito, et al., Brain Research Bulletin 74. X.-Y. Long, et al., Journal of Neuro-
Mapping 31, 1207 (2010). 70, 263269 (2006). science Methods 171, 349355 (2008).
2. E. A. Allen, et al., Front. Syst. Neurosci. 5 38. L. Ferrarini, et al., NeuroImage 56, 75. C. Madjar, et al. .
(2011). 14531462 (2011). 76. C. Malherbe, et al., IEEE International
3. J. S. Anderson, M. A. Ferguson, 39. A. R. Franco, A. Pritchard, V. D. Cal- Symposium on Biomedical Imaging: From
M. Lopez-Larson, D. Yurgelun-Todd, houn, A. R. Mayer, Hum. Brain Mapp. 30, Nano to Macro (2010).
Brain Connectivity 1, 147157 (2011). 22932303 (2009).
4. C. Aydin, O. Oktay, A. U. Gunebakan, 77. S. H. Maramraju, et al., IEEE Nuclear
40. W. FREEMAN, International Journal of
R. K. Ciftci, A. Ademoglu, 35th Interna- Science Symposium Conference Record
Psychophysiology 73, 4352 (2009).
tional Conference on Telecommunications (2008).
41. W. Freeman, IEEE Transactions on Cir-
and Signal Processing (TSP) (2012). 78. T. Meindl, et al., Hum. Brain Mapp. p.
cuits and Systems 35, 781783 (1988).
5. E. B. Beall, M. J. Lowe, Journal of Neuro- 42. T. Gili, Time-frequency analysis of rest- NANA (2009).
science Methods 191, 263276 (2010). ing state networks recovery as a function 79. M. Meinzer, et al., Neurobiology of Aging
6. L. Beason-Held, M. Kraut, S. Resnick, of cognitive load., Master’s thesis, Univer- 33, 656669 (2012).
Brain Imaging Behav 3, 123 (2009). sity of Rome, La Sapienza, Department of 80. F. Musso, J. Brinkmeyer, A. Mobascher,
7. P. Bellec, Intl. Workshop on Pattern Physics (2011). T. Warbrick, G. Winterer, NeuroImage 52,
Recognition in Neuroimaging (2013). 43. M. Goldberg, et al., IEEE Conf. on Tech- 11491161 (2010).
8. C. Benjamin, et al., Frontiers in Human nologies for Homeland Security (2008). 81. G. Northoff, et al., Nat Neurosci 10,
Neuroscience 4 (2010). 44. M. D. Greicius, V. Menon, Journal of Cog- 15151517 (2007).
9. H. M. de Bie, et al., Hum. Brain Mapp. 33, nitive Neuroscience 16, 14841492 (2004). 82. E. van Oort, A. van Cappellen van Wal-
11891201 (2012). 45. O. Grigg, C. L. Grady, PLoS ONE 5, sum, D. Norris, NeuroImage 90, 381389
10. R. M. Birn, K. Murphy, P. A. Bandettini, e13311 (2010). (2014).
Hum. Brain Mapp. 29, 740750 (2008). 46. B. Hahn, T. J. Ross, E. A. Stein, Cerebral 83. H.-J. Park, B. Park, D.-J. Kim, Annual Intl.
11. A. Botzung, Frontiers in Human Neuro- Cortex 17, 16641671 (2007). Conf. of the IEEE Eng. in Medicine and
science (2010). 47. T. Hedden, et al., Journal of Neuroscience Biology Society (2009).
12. S. L. Bressler, V. Menon, Trends in Cogni- 29, 1268612694 (2009). 84. C. Parsons, K. Young, L. Murray, A. Stein,
tive Sciences 14, 277290 (2010). 48. M. van den Heuvel, R. Mandl, H. Hul- M. Kringelbach, Progress in Neurobiology
13. J. A. Brewer, et al., Proceedings of shoff Pol, PLoS ONE 3, e2001 (2008). 91, 220241 (2010).
the National Academy of Sciences 108, 49. M. van den Heuvel, R. Mandl, J. Luigjes, 85. G. V. Pendse, D. Borsook, L. Becerra,
2025420259 (2011). H. Hulshoff Pol, Journal of Neuroscience PLoS ONE 6, e27594 (2011).
14. R. L. Buckner, NeuroImage 62, 11371145 28, 1084410851 (2008). 86. V. Perlbarg, et al., 5th IEEE International
(2012). 50. M. P. van den Heuvel, R. C. Mandl, R. S. Symposium on Biomedical Imaging: From
15. R. L. Buckner, J. L. Vincent, NeuroImage Kahn, H. E. Hulshoff Pol, Hum. Brain Nano to Macro (2008).
37, 10911096 (2007). Mapp. 30, 31273141 (2009). 87. P. L. Purdon, H. Millan, P. L. Fuller,
16. M. van Buuren, T. E. Gladwin, B. B. Zand- 51. S. G. Horovitz, et al., Proceedings of G. Bonmassar, Journal of Neuroscience
belt, R. S. Kahn, M. Vink, Hum. Brain the National Academy of Sciences 106, Methods 175, 165186 (2008).
Mapp. 31, 11171127 (2010). 1137611381 (2009).
88. M. Pyka, et al., PLoS ONE 4, e7198
17. Z. Cai, J. Zhai, International Conference 52. G.-A. Hossein-Zadeh, B. Ardekani,
(2009).
on Multimedia Technology (2011). H. Soltanian-Zadeh, IEEE Trans. Med.
89. P. Qin, G. Northoff, NeuroImage 57,
18. V. Calhoun, IEEE International Sympo- Imaging 22, 795805 (2003).
12211233 (2011).
sium on Biomedical Imaging: From Nano 53. J. H. Jang, et al., Neuroscience Letters
487, 358362 (2011). 90. W. Qiu, et al., The 2011 IEEE/ICME Inter-
to Macro (2009). national Conference on Complex Medical
19. V. Calhoun, T. Adali, Proceedings of the 54. S.-Y. Jeng, S.-C. Chen, P.-C. Lee, P.-S.
Ho, R. Tsai, 9th International Conference Engineering (2011).
2004 14th IEEE Signal Processing Soci- 91. J. Rees, Clinics in Dermatology 31,
ety Workshop Machine Learning for Sig- on e-Health Networking, Application and
Services (2007). 806810 (2013).
nal Processing (2004). 92. J. J. Remes, et al., NeuroImage 56, 554569
55. H. Jin, et al., International Journal of Psy-
20. X. J. Chai, A. N. Castan, D. ngr, (2011).
chophysiology 71, 142148 (2009).
S. Whitfield-Gabrieli, NeuroImage 59, 93. R. Sala-Llonch, et al., Cortex 48,
56. W. Jin-Jia, J. Ke-Mei, M. Chong-Xiao,
14201428 (2012). 11871196 (2012).
First International Conference on Perva-
21. C. Chang, J. P. Cunningham, G. H. Glover,
sive Computing, Signal Processing and 94. P. G. Samann, et al., Cerebral Cortex 21,
NeuroImage 44, 857869 (2009).
Applications (2010). 20822093 (2011).
22. C. Chang, G. H. Glover, NeuroImage 50,
57. H. J. Jo, Z. S. Saad, W. K. Simmons, 95. F. Sambataro, et al., Neurobiology of Ag-
8198 (2010).
L. A. Milbury, R. W. Cox, NeuroImage 52, ing 31, 839852 (2010).
23. C. Chang, G. H. Glover, NeuroImage 47,
571582 (2010). 96. S. Sargolzaei, A. S. Eddin, M. Cabrerizo,
14481459 (2009).
58. R. E. Kelly, et al., Journal of Neuroscience M. Adjouadi, 6th International
24. Z. Chen, V. Calhoun, Medical Imaging
Methods 189, 233245 (2010). IEEE/EMBS Conference on Neural
2011: Biomedical Applications in Molec-
59. D.-Y. Kim, J.-H. Lee, Neuroscience Let- Engineering (NER) (2013).
ular, Structural, and Functional Imaging
ters 498, 5762 (2011). 97. A. Sarje, N. Thakor, The 26th Annual Intl.
(2011). 60. V. Kiviniemi, et al., Hum. Brain Mapp. 30,
25. E. Congdon, et al., NeuroImage 53, Conference of the IEEE Engineering in
38653886 (2009). Medicine and Biology Society .
653663 (2010). 61. V. Kiviniemi, et al., Brain Connectivity 1,
26. R. T. Constable, et al., NeuroImage 64, 98. R. Scheeringa, et al., International Jour-
339347 (2011). nal of Psychophysiology 67, 242251
371378 (2013). 62. W. Koch, et al., NeuroImage 51, 280287
27. S. M. Daselaar, Frontiers in Human Neu- (2008).
(2010). 99. V. Schpf, et al., Journal of Neuroscience
roscience 3 (2009). 63. N. A. Kochan, et al., PLoS ONE 6, e23960
28. J. A. De Havas, S. Parimal, C. S. Soon, Methods 192, 207213 (2010).
(2011).
M. W. Chee, NeuroImage 59, 17451751 100. M. L. Seghier, E. Fagan, C. J. Price,
64. S. Kumar, A. Noor, B. K. Kaushik, B. Ku-
(2012). Journal of Neuroscience 30, 1680916817
mar, International Conference on Devices
29. M. De Luca, C. Beckmann, N. De Ste- (2010).
and Communications (ICDeCom) (2011).
fano, P. Matthews, S. Smith, NeuroImage 65. A. R. Laird, et al., Journal of Neuro- 101. K. Singh, I. Fawcett, NeuroImage 41,
29, 13591367 (2006). science 29, 1449614505 (2009). 100112 (2008).
30. F. De Martino, et al., NeuroImage 57, 66. R. Leech, R. Braga, D. J. Sharp, Journal 102. X. Song, X. Tang, The 12th Annual Meet-
10311044 (2011). of Neuroscience 32, 215222 (2012). ing of the Association for the Scien-
31. G. Derado, F. Bowman, T. Ely, C. Kilts, 67. X. Lei, et al., PLoS ONE 6, e24642 (2011). tific Study of Consciousness (ASSC2008)
Stat Interface 3, 45 (2010). 68. C.-S. R. Li, P. Yan, K. L. Bergquist, (2008).
32. G. Deshpande, S. LaConte, S. Peltier, R. Sinha, NeuroImage 38, 640648 (2007). 103. X. Song, et al., Medical Imaging 2013:
X. Hu, Hum. Brain Mapp. 30, 1323 69. R. Li, et al., NeuroImage 56, 10351042 Biomedical Applications in Molecular,
(2009). (2011). Structural, and Functional Imaging
33. G. Deshpande, K. Sathian, X. Hu, IEEE 70. R. Li, et al., Medical Imaging 2009: (2013).
Trans. Biomed. Eng. 57, 14461456 (2009). Biomedical Applications in Molecular, 104. D. Sridharan, D. J. Levitin, V. Menon, Pro-
34. L. Ekstrand, N. Karpinsky, Y. Wang, Structural, and Functional Imaging ceedings of the National Academy of Sci-
S. Zhang, JoVE (2013). (2009). ences 105, 1256912574 (2008).
Automated Image Mining in fMRI Reports: a Meta-research Study 87
105. T. Starck, J. Remes, J. Nikkinen, O. Ter- Alzheimer 26. X. Wu, et al., Hum. Brain Mapp. 32,
vonen, V. Kiviniemi, J Neurosci Methods 18681881 (2011).
186, 179 (2010). 27. H.-Y. Zhang, et al., Radiology 256,
106. D. Stawarczyk, S. Majerus, P. Maquet, 598606 (2010).
A. DArgembeau, PLoS ONE 6, e16997 1. F. Bai, et al., Brain Research 1302, 167174
(2009). 28. J. Zhou, et al., Brain 133, 13521367
(2011).
2. V. Bonavita, C. Caltagirone, C. M. andA- (2010).
107. K. Supekar, et al., NeuroImage 52, 290301
(2010). lessandro Padovani, E. Scarpini, S. Sorbi, 29. Y. Zhou, et al., Alzheimers & Dementia 4,
108. S. J. Teipel, et al., NeuroImage 49, Journal of Alzheimer’s Disease 29, 109 265270 (2008).
20212032 (2010). (2012).
109. S. Teng, et al., 35th Annual International 3. J. S. Damoiseaux, K. E. Prater, B. L.
Conference of the IEEE Engineering in Miller, M. D. Greicius, Neurobiology of
Medicine and Biology Society (EMBC) Aging 33, 828.e19828.e30 (2012).
(2013). 4. N. Filippini, et al., Proceedings of
110. M. Thomason, Frontiers in Human Neuro- the National Academy of Sciences 106,
111.
science 3 (2009).
M. E. Thomason, et al., NeuroImage 41, 5.
72097214 (2009).
T. Gili, et al., Journal of Neurology, Neu-
Schizophrenia
14931503 (2008). rosurgery & Psychiatry 82, 5866 (2011).
112. M. E. Thomason, et al., NeuroImage 55, 6. M. D. Greicius, G. Srivastava, A. L. Reiss,
165175 (2011). V. Menon, Proceedings of the National 1. C. Abbott, Decreased functional connec-
113. P. Valsasina, et al., Proc. Intl. Soc. Mag. Academy of Sciences 101, 46374642 tivity with aging and disease duration
Reson. Med (2009), vol. 17. (2004). in schizophrenia, Master’s thesis (2010).
114. R. Veselis, Best Pract Res Clin Anaesthe- 7. A. Hafkemeijer, J. van der Grond, S. A. Master Thesis.
siol 21, 297 (2007). Rombouts, Biochimica et Biophysica Acta
(BBA) - Molecular Basis of Disease 1822, 2. J.-C. Dreher, et al., Biological Psychiatry
115. H. Wang, Z. Lu, Seventh International
431441 (2012). 71, 890897 (2012).
Conference on Natural Computation
(2011). 8. Y. Han, et al., NeuroImage 55, 287295 3. M. J. Escart, et al., Schizophrenia Re-
116. L. Wang, X. Guo, J. Sun, Z. Jin, S. Tong, (2011). search 117, 3141 (2010).
Annual International Conference of the 9. S. C. Johnson, et al., Archives of General 4. J. H. Jang, et al., Schizophrenia Research
IEEE Engineering in Medicine and Biol- Psychiatry 64, 1163 (2007). 127, 5865 (2011).
ogy Society (2012). 10. W. Koch, et al., Neurobiology of Aging 33,
5. B. Jeong, M. Kubicki, Psychiatry
117. Z. Wang, J. Liu, N. Zhong, H. Zhou, 466478 (2012).
Research: Neuroimaging 181, 114120
Y. Qin, The 2010 International Joint 11. N. A. Kochan, et al., Biological Psychiatry
(2010).
Conference on Neural Networks (IJCNN) 70, 123130 (2011).
(2010). 12. J. Lee, J. C. Ye, IEEE International Con- 6. B. Nelson, et al., Neuroscience & Biobe-
118. I. Weissman-Fogel, M. Moayedi, K. S. ference on Systems, Man, and Cybernetics havioral Reviews 33, 807817 (2009).
Taylor, G. Pope, K. D. Davis, Hum. Brain (SMC) (2012). 7. M. Nielsen, et al., IEEE SMC99 Con-
Mapp. p. n/an/a (2010). 13. K. Lee, J. C. Ye, IEEE International ference Proceedings. 1999 IEEE Interna-
119. Y. D. van der Werf, E. J. Sanz-Arigita, Symposium on Biomedical Imaging: From tional Conference on Systems, Man, and
S. Menning, O. A. van den Heuvel, BMC Nano to Macro (2010). Cybernetics (Cat. No.99CH37028) .
Neuroscience 11, 145 (2010). 14. K. Li, et al., NeuroImage 61, 8297 (2012). 8. A. Rotarska-Jagiela, et al., Schizophrenia
120. S. Whitfield-Gabrieli, et al., NeuroImage 15. P. Liang, Z. Wang, Y. Yang, X. Jia, K. Li, Research 117, 2130 (2010).
55, 225232 (2011). PLoS ONE 6, e22153 (2011).
9. R. Salvador, et al., Hum. Brain Mapp. 31,
121. L. B. Wilson, J. R. Tregellas, E. Slason, 16. A.-L. Lin, A. R. Laird, P. T. Fox, J.-
20032014 (2010).
B. E. Pasko, D. C. Rojas, NeuroImage 55, H. Gao, Neurology Research International
724731 (2011). 2012, 117 (2012). 10. F. C. Schneider, et al., Schizophrenia Re-
122. M. Wirth, et al., NeuroImage 54, 17. Z. Liu, et al., NMR Biomed. 25, 13111320 search 125, 110117 (2011).
30573066 (2011). (2012). 11. S. Teng, et al., 2010 International Con-
123. C. Wu, et al., Neuroimage 45, 694 (2009). 18. M. M. Lorenzi, et al., Drugs & aging 28, ference on Bioinformatics and Biomedical
124. C. W. Wu, et al., NeuroImage 59, 205 (2011). Technology .
30753084 (2012). 19. K. Mevel, G. Chtelat, F. Eustache, 12. J. R. Tregellas, et al., Biological Psychia-
125. J.-T. Wu, et al., Neuroscience Letters 504, B. Desgranges, International Journal of try 69, 711 (2011).
6267 (2011). Alzheimers Disease 2011, 19 (2011).
126. L. Wu, T. Eichele, V. D. Calhoun, Neu- 20. J. Persson, et al., Neuropsychologia 46, 13. D. Vargas-Vázquez, Journal of Electronic
roImage 52, 12521260 (2010). 16791687 (2008). Imaging 14, 013006 (2005).
127. J. Yang, X. Weng, Y. Zang, M. Xu, X. Xu, 21. J. R. Petrella, F. C. Sheldon, S. E. Prince, 14. L. Wang, P. D. Metzak, T. S. Woodward,
Cortex 46, 354366 (2010). V. D. Calhoun, P. M. Doraiswamy, Neurol- Schizophrenia Research 125, 136142
128. W. Zeng, A. Qiu, B. Chodkowski, J. J. ogy 76, 511517 (2011). (2011).
Pekar, NeuroImage 46, 10411054 (2009). 22. Y.-w. Sun, et al., Behavioural Brain Re- 15. S. Whitfield-Gabrieli, et al., Proceedings
129. D. Zhang, A. Z. Snyder, J. S. Shimony, search 223, 388394 (2011). of the National Academy of Sciences 106,
M. D. Fox, M. E. Raichle, Cerebral Cortex 23. P. Toussaint, et al., IEEE International 12791284 (2009).
20, 11871194 (2010). Symposium on Biomedical Imaging: From
130. H. Zhang, et al., NeuroImage 51, 16. N. D. Woodward, B. Rogers, S. Heckers,
Nano to Macro (2011).
14141424 (2010). Schizophrenia Research 130, 8693 (2011).
24. P.-J. Toussaint, et al., NeuroImage 63,
131. S. Zhang, C.-s. R. Li, Hum. Brain Mapp. 936946 (2012). 17. Q. Yu, et al., Front. Syst. Neurosci. 5
33, 89104 (2012). 25. F. Vogelaere, P. Santens, E. Achten, (2012).
132. Z. Zhou, et al., Magnetic Resonance Imag- P. Boon, G. Vingerhoets, Neuroradiology 18. D. ngr, et al., Psychiatry Research: Neu-
ing 29, 418433 (2011). 54, 11951206 (2012). roimaging 183, 5968 (2010).
References
6. Deco G, Jirsa VK, McIntosh AR (2011) Emerging concepts for the dynamical organization of
resting-state activity in the brain. Nat Rev Neurosci 12(1):43–56
7. Derrfuss J, Mar R (2009) Lost in localization: The need for a universal coordinate database.
Neuroimage 48(1):1–7
8. Duda RO, Hart PE (1972) Use of the Hough transformation to detect lines and curves in
pictures. Commun ACM 15(1):11–15
9. Esposito F, Pignataro G, Di Renzo G, Spinali A, Paccone A, Tedeschi G, Annunziato L (2010)
Alcohol increases spontaneous BOLD signal fluctuations in the visual network. Neuroimage
53(2):534–43
10. FIL Methods Group: Statistical Parametric Mapping. http://www.fil.ion.ucl.ac.uk/spm/
11. Hand D, Mannila H, Smyth P (2001) Principles of data mining. MIT Press, Cambridge
12. Huettel SA, Song AW, McCarthy G (2008) Functional magnetic resonance imaging, 2nd ed.
Sinauer, Sunderland
13. Johnson SC, Ries ML, Hess TM, Carlsson CM, Gleason CE, Alexander AL, Rowley HA,
Asthana S, Sager MA (2007) Effect of Alzheimer disease risk on brain function during self-
appraisal in healthy middle-aged adults. Arch Gen Psychiat 64(10):1163–1171
14. Laird AR, Lancaster JL, Fox PT (2009) Lost in localization? the focus is meta-analysis.
Neuroimage 48(1):18–20
15. Levy DJ, Glimcher PW (2012) The root of all value: a neural common currency for choice.
Curr Opin Neurobiol 22(6):1027–1038
16. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision
60:91–110
17. Raichle ME, MacLeod AM, Snyder AZ, Powers WJ, Gusnard DA, Shulman GL (2001) A
default mode of brain function. Proc Natl Acad Sci USA 98(2):676–682
18. Rajasekharan J, Scharfenberger U, Gonçalves N, Vigário R (2010) Image approach towards
document mining in neuroscientific publications. In: IDA, pp 147–158
19. Snyder AZ, Raichle ME (2012) A brief history of the resting state: the Washington
University perspective. NeuroImage 62(2):902–910 http://www.sciencedirect.com/science/
article/pii/S10538119120 00614
20. Szeliski R (2010) Computer vision: algorithms and applications, 1st edn. Springer, New York
21. Yarkoni T, Poldrack RA, Nichols TE, Van Essen DC, Wager TD (2011) Large-scale automated
synthesis of human functional neuroimaging data. Nature Methods 8(8):665–670
22. Ylipaavalniemi J, Vigário R (2008) Analyzing consistency of independent components: an
fMRI illustration. NeuroImage 39(1):169 – 180. http://dx.doi.org/10.1016/j.neuroimage.2007.
08.027. http://www.sciencedirect.com/science/article/pii/S1053811907007288
23. Yushkevich PA, Piven J, Cody Hazlett H, Gimpel Smith R, Ho S, Gee JC, Gerig G (2006)
User-guided 3D active contour segmentation of anatomical structures: significantly improved
efficiency and reliability. Neuroimage 31(3):1116–1128
Visual Pattern Recognition Framework Based
on the Best Rank Tensor Decomposition
B. Cyganek
Abstract In this paper a framework for visual patterns recognition of higher dimen-
sionality is discussed. In the training stage, the input prototype patterns are used to
construct a multidimensional array—a tensor—whose each dimension corresponds
to a different dimension of the input data. This tensor is then decomposed into
a lower-dimensional subspace based on the best rank tensor decomposition. Such
decomposition allows extraction of the lower-dimensional features which well repre-
sent a given training class and exhibit high discriminative properties among different
pattern classes. In the testing stage, a pattern is projected onto the computed tensor
subspaces and a best fitted class is provided. The method presented in this paper, as
well as the software platform, is an extension of our previous work. The conducted
experiments on groups of visual patterns show high accuracy and fast response time.
1 Introduction
B. Cyganek ()
AGH University of Science and Technology, Al. Mickiewicza 30, 30-059 Krakow, Poland
e-mail: [email protected]
© Springer International Publishing Switzerland 2015 89
J. M. R. S. Tavares, R. Natal Jorge (eds.), Developments in Medical Image Processing
and Computational Vision, Lecture Notes in Computational Vision and Biomechanics 19,
DOI 10.1007/978-3-319-13407-9_6
90 B. Cyganek
Recently, multidimensional arrays of data, called tensors, were proposed for pattern
recognition. These, especially well fit into the problem of pattern recognition in
visual signals due to a direct representation of each of the dimensions of the input
signal.
Even more important are the methods of analyzing tensor content. In this respect
a number of tensor decomposition methods were proposed [5, 10–12, 16]. In this
respect the three decomposition methods are as follows.
1. The Higher-Order Singular Value Decomposition (HOSVD) [11].
2. The best rank-1 [12].
3. The best rank-(R1 , R2 , . . ., RK ) approximations [12, 16].
First of the above, the HOSVD can be used to build the orthogonal space for pattern
recognition [14]. Its variant operating on tensors obtained from the geometrically de-
formed prototype patterns is discussed in [5]. However, HOSVD is not well suitable
for data reduction. Although there is a truncated version HOSVD, its results lead
to excessive errors. Thus, usually a truncated HOSVD is treated only as a coarse
approximation or it can serve as an initialization method for other decompositions.
Visual Pattern Recognition Framework Based on the Best Rank Tensor Decomposition 91
In terms of dimensionality reduction, better results can be obtained with the best
rank-1 decomposition [12]. However, the best rank-(R1 , R2 , . . ., RK ) approximation
offers much better behavior in terms of pattern representation in lower-dimensional
subspaces, as shown by de Lathauwer [12], as well as other researchers, such as
Wang and Ahuja [15, 16]. In this paper we follow this approach, discussing its prop-
erties and a method of pattern recognition, as well as providing an experimental and
software framework for pattern recognition with the best rank tensor decomposition.
In some computations, it is more efficient to represent the tensor and matrix product
given in (2) in an equivalent representation based on the p-mode tensor flattening
and the Kronecker product. That is,
T = Z×1 S1 ×2 S2 . . . ×K SK , (5)
where ||.||F denotes the Frobenius norm . It can be shown that the approximated tensor
T̃ conveys as much of the “energy”, in the sense of the squared entries of a tensor, as
the original tensor T , under the requested rank constraints. A value of E is called the
reconstruction error. Figure 1 depicts the best rank-(R1 , R2 , R3 ) decomposition of a
3D tensor T ∈ N1 ×N2 ×N3 . However, contrary to the rank definition of the matrices,
there are different rank definitions for tensors. For more discussion see [5, 11].
It can be also easily observed that the assumed rank conditions mean that the
approximation tensor T̃ can be decomposed as follows
T̃ = Z×1 S1 ×2 S2 . . . ×K SK , (8)
Each of the matrices S1 ∈ N1 ×R1 , S2 ∈ N2 ×R2 , . . ., and SK ∈ NK ×RK in (8) has
orthonormal columns. The number of columns for Si is given by Ri .
The core tensor Z ∈ R1 ×R2 ×...×RK is of dimensions R1 , R2 , . . ., RK . It can be
computed from the original tensor T as follows
N1 S1
R1 Z
N1
R3
R1 R2
N3
S3
N2
3
N
3
R
N2 S2
R2
As alluded to previously, the only control parameters of the method are the ranks R1 ,
R2 , and R3 . A trade-off can be achieved between the compression ratio C in (10)
with respect to the approximation error expressed in Eq. (7). This influences also
pattern recognition accuracy, as will be discussed.
The already described, the subspace obtained after the best rank decomposition can
be used to generate specific features of an image X, which can be then used for pattern
recognition [16]. The features are obtained by projecting the image X of dimensions
N1 × N2 into the space spanned by the two matrices S1 and S2 in accordance with (9).
However, at first the pattern X needs to be represented in an equivalent tensor form
X which is of dimensions N1 × N2 × 1. Then, the feature tensor F of dimensions
R1 × R2 × 1 is obtained by projecting X onto the space spanned by S1 and S2 , as
follows
Tensor T contains is constructed out of the available training patterns. However, the
method can work depending on a number of available training patterns, starting from
only one exemplar, as will be discussed. Hence, in our framework the following two
scenarios were evaluated, depending on the available number of training patterns:
94 B. Cyganek
Answer
Test pattern
TENSOR BASED CLASSIFIER
X
Prototype pattern
Generator of
deformable Best rank-R tensor
prototypes decomposition
Fig. 2 The process of the 3D pattern tensor generation by geometrical warping of the prototype
pattern
1. A set of prototype patterns Pi of the same object is available. These are used to
form the input tensor T .
2. If only one prototype P is available, its different appearances Pi are generated by
geometrical warping of the available pattern. This process is visualized in Fig. 2.
As a result the patterns form a 3D tensor after the best-rank decomposition spans the
space representing that class. In the case of multiple classes, a 3D tensor is built for
each of the classes separately.
The next step after the best rank-(R1 , R2 , . . . , RK ) decomposition consist of
building features from each of the prototype patterns Pi from the tensor T . These
are computed as follows
Fi = Pi ×1 ST1 ×2 ST2 , (12)
where Pi denotes an N1 × N2 × 1 tensor representation of the pattern Pi . In the
same way features are computed for the tensor PX created from the test pattern
PX . It is interesting to notice that dimensions of the computed in this way features
are much less than dimensions of the original patterns due to data compression
expressed by (10). However, they represent the two-dimensional dominating spaces
in each dimension independently. Thus, their discriminative properties are usually
high despite low-dimensional representation.
Finally, a quantitative measure of the fitness of the test pattern PX to the prototypes
of a class c is computed based on the following formula
N3 % %
1 % (c) %
ρc = %FX − Fi % . (13)
N3 i=1 F
Figure 3 depicts the described process of multi-class pattern recognition from the
best-rank decomposition of the prototype pattern tensor.
As alluded to previously, the training parameters are the chosen rank values of R1 ,
R2 , and R3 in (8). In our experiments these are usually determined experimentally, al-
though they can be also chosen after analyzing signal energy level in the decomposed
tensor. However, especially interesting is the case of R3 = 1 which means that the
third dimension of the pattern tensor, which reflects a number of training patterns,
will be compressed to one the most prominent example. Such strategy frequently
leads to superior results, as will be presented in the experimental part.
PN3
...
P2
P1
Z 1
N1 S1 2 N2 S2 3
N3 S3
N1 T
R1 R2 R3
N3
N2
N1 P1 1 R1 ST1 2 R2 ST2 = R1
F1
N2
N1 R2
N2
...
...
...
...
N1 PN3 ST1 ST2 = FN3
1 R1 2 R2 R1
N2
R2
N2 N1
N1 X ST1 ST2 = R1
FX
1 R1 2 R2
N2
R2
N2 N1
Fig. 3 Pattern recognition scheme with the best rank-(R1 , R2 , . . ., RK ) decomposition of a tensor
composed from the prototype patterns P1 , P2 , . . ., PN3 of a single class. Decomposition of the
pattern tensor provides the lower-dimensional subspaces given by the column orthogonal matrices
S1 , S2 , and S3 . Prototype features are obtained by projecting each prototype patterns onto the space
spanned by the matrices S1 and S2 . Features of the test pattern X are finally compared with the
prototype features. The procedure is repeated for each class and the class with the best match of
features is returned by the classifier
The above HOOI procedure has been implemented in our software framework,
as described in [5]. The implementation utilizes C++ classes with basic data types
defined as template parameters, as shown in Fig. 5. This allows time and memory
Visual Pattern Recognition Framework Based on the Best Rank Tensor Decomposition 97
Z Z
savings by using the fixed point representation of data instead of the floating point. In
the presented experiments the 12.12 fixed point representation showed to be sufficient
(each data is stored on 3 bytes instead of 8, needed in the case of the floating point
representation).
The Best_Rank_R_DecompFor, shown in Fig. 5, is the main class for the best-rank
tensor decomposition. It is derived from the TensorAlgebraFor class which imple-
ments all basic operations on tensors, such as the p-mode multiplications, discussed in
the previous section. Tensors, are represented by objects of the class TFlatTensorFor
which represents tensors in the flattened form. The Best_Rank_R_DecompFor class
is accompanied by the S_Matrix_Initializer hierarchy. Its main role is to define the
way of initial setup of the values of the Si matrices for the HOOI process. In our case
these were initialized with randomly generated values of uniform distribution [5, 7].
98 B. Cyganek
TFlatTensorFor
# fTensorMode : int
# fIndexVector: vector
+ TFlatTensorFor(
const TensorElem_IndexVector & indexVec,
int tensor_mode,
ECyclicMode cyclicMode = kForward )
N
1
T
TensorAlgebraFor
T
Best_Rank_R_DecompFor T
S_Matrix_Initializer
# fMatrix_Initializer_Obj : S_Matrix_Initializer< T, ACC > *
1 1
+ operator()( const FlatTensor & T,
+ operator() ( const FlatTensor & T,
const RankVector & ranks,
const RankVector & requested_ranks,
SingleMatrixVector & S_vector ) = 0 : bool
typename SingleMatrixVector & S_vector,
const AccumulatorType epsilon = 1e-6,
const int max_iter_counter = 1000,
int * num_of_iterations = 0 ) : FlatTensor_AP
+ FindDominantSubspace( T
Compute_SVD & svd_calculator,
const typename FlatTensor::DataMatrix & S_hat,
OrphanInitializedMatrices_S_UniformRandomGenerator
typename FlatTensor::DataMatrix & S,
int requested_tensor_Rank_k,
int tensor_index_k );
+ operator()( const FlatTensor & T, const RankVector & ranks, SingleMatrixVector & S_vector ) : bool
Fig. 5 Class hierarchy from the DeRecLib library implementing the best-rank tensor decomposition
for tensors of any dimensions and any type of elements
4 Experimental Results
This paper is based on the previous version, presented in [4]. In this section we cite
these results, augmented with results of the tests on face recognition. Figure 6 depicts
a maxillary radiograph (left), as well as the implant pattern (right).
In the first task, the implants in the maxillary radiograph images are recognized
with the proposed technique. At first, the places of implants are detected by exploiting
their high contrast in the radiograph images [6]. These are detected as highly contrast
areas, which after registration are fed to the tensor classifier described in the previous
sections. Since only one example of the prototype image is usually available, its
different appearances are generated by image warping, as described in the previous
section. In the experiments an implant pattern is rotated in the range of ±12◦ .
Visual Pattern Recognition Framework Based on the Best Rank Tensor Decomposition 99
Fig. 6 An example of a maxillary radiograph image (a) and a dental implant to be recognized (b).
(Based on [4])
Fig. 7 Examples of the geometrically deformed versions of the prototype image of an implant. These
are formed into a 3D tensor which after the best-rank approximation is used in object recognition.
(From [4])
Fig. 8 Reconstruction error E in respect to the compression ratio C of the input patterns (a) Accuracy
A of pattern recognition in respect to the compression ratio C of the input patterns (b). (From [4])
There are ten different images of each of 40 distinct persons. For few subjects,
the images were taken at different times, at varying lighting conditions, as well as
with different facial expressions (open/closed eyes, smiling/not smiling) and facial
Visual Pattern Recognition Framework Based on the Best Rank Tensor Decomposition 101
Fig. 9 Examples of the images from the Olivetti Research Lab (ORL)—now ATT Labs. There are
40 subjects, for each there are ten images from which a number were randomly selected for training
and the remaining for testing
details (glasses/no glasses). All the images were taken against a dark homogeneous
background with the subjects in an upright, frontal position (with tolerance for some
side movement).
Figure 10 presents two accuracy plots obtained on the ATT face database with
the presented method. In Fig. 10a accuracy is shown in respect to different rank
assignments, which directly influence compression ratio, in accordance with formula
(10). In this experiment nine images were used for training and the remaining one
for testing. The procedure was repeated 10 times. The rank values in Fig. 10a are as
follows: (20, 20, 1), (20, 20, 3), (40, 40, 1), (10, 10, 1), (20, 20.9). We notice, that
different ranks lead to different accuracy and there is no simple formula joining the
compression ration C with accuracy A. Nevertheless, high C leads to lowering A.
In Fig. 10b the same ranks (20, 20,1 ) are used and the accuracy is drawn in respect
to different partitions of the database patterns into the training and testing groups
respectively. These are as follows: 9 vs. 1, 7 vs. 3, 5 vs. 5, and 3 vs. 7. Although,
a lowering number of training patterns with a higher number of test patterns leads
to lower accuracy, the drop is by 0.1 (that is, by 10 %). For future research we plan
further investigation, as well as we will try to develop the methods of automatic rank
assignments based on signal properties.
The used database is demanding due to high diversity of face appearances within
majority of single person. Despite this difficulty, the proposed method allows high
accuracy and performs in real-time. Hence, the method can be used in many medical,
as well as biometrical on other pattern recognition tasks.
5 Conclusions
Fig. 10 Accuracy of face recognition in respect to different compression ratio C (a). Accuracy of
face recognition for the same compression ratio (20, 20, 1) and different assignments T of training
vs. testing images (b)
showed high accuracy and fast response time. In the presented experiments with
implant recognition in maxillary radiograph images, the reached accuracy is 97 %.
The method was also tested on the problem of face recognition. In the task of face
recognition from the face database the method achieves 90 % accuracy on average.
Additionally, the object-oriented software platform was presented which, apart from
training computations, allows real time response time. It was also indicated that the
training process can be easily parallelized, since each class can be processed inde-
pendently. The software for tensor decomposition is available from the webpage [7].
Our future research on this subject will concentrate on further analysis, measurement
of different signal transformations, as well as on development of methods for best
rank assignments.
Acknowledgements The financial support from the Polish National Science Centre NCN in the
year 2014, contract no. DEC-2011/01/B/ST6/01994, is greatly acknowledged.
References
1. Chen J, Saad Y (2009) On the tensor svd and the optimal low rank orthogonal approximation
of tensors. SIAM J Matrix Anal Appl 30(4):1709–1734
2. Cichocki A, Zdunek R, Amari S (2008) Nonnegative matrix and tensor factorization. IEEE
Signal Process Mag 25(1):142–145
3. Cichocki A, Zdunek R, Phan AH, Amari S-I (2009) Nonnegative matrix and tensor factoriza-
tions. Applications to exploratory multi-way data analysis and blind source separation. Wiley,
Chichester
4. Cyganek B (2013) Pattern recognition framework based on the best rank-( R1 , R2 ,. . ., RK ) tensor
approximation. In: Computational vision and medical image processing IV: proceedings of
VipIMAGE 2013—IV ECCOMAS thematic conference on Computational vision and medical
image processing, pp 301–306
5. Cyganek B (2013) Object detection and recognition in digital images: theory and practice.
Wiley
6. Cyganek B, Malisz P (2010) Dental implant examination based on the log-polar matching of
the maxillary radiograph images in the anisotropic scale space. IEEE Engineering in Medicine
and Biology Conference, EMBC 2010, Buenos Aires, Argentina, pp 3093–3096
7. DeRecLib (2013) http://www.wiley.com/go/cyganekobject
8. Duda RO, Hart PE, Stork DG (2001) Pattern classification. Wiley, New York
9. https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
10. Tamara GK, Brett WB (2009) Tensor decompositions and applications. SIAM Rev 51(3):
455–500
11. Lathauwer de L (1997) Signal processing based on multilinear algebra. PhD dissertation,
Katholieke Universiteit Leuven
12. Lathauwer de L, Moor de B, Vandewalle J (2000) On the best rank-1 and rank-( R1 , R2 , . . .,
RN ) approximation of higher-order tensors. SIAM J Matrix Anal Appl 21(4):1324–1342
13. Muti D, Bourennane S (2007) Survey on tensor signal algebraic filtering. Signal Process
87:237–249
14. Savas B, Eldén L (2007) Handwritten digit classification using higher order singular value
decomposition. Pattern Recognit 40(3):993–1003
15. Wang H, Ahuja N (2004) Compact representation of multidimensional data using tensro rank-
one decomposition. In: Proceedings of the 17th international conference on pattern recognition,
Vol 1, 4pp 4–47
16. Wang H, Ahuja N (2008) A tensor approximation approach to dimensionality reduction. Int J
Comput Vision 76(3):217–229
Tracking Red Blood Cells Flowing through a
Microchannel with a Hyperbolic Contraction:
An Automatic Method
Abstract The present chapter aims to assess the motion and deformation index
of red blood cells (RBCs) flowing through a microchannel with a hyperbolic con-
traction using an image analysis based method. For this purpose, a microchannel
containing a hyperbolic contraction was fabricated in polydimethylsiloxane by using
a soft-lithography technique and the images were captured by a standard high-speed
microscopy system. An automatic image processing and analyzing method has been
developed in a MATLAB environment, not only to track both healthy and exposed
RBCs motion but also to measure the deformation index along the microchannel.
The keyhole model has proved to be a promising technique to track automatically
healthy and exposed RBCs flowing in this kind of microchannels.
1 Introduction
using manual methods several studies were able to measure motion [1, 7, 10–14]
and dynamical deformation [6, 19, 22, 28] of RBCs flowing through microchannels.
However, the manual data collection is extremely time consuming and may in-
troduce users’ errors into the data. Hence, it is crucial to develop sophisticated
computerized methods able to track automatically multiple cell trajectories and
reduce possible errors by the users’ evaluation. Several researchers have been devel-
oping different kinds of automatic particle tracking tools for Image J [18, 23, 24],
Matlab [19, 24], LabVIEW [4, 18] and IDL [5]. A promising plugin for Image J
is the “Particletracker” [23]. However, this plugin is still under development as the
automatic tracking trajectories tend to overlap, especially at high concentration of
particles and/or cells. Recently, Pinho et al. [20] have developed a Matlab module to
track automatically individual RBCs flowing through a microchannel. However, this
method did not measure the RBCs deformability. Hence, it is essential to develop an
automatic method able to perform both tracking and deformability measurements of
individual RBCs.
In this study, we propose an automatic image analysis technique based on the
keyhole tracking algorithm that describes the probable movement of RBCs model
[21]. First, a sequence of binary images containing segmented foreground objects
were obtained by pre-processing videos, and then tracks were formed by linking
the objects with common optical flow in contiguous frames. Finally, we measure the
deformation of individual RBCs flowing through a microchannel having a hyperbolic
contraction. In this geometry the RBCs mechanical properties are under the effect
of a strong extensional flow.
Optical flow segmentation is usually defined as grouping of pixels of similar
intensity that are associated with smooth and uniform motion information. However,
this is a problem that is loosely defined and ambiguous in certain ways. Though the
definition of motion segmentation says that regions with coherent motion are to be
grouped, the resulting segments may not correspond to meaningful RBC regions in
the image. To alleviate this issue the motion segmentation problem is placed at two
levels namely low level and high level. Low level motion segmentation tries to group
pixels with homogeneous motion vectors without taking any other information apart
from intensity or image gradient. High level motion segmentation divides the image
into regions that exhibit coherent motion and it also uses other image cues to produce
image segments that correspond to projections of real RBCs.
It has been acknowledge by many authors that it is very difficult to determine the
motion of pixels in areas of smooth intensity and that the motion in these areas must
invariably be found by extrapolating from nearby features. These smooth areas of
the image can be determined prior to any motion analysis by performing an initial
segmentation based purely on intensity (or other spatial cues) to combine these
smooth areas into individual atomic regions. The motion of these regions, rather
than pixels, is then determined and these regions clustered together according to
their motion.
Our method takes the spatial atomic regions produced by the watershed algorithm
and a variational motion estimation method [2] and combines them into a complete
algorithm producing a reliable motion segmentation framework which is used in the
tracking step.
Tracking Red Blood Cells Flowing through a Microchannel . . . 107
The working fluid used in this study was Dextran 40 (Dx40) containing ∼2 % of
human RBCs (i.e., hematocrit, Hct ∼2 %). The blood was collected from a healthy
adult volunteer, and EDTA (ethylenediaminetetraacetic acid) was added to the sam-
ples to prevent coagulation. The blood samples were washed by centrifugation and
then stored hermetically at 4o C until the experiments were performed at room tem-
perature. For the RBCs exposed to chemicals, the cells were incubated for 10 mins at
room temperature with 0.02 % diamide (Sigma-Aldrich). After the incubation time,
RBCs exposed to chemicals were washed in physiological saline and re-suspended
in Dextran 40 at 2 % Hct and then used immediately in our experiments.
The microchannels containing a hyperbolic contraction were produced in poly-
dimethylsiloxane (PDMS) using a standard soft-lithography technique from a SU-8
photoresist mold. The molds were prepared in a clean room facility by photo-
lithography using a high-resolution chrome mask. The geometry of the fabricated
microchannel is shown in Fig. 1. The channel has a constant depth of 14 μm through-
out the PDMS device and the width of the upstream and downstream channels is
400 μm. The minimum width in the hyperbolic contraction region is 20 μm.
For the microfluidic experiments, the device containing the microchannel was
placed on the stage of an inverted microscope (IX71, Olympus). The flow rate of
0.5 μL/min was controlled using a syringe pump (PHD ULTRA). The images of the
flowing RBCs were captured using a high speed camera (FASTCAM SA3, Photron)
and transferred to the computer to be analyzed. An illustration of the experimental
setup is shown in Fig. 2.
108 B. Taboada et al.
Fig. 2 Experimental setup: inverted microscope, high speed camera and syringe pump
The proposed methodology has five major stages. First, we remove background,
noise and some artifacts of the original movie, as a pre-processing stage, obtaining
an image only with the RBCs. Next, we create an over-segmented image, based on
the initial magnitude gradient image, using the watershed transform. The optical flow
information of these regions is obtained by using the variational method proposed
by Brox et al. [2]. After that, the cell tracking links the atomic regions in contiguous
frames, according to their motion, to form the tracks by means of a keyhole model
proposed by Reyes-Aldasoro et al. [21]. Finally, we measure the deformation index
of each RBC.
Optical flow is defined as the 2D vector field that matches a pixel in one image
to the warped pixel in the other image. In other words, optical flow estimation
tries to assign to each pixel of the current frame a two-component velocity vector
indicating the position of the same pixel in the reference frame. The segmentation
of an image sequence based on motion is a problem that is loosely defined and
ambiguous in certain ways. Optical flow estimation algorithms often generate an
inaccurate motion field mainly at the boundaries of moving objects, due to reasons
such as noise, aperture problem, or occlusion. Therefore, segmentation based on
motion alone results in segments with inaccurate boundaries.
Tracking Red Blood Cells Flowing through a Microchannel . . . 109
At this stage, the image background is removed by subtracting the average of all
movie images from each image. To improve the identification of the RBCs the image
contrast is adjusted by histogram expansion.
Images taken with digital cameras will pick up noise from a variety of sources.
As the watershed algorithm is very sensitive to noise it is desirable to apply a noise
reduction filter in the pre-processing step. Several filters have been proposed in the
literature to reduce the spurious boundaries created due to noise. However, most of
these filters tend to blur image edges while they suppress noise. To prevent this effect
we use the non-linear bilateral filter [25].
The basic idea underlying the bilateral filter is to replace the intensity of a pixel by
taking a weighted average of the pixels within a neighbourhood (in a circle) with the
weights depending on both the spatial and intensity difference between the central
pixel and its neighbours. In smooth regions, pixel values in a small neighbourhood
are similar to each other and the bilateral filter acts essentially as a standard domain
filter, averaging away the small, weakly correlated, differences between pixel values
caused by noise. Bilateral filter preserves image structure by only smoothing over
those neighbours which form part of the “same region” as the central pixel.
110 B. Taboada et al.
An ideal over-segmentation should be easy and fast to obtain, and should not contain
too many segmented regions and it should have its region boundaries as a superset
of the true image region boundaries. In this section we present an algorithm step that
groups pixels into “atomic regions”. The motivations of this preliminary grouping
stage resemble the perceptual grouping task: (1) abandoning pixels as the basic
image elements, we instead use small image regions of coherent structure to define
the optical flow patches. In fact, since the real world does not consist of pixels, it can
be argued that this is even a more natural image representation than pixels as those
are merely a consequence of the digital image discretization.
Watershed transform is a classical and effective method for image segmentation
in grey scale mathematical morphology. For images the idea of the watershed con-
struction is quite simple. An image is considered as a topographic relief where for
every pixel in position (x, y), its brightness level plays the role of the z-coordinate
in the landscape. Local maxima of the activity image can be thought of as mountain
tops, and minima can be considered as valleys.
In the flooding or immersion approach [26], single pixel holes are pierced at each
regional minimum of the activity image which is regarded as topographic landscape.
When sinking the whole surface slowly into a lake water leaks through the holes,
rising uniformly and globally across the image, and proceeds to fill each catchment
basin. Then, in order to avoid water coming from different holes merge, virtual dams
are built at places where the water coming from two different minima would merge.
Figure 3 illustrates the immersion simulation approach. Fig. 3a shows a 1D func-
tion with five minima. Water rises in and fills the corresponding catchment basins,
as in Figs. 3b–c. When water in basins b3 and b4 begin to merge a dam is built to
prevent this overflow of water. Similarly, the other watershed lines are constructed.
When the image surface is completely flooded the virtual dams or watershed lines
separate the catchment basins from one another and correspond to the boundaries of
the regions as shown in Fig. 3d.
Tracking Red Blood Cells Flowing through a Microchannel . . . 111
In many differential methods, the estimation of optical flow relies on the assumption
that objects in an image sequence may change position but their appearance re-
mains the same or nearly the same (brightness constancy assumption) [17] from
time t to time t + 1. Brox et al. [2] proposed a variational method that com-
bines a brightness constancy assumption, a gradient constancy assumption and a
discontinuity-preserving spatio-temporal smoothness constraint.
Estimating optical flow involves the solution of a correspondence problem. That
is, what pixel in one frame corresponds to what pixel in the other frame. In order
to find these correspondences one needs to define some assumptions that are not
affected by the displacement. The combined variational approach [2] differs from
usual variational approaches by the use of a gradient constancy assumption. This
assumption provides the method with the capability to yield good estimation results
even in the presence of small local or global variations of illumination.
Constancy Assumptions on Data Given two successive images of a sequence
I (x, y, t) and I (x + u, y + v, t + 1) we seek at each pixel x := (x, y, t)T the optical
flow vector v (x) := (u, v, 1)T that describes the motion of the pixel at x to its new
location (x + u, y + v, t + 1) in the next frame.
• Brightness constancy assumption
The common assumption is that the grey value of the pixel does not change as it
undergoes motion:
I (x, y, t) = I (x + u, y + v, t + 1) (1)
However, this constancy assumption cannot only deal with image sequences with
either local or global change in illumination. In this case other assumptions that
are invariant against brightness changes must be applied. Invariance can be en-
sured by considering spatial derivatives. Horn and Schunck [9] add a smoothness
assumption to regularize the flow, and Lucas and Kanade [17] assume constant
motion in small windows.
• Gradient constancy assumption
A global change in illumination both shifts and/or scales the grey values of an
image sequence. Shifting the grey values will not affect the gradient. Although
scaling the grey values changes the length of the gradient vector it does not affect
its direction. Thus, we assume that the spatial gradients of an image sequence can
be considered as constant during motion:
∇I (x, y, t) = ∇I (x + u, y + v, t + 1) (2)
where ∇ = (∂x, ∂y) denotes spatial gradient. Although the gradient can slightly
change due to changes in the grey value too, it is much less dependent on the
illumination than on the brightness assumption.
Finding the flow field by minimizing the data term alone is an ill-posed problem since
the optimum solution, especially in homogeneous areas, might be attained by many
112 B. Taboada et al.
dissimilar displacement fields. This is the aperture problem: the motion of a homoge-
neous contour is locally ambiguous. In order to solve this problem some regularisa-
tion is required. The most suitable regularisation assumption is piecewise smoothness
[2], that arises in the common case of a scene that consists of semi-rigid objects.
The data term ED (u, v) incorporates the brightness constancy assumption, as
well as the gradient constancy assumption. While the first data term models the
assumption that the grey-level of objects is constant and does not change over time,
the second one accommodates for slight changes in the illumination. This is achieved
by assuming constancy of the spatial image gradient:
ED (u, v) = ψ |I (x + v) − I (x)|2 + γ |∇I (x + v) − ∇I (x)|2 dx (3)
Ω
where Ω is the region of interest (the image) over which the minimization is done.
The parameter γ relates the weight of the two constancy assumptions, and ψ s 2 =
√
s 2 + ε 2 is a non-quadratic (convex) penaliser applied to both the data and the
smoothness term which represents a smooth approximation of the L1 norm, L1 (s) =
|s|. Using the L1 norm rather than the common L2 norm reduces the influence of
outliers and makes estimation robust. Due to the small positive constant ε, ψ s 2 is
still convex which offers advantages in the minimization process. The incorporation
of the constant ε makes the approximation differentiable at s = 0; the value of ε sets
the level of approximation which we choose to be 0.001.
Applying a non-quadratic function to the data term addresses problems at the
boundaries of the image sequence, where occlusions occur and therefore outliers in
the data compromise the correct estimation of the flow field.
Smoothness Assumption The smoothness assumption [2, 9, 27] is motivated by the
observation that it is reasonable to introduce a certain dependency between neigh-
bouring pixels in order to deal with outliers caused by noise, occlusions or other local
violations of the constancy assumption. This assumption states that disparity varies
smoothly almost everywhere (except at depth boundaries). That means we can expect
that the optical flow map is piecewise smooth and it follows some spatial coherency.
This is achieved by penalising the total variation of the flow field. Smoothness is
assumed by almost every correspondence algorithm. This assumption fails if there
are thin fine-structured shapes (e.g. branches of a tree, hairs) in the scene.
Horn and Schunck proposed in their model the following smoothness (homoge-
neous) term [9]:
ESH S (u, v) = |∇u|2 + |∇v|2 dx (4)
Ω
However, such a smoothness assumption does not respect discontinuities in the flow
field. In order to be able to capture also locally non-smooth motion it is necessary
to allow outliers in the smoothness assumption. This can be achieved by the non-
quadratic penaliser ψ also used in the data term. Thus, the smoothness term ES (u, v)
becomes:
ES (u, v) = ψ |∇u|2 + |∇v|2 dx (5)
Ω
Tracking Red Blood Cells Flowing through a Microchannel . . . 113
The smoothness term gives a penalty to adjacent segments which have different
motion parameters.
Energy Functional Applying non-quadratic penaliser functions to both the data
and the smoothness term and also integrating the gradient constancy assumption,
results in the optical flow model described by the following energy functional:
where α is some positive regularisation parameter which balances the data term Ed
with the smoothness term Es : Larger values for α result in a stronger penalisation of
large flow gradients and lead to smoother flow fields.
The minimization of E (u, v) is an iterative process, with external and internal
iterations. The reader is referred to Brox et al. [2] for a solution to minimize this
functional.
2.3.4 Tracking
The cell tracking is performed following the keyhole model proposed by Reyes-
Aldasoro et al. [21] which predicted the most probable position of a RBC at time
t + 1 from the position in times t − 1 and t. Assuming that child RBC (cell at frame t)
moves in the same direction and velocity as its parent (cell at frame t −1) it is possible
to predict the position of the cell in the next frame t + 1. Of course, this would not
cover major changes in speed or turns. Two regions of probability where the RBC
is most probable to be were therefore defined: a narrow wedge (60◦ wide) oriented
towards the predicted position, and a truncated circle (300◦ ) that complements the
wedge; together they resemble a keyhole. This model was designed in a mask of
141 × 141 pixels, as shown in Fig. 4, where the keyhole has a wedge length of
60 pixels and the circle has a radius of 15 pixels. This design allows the keyhole
model to rotate 180◦ within the mask.
114 B. Taboada et al.
Deformation Index (DI) is a well-used dimensionless value for expressing the degree
of RBCs deformation and is defined as:
Lmaj or − Lminor
DI = (7)
Lmaj or + Lminor
where Lmaj or and Lminor are the major and minor axis lengths of a RBC. The DI
value is between 0 and 1, i. e., 0 means a RBC with a shape close to a circle and the
higher value means a more deformed shape such an elongated ellipse.
For the selected RBC (see Fig. 5), the proposed method was able to track automati-
cally the cell through the hyperbolic microchannel. Figure 6 shows the RBC trajectory
obtained by the proposed method. The RBC trajectory has a linear behavior mainly
due to its location in the middle of hyperbolic microchannel.
By using the proposed image analysis method we have also calculated automati-
cally the deformation index (DI) of the selected RBC flowing along the microchannel.
Detailed information about the DI calculation can be found elsewhere [28].
From Fig. 7 it is possible to observe that the proposed method is able to calculate
automatically the DI of the selected RBC. Although the DI results are extremely
oscillatory, overall the results show that the DI tends to decrease as the RBC leaves
the hyperbolic contraction. This result corroborates recent studies performed by
Yaginuma et al. [28] and Faustino et al. [6] where they have used a manual method to
calculate the DI. Additionally, the proposed method was tested to track a RBC treated
with diamide (0.02 %) throughout a microchannel with a hyperbolic contraction (see
Fig. 8).
For this particular case the selected RBC (see Fig. 9) is located near the wall of the
hyperbolic contraction and consequently its trajectory has a tendency flowing along
the wall of the contraction region. After the contraction this RBC has a tendency to
flow towards the wall of the sudden expansion region of the microchannel. This is
an expected behavior under a laminar regime.
Tracking Red Blood Cells Flowing through a Microchannel . . . 115
Fig. 6 Trajectory of a selected RBC tracked by the proposed image analysis method. The vertical
red line represents the exit of the hyperbolic contraction
Fig. 7 Deformation index (DI) of a selected RBC by using the keyhole model. The vertical red line
represents the exit of the hyperbolic contraction
Figure 10 shows the DI for a RBC exposed to 0.02 % diamide flowing through
a hyperbolic microchannel. For this particular case the RBC DI tends to increase
until the exit of the hyperbolic contraction. As soon as the RBC enters the sudden
expansion region, the RBC DI decreases.
Figure 11 shows clearly that for both RBCs the DI tends to reduce when the RBCs
enter the expansion region, which is consistent with other past results [6, 22, 28].
The results from Fig. 11 also show that the DI of a RBC exposed to 0.02 % diamide is
higher than the DI of the selected healthy RBC. This latter result needs to be analysed
with some caution as the exposed RBC is flowing close to the wall where the shear
rate is extremely high and may play a key role on the increase of the RBC DI. Further
studies are needed to clarify this phenomenon.
116 B. Taboada et al.
Fig. 8 RBCs exposed to 0.02 % diamide flowing through a microchannel having a hyperbolic
contraction
Fig. 9 Trajectory of a selected RBC exposed to 0.02 % diamide flowing through a hyperbolic
microchannel. The vertical red line represents the exit of the hyperbolic contraction
The present study has tested an image analysis technique to track RBCs flowing
through a microchannel with a hyperbolic contraction. The proposed automatic
method is based on a keyhole model and its main purpose is to provide a rapid
and accurate way to obtain automatically multiple RBC trajectories and deformabil-
ity data. The results have shown that the proposed automatic method was able not
only to track both healthy and exposed RBCs motion but also to measure RBCs DI
along the microchannel. The DI data have shown clearly that for both RBCs the DI
tends to reduce when the RBCs enter the microchannel expansion region. Hence, the
results have shown that the proposed method can be successfully integrated with a
Tracking Red Blood Cells Flowing through a Microchannel . . . 117
Fig. 10 DI of a selected RBC exposed to 0.02 % diamide flowing through a hyperbolic microchannel
tracked by the keyhole model. The vertical red line represents the exit of the hyperbolic contraction
high-speed microscopy system and used as a fast way to obtain RBC measurements.
Additionally, by reducing the time consuming tasks and errors from the users, this
method will provide a powerful way to obtain automatically multiple RBC trajecto-
ries and DIs specially when compared with the manual tracking methods often used
in blood microflow studies.
The algorithm takes advantage of spatial information to overcome inherent prob-
lems of conventional optical flow algorithms, which are the handling of untextured
regions and the estimation of correct flow vectors near motion discontinuities. The as-
signment of motion to regions allows the elimination of optical flow errors originated
by noise. Detailed studies with different optical conditions need to be performed in
the near future as the optics and illumination source strongly affects the quality of the
images. Moreover, the application of the proposed method to other more complex
flows are also worth studying in the near future.
118 B. Taboada et al.
References
19. Pinho D, Yaginuma T, Lima R (2013) A microfluidic device for partial cell separation and
deformability assessment. BioChip J 7:367–374
20. Pinho D, Gayubo F, Pereira AI, Lima R (2013) A comparison between a manual and automatic
method to characterize red blood cell trajectories. Int J Numer Meth Biomed Eng 29(9):977–987
21. Reyes-Aldasoro CC, Akerman S, Tozer G (2008) Measuring the velocity of fluorescently
labelled red blood cells with a keyhole tracking algorithm. J Microsc 229(1):162–173
22. Rodrigues R, Faustino V, Pinto E, Pinho D, Lima R (2014) Red blood cells deformabil-
ity index assessment in a hyperbolic microchannel: the diamide and glutaraldehyde effect.
WebmedCentralplus Biomedical Engineering. 1: WMCPLS00253
23. Sbalzarini IF, Koumoutsakos P (2005) Feature point tracking and trajectory analysis for video
imaging in cell biology. J Struct Bio 151(2):182–195
24. Smith MB, Karatekin E, Gohlke A, Mizuno H, Watanabe N, Vavylonis D (2011) Interactive,
computer-assisted tracking of speckle trajectories in fluorescence microscopy: application to
actin polymerization and membrane fusion. Biophys J 101:1794–1804
25. Tomasi C, Manduchi R (1998) Bilateral filtering for gray and color images. International
Conference on Computer Vision, pp 839–846
26. Vincent L, Soille P (1991) Watersheds in digital spaces: an efficient algorithm based on
immersion simulations. IEEE PAMI 13(6):583–598
27. Weiss Y (1997) Smoothness in layers: motion segmentation using nonparametric mixture esti-
mation. Watersheds in digital spaces: An efficient algorithm based on immersion simulations,
Int Conf on Computer Vision and Pattern Recognition, pp 520–527
28. Yaginuma T, Oliveira MS, Lima R, Ishikawa T, Yamaguchi T (2013) Human red blood
cell behavior under homogeneous extensional flow in a hyperbolic-shaped microchannel.
Biomicrofluidics 7:54110
A 3D Computed Tomography Based Tool
for Orthopedic Surgery Planning
Abstract The preparation of a plan is essential for a surgery to take place in the
best way possible and also for shortening patient’s recovery times. In the orthopedic
case, planning has an accentuated significance due to the close relation between
the degree of success of the surgery and the patient recovering time. It is important
that surgeons are provided with tools that help them in the planning task, in order
to make it more reliable and less time consuming. In this paper, we present a 3D
Computed Tomography based solution and its implementation as an OsiriX plugin
for orthopedic surgery planning. With the developed plugin, the surgeon is able to
manipulate a three-dimensional isosurface rendered from the selected imaging study
(a CT scan). It is possible to add digital representations of physical implants (surgical
templates), in order to evaluate the feasibility of a plan. These templates are STL files
generated from CAD models. There is also the feature to extract new isosurfaces of
different voxel values and slice the final 3D model according to a predefined plane,
enabling a 2D analysis of the planned solution. Finally, we discuss how the proposed
application assists the surgeon in the planning process in an alternative way, where
it is possible to three-dimensionally analyze the impact of a surgical intervention on
the patient.
1 Introduction
The surgery’s success is intimately related with its planning. The pre-operative plan-
ning consists in an evaluation supported by the clinical information and the patient’s
studies to establish a surgical procedure suitable to it. During the planning process, a
group of steps are defined that increase the chances of a successful surgery, improv-
ing the communication between the surgeon and the other members of the surgery,
e. g., nurses, and anesthetist [21].
One of the difficulties that a surgeon faces is the need of visually precept the impact
that his planned surgery will have on the patient. In the case of orthopedic surgery, this
is even more important, since the analysis of implant’s placement (when necessary) is
only possible in real time (surgery). Moreover, it is important to annotate the location /
actual position of a pre-surgical implant, thereby reducing the risk inherent to any
surgical intervention. Allowing the patient situation to be previously analyzed in
detail and with greater time interval, the task of the orthopedic surgeon is facilitated,
because the pressure of the decision making during surgery is reduced. A surgeon
tends to become less formal when planning only based on his previous professional
experience. However, there is a need to become more effective and rigorous in
the planning of more complex surgeries [10, 20, 21]. Surgeons are continuously
searching to improve their performance, increasing their accuracy levels.
One of the current trends are the three dimension reconstructions which are used
in several sectors of activity, namely in healthcare, particularly in diagnostics [5,
19]. Currently, there are several companies that offer computer-assisted orthopedic
surgery (CAOS) solutions to help surgeons plan the surgical intervention. However,
some of them are still based on orthogonal X-ray views, which remove the three
dimensionality of the tissues (e.g. organs, bones). Although others use Computed
Tomography (CT) scans, it is not possible for the surgeon to add templates that
represent the implants that will be used in surgery, not allowing a global view of
the planned solution. Some of these tools are Orthoview, TraumaCad, SurgiCase,
HipOp [16].
The Orthoview solution can be integrated with DICOM Picture Archiving and
Communication Systems (PACS) and uses two orthogonal X-ray images. After im-
porting these images, the user is able to add vector lines (always in 2D) representing
real physical implants [4]. Nonetheless, the surgeon is unable to visualize the resul-
tant model in a 3D space. Due to the impossibility to convert a 3D structure into 2D
without losing details, it is impossible to analyze the fracture (if there is one) with
detail [3]. After these steps the surgeon can generate a report to use at the surgery.
Another application called TraumaCad is available from VoyantHealth. This ap-
plication is fairly similar to Orthoview, although it uses CT scans instead of X-rays.
However, both applications use 2D implants models (templates). In the surgeon’s
perspective, these models are represented through vector lines which decrease the
perception of the trauma. In TraumaCad, the 3D visualization only allows the anal-
ysis of the study in other angles. The surgeon can add surgical templates to the CT
scan and position them by using a multi view approach for guidance. For this to be
accomplished, the software reslices the CT scan by using a multi-planar algorithm,
therefore enabling the user to analyze the same image in four different angles (Axial,
Coronal, Sagittal and Oblique). Although not being a real 3D model representation,
it is a major step when compared with the previously mentioned application [18].
SurgiCase is available from the Belgian company Materialise. It allows the pre-
operative planning with the help of an engineer, where the surgeon can create a 3D
A 3D Computed Tomography Based Tool for Orthopedic Surgery Planning 123
model of the planned resolution but this is achieved with a remote assistant’s help.
This assistant is an engineer from Materialise that will work with him in a cooperative
way, in order to develop a plan for the surgery. So, the surgeon has little autonomy
since he cannot add templates or test other procedures. Consequently, if he wants to
change anything in the plan, he must contact the assistant [1].
There is a free software application for pre-operative planning developed by Istituti
Ortopedici Rizzoli and CINECA, named HipOp. This application was conceived
for the total hip replacement. The system imports a CT scan which defines a 3D
anatomical space. The anatomical objects are represented through multiple views.
The implants can be loaded by the surgeon and they are represented by their 3D
model in the same space. However, it is not possible to move both components at
the same time independently [8].
Steen and Widegren [17] in association with Sectra Medical Systems AB in Swe-
den presented a prototype to analyze the fit of implants. The application shades the
implants depending on the distance between them and the bone. Aiming for the total
hip replacement, it offers the possibility to measure its critical distances. There are
some limitations towards its 3D environment. The 3D implant model and the 3D vol-
ume reconstruction of the CT study are rendered independently which causes failures
in the transparency. Beyond that, it is not possible to intersect both components, in
this case the implant model is always totally in front or behind the 3D CT volume.
Thus, it is difficult to predict surgery outcome for the patient.
Some other planning solutions use professional image editing tools, such asAdobe
Photoshop. By using tools typically design for digital image processing, the surgeons
create an image of the final result [15]. Some publications point out that the success
of the surgery increases using virtual reality techniques where surgeons can practice
their surgeries. This kind of technology is normally associated with the training
carried out by pilots and astronauts [10].
The main reason for the lack of software that merge a CT scan and templates
representing the implants to be used in surgery is due to the difficulty of dealing with
two different major graphic types representations: 3D bitmaps (voxels) and vector
images. On one hand, the CT scan, which is a series of volume based images with
same thickness and equally spaced (i.e. matrix of voxels). On the other hand, we
have a template provided by an orthopedic implants manufacturer, which is vector
type. This template is a virtual representation of a physical support and its structure
is a set of arranged triangles. Due to these structurally different image types, the
development of a solution that aggregates these two types together on a same plane
is somewhat challenging. In order to visualize and analyze all the angles of the
fracture, a surgeon needs to freely manipulate the templates on the patient’s imaging
studies. This can only be satisfactorily achieved with a 3D model. Since a CT scan is a
series of images with the same thickness and equally spaced, it allows us to create the
corresponding 3D model. This model, allows a greater viewing and understanding
of the fracture extent (in the case of bone tissue) [5, 19].
The three main rendering techniques that enable the creation of a CT scan 3D
model are Multi-planar Rendering (MPR), Volume Rendering (VR) and Surface
Rendering (SR). The MPR is usually used when only weak computational resources
124 J. Ribeiro et al.
are available because the processing required is lower. It is widely used whenever the
goal is to visualize the imaging study through different planes simultaneously (e.g.
Axial, Coronal, Sagittal and Oblique). The VR technique is used when the purpose
is to visualize the entire volume. Images are created by projecting rays in the volume
from a viewpoint (Ray Casting method) [14]. For each ray that intersects the volume
(one or more voxels), color and opacity values are calculated and then represented as
a pixel. This technique requires a huge amount of runtime calculations, which implies
more powerful machines. SR is the technique that was used in this work. It is, by
definition, the visualization of a 3D object from a set of isosurfaces. They are only
made by points with the same intensity which in this case refers to the attenuation
value for radiation using the Hounsfield’s scale. It is widely used whenever the goal
is to visualize structures close to each other (e.g. visualize the skull on a brain CT
scan) [22]. These isosurfaces can be constructed by contours that are extracted from
each slice in order to create a surface based on the volume’s contour or by voxels
where the isosurfaces are generated directly from voxels with a predefined value
from the Hounsfield’s scale. One of the algorithms used in this reconstruction is the
Marching Cubes (MC) [2, 9].
The surgeon using any of these techniques can extract more information about
the study because he is given the capability to analyze it in all possible angles. Yet,
he is unable to add templates to elaborate a plan for the surgery. In this study, we
present a solution for the problem of structural differences between images. With the
proposed solution, the surgeon can import a CT scan, generate a 3D surface from it
and add 3D surgical templates on top of it. This means that 3D vector graphics can
be merged with a 3D matrix (generated) surface. The present article is structured as
follows: first, 3D Modeling principles are introduced, then it is presented the features
and the operation of the proposed solution, finishing with conclusions and the future
steps to improve the application.
Each CT scan produces a volume of data that can be manipulated. In order to extract
the isosurfaces from the CT scan, the MC algorithm was chosen [9]. These isosurfaces
are constituted by a polygonal mesh which was computed from a scalar field, i.e., set
of voxels. The process starts with a given predefined Hounsfield Unit (HU) value from
the original imaging study. The voxels that meet this threshold requirement are then
used by the MC algorithm to construct the isosurface by marching iteratively with
an imaginary cube through the 3D grid where the voxels are projected. Constructing
the 3D model entails proceeding with the scalar field and evaluating each vertices
of the cube in order to use the best polygons to represent the original surface. These
vertices are then aggregated to form the final isosurface of a polygon mesh. A lookup
table (Fig. 1) is used by the MC algorithm in order to decide how to fuse the vertices
and resolve ambiguity when choosing the points which belong to the polygon mesh
A 3D Computed Tomography Based Tool for Orthopedic Surgery Planning 125
surface. Since the first algorithm implementation, this table has been refined over
the years to provide better results.
126 J. Ribeiro et al.
Each image (CT scan’s slice) has associated metadata arranged by tags provided by
the DICOM standard. This metadata contains information about the file, the patient
it belongs to as well as the study. Among this information, some tags characterize
the whole volume, like space between slices (tag 0018,0088), slice thickness (tag
0018,0050), slice location (tag 0020,1041) and number of slices (tag 0054,0081).
Voxels parameters, such as position and thickness are set using the information
provided by these tags. The MC algorithm then uses the voxels information to create
the 3D models.
Figure 2 shows an example of a CT scan’s slice, where its volumetric matrix
structure is illustrated. The 3D representations of the surgical template’s mod-
els are also polygon meshes, modeled after the original. Merging these models
in the same graphic’s scene along with the generated isosurfaces, will enable the
visual intersection of both and give the ability to rotate and position each model
independently.
Since the surgeon is more familiar with the 2D representation of each slice, the
application provides both ways of presenting the data. A MPR technique is used to
help the surgeon better visualize the axial, coronal and sagittal planes of each CT
scan’s slice (Fig. 3). When rendering in conjunction with the 3D generated model
in the same scene, the surgeon is able to pan each 2D plane and visualize each slice
representation independently in another viewer.
A 3D Computed Tomography Based Tool for Orthopedic Surgery Planning 127
The proposed solution, which we named OrthoMED, was developed in C++ and
Objective-C as a plugin for OsiriX [12, 13] using a set of open source libraries
written in the same programming language. They are OsiriX DCM Framework, used
128 J. Ribeiro et al.
to read the DICOM files and their metadata, ITK and GDCM, to parse and process
each DICOM file and VTK, to implement the MC algorithm and 3D visualization
[6, 7, 11]. OsiriX was chosen since it is a widely used viewer for medical purposes,
allowing a minimized learning curve associated with the usage of this new tool.
Figure 4 presents the application’s internal workflow to create an isosurface from
a CT scan. For OsiriX to detect OrthoMED as a plugin it is necessary to create
A 3D Computed Tomography Based Tool for Orthopedic Surgery Planning 129
During Step C the surgeon is able to visualize the patient’s CT scan with MPR,
allowing its analysis in three different planes. By moving the related plane, the
surgeon is now able to visualize each slice from the chosen plane. This is very
helpful because it allows the precise location of the fracture.
Figure 6 presents a screenshot of OrthoMED’s main window with its five sections:
a) Opens the window with the surgical templates database, where the surgeon can
choose which template he wants to add. These templates are the 3D digital
representation of a real template. They have the same shape and size;
b) Table with the surgical templates added, with their corresponding positions and
angles;
c) Section where the final 3D model slices are displayed. Scrolling up and down
shows the whole array;
d) Main section, where the surgeon handles the 3D isosurface as well as the
templates, always in a 3D space;
e) Exporting section of the report with the surgery plan.
The upper slider can be used to change the isosurface’s opacity, helping the
visualization of the internal interceptions (Fig. 7).
The two radio buttons on the side are used to select which structure the user
wants to move. If ‘Actor’ is selected, then the surgeon can select, move or rotate any
independent 3D object (i.e., any template added). If ‘Camera’ is selected, then the
point of view from all scene is moved or rotated. Fig. 8 presents the window with
all available templates. Here the user can select a template and check its data on the
A 3D Computed Tomography Based Tool for Orthopedic Surgery Planning 131
Fig. 9 OrthoMED’s main window with some templates on the generated isosurface
new isosurface added. In that way, the surgeon is now able to evaluate his plan and
how it will affect the surrounding tissues. This is quite useful for his analysis. This
isosurface and its value could be entered in the options window (Fig. 10).
After the planning process, the surgeon is able to export a report with all the
information needed, e.g., a table with the templates information and their position in
the 3D space, some screenshots taken from the final 3D model as well as the patient’s
specific information (Fig. 13).
134 J. Ribeiro et al.
4 Conclusion/Discussion
The main goal of the presented work is to create a solution which allows the inter-
operability between images in two different format types. The currently available
solutions have limitations, some of them do not include 3D modeling or when is
that available, the creation of the model is based on standard models or the inter-
vention of specialized technicians is required. The OrthoMED plugin enables the
interoperability between a CT scan 3D model and surgical templates representing
orthopedic physical implants. Since we are dealing with different types of image (i.e.
the CT study is a matrix of voxels and the surgical templates are vector graphics) it
was necessary to develop a method to join them in the same planes. These surgical
templates are STL files, created from CAD models of physical implants provided by
medical implants suppliers. The advantage in using this kind of files is due to its wide
use by the implants industry. Comparing the proposed solution with the described
commercial ones, OrthoMED presents an alternative approach, delegating the task
entirely to the surgeon. With this plugin, the surgeon is able to add templates and
handle their position always in a 3D space, which allows a constant evaluation of
the best positioning. He can also slice the final 3D model with templates on it. After
choosing the slicing plane (e.g. Axial, Sagittal, Coronal, Oblique), the surgeon can
evaluate his surgical solution in a 2D view that gives him more detailed informa-
tion. Finally he can export the surgery report plan. The developed solution becomes
advantageous for the surgeons. Thus, they can manipulate the generated 3D model,
composed by one or more isosurfaces, and they can add the implants templates from
A 3D Computed Tomography Based Tool for Orthopedic Surgery Planning 135
the database. Comparing this solution with others, OrthoMED brings a different,
enhanced and complementary solution to orthopedic surgical planning.
136 J. Ribeiro et al.
5 Future Work
The OrthoMED plugin can already be considered an advantageous tool for the
surgeons. However, this application can be improved whether some features were
added. It will be relevant to provide some image processing algorithms such as
image segmentation. When there are fragments of free bone tissue in the muscle of
the patient due to an injury, it is necessary for the surgeon to select and manipulate,
thus being able to use the implants for reconstruction of the affected area. Other
important feature is to highlight the intersection area, determining the points where
the intersections occur. Furthermore, implementing more 3D modeling algorithms,
the surgeons have the possibility to decide and choose the most suitable tool for the
planning of each case. Finally, it is important to improve the user interface and user
experience of the application, making it more intuitive and simpler to operate.
References
14. Roth SD (1982) Ray casting for modeling solids. Comput Gr Image Process 18(2):109–144.
http://dx.doi.org/10.1016/0146-664X(82)90169-1
15. Shiha A, Krettek C, Hankemeier S, Liodakis E, Kenawey M (2010) The use of a professional
graphics editing program for the preoperative planning in deformity correction surgery: a
technical note. Injury 41(6):660–664. doi:10.1016/j.injury.2009.10.051
16. Sikorski JM, Chauhan S (2003) Aspects of current management. J Bone Joint Surg 85(3):319–
323
17. Steen A, Widegren M (2013) 3D Visualization of Pre-operative Planning for Orthopedic
Surgery. In: Ropinski T, Unger J (eds) Proceedings of SIGRAD 2013, visual computing,
June 13–14. Linköping University Electronic Press, Sweden, pp 1–8
18. Steinberg EL, Shasha N, Menahem A, Dekel S (2010) Preoperative planning of total hip
replacement using the TraumaCad system. Arch Orthop Trauma Surg 130(12):1429–1432.
doi:10.1007/s00402-010-1046-y
19. Suero EM, Hüfner T, Stübig T, Krettek C, Citak M (2010) Use of a virtual 3D soft-
ware for planning of tibial plateau fracture reconstruction. Injury 41(6):589–591. doi:
10.1016/j.injury.2009.10.053
20. The B, Verdonschot N, van Horn JR, van Ooijen PMA, Diercks RL (2007) Digital versus
analogue preoperative planning of total hip arthroplasties: a randomized clinical trial of 210
total hip arthroplasties. J Arthroplast 22(6):866–870. doi: http://dx.doi.org/10.1016/j.arth.
2006.07.013
21. Wade RH, Kevu J, Doyle J (1998) Pre-operative planning in orthopaedics: a study of surgeons’
opinions. Injury 29(10):785–786
22. Wang H (2009) Three-dimensional medical CT image reconstruction. In: 2009 interna-
tional conference on measuring technology and mechatronics automation, IEEE, pp 548–551.
doi:10.1109/ICMTMA.2009.10
Preoperative Planning of Surgical Treatment
with the Use of 3D Visualization and Finite
Element Method
1 Introduction
Fig. 1 Left: The marked points on the hip joint border and the approximated plane are shown.
Right: Resulting position of the artificial hip joint in correspondence to the mirrored, healthy hip
part [16]
Fig. 2 Virtual planning of pelvis stabilization with the use of SQ PELVIS system. Left: Virtual
reduction and fixation of the fractured bone. Right: The direction and length of the screws [5]
Another approach aims to support the surgeon by providing them with templates
which facilitate technical aspects of carrying out the operation, for instance, a nav-
igation system (Fig. 3) which was used in the work of Gras et al. [13] to plan the
position of the stabilizing screws in pelvic ring injuries.
Another example can be provided by operative planning in orthognathic surgery
[10]. The standard planning is done on the basis of CT scanning (Fig. 4). However,
there are also special programmes, such as Mimics and 3-matic software (Materialise)
[23] for planning the corrections of the facial skeleton. Similar procedures supporting
treatment in orthognathic surgery were developed, among others, by: Cutting [8, 9],
Yasuda [30] and Altobelli [1].
142 W. Wolański et al.
Fig. 3 Sterile touch screen of the navigation system (Vector Vision, Brainlab) displaying standard
images (lateral view, inlet, outlet) and an auto-pilot view. Red bar: virtually planned SI-screw;
yellow line: prospective path of the navigated guide wire (trajectory), green bull’s-eye: reflecting
the exact positioning of navigated instruments to achieve the planned screw position [13]
Fig. 4 Example of
computer-aided surgery
(CAS) of a patient with
Crouzon syndrome.
Simulation and result of Le
Fort II distraction before
surgery and after CT planning
[10]
In more advanced research new devices are being developed with the purpose
of supporting the doctor during the surgical procedure, for example: a neck jig
device presented in the work of Raaijmaakers et al.[25]. The Surface Replacement
Arthroplasty jig was designed as a slightly more-than-hemispherical cage to fit the
anterior part of the femoral head. The cage is connected to an anterior neck support.
Four knifes are attached on the central arch of the cage. A drill guide cylinder is
attached to the cage, thus allowing guide wire positioning as pre-operatively planned
(Fig. 5).
Preoperative Planning of Surgical Treatment . . . 143
Apart from planning a procedure for an individual patient, new methods of en-
gineering support make it possible to choose optimal parameters for the operation.
An example can be provided by the application of a method of finite elements in
the biomechanical analysis of the system after simulated virtual treatment. In the re-
search of Jiang et al. [19] planning of corrective incisions (scaphocephaly) was done
as well as biomechanical analysis of the obtained models was performed (Fig. 6).
Thanks to that, it is possible to choose the most favourable variant of the operation.
In addition to that, the research of Szarek et al. [27] analysed the level of stress in the
hip joint endoprosthesis resulting from variable loads during human motor activity.
Analysing the influence of preoperative planning and 3D virtual visualization of
the examined cases on the quality of treatment, it can be stated that the engineering
support provides assistance for the vast majority of doctors in the scope of complex
assessment of the phenomenon and preparation for a real-life procedure. The con-
ducted research has proven that [18] both the planning time and labour intensity are
reduced by around 30 % if 3D models are available. In addition to that, the precision
(accuracy) of predicting the size of the resection area (e.g. in the case of tumours)
increases by about 20 % (Fig. 7). Moreover, according to subjective feelings of the
examined doctors their confidence in the established diagnosis has risen by around
20 % in the case of 3D planning.
Surgical treatment within the skeletal system is always the last resort in the case
when other preventive methods have failed. For instance, when the application
of orthopaedic equipment has not brought the desirable effects. On the basis of
several-year tests carried out in co-operation with surgeons a general scheme of
engineering support procedure has been developed for pre-operative planning of
surgical operations (Fig. 8).
In the first phase the attending physician gives a diagnosis of the disease. Usually,
within the framework of a regular diagnosis a CT or MRI examination is done,
thanks to which 2D images of individual cross sections are obtained. On the grounds
of the Hounsfield scale, in the programme Mimics® Materialise [31] it is possible to
segment the tissues of interest (e.g. bones, cartilages) and then generate a 3D model.
144
Fig. 6 Distribution of stress (top) and displacements (bottom) in the skull vault before the surgery and in five variants of corrective incisions [19]
W. Wolański et al.
Preoperative Planning of Surgical Treatment . . . 145
Fig. 7 Comparison between viewing 2D CT images and 3D displays of thoracic cavities in determin-
ing the resectability of lung cancer. Left: Planning time. Right: Accuracy of predicted resectability
[18]
In the next stage, on the basis of the constructed geometrical model it is possible to
carry out detailed morphological measurements in order to determine the type of the
defect and degree of the disease progression.
On these grounds the patient is qualified for the surgical procedure by the doctor.
Additionally, the programme 3-matic® Materialise [32] makes it possible to do the
analysis of bone thickness, which is very helpful in the selection of surgical tools for
the operation. Mimics programme enables all sorts of modifications of the obtained
model as well as simulations of the planned operation. In consultation with the doctor,
bone incisions and displacements are simulated with the purpose of obtaining the
desirable treatment effects. After the correction has been planned, it is advisable to
conduct the morphometric analysis once again in order to check the values of indexes
which were used in the preoperative evaluation.
Next, the model is prepared to be introduced into computing environment. Dis-
cretization of the model, i.e. the creation of the volumetric mesh and its optimization
146 W. Wolański et al.
is done in the 3-matic programme. Then, the model is exported toAnsys Workbench®
environment [33] in order to carry out biomechanical analyses. The primary objec-
tive of FEM analysis is to check whether during bone modelling or implanting no
fracture or damage to the structure occurs. It is particularly important while planning
endoscopic surgical procedures due to the fact that any unforeseen fracture of the
bones makes it necessary to stop the microinvasive surgical treatment and complete
the operation with the use of classic methods. A numerical simulation provides thus
an individual risk assessment of the surgical procedure and may be a decisive factor
while selecting a variant of the operation.
Finally, by comparing the results of the performed analyses it is possible to make
the most advantageous and the safest choice of the operative variant. It must be
emphasized that preoperative planning is an absorbing and time-consuming process,
and therefore, not suitable for all kinds of operations. Its application is justified
and brings many notable benefits in the case of particularly complicated surgical
procedures.
In the further part this paper presents examples of procedures of engineering
support for preoperative planning in the cases of surgical corrections of head shape
in infants, corrections of chest deformities as well as spine stabilization.
Fig. 10 Three-dimensional
model of skull with
trigonocephaly with marked
anatomic points
One set the values of indexes determining an incorrect shape of the skull in
trigonocephaly. Those values were next compared to the standard values of children
with a regular skull shape at the age of 0–2 months old in order to determine the
way the correction should be performed. The results have been presented in Table 1.
The measurements showed that the frontal angle was too acute but other indexes
were within regular limits. No hypertelorism was detected, therefore the correction
was going to be made only on the frontal bone without any interference in the orbital
cavities. In this way it was determined that it was possible to carry out a microinvasive
procedure. The main decisive factors at that stage of planning were as follows: the
patient’s age, bone thickness (within 5 mm in the sites of potential incisions) as well
as a correct distance between orbital cavities and a lack of deformities within facial
skeleton. The doctor made a decision that the correction of the skull shape was going
to consist in the cutting of the fused metopic suture and parting of the bones in order
to obtain an optimum shape of the head.
The virtual correction was performed in two stages. In the first one the frontal bone
was separated from the rest of the skull alongside frontoparietal sutures. The lower
limit was provided by the nasal bone and frontozygomatic sutures. The incisions of
the frontal bone were planned in Mimics environment in the way guaranteeing an
optimum forehead shape (Fig. 11). Dislocations, in fact rotations, of the fragments
of the bones were done manually taking into consideration the doctor’s suggestions
and actual conditions of the operation. Point nasion (n) on the nasal bone was defined
to be a fixed point according to which the bones were parted to make the head shape
round.
148 W. Wolański et al.
Having obtained an optimal visual effect of the correction, one measured displace-
ment of several bone points which in the further planning phase were introduced into
Ansys environment as boundary conditions. In the end, the average displacement of
the bones from their initial position equalled 11 mm. The value of the frontal bone
angle was checked once again in order to evaluate if it was now close to standard. Af-
ter the procedure the angle was increased up to 132.7◦ , which produced a satisfactory
effect of the correction (Fig. 12).
Preoperative Planning of Surgical Treatment . . . 149
Fig. 12 Measurements of
skull after virtual correction.
Left: Displacement necessary
to correct the forehead shape.
Right: Forehead angle before
(121,8◦ ) and after (132,7◦ )
the surgery
Fig. 13 Applied variants of frontal bone incisions being 30, 50 and 70 mm long
The analysis determined the total deformation, reduced strain and stresses with
the use of von Mises hypothesis. An abridged list of simplification assumptions has
been presented in Table 2.
Three variants of incisions were prepared for the analysis (Fig. 13) being 30, 50
and 70 mm long respectively. Simulation was performed in order to ensure that no
damage to the bones occurs during the medical procedure due to the deformation
done by a certain value.
After the model geometry had been introduced, a rigid fixation was placed in
the site of the frontal bone (in the vicinity of nasion point)—(Fig. 14). Moreover,
the displacements in axes x and z were completely restricted in the sites of the
Preoperative Planning of Surgical Treatment . . . 151
Fig. 14 Boundary conditions of trigonocephaly correction. Left: Fixation of model. Right: Points
of application of displacement equal to 11 mm in y axis
frontoparietal suture. Also, the dislocation of these sites in y axis was partially
limited (a maximum value of 6 mm was determined on the basis of the simulation
results in Mimics software). It was calculated that displacement of upper parts of the
incised halves of the frontal bone necessary to obtain an optimum shape of the skull
equals 11 mm in the direction of y axis.
The results of numerical analysis have been presented in Table 3. The distribution
of displacements has been taken into consideration as well as the maps of stresses
occurring in the frontal bone in different incision variants. The case subject to ex-
amination is an example of one of the simplest methods of treatment in the context
of incision technique. It results mainly from the fact that this is a microinvasive
procedure, therefore the possibilities to use different incisions are small due to the
limitations connected with the operative field and the applied surgical tools. Further
part of this work presents an example of a classic surgical procedure of trigonocephaly
correction.
Analysing the results of the simulation it was stated that the distribution of bones
displacement is very similar in all three variants. Considerable differences may be
noticed in maximum values of stresses occurring at the time of bone modelling. In
the first variant they are the smallest and equal max 43 MPa, in the second variant
54 MPa. In both cases the stresses do not exceed permissible values, therefore it can
be assumed that no bone damage will occur during the surgical procedure. In the
third variant, maximum stresses equal 62.5 MPa. This variant was rejected due to the
fact that too deep incision is always more risky as to the fracture of the bone in the
vicinity of nasion point. At the same time, the visual effect does not differ much from
the result obtained in variant 2. Variant 1 was also rejected as in this case the incision
could prove too small to enable further correct growth of the skull. Finally, on the
grounds of the quantitative and qualitative assessment variant 2 of the correction was
adopted as optimal.
152 W. Wolański et al.
Fig. 16 Three-dimensional
geometrical models of human
pigeon chest elements
In the next stage, a geometrical model of the chest was started to be developed.
On the basis of the patient’s CT images using Mimics software a three-dimensional
model of individual structures of the chest was developed. The process of building a
geometrical model consisted in generating and editing masks of individual elements.
The creation of a mask in Mimics environment consists in segmentation by means
of partition of homogeneous areas as to grey shades in a previously defined search
area. In the process of segmentation of the pigeon chest model the following items
were distinguished (Fig. 16):
• 22 Bone ribs,
• 11 Thoracic vertebrae,
• 10 Intervertebral discs,
• 14 Cartilage ribs
• Sternum.
The algorithm of creating the above-mentioned elements was very similar in each
case except for intervertebral discs and cartilage ribs which required more correction
with the use of masks editing tools due to a heterogeneous grey shade.
Next, one began planning of the correction treatment of the defect by means
of Ravitch’s method. It consisted in resection of the elements of the cartilage ribs
154 W. Wolański et al.
and sternum as well as their adequate rotation and repositioning. Displacement and
rotation of the fragments of the bones were set manually taking into consideration the
doctor’s suggestions and actual circumstances of the surgical procedure. A correct
position of the sternum was obtained by repositioning it in the direction towards the
spine by about 30 mm (Fig. 17). At the same time, by removing the fragments of the
cartilage ribs and by moving them one decreased the inclination angle of the sternum
to the median plane from 24.26◦ to 13.05.
After obtaining an optimum visual effect of the correction, the dislocations of
several bone points were measured. In the further phase of planning they were intro-
duced into Ansys environment as boundary conditions. All elements of the chest were
digitalized by tetrahedral elements Solid72 in 3-matic programme. The volumetric
mesh was created and optimized (Fig. 18).
The numerical analysis of the pigeon chest model was performed with the use of
Ansys Workbench environment. The boundary conditions of the analysis have been
presented in Table 4.
With a view to carrying out simplification of the numerical model one omitted
the impact of internal organs as well as pressure inside the thorax. The total number
of finite elements amounted to 299 974, which were connected in 550 482 nodes.
Contacts between individual elements were done automatically in the first stage,
then their surfaces were corrected manually. The surfaces were linked by means of a
‘Bonded’ type connection, which does not allow the elements to dislocate in relation
to each other. The total number of all connections amounted to 70. The model was
fixed by means of taking away the degree of freedom in the nodes on the upper
surface of the first thoracic vertebra and the lower surface of 11-th vertebra.
Fig. 19 Results of numerical simulation of correction. Left: Map of deformation. Right: Map of
equivalent stresses
During the numerical analysis, reduced strain and stresses were determined with
the use of von Mises hypothesis. The results of the numerical analysis have been
shown in Fig. 19.
On the basis of the performed numerical calculations the index of the model
stiffness was determined. It was defined according to the below formula (1) which
equalled 2.86 for the analyzed case.
& '
F N
k= (1)
d mm
While analyzing the findings of the simulation it was stated that the distribution
of maximum stresses in the sternum (6.92 MPa), cartilage ribs (8.39 MPa) and bone
ribs (36.73 MPa) did not indicate any possibility of damage occurring during the
procedure of the pigeon chest correction by Ravitch’s method as they were all lower
than 87.0 MPa, which was adopted as a permissible value. The obtained maximum
values of main deformations 0.0012 in bone elements of the chest were also below
the values suggesting the bone destruction [24].
Fig. 21 Model of spine segment before and after stabilization with the Coflex implant
Precise positioning of the implant in the spine as well as the fact that the surgical
procedure is microinvasive play an essential role during the implant insertion. The
performed simulation makes it possible to verify the construction by checking com-
patibility of main stabilization measurements with individual anatomical features of
the patient. In this case it was used an implant by Coflex company which is used in
clinical practice for lumbar spine stabilization with posterior intervertebral systems
(Fig. 21).
In the case of application of ready-made implants available on the market, the
preoperative planning enables it to choose from the catalogued series of types the best
kind and type of stabilization matching an individual patient. Geometrical models
of L4 and L5 vertebrae were modified in the site of the implant positioning, exactly
like during a surgical procedure. Material properties were attributed to stabilization.
They were determined as follows: titanium alloy Ti-6Al-4 V, i.e. Young modulus
equal to 115 GPa and Poisson’s ratio equal to 0.3 [23].
For the sake of subsequent strength analysis the model was digitalized with the
use of a finite element method MES. Each element of the performed model had a
mesh created by means of tetragonal elements of an average edge distance of 3 mm.
Then, material properties were determined for individual elements. The programme
Mimics by the firm Materialise, which was used in this research, makes it possible to
define proportions and distribution of the material within the object. The programme
enables it to attribute to each spatial element as many properties as defined by the
designer. Cortical bone tissue is different from compact bone not only in its structure
but also in mechanical properties. That is why cortical bone and spongy bone were
distinguished within each vertebra. In order to segment the spongy bone tissue the
functions available in Mimics software were applied. To achieve that it was necessary
to make the mask areas of spongy bone on the vertebra contours in all cross sections
The masks served the purpose of providing the areas covered by defined masks
with spongy bone material properties. This tool of the programme was used while
defining material properties of other anatomical structures on the basis of the already
158 W. Wolański et al.
created masks. The values of the determined properties have been placed in Table 5,
whereas the graphic distribution of tissues has been presented in Fig. 22.
A key factor taken into consideration in the selection of a stabilization type for
the lumbar spine is the impact it will exert on the stabilized section. Numerical
simulations of the models of the physiological section of the human lumbar spine
as well as numerical simulations of the posterior interspinal stabilization make it
possible to analyse the degree of load on the spine and the influence of implantation
on spinal properties. The analysis of the spinal load and of the impact of the conducted
implantation on the lumbar spine properties was made in ANSYS programme. In order
to achieve that, the executed models, both physiological one and stabilized one, were
imported to that programme. Calculations were carried out with the set boundary
conditions equal to loads occurring in a natural standing position. The load amounting
to 1000 N was set on the upper surface of vertebra L3 while fixation was set on the
lower surface of vertebra L5/S1 (Fig. 23).
While analyzing the obtained results of compression it was noticed that resultant
values of displacements are higher for the physiological spine model than in the
case of the implant model. Their maximum values equal respectively 0.45 mm for
the model without implant, and 0.22 mm for the model with implant. The biggest
reduction of stresses determined according to Huber-Mises hypothesis occurred in
the bone tissue, at the vertebral pedicle. The values did not exceed 36 MPa for
the physiological model. However, for the spine-and-implant system the highest
Preoperative Planning of Surgical Treatment . . . 159
intensity of stresses occurred in the very implant amounting to 37 MPa. In the spine-
and-implant system one observed lower values of strain than in the physiological
model. Maximum values of strain equalled respectively 0.016 and 0.008 (Table 6).
The conducted analyses make it possible to state that the implanted stabilization
has improved the spine stability. After the implant positioning the values of resultant
displacements decreased during strength simulations. It resulted from the fact that
the degenerated movable segment was stabilized with the use of the implant as well
as due to the material properties forming the implant. Also it is significant that after
stabilization the cross-section areas of spinal nerves increase (Fig. 24), therefore it
can be concluded that the patient’s pain will decrease.
The performed research shows how medical and biomechanical interpretation of
numerical simulations can be used to plan neurosurgical procedure of spine stabi-
lization. Biomechanical analyses of strength and forces can ascertain the durability
and stability of the implant connection with the stabilized section of spine and also
determine places that require reconstruction of bone. With finite element method,
surgical prediction can be made to guide surgeons to make the decision of improving
surgical treatment. Virtual planning of the treatment is helpful for the neurosurgeons,
because it increases the quality of treatment and safety during the operation.
3 Conclusions
Before
stabilization
After
stabilization
Fig. 24 Cross-section areas of the spinal nerves before and after stabilization
Preoperative Planning of Surgical Treatment . . . 161
tation of new innovative ideas and cutting edge technology into operative technique
aiming at the application of microinvasive procedures. There are several CAD pro-
grammes which could become a perfect biomechanical tool complementing medical
knowledge. Such software makes it possible to do, among other things, mechanical
analyses as well as to plan surgical procedures. Preoperative planning may be sup-
plemented by an additional procedure of reconstructing anatomical structures and
performing a virtual medical operation (simulation) in the computer system. The
models obtained in such a way may also serve the purpose of engineering analysis
which aims at characterizing the interaction of tissues in time as well as assessing the
risk of bone damage or fracture during a surgical procedure. The developed method
of engineering support makes it easier for the doctors to make right decisions at each
stage of treatment. This kind of support may have a significant importance for young
inexperienced surgeons or medical students. However, even experienced doctors may
practise each phase of the surgical procedure virtually, which considerably shortens
the duration of the operation. The application of a complex planning procedure is
simply indispensable in the case of complicated, multi-phase surgical procedures. Its
major advantage is an individual approach to each patient. The examples of planning
surgical procedures show that engineering support increases patients’ safety during
the operation and improves the quality of treatment. Interdisciplinary collaboration
between doctors and engineers brings desirable benefits and results in well-performed
operations.
References
1. Altobelli DE, Kikinis R, Mulliken JB, Cline H, Lorensen W, Jolesz F (1993) Computed-assisted
three-dimensional planning in craniofacial surgery. Plast Reconstr Surg 92:576–585
2. Barone CM, Jimenez DF (2004) Endoscopic approach to coronal craniosynostosis. Clin Plast
Surg 31:415–422
3. Baumer TG, Powell BJ, Fenton TW, Haut RC (2009) Age dependent mechanical properties of
the infant porcine parietal bone and a correlation to the human. J Biomech Eng 131(11):111–116
4. Bruchin R, Stock UA, Drucker JP, Zhari T, Wippermann J, Albes JM, Himtze D, Eckardt S,
Konke C, Wahlers T (2005) Numerical simulation techniques to study the structural response
of the human chest following median sternotomy. Ann Thorac Surg 80:623–630
5. Cimerman M, Kristan A (2007) Preoperative planning in pelvic and acetabular surgery: the
value of advanced computerized planning modules. Injury 38(4):442–449
6. Coats B, Margulies SS (2006) Material properties of human infant skull and suture at highrates.
J Neurotrauma 23:1222–1232
7. Couper ZS, Albermani FG (2005) Biomechanics of shaken baby syndrome: physical testing
and numerical modeling. In: Deeks Hao (eds) Developments in mechanics of structures and
materials. Taylor Francis Group, London, pp 213–218
8. Cutting C, Bookstein Fl, Grayson B, Fellingham L, Mccarthy JG (1986) Three dimensional
computer-assisted design of craniofacial surgical procedures: optimization and interaction with
cephalometric and CT-basedmodels. Plast Reconstr Surg 77:877–885
9. Cutting C, Grayson B, Bookstein F, Fellingham L, Mccarthy JG (1986) Computer-aided
planning and evaluation of facial and orthognathic surgery. Clin Plast Surg 13:449–462
10. Ehmer U, Joos U, Flieger S, Wiechmann D (2012) The University Münster model surgery
system for Orthognathic surgery. Part I—the idea behind. Head Face Med 8:14
162 W. Wolański et al.
11. Furusu K, Watanabe I, Kato Ch, Miki K, Hasegawa J (2001) Fundamental study of side impast
analysis using the finite element model of the human thorax. JSAE 22:195–199
12. Gzik M, Wolański W, Kawlewska E, Larysz D, Kawlewski K (2011) Modeling and simulation
of trigonocephaly correction with use of finite elements method. Proceedings of the III ECCO-
MAS thematic conference on computational vision and medical image processing: VipIMAGE,
Portugal, pp 47–50
13. Gras F, Marintschev I, Wilharm A, Klos K, Mückley T, Hofmann G (2010) O: 2D-
fluoroscopic navigated percutaneous screw fixation of pelvic ring injuries—a case series. BMC
Musculoskelet Disord 11:153
14. Gzik M, Wolański W, Tejszerska D, Gzik-Zroska B, Koźlak M, Larysz D (2009) Interdisci-
plinary researches supporting neurosurgical correction of children head deformation. Model
Optim Phys Syst 8:49–54
15. Gzik-Zroska B, Wolański W, Gzik M (2013) Engineering-aided treatment of chest deformities
to improve the process of breathing. Int J Numer Method Biomed Eng 29:926–937
16. Handels H, Ehrhardt J, Plötz W, Pöppl SJ (2001) Three-dimensional planning and simulation
of hip operations and computer-assisted construction of endoprostheses in bone tumor surgery.
Comput Aided Surg 6(2):65–76 (Wiley Online Library)
17. Handels H, Ehrhardt J, Plötz W, Pöppl SJ (2000) Virtual planning of hip operations and
individual adaption of endoprostheses in orthopaedic surgery. Int J Med Inform 58–59:21–28
18. Hu Y, Malthaner RA (2007) The feasibility of three-dimensional displays of the thorax for
preoperative planning in the surgical treatment of lung cancer. Eur J Cardiothorac Surg 31:506–
511
19. Jiang X, You J, Wang N, Shen Z, Li J (2010) Skull mechanics study of PI procedure plan for
craniosynostosis correction based on finite element method, Proceedings of 4th International
Conference on Bioinformatics and Biomedical Engineering (iCBBE)
20. Jimenez DF, Barone CM, Cartwright CC et al (2002) Early management of craniosynostosis
using endoscopic-assisted strip craniectomies and cranial orthotic molding therapy. Pediatrics
110:97–104
21. Larysz D, Wolański W, Gzik M, Kawlewska E (2011) Virtual planning of the surgical treatment
of baby skull shape correction. Model Optim Phys Syst 10:49–52
22. Larysz D, Wolański W, Kawlewska E, Mandera M, Gzik M (2012) Biomechanical aspects
of preoperative planning of skull correction in children with craniosynostosis. Acta Bioeng
Biomech 14:19–26
23. Marchetti C, Bianchi A, Muyldermans L, Di Martino M, Lancellotti L, Sarti A (2011) Validation
of new soft tissue software in orthognathic surgery planning. Int J Oral Maxillofac Surg 40:26–
32
24. Nackenhorst U (1997) Numerical simulation of stress stimulated bone remodeling. Technische
Mech 17(1):31–40
25. Raaijmaakers M, Gelaude F, de Smedt K, Clijmans T, Dille J, Mulier M (2010) A custom-made
guide-wire positioning device for hip surface replacement arthroplasty: description and first
results. BMC Musculoskelet Disord 11:161
26. Sacha E, Tejszerska D, Larysz D, Gzik M, Wolański W (2010) Computer method in cran-
iosynostosis. Proceedings of 12th International Scientific Conference “Applied Mechanics”,
Technical University of Liberec, pp 111–115
27. Szarek A, Stradomski G, Włodarski J (2012) The analysis of hip joint prosthesis head mi-
crostructure changes during variable stress state as a result of human motor activity. Mater Sci
Forum 706–709:600–605
28. Tejszerska D, Wolański W, Larysz D, Gzik M, Sacha E (2011) Morphological analysis of the
skull shape in craniosynostosis. Acta Bioeng Biomech 13(1):35–40
29. Wolański W, Larysz D, Gzik M, Kawlewska E (2013) Modeling and biomechanical analysis of
craniosynostosis correction with the use of finite element method. Int J Numer Method Biomed
Eng 29:916–925
30. Yasuda T, Hashimoto Y, Yokoi S, Toriwaki JI (1990) Computer system for craniofacial surgical
planning based on CT images. IEEE Trans MedImaging 9:270–280
Preoperative Planning of Surgical Treatment . . . 163
31. Materialise software & services for biomedical engineering: mimics software. http://
biomedical.materialise.com/mimics. Accessed 13 March 2014
32. Materialise software & services for biomedical engineering: 3-matic software. http://
biomedical.materialise.com/3-metic. Accessed 13 March 2014
33. ANSYS software. http://www.ansys.com/. Accessed 13 March 2014
Pretreatment and Reconstruction of
Three-dimensional Images Applied in a Locking
Reconstruction Plate for a Structural Analysis
with FEA
Abstract The concept about fracture stabilization by compression and the use of
locking plates have been the interest of many studies. An understanding of the bone-
plate construct stability is important for clinical use. Differences in plate geometries
and materials have influenced in the results obtained. Thus, the present study evalu-
ated the acquisition of images and geometric reconstruction seeking a more detailed
study of its structure through the application of numerical methods such as finite el-
ements. A seven-hole locking reconstruction plate manufactured with stainless steel
was used as material model. Acquisition of geometric information was obtained from
the profile projection method for simplified shapes such as curves and external rays.
The micro CT (computed tomography) worked as additional information on details
of the structure as volume and validation of data obtained from the projection profile.
1 Introduction
There are many different sizes and shapes of bone plates available for fracture im-
mobilization [15]. The dynamic compression plate (DCP) has oval holes to allow
axial compression of the fracture site during screw tightening [6] and the construct
stability requires plate-to-bone compression [15, 3]. Despite being widely used, the
DCP may present disadvantages such as cortical loss under the plate, delayed union,
and refracture after plate removal [15, 3, 6].
The biological concept of the internal fixation of the fracture stimulated the devel-
opment of a new approach to the plate fixation [9, 15]. Different from the conventional
plate, in a locked plate the screw is locked into the plate and the forces are transferred
from the bone to the plate through the threaded connection [15, 7, 6]. In addition, the
plate compression on the bone is not required with this system and bone blood supply
is preserved, but the stiffness of the construct determines the fracture stability [15].
The locked plate was initially developed to stabilize fractures with poor bone
quality, such as osteoporosis, osteomalacia or comminution [11], but its use has been
widespread [7]. Several modifications have been performed in locked plate designs
[7, 3]. However, concerns about the adequate use of these plates have been raised
[11]. Pre-operative planning and care with biomechanical principles are important
to locked compression plate be successful [12]. Factors such as number, orientation
angle, and monocortical or bicortical placement of the locked screws may influence
fixation strength [13, 10, 4, 3, 8]. Furthermore, the use inadequate of locked screws
can produce a more stiff construct that may compromise fracture healing [11].
Thus, an understanding of the bone-plate construct stability is important for clini-
cal use. Biomechanical studies performed by static and dynamical tests are necessary
to determine the construct stiffness, strength and failure mode of the plating con-
figurations [13, 10, 4, 2, 8]. Mechanical properties may also be evaluated by using
numerical models such as Finite Element Analysis [13, 14]. A first step in a study is
to analyze the geometry of plates already manufactured.
1.2 Geometry
Computer Aidded Design (CAD) is the ideal type of software for this. With CAD
software is possible to model the geometry from measurement data already obtained
and the final geometry is the base for Finite Element Analysis.
When designing a structure, know all details about the actual problem is really
important. A first analysis is made to create a model able to represent the current
structure. This model provides all balance equations from mathematical relationships
known by mechanical studies. These equations translate the physics behavior of the
structure. The mathematical manipulation provides enough data to study internal
strength, showing all displacements, deformations and stresses. These data need
analyses, comparing the results with what was expected in the proposed model.
This procedure is valid for any beginning of the project, as well as its development,
but when geometries are more complex (when compared with simple problems from
mechanical classic) the solution is not accurate and it is in this context that the
finite element method provides an approximate solution from the discretization of a
continuous system.
The parameters that describe the behavior of the system are the nodal displace-
ments [1]. From them it is possible to analyze the internal forces, stresses, and
evaluate the strength of the analyzed structure.
Finite Element Method (FEM) calculation had been showed as a valid method
applied in biomechanics systems when the results can be used for fixing problems
in prostheses [5].
Applying FEM with computer support, the solver (part of FEA responsible for the
calculation) can calculate many equations in a short time. Impossible problems to
be answered with manual calculation in the past, now have good results for complex
problems.
2.1 Plate
reconstruction plate may be used to treat certain types of fractures using bicortical
or monocortical locked screws (Fig. 2).
The information about the geometry was obtained with two methods. The first
method was the profile projection and the second one was the micro CT. The profile
projector used was a Mitutoyo PJ311.
With this information, it was possible to evaluate the larger sizes such as length,
thickness, diameters, and others. These values were compared with the ones obtained
with the digital caliper rule Western DC-60.
The micro CT SkyScan 1176 was used to obtain details about curves and radius. It
is important to remember that with the files generated with micro CT, it is possible to
reconstruct the 3D file using medical softwares and export the results in Stereolithog-
raphy (stl) files. However, in the present study the main objective was to develop the
geometry using only direct measurements, a type of method more efficient for simple
geometries and resulting in fewer problems with surfaces in meshing for FEA.
In order to show an example of SkyScan results in micro CT, the Fig. 3 presents
some projected images from the process. These projections need to be transformed
into slices for reconstruction.
Pretreatment and Reconstruction of Three-dimensional Images . . . 169
During the work with images obtained using CT or micro CT, it is common to present
noises or contours less defined. If the interest is to reconstruct the three-dimensional
image with quality in measurements and details, the image treatment is crucial. For
this, there are many ways to convert the files with problems into quality files within
the tolerance.
The suggested method for the present study was to use one algorithm written in
MATLAB code with a simple algorithm that is able to convert one image file of CT
to a binary image which is formed by zero and one that represent the black and white
colors.
The control of boundary in the image is made with values obtained from man-
ual and visual measurements already made before. So, if the contour is a resolved
problem, the details of geometry can be observed with more precision.
The simple routine used is described in Fig. 4, and an example (one slice image
of micro CT) before and after of the treatment with algorithm can be visualized in
Fig. 5.
After obtaining all information about the geometry, the next step was to load them
in CAD software. There are many types of softwares that can be applicable, but in
170 J. P. O. Freitas et al.
the present study it was chosen Solidworks 2012, because the tools are very simple
and the toolbox of surfaces and molds are very useful for reconstructed geometries.
In this software is possible to define regions of interest for analysis.
These regions are mapped for the FEM meshing. The mapping is important
because it defines the quality of elements, and hence the quality of results.
Figure 6 shows a comparison between the real plate and the model made with
Solidworks.
The final geometry is exported in parasolid (*.x_t) extension and imported in the
FEM software. In this case was used Ansys APDL 11. The meshing generation was
controlled by the finite element size in each line of the geometry. The first mesh
created can be visualized in Fig. 7.
Pretreatment and Reconstruction of Three-dimensional Images . . . 171
3 Results
The image treatment has proved an efficient method to obtain measurements and, in
spite of having not been tested, the binary files should be useful for three-dimensional
reconstruction with medical softwares support. The Fig. 5 shows the results for one
slice.
The model in CAD had good results. Figure 6 shows a qualitative result.
Figure 7 shows how important is the mesh control for a good result in meshing
generation. With all models developed like bones and screws, it is possible to apply
the boundary conditions in this model and run the solution to provide the behavior
of the plate in various conditions.
4 Conclusions
The present study evaluated the acquisition of images and geometric reconstruction
seeking a more detailed study of its structure through the application of numerical
methods such as finite elements.
A seven-hole locking reconstruction plate manufactured with stainless steel was
used as material model. Acquisition of geometric information was obtained from the
method of projection profile for simplified shapes such as curves and external rays.
The micro CT worked in obtaining additional information on details of the structure
as volume and validating the data obtained from the projection profile. Using this
method was possible to generate a 3D-model with good quality.
References
9. Perren SM (2002) Evolution of the internal fixation of long bone fractures. J Bone Joint Surg
Br 84:1093–1110
10. Roberts JW, Grindel SI, Rebholz B et al (2007) Biomechanical evaluation of locking plate
radial shaft fixation: unicortical locking fixation versus mixed bicortical and unicortical fixation
in a sawbone mode. J Hand Surg Am 32:971–975
11. Scolaro J, Ahn J (2011) Locked plating in practice: indications and current concepts. Univ
Pennsylvania Orthop J 21:18–22
12. Sommer C, Babst R, Muller M et al (2004) Locking compression plate loosening and plate
breakage: a report of four cases. J Orthop Trauma 18:571–577
13. Stoffel K, Dieter U, Stachowiak G et al (2003) Biomechanical testing of the LCP—how can
stability in locked internal fixators be controlled? Injury 34(Suppl. 2):11–19
14. Taheri E, Sepehri B, Ganji R et al (2012) Effect of screws placement on locking compression
plate for fixating medial transverse fracture of tibia. Biomed Eng Res 1:13–18
15. Wagner M (2003) General principles for the clinical use of the LCP. Injury 34(Suppl 2):31–42
Tortuosity Influence on the Trabecular Bone
Elasticity and Mechanical Competence
W. L. Roque ()
Department of Scientific Computation, Federal University of Paraíba, João Pessoa, Brazil
e-mail: [email protected]
A. Alberich-Bayarri
Biomedical Imaging Research Group, La Fe Health Research Institute, Valencia, Spain
e-mail: [email protected]
1 Introduction
Fig. 1 Baitogogo, a
masterpiece of Henrique
Oliveira, in Palais de Tokyo,
Paris
the TB quality. On the other hand, the connectivity of the trabecular bone network,
which can be estimated, for instance, through the Euler-Poincaré characteristic, EPC,
and the Young modulus of elasticity, E, have shown to be of major importance to
describe the mechanical behavior of the structure. On the other hand, the trabecular
bone forms a network that is not a regular lattice of straight lines as a truss, by the
contrary, nature has chosen a sinuous structural design presenting a highly connected
network of bones with rod and plate aspects. Figure 1 is a picture of a masterpiece
of Henrique Oliveira1 , a Brazilian artist, that nicely resemble the contrast between a
straight grid and a tortuous trabecular structure.
Recently the tortuosity [38], τ, which reflects the network sinuosity degree of a
connected path, has been investigated as a geometrical parameter that also affects the
mechanical behavior of the trabecular bone structure. In fact, there are several ways
to define tortuosity, τ, according to the specific field of application [10]. Nevertheless,
the simplest mathematical definition is the ratio of the geodesic length between two
points in a connected region to the Euclidian distance connecting these two points.
This definition implies that the tortuosity is such that τ ≥ 1. In a porous medium the
tortuosity of the pore space is quite relevant for the fluid flow and permeability. On
the other hand, when modeling the trabecular bone as a two phase porous medium,
one question that may arise is how the tortuosity of the trabecular network influences
the mechanical competence of the structure.
In [5] a study was conducted, based on the Biot-Allard model, showing the an-
gle dependence of tortuosity and elasticity influence on the anisotropic cancellous
bone structure using audiofrequencies in air-filled bovine bone replicas produced
by stereolithography 3D printing. In [31] it has been shown that, based on Fourier
transform and finite element methods, the normalized stress-strain behavior of a sin-
gle collagen fiber is influenced by fiber tortuosity. This effect of tortuosity on the
1
http://palaisdetokyo.com/fr/exposition/exposition-monographique/Henrique-Oliveira.
176 W. L. Roque and A. Alberich-Bayarri
stress-strain behavior can be accounted for by the relationship between fiber tortu-
osity and the source of fiber stress during straining. The resulting stress in a fiber
during an uniaxial pull is the result of two components. The first source component
is the stress generated from increasing the bond lengths between the backbones of
the polymer chains. The second source component is the stress generated from de-
creasing the overall tortuosity of the fiber. Nevertheless, the influence of tortuosity
on the elasticity of the trabecular bone itself is not yet fully understood.
Currently a debate has been conducted about the influence of aging to the distri-
bution of vertical and horizontal trabeculae; some studies have shown that trabeculae
aligned in the direction of most frequent stress play an important role to the bone
structural strength [12, 15]. In particular, it has been observed that with aging the
human vertebral bone looses mass and trabecular elements, i. e., losses connectivity,
resulting in a weaker bone structure leading to a higher fracture risk. Bone density
is the main determinant of bone strength, but the microstructure of the trabecular
bone is also important to the mechanical behavior of the structure [13, 30]. The re-
duction and slender of osteoporotic horizontal trabeculae turn the vertical ones more
susceptible to buckling under compression forces, which is no longer reinforced by
the horizontal struts. However, how the trabeculae characteristics may influence the
bone strength is still a matter of current interest [17].
The first imaged-based studies concerning the estimation of trabecular bone net-
work tortuosity were presented in [38–40], which reveal a high linear correlation
between the trabecular network tortuosity in the main stress direction, that can be as-
sumed as vertical, and the trabecular volume fraction (BV /TV ), connectivity (EPC)
and Young modulus of elasticity (E). This indicates that tortuosity is an important
feature of the bone quality and plays a role on its resistance to load. However, due
to the connectivity of the TB network, the tortuosity along other horizontal direc-
tions may as well influences E in the main stress direction, as load-bearing paths
are relevant to spread out applied stress and this is one of the investigation concerns
addressed in this paper.
Due to the high coefficients obtained in the linear correlation analysis among these
four fundamental parameters, by means of the principal component analysis (PCA)
a mechanical competence parameter (MCP) was defined in [41], merging the four
previous ones, with the intent of grading the trabecular bone structural fragility. The
study was initially done using 15 ex vivo distal radius samples obtained by μCT. Here,
to further investigate the consistence of the MCP and its potentiality as a parameter
to grade the TB fragility, we compute the MCP to two additional cohorts: one also
from distal radius obtained in vivo by magnetic resonance imaging (MRI) and the
second one, from L3 lumbar vertebrae obtained by μCT. The elasticity study was
performed in two different ways: simulation by finite element method (FEM) for the
first two sample’s set and by actual mechanical test for the third one. These analyses
are important because verify tortuosity and MCP consistences, as they will be applied
to different image acquisition methods and resolutions, and for two different Young
modulus estimation techniques.
The paper is organized as follows: Section 2 presents the materials and methods
involved and includes a brief explanation on the parameters of interest, namely:
Tortuosity Influence on the Trabecular Bone Elasticity and Mechanical Competence 177
This section presents the three cohorts that comprise the set of image samples
used in our study and briefly explain the concepts and principal aspects concern-
ing the four representative parameters explored in this work, namely, BV /TV , EPC,
τ and E.
To further investigate the potentiality of the MCP, the present work considers three
different sets of trabecular bone 3D image samples: two sets from distal radius, one
of them containing 15 ex vivo μCT samples, and the other one containing 103 in
vivo MRI samples; the third one containing 29 ex vivo μCT L3 vertebral samples.
The final isotropic resolutions are 34 μm to the μCT and 90 μm to the MRI images,
and the main analyzed direction was the axial one (craniocaudal to the vertebrae and
distal-proximal to the radius).
The μCT distal radius samples, with lateral size 12 mm, were harvested with a
mean distance of 9.75 mm from the distal extremity, and volumes of interest (VOI)
were selected with sizes which vary according to the material’s clinical analysis. They
were imaged with the scanner microCT-20 (Scanco Medical, Brüttisellen, Switzer-
land) and, to the noise removal, the μCT 3D images were filtered with a Gaussian
3D filter. In each case, the grayscale histogram of the filtered images has two peaks,
corresponding to marrow and bone; so, they were binarized using a global threshold
equal to the minimum between the two peaks. The 15 image sets have 239 slices
each, with 2D ROIs 212 × 212, 237 × 237, 242 × 242, 252 × 252 e 257 × 257
pixels; the 10 other samples have 268 × 268 pixels. Additional details concerning to
the sample’s preparation and acquisition protocols are described in [27].
A set of 29 μCT vertebral samples were supplied by the Department of Forensic
Medicine, Jagiellonian University Medical College. The specimens were taken from
female individuals without metabolic bone disease or vertebral fractures. Mean and
standard deviation of the individuals age were equal to 57 ± 17 years, respectively.
Immediately after dissection, all soft tissue was cleaned out and the samples were
placed in containers filled with ethanol. An X-tek Benchtop CT160Xi high-resolution
CT scanner (Nikon Metrology, Tring, UK) was used to scan the vertebral bodies.
The images were segmented into bone and marrow cavity phases with a global
thresholding method. The segmentation threshold was selected automatically based
on the MaxEntropy algorithm [26], such that the information entropy consistent with
178 W. L. Roque and A. Alberich-Bayarri
a two-phase model be maximal. The final 3D binarized images have size that vary
from 770 until 1088 pixels in x, from 605 until 876 pixels in y and from 413 until
713 slices (z direction), being the size average 950 × 750 × 600.
The elasticity study with these vertebral samples was performed by mechanical
test. An MTS Mini Bionix 858.02 loading system with a combined force/torque
transducer with range of 25 kN/100 N.m was used to perform the compression tests.
The specimens were located between two stiff steel plates which were firmly mounted
to the force/torque transduced and to an upper jaw of the loading system. Prior
to mechanical testing each probed specimen was glued with a self-curing denture
base acrylic resin between two polycarbonate sheets at its endplate surfaces. This
procedure was chosen to create two surfaces which will be as parallel as possible
above each endplate to transmit the compressive load from the loading system to
each specimen in an uniform way. The polycarbonate sheets were removed from the
vertebra endplates before the testing. Each vertebra was loaded in compression with
a loading rate of 5 mm/min to a certain level of engineering deformation (at most
30 % of the original height of the specimen). The compressive force was monitored
during the test with sample rate of 20 Hz. All data that were measured during the
compression tests were transformed to plots of applied force and displacement for
each specimen. Compliance of the loading system was measured as well, so, during
the post-processing, it was possible to gain a true relation between an applied force
and deformation of a vertebra body. The stiffness in the linear part of a loading path
for each specimen was evaluated and the Young modulus, E, was defined as the ratio
of the product of the stiffness and the vertebral height to the mean cross section area
of the vertebral body.
A set of 103 MRI radius samples were considered from the distal metaphysis
and from a group including healthy subjects and a mix of disease stages. The MRI
acquisitions were performed in a 3 Tesla system and scanned in 3D using a T1-
weighted gradient echo sequence (TE/TR/a=5 ms/16 ms/25 º). The MRI images
were acquired with a nominal isotropic resolution of 180 μm. MR image processing
and analysis were performed with MATLAB R2012a (The MathWorks, Inc., Natick,
MA). The image preparation steps consisted of an initial segmentation using a rect-
angular region of interest, image intensities homogeneity correction, interpolation
and binarization. All the steps were applied as in [2], with the exemption of the inter-
polation, which was performed by applying a 3D non-local upsampling algorithm,
achieving final resolution of 90 μm [29]. It has 65 samples with 80 slices, 10 with
120 and, the other ones, vary from 30 until 200 slices, predominantly between 50
and 100. Each 2D image has laterals dimensions varying from 38 up to 206 pixels,
predominantly around 70 × 100 pixels.
Finite element method simulations were conducted to estimate Young modulus
in all the 103 distal radius samples as well as for the 15 μCT distal radius samples.
For that, a mesh was created based on the 3D trabecular bone images using an op-
timized algorithm [1] implemented in Matlab R2011a, which converts each voxel
to an hexahedron element (brick element). Compression stress-strain tests were nu-
merically simulated by a finite element linear-elastic-isotropic analysis performed
in Ansys v11.0 (Ansys Inc., Southpointe, PA). The bulk material properties were
Tortuosity Influence on the Trabecular Bone Elasticity and Mechanical Competence 179
set to Ebulk = 10GP a, a common value assumed to compact bone, and Poisson’s
coefficient ν = 0.3. A deformation of 1 % of the edge length was imposed in all the
distal radius compression simulations. Computational cost of the simulations was
approximately of 5 h per sample on a computer workstation (Quad Core at 2.83 GHz
and 8 GB of RAM). After applying the homogenization theory [23], apparent Young
modulus results were obtained.
In general, most of the papers published in scientific journals are based on the
authors’ own set of image samples of subjects and upon them the studies are carried
on. Nevertheless, as a normal rule, the set of samples are not made available to
the research community and most of the times are not even made available under
request. Although all the methods and equipments to get the samples are very well
described in the material and methods section, there is a lack of freedom for other
researchers to access the image database to work with them. The availability of image
sample data would let other researchers to actually see the samples, to reproduce the
computations presented in the papers, validating by themselves the algorithms and
checking results that were published and, above all, allowing the use of the set of
samples to further research that can be carried out either as complementary to the
original paper or promoting new developments. In this regard, the image samples
that are the basis of our study are free data samples made available upon request.
The computations of BV /TV , CEP and τ values were done using OsteoImage, a
computer program developed by one of the authors especially to TB image analyses.
The statistical analyses were performed with the free software RGui [34] and the 3D
image reconstructions were done with ImageJ (http://rsbweb.nih.gov/ij/).
EPC = I − C + H. (2)
As the trabeculae have no closed cavities [18] and the number of isolated parts is
approximately 1 in a well structured sample, the EPC value should be negative and
the lower the value the higher the connectivity [8]; in this case, the connectivity is
estimated by its modulus. A positive EPC value indicates that the sample has more
isolated parts than connections, and, therefore, the EPC indicates that its structure
has lost much of its connectedness.
As EPC is a zero-dimensional measure, it needs to be estimated by a three-
dimensional test; for practical purposes, a couple of parallel 2D images can be used,
forming a disector [21, 35, 43, 47], and the EPC can be estimated for each one of
them inside the volume of interest. In general, the EPC is given normalized by its
volume size, EPC V . The algorithm to compute the EPC can be seen in [36].
2.4 Tortuosity
The tortuosity, τ, characterizes how much an object departures from being straight
and this concept has been extended to the trabecular bone network. Geometrically,
it is defined as
LG
τ= , (3)
LE
where LG is the geodesic distance between two connected points, say a and b, of the
trabecular network without passing across other phases (marrow cavity); and LE is
the Euclidean distance between these points, which will be considered here as the
distance between two parallel reference planes (see Fig 2) [50]. This approach allows
to classify as tortuous, τ ≥ 1, any filamentous structure that is not perpendicular to
the reference planes.
Gommes et al. [19] proposed a geodesic reconstruction (GR) algorithm that can
be applied on binary images to estimate the geodesic length. This algorithm was
implemented in a previous work [38] and was used to the solid phase of the bone
samples, sweeping the image along the reference plane direction, reconstructing the
trabecular bone network voxel by voxel. The number of GR necessary to recover
Tortuosity Influence on the Trabecular Bone Elasticity and Mechanical Competence 181
all the trabeculae of an image depends on their sinuosities, exceeding the number of
analyzed slices considered as the Euclidean distance; the equality occurs only in the
case of a structure completely perpendicular to the sweeping direction.
During the GR process, the algorithm computes and stores the Euclidean, LE ,
and the geodesic, LG , lengths. A distribution of Euclidean and geodesic lengths is
generated. Taking the geodesic distance average, LG at each Euclidean distance,
the tortuosity can be estimated as the slope of the best fit line of points (LE , LG ).
This algorithm can be applied directly to 3D binarized μCT or MRI images. More
details of the algorithm implementation can be found in [38, 40].
2.5 Elasticity
σ = Eε, (4)
where E is the Young modulus of elasticity. Usually, σ is obtained from the sample
reaction force, divided by the area where it is being applied on. Rigorously, the
trabecular structure is not isotropic [22, 44, 45], hence E is not a scalar, but a
symmetric tensor; nevertheless, considering the complexity of modeling a porous
structure, an isotropic model can be reasonably assumed [1, 14].
The 3D trabecular bone images were meshed to the elastic simulation using an
optimized algorithm [1] implemented in Matlab R2011a (The MathWorks Inc., Nat-
ick, MA) which converts each voxel to an hexahedron element (brick element).
Compression stress-strain test in each space direction was numerically simulated by
a finite element linear-elastic-isotropic analysis performed in Ansys v11.0 (Ansys
Inc., Southpointe, PA). The bulk material properties were set to Ebulk = 10GP a, a
common value assumed to compact bone, and Poisson’s coefficient ν = 0.3. A de-
formation of 1 % of the edge length was imposed in all the compression simulations.
Computational cost of the simulations was approximately of 5 h per sample on a
computer workstation (Quad Core at 2.83 GHz and 8 GB of RAM). After applying
the homogenization theory [23], apparent Young modulus results were obtained in
each spatial direction (Ex , Ey , Ez ).
3 Results
Table 1 presents the mean and standard deviation (SD) that were obtained for the
distal radius μCT and MRI trabecular bone cohorts. Firstly, by a simple inspection
of the data in Table 1, it is observed that in the z direction τ has the lowest mean
and SD, and the E has the highest value ones, in both groups. This corresponds to
the distal-proximal direction, which is normally the direction that is more frequently
submitted to tensile and compressive forces, when compared to the x and y ones,
corresponding to the horizontal sweeping directions. This evidence is an indication
that the trabeculae get aligned to turn the structure stronger, which is in agreement
with the very well known fact that the trabecular bone aligns in the direction which
it is more frequently mechanically demanded [20, 45, 49].
To further investigate the tortuosity influence on the trabecular bone strength, a
linear correlation study was performed including the whole data and the results are
provided in Tables 2 and 3.
The linear correlation coefficients between τ and E in the horizontal x, y, and
vertical z directions reveal a strong influence of tortuosity’s increase to the decrease
in bone stiffness. Bone mass loss occurs due mainly to an unbalance between bone
formation and bone resorption, and marrow cavity sizes and quantities in certain
parts of the trabecular bone are closely related to bone remodeling, being directly
Tortuosity Influence on the Trabecular Bone Elasticity and Mechanical Competence 183
Fig. 3 Linear relationship between E, in MPa, and τ, in the a x and y directions and b z direction
Fig. 4 Linear relationship between E, in MPa, and τ, in the x, y and z directions, for the MRI
samples. The inversely proportional relationship between E and τ is remarkable in the z and x
directions
use of the tortuosity and connectivity simply in z-direction to estimate the mechanical
competence parameter given in [41].
Since the linear correlation coefficients are notably high, this assures that a princi-
pal component analysis can be performed among the samples belonging to each one
of the three cohorts. In fact, the variance to the first principal component was actually
far higher, varying from 3.1 to 3.3, in the three cases. Therefore, this guarantees that
these four parameters can be merged into a single new parameter.
Following the definition of the mechanical competence parameter (MCP) [41] as
the first principal component, for each cohort we have
In the literature the Young modulus has been many times used as the main ref-
erence to explain the bone mechanical competence [11, 18, 24, 28, 32]. Its higher
correlation coefficient obtained with BV /TV is the reason for that, nevertheless, the
inclusion of other parameters increase this correlation. In fact, adding EPC V and τ
to the analysis, have shown an increased of r 2 up to 5 %. In other words, the stepwise
analysis considering the three parameters, BV /TV , EPC V and τ explains E evolv-
ing from 75 % up to 84 % for the cohorts. One can see that the variability is high
meaning that the Young modulus carries around 20 % of the exceeding mechanical
competence information, what justifies its consideration on the MCP construction.
4 Discussion
The tortuosity measures the network sinuosity degree compared to a straight one and
was recently proposed and investigated as a trabecular bone parameter that corre-
lates very well with trabecular connectivity, volume fraction and Young modulus of
elasticity in the z direction, with impacts on the trabecular bone mechanical compe-
tence [41]. In this paper the influence of trabecular bone tortuosity to the structural
stiffness was shown through the Young modulus of elasticity. The studies were done
in the three principal space directions, x, y and z, and used two cohorts: one with 15
μCT ex vivo images and the other one with 103 MRI images of in vivo distal radius
trabecular bone.
Tortuosity Influence on the Trabecular Bone Elasticity and Mechanical Competence 187
Fig. 6 MCPN color spectrum (see eBook version) of the μCT distal radius samples where blue
means better and red means worse
a set of interesting results of age-related changes that occur for vertical trabecular
volume fraction (vBV/TV), thickness (Tb.Th), number (Tb.N), connectivity density
(Conn.D), structural model index (SMI) and degree of anisotropy (DA), based on ex
vivo lumbar vertebrae of 40 women and 39 men with an even distribution ranging
from 20 to 90 years old. An outstanding conclusion of their study is that vertical
and horizontal bone are lost with age for both women and men, being faster for
women, and the horizontal/vertical trabecular thickness ratio decreases significantly
with age, indicating a more pronounced thinning of horizontal trabeculae. Vertical
and horizontal trabeculae are structurally important and their thinning or disruption
compromise the trabecular bone strength [15, 16].
The trabeculae thinning, increasing in porosity and diminishing connectivity are
factors that cause an increase in the network tortuosity, weakening the structure. It
has been shown here that the lowest tortuosity and the highest E values occur in the z
direction for all cohorts, corresponding to the distal-proximal radius or craniocaudal
directions. This result is a good indication that the trabeculae alignment influences
the bone mechanical competence, increasing its resistance to load. Additionally, the
moderate linear correlation between E in the z-direction (vertical) and the tortuosities
in the x and y directions (horizontal ones), do provide support to the influence of
the horizontal tortuosity to the trabecular strength in the vertical direction. This is a
somewhat expected result as in mechanical engineering load-bearing structures are
build with redundant load paths to provide a safer distribution of forces.
It has to be pointed out that the tortuosity technique used in this paper estimates the
bulk trabecular tortuosity in each direction according to the sweeping plane direction
and not specifically considering only the vertical or horizontal trabeculae as defined
in [46]. In fact, as the trabeculae form a complex network, the influence of loading
in one direction gets spread to the other ones. Thus, the results presented here have
shown that the influence of the horizontal tortuosities (τ+x , τ+y ) on the vertical E is
not as strong as the vertical tortuosity (τ+z ) on the distal radius, this is in agreement
with the achievements given in [15, 16, 46] that the vertebra vertical trabeculae are
mainly responsible for the compressive bone strength.
5 Conclusion
Osteoporosis has become a health problem to the aging population and an eco-
nomic burden worldwide to the public and private health care systems. Bone mass
loss causes irreversible damages to the bone microarchitecture which likely leads to
fragility fractures.
In this paper we have shown that the mechanical competence parameter MCP is
suitable to grade the trabecular bone fragility, resuming four important parameters
that characterizes the TB quality, namely: volume fraction, connectivity, tortuosity
and Young modulus. The MCP was investigated for three cohorts from different
trabecular bone sites and image resolutions from in vivo and ex vivo subjects, showing
full agreement between them. On the other hand, it has been shown that the tortuosity
Tortuosity Influence on the Trabecular Bone Elasticity and Mechanical Competence 189
Acknowledgements We would like to thank Dr. K. Arcaro for several preliminary discussions and
especially Dr. Z. Tabor for kindly let us make use of his μCT image samples and data. W. L. Roque
thank the University for its competence in dealing with the redistribution process.
References
1. Alberich-Bayarri A, Marti-Bonmati L, Perez MA, Lerma JJ, Moratal D (2010) Finite element
modeling for a morphometric and mechanical characterization of trabecular bone from high res-
olution magnetic resonance imaging. In: Moratal D (ed) Finite element analysis. InTechOpen,
pp 195–208
2. Alberich-Bayarri A, Marti-Bonmati L, Pérez MA, Sanz-Requena R, Lerma-Garrido JJ, García-
Martí G, Moratal D (2010) Assessment of 2D and 3D fractal dimension measurements of
trabecular bone from high-spatial resolution magnetic resonance images at 3 tesla. Med Phys
37:4930–4937
3. Arcaro K (2013) Caracterização Geométrica e Topológica da Competência Mecânica no Es-
tudo da Estrutura Trabecular. DSc. Thesis (in Portuguese). Graduate Program in Applied
Mathematics, Federal University of Rio Grande do Sul, Porto Alegre, Brazil, July 2013
4. Argenta MA, Gebert AP, Filho ES, Felizari BA, Hecke MB (2011) Methodology for numerical
simulation of trabecular bone structures mechanical behavior. CMES 79(3):159–182
5. Aygün H, Attenborough K, Postema M, Lauriks W, Langton CM (2009) Predictions of angle
dependent tortuosity and elasticity effects on sound propagation in cancellous bone. J Acoust
Soc Am 126:3286–3290
6. Boutroy S, Van Rietbergen B, Sornay-Rendu E, Munoz F, Bouxsein ML, Delmas PD (2008)
Finite element analysis based on in vivo HR-pQCT images of the distal radius is associated
with wrist fracture in postmenopausal women. J Bone Miner Res 23(3):392–399
7. Carbonare D, Giannini S (2004) Bone microarchitecture as an important determinant of bone
strength. J Endocrinol Invest 27:99–105
8. Chappard D, Basle MF, Legrand E, Audran M (2008) Trabecular bone microarchitecture: a
review. Morphologie 92:162–170
9. Chen H, Zhou X, Fujita H, Onozuka M, Kubo K-Y (2013) Age-related changes in trabecular
and cortical bone microstructure. Int J Endocrinol 2013:213234
10. Clennell MB (1997) Tortuosity: a guide through the maze. In: Lovell MA, Harvey PK (eds)
Developments in Petrophysics, vol 122. Geological Society, London, pp 299–344
11. Cohen A, Dempster DW, Müller R, Guo XE, Nickolas TL, Liu XS, Zhang XH, Wirth AJ, van
Lenthe GH, Kohler T, McMahon DJ, Zhou H, Rubin MR, Bilezikian JP, Lappe JM, Recker RR,
190 W. L. Roque and A. Alberich-Bayarri
Shane E (2010) Assessment of trabecular and cortical architecture and mechanical competence
of bone by high-resolution peripheral computed tomography: comparison with transiliac bone
biopsy. Osteoporos Int 21:263–273
12. Dempster DW (2003) Bone microarchitecture and strength. Osteoporos Int 14(Suppl 5):S54–
S56
13. Ebbesen EN, Thomsen JS, Beck-Nielsen H, Nepper-Rasmussen HJ, Mosekilde L (1999)
Lumbar vertebral body compressive strength evaluated by dual-energy x-ray absorptiometry,
quantitative computed tomography, and ashing. Bone 25:713–724
14. Edwards WB, Troy KL (2012) Finite element prediction of surface strain and fracture strength
at the distal radius. Med Eng Phys 34:290–298
15. Fields AJ, Lee GL, Liu XS, Jekir MG, Guo XE, Keaveny TM (2011) Influence of vertical
trabeculae on the compressive strength of the human vertebra. J Bone Miner Res 26:263–269
16. Fields AJ, Nawathe S, Eswaran SK, Jekir MG, Adams MF, Papadopoulos P, Keaveny TM
(2012) Vertebral fragility and structural redundancy. J Bone Miner Res 27:2152–2158
17. Gefen A (2009) Finite element modeling of the microarchitecture of cancellous bone: tech-
niques and applications. In Leondes CT (ed) Biomechanics system technology: muscular
skeletal systems, vol 4, pp 73–112. World Scientific, Singapore (chapter 3)
18. Gomberg BR, Saha PK, Song HK, Hwang SN, Wehrli FW (2000) Topological analysis of
trabecular bone MR images. IEEE T Med Imaging 19(3):166–174
19. Gommes CJ, Bons A-J, Blacher S, Dunsmuir JH, Tsou AH (2009) Practical methods for mea-
suring the tortuosity of porous materials from binary or gray-tone tomographic reconstructions.
AIChE J 55(8):2000–2012
20. Gong H, Zhu D, Gao J, Lv L, Zhang X (2010) An adaptation model for trabecular bone at
different mechanical levels. Biomed Eng Online 9:32
21. Gundersen HJG, Boyce RW, Nyengaard JR, Odgaard A (1993) The Conneuler: unbiased
estimation of the connectivity using physical disectors under projection. Bone 14:217–222
22. Hambli R, Bettamer A, Allaoui S (2012) Finite element prediction of proximal femur fracture
pattern based on orthotropic behaviour law coupled to quasi-brittle damage. Med Eng Phys
34:202–210
23. Hollister SJ, Fyhrie DP, Jepsen KJ, Goldstein SA (1991) Application of homogenization theory
to the study of trabecular bone mechanics. J Biomech 24:825–839
24. Homminga J, Mccreadie BR, Weinans H, Huiskes R (2002) The dependence of the elastic
properties of osteoporotic cancellous bone on volume fraction and fabric. J Biomech 36:1461–
1467
25. Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer, Berlin
26. Kapur JN, Sahoo PK, Wong ACK (1985) A new method for gray-level picture thresholding
using the entropy of the histogram. Graph Mod Im Proc 29:273–285
27. Laib A, Beuf O, Issever A, Newitt DC, Majumdar S (2001) Direct measures of trabecular bone
architecture from MR images. Adv Exp Med Biol 496:37–46 (Springer US, chapter 5)
28. Liu XS, Sajda P, Saha PK, Wehrli FW, Bevill G, Keaveny TM, Guo XE (2008) Complete volu-
metric decomposition of individual trabecular plates and rods and its morphological correlations
with anisotropic elastic moduli in human trabecular bone. J Bone Miner Res 23(2):223–235
29. Manjón JV, Coupé P, Buades A, Fonov V, Louis Collins D, Robles M (2010) Non-local MRI
upsampling. Med Image Anal 14:784–792
30. Mosekilde L (1993) Vertebral structure and strength in vivo and in vitro. Calcif Tissue Int
53(Suppl 1):S121–S126
31. Ohmura J (2011) Effects of elastic modulus on single fiber uniaxial deformation. Undergraduate
Honors Thesis, The Ohio State University, 41pp
32. Parkinson IH, Badiei A, Stauber M, Codrington J, Müller R, Fazzalari NL (2012) Vertebral
body bone strength: the contribution of individual trabecular element morphology. Osteoporos
Int 23:1957–1965
33. Portero-Muzy NR, Chavassieux PM, Milton D, Duboeuf F, Delmas PD, Meunier PJ (2007)
Euler strut-cavity, a new histomorphometric parameter of connectivity reflects bone strength
and speed of sound in trabecular bone from human os calcis. Calcified Tissue Int 81:92–98
Tortuosity Influence on the Trabecular Bone Elasticity and Mechanical Competence 191
34. R Development Core Team (2010) R: a language and environment for statistical computing. R
Foundation for Statistical Computing, Vienna, Austria, 2010. ISBN 3-900051-07-0
35. Roberts N, Reed M, Nesbitt G (1997) Estimation of the connectivity of a synthetic porous
medium. J Microsc 187:110–118
36. Roque WL, de Souza ACA, Barbieri DX (2009) The euler-poincaré characteristic applied to
identify low bone density from vertebral tomographic images. Rev Bras Reumatol 49:140–152
37. Roque WL, Arcaro K, Tabor Z (2010) An investigation of the mechanical competence of
the trabecular bone. In: Dvorkin E, Goldschmit M, Storti M (eds) Mecánica computacional,
vol XXIX, pp 2001–2009. AMCA, Buenos Aires
38. Roque WL, Arcaro K, Freytag I (2011) Tortuosidade da rede do osso trabecular a partir da recon-
strução geodésica de imagens binárias tridimensionais. Anais do XI Workshop de Informática
Médica, pp 1708–1717
39. Roque WL, Arcaro K, Alberich-Bayarri A (2012) Tortuosity and elasticity study of distal radius
trabecular bone. In: Rocha A, Calvo-Manzano JA, Reis LP, Cota MP (eds) (2012) Actas de la
7a Conferencia Ibérica de Sistemas y Tecnologías de Información, vol 1. AISTI - UPM, 2012.
40. Roque WL, Arcaro K, Lanfredi RB (2012) Tortuosidade e conectividade da rede trabecular do
rádio distal a partir de imagens micro-tomográficas. Rev Bras Eng Bio 28:116–123
41. Roque WL, Arcaro K, Alberich-Bayarri A (2013) Mechanical competence of bone: a new
parameter to grade trabecular bone fragility from tortuosity and elasticity. IEEE T Bio-Med
Eng 60:1363–1370
42. Saha PK, Xu Y, Duan H, Heiner A, Liang G (2010) Volumetric topological analysis: a novel
approach for trabecular bone classification on the continuum between plates and rods. IEEE T
Med Imaging 29(11):1821–1838
43. Sterio DC (1984) The unbiased estimation of number and sizes of arbitrary particles using the
disector. J Microsc 134:127–136
44. Tabor Z (2007) Estimating structural properties of trabecular bone from gray-level low-
resolution images. Med Eng Phys 29:110–119
45. Tabor Z (2009) On the equivalence of two methods of determining fabric tensor. Med Eng
Phys 31:1313–1322
46. Thomsen JS, Niklassen AS, Ebbesen EN, Brüel A (2013) Age-related changes of vertical and
horizontal lumbar vertebral trabecular 3d bone microstructure is different in women and men.
Bone 57:47–55
47. Vogel HJ, Kretzschmar A (1996) Topological characterization of pore space in soil—sample
preparation and digital image-processing. Geoderma 73:23–38
48. Wesarg S, Erdt M, Kafchitsas Ks, Khan MF (2010) Direct visualization of regions with lowered
bone mineral density in dual-energy CT images of vertebrae. In: Summers RM, Bram van
Ginneken MD (eds) Medical Imaging 2011: Computer-Aided Diagnosis. SPIE Proceedings,
2010
49. Wolff J (1986) The law of bone remodeling. Springer-Verlag, Berlin (translation of the german
1892 edition) edition
50. Wua YS, van Vliet LJ, Frijlink HW, Maarschalka KV (2006) The determination of relative
path length as a measure for tortuosity in compacts using image analysis. Eur J Pharm Sci
28:433–440
Influence of Beam Hardening Artifact in Bone
Interface Contact Evaluation by 3D X-ray
Microtomography
Abstract Trabecular bone screws are commonly used for fixation of fractures in
order to increase holding power in the fine sponge bone. The success of skeletal
anchorage using mini screws is related to their stability in the bone tissue. Factors
that influence the immediate stability of metal implants are related to the design
of the device to the quantity and quality of bone and to insertion technique. The
present work studied bone interface contact parameter by x-ray microtomography.
The results identified the importance of evaluating the metallic artifact around the
mini screws, which can be assessed by different pixel size dilation image processing.
It can be noted a correlation pattern between beam hardening artifact correction and
bone interface contact measurements.
1 Introduction
Fig. 1 CT attenuation
principle scheme
In Eq. (1) the range of integration over z covers the entire scanned object. This is
the key equation for X-ray imaging via projection radiography in which Id (x,y) is
the projection image of μ(x,y,z;E) and η(E) represents the quantum efficiency of the
detector at energy E.
In this sense, the more absorbent the object is, the fewer the X-ray photons
detected. Because low energy X-rays are more efficiently attenuated than high energy
X-rays, the distribution of energies is slanted toward higher energies, which lead to a
beam hardening artifact. In other words, beam-hardening results from the preferential
absorption of low-energy photons from the beam.
A major effect of beam hardening is the enhancement of the image edges. This is
one of the most difficult image artifacts of CT because quantitative measurements are
highly influenced by this problem due to its relation with the attenuation coefficient.
In addition the same material can result in different gray levels depending on the
Influence of Beam Hardening Artifact in Bone Interface Contact . . . 195
Fig. 3 Illustration of μCT radius image without a and with b patient movement. The arrows show
the movement artifact
the radiation through the object, dl is its distance increment along L. Also in (2), we
have the term ln( lld0 ), which is called ray sums and represents the contribution of all
μ along the radiation path.
! ( ) !
l0 − μ(x,y)dl I0
ln = ln e → ln = μ(x, y)dl = P (x, y) (2)
ld I L
A set of ray sums over a given angle, parallel to the beam radiation, forms the
projection term P. Each projection is acquired with the object (X-ray tube-detector
system) rotated by an angle ϕ relative to the original position. So, it is possible to
obtain a projection for each angle ϕ.
The information from transmitted X-rays is processed by a computer in order to
obtain the CT/μCT images. In order to achieve this goal the theory of image recon-
struction from projections is applied. In general words, the attenuation coefficient in
each point (x,y) of the scanned object can be found from the projections using the
inverse of the Radon transform. There are a number of alternatives to perform the
reconstruction, such as the direct Fourier method or iterative approach. Currently, the
most used reconstruction algorithms are based on the direct reconstruction method
called filtered backprojection algorithm, which is mainly a combination of filtering
and a good numerical stability. Basically, the function of the filter used describes a
low pass filter that can be used to globally balance the noise and spatial resolution in
the reconstruction results. The filtered back projection issue was first described in the
60s but the key theory on CT filtering reconstruction was presented on the 70s and im-
plemented by Hounsfield, who is acknowledged as the inventor of the CT technique.
A great evolution in μCT reconstruction theory was achieved when using a series of
X-ray cone beam projections directly into 3D density distribution. Cone beam CT
198 I. Lima et al.
is a 3D extension of 2D fan beam CT and has the advantage of the reduction of data
collection time, which is particularly important when moving structures are scanned.
The 3D data set of the scanned object is obtained by stacking contiguous 2D
images. Here, the source trajectory is a circle and each horizontal row of the detector is
ramp-filtered as if it was a projection of a 2D object. Then, the filtered projection data
are back projected along the original rays and the middle slice is reconstructed exactly.
The 2D algorithms reconstruct a slice of the scanned object. But, if volumetric data
knowledge is required, the complete procedure must be performed slice by slice.
Description of μCT reconstruction algorithms can be largely found in the literature.
In order to record the transmitted X-ray beam, a detection system must be used.
The use of image intensifier (II) with a charge coupled device (CCD) can be found
in many old CT systems. The II are closed vacuum tubes amplifying image signals.
They are made of glass, aluminum or non-ferromagnetic metal, which allows the
flow of electrons from the photocathode to the anode. Input and output phosphorus
and electromagnetic lenses are also its constituents. Therefore, they are responsible
for the conversion of X-ray photons into light signals and their diameter are generally
about 23–57 cm. The function of the input phosphor is to absorb the X-rays and emit
light radiation. It is typically made of cesium iodide activated with sodium screen, but
can also be made of zinc-cadmium activated with copper. However the first option is
better because the crystals are vertically oriented, which helps to channel the light.
The electronic signal from the II is captured by the CCD and then sent to a TV
monitor resulting in a representation of the radiographic image in real time. In fact,
the digitalization can be performed through CCD or by direct capture of the X-ray
detector with a flat panel detector.
The CCD cameras are in general composed of amorphous Si with a scintillation
layer, which is basically cesium iodide. Silicon has a low X-ray absorption coeffi-
cient, which leads to a small number of photons detected by the CCD. This results in
a significant quantum noise. In order to decrease this noise, it is possible to increase
the dose of radiation or the quantum detection efficiency. As increasing the dose is
undesirable, priority is given to increasing the quantum efficiency of radiation de-
tectors. The quantum efficiency of the detector system can be increased by adding
a scintillation layer above the CCD. X-rays are absorbed by this layer, which has
a high absorption coefficient, and then converted into visible light (wavelength or
near-visible).
The flat panel detectors are based on flat-screen arrangement of amorphous silicon
photodiodes and thin transistors in combination with scintillators CsI (Tl) devices.
They replace the image intensifier and video camera, recording the image sequences
in real time. The transition of II to flat panel is facilitated by the advantages they
offer, such as images without distortion, excellent contrasts, large dynamic range
and high sensitivity to X-rays
μCT emerged as a non-destructive method of analysis [11], to investigate the
interface between bone and screws. Trabecular bone screws are commonly used for
fixation of fractures in order to increase holding power in the fine trabecular bone.
The holding strength of a screw is directly linked with bone quality, which is a very
important issue in clinical healthcare [1].
Influence of Beam Hardening Artifact in Bone Interface Contact . . . 199
In the medical area, titanium screws are used for fixation of fractures in order
to increase holding power in the fine trabecular bone. In Orthodontics, in the last
2 decades, fixations screws were modified to be used as anchorage devices. These
screws are called miniscrews or mini-implants. They are widespread in clinical prac-
tice because they allow tooth movement in three dimensions with minimal effect on
other teeth.
The success of miniscrews is related to primary stability, which is defined as the
absence of mobility in the bone bed after mini-implant placement [5] and depends
on the mechanical engagement of an implant with the bone socket [2]. If the initial
mechanical retention of the MI is not observed, a larger miniscrew should be used
or the insertion site should be modified [3]. On the other hand, exaggerated tension
during insertion may result in heating and damage to the bone tissue, including
ischemia and necrosis, or even fracture of mini-implant [10].
After primary stability is achieved, the healing process starts and, due to osseoin-
tegration, the implant gains secondary stability [14]. Osseointegration is a direct
structural and functional connection between ordered, living bone and the surface of a
load-carrying implant. There is a direct bone-to-metal interface without interposition
of non-bone tissue [9].
The contact surface of the bone to mini-implant, called bone to implant contact
(BIC), has traditionally been assessed by histological techniques [4, 6, 10, 13]. The
histological technique presents some disadvantages: it requires the destruction of the
sample for making the histological slides; the analysis depends on the subjectivity of
the operator; and it is necessary to evaluate lots of cross sections to obtain a global
view of the sample.
The great advantage of μCT in dental area is the non-destructive nature of the
technique as well as that it obtains information of the entire sample volume. However,
one of the biggest challenges of the μCT BIC evaluation is to avoid the beam-
hardening artifact presented in the reconstructed images, caused by the metal of the
screws. In this context, the objective of this study was to evaluate BIC parameter of
mini screws inserted into bone blocks by μCT.
Bovine samples (Fig. 4) (Bos taurus, Angus lineage) were removed from pelvic
bones immediately after the animals were slaughtered with the use of a trephine
bur (8 mm ø x 20 mm long, Sin Implants, São Paulo, Brazil) adapted to a low speed
motor handpiece (Beltec LB100, Araraquara, Brazil), under irrigation.
The samples received implantation of a conical solid miniscrew, made of Ti-6Al-4
V alloy (INP®, São Paulo, Brazil), with 1.4 mm diameter and 6 mm long, and after
that they were immersed in sterile physiological solution and stored frozen (− 20 ◦ C).
In order to perform the μCT, the samples were removed from the freeze, defrosted
at room temperature ant then scanned.
200 I. Lima et al.
Fig. 4 Bovine samples: Macroscopic view of the right half of the pelvic bone. a Caudal view: the
arrow indicates the gluteus iliac wing bone. b Medial view: the arrow indicates the caudal portion
of the pubic bone
The images (Fig. 4c) were acquired in a high resolution system (Bruker/Skyscan
μCT, model 1173, Kontich, Belgium, software version 1.6) at a pixel resolution
of 9.3 μm, using a 1 mm thick aluminum filter, 80 kV, 90 μA, and exposure time
of 800 ms. A flat panel detector with a matrix of 2240 × 2240 pixels was used.
The samples were kept in 2 ml Eppendorf tubes containing saline solution during
acquisition to avoid dehydration. The μCT images were reconstructed (NRecon
software, InstaRecon, Inc. Champaign, IL, USA, version 1.6.4.1) and evaluated in
the CT-Analyzer software (version 1.10, Bruker/Skyscan μCT, Kontich, Belgium).
After the scanning, quantitative evaluations were performed directly in 3D. The
volume of interest (VOI) corresponded to a 3.4 mm diameter cylinder surrounding
the mini screw, which means 1 mm beyond the mini screw (Fig. 5). In this particular
study only the regional changes on trabecular microarchitecture were evaluated. In
total, 366 slices were analyzed, which is equivalent to a cylinder volume equal to
15 mm3 .
In this study, the intersection surface between the trabecular bone and the mini-
screw (IS) and the bone surface (BS) was calculated to evaluate BIC parameter.
For that purpose, after the reconstruction procedure, μCT data were segmented
with a global threshold. However, the metal artifact surrounding the mini-implant
must be identified and take into account when BIC evaluation is performed. In this
step, different values of pixel size dilation distant from the mini-implant interface
were studied. 2D and 3D morphological operation approaches involving dilatation
of pixels/voxels from the surface were used. All the steps were performed by using
round kernel operation with many radius values (2, 4, 6, 8, 10 and 12). BIC evaluation
was also performed without any morphological operation in order to compare the
impact of this approach.
Influence of Beam Hardening Artifact in Bone Interface Contact . . . 201
Fig. 5 μCT cross-sections of bone implant contact a sagittal view, b transaxial view, c better detail
of beam hardening effect
3 Results
Cortical bone is dense and has a solid structure whereas trabecula has a honeycomb
organization and is believed to distribute and dissipate the energy from articular
contact loads. Although about 80 % of the total skeletal mass is cortical bone, tra-
becular bone has a much greater surface area than cortical [15]. In this study, only
the region surrounding the trabeculae was evaluated. The possibility of using μCT,
a non-invasive and non-destructive technique for the evaluation of the bone-implant
interface was explored.
Although μCT provides a good quality data for bone and implant interface in-
vestigation, beam hardening artifact corrections must be taken into account. This
issue is caused by the non-linear relation between the attenuation values and the
measurement values of the projection. Like all medical and industrial X-ray beams,
μCT uses a polyenergetic X-ray spectrum (X-ray attenuation coefficients are energy
dependent). After passing through a given thickness of an object, lower energy X-
rays are attenuated to a greater extent than higher energy X-rays are. As the X-ray
beam propagates through a thickness of materials, the shape of the spectrum be-
comes skewed toward higher energies. In this sense, beam hardening phenomenon
202 I. Lima et al.
Fig. 6 Beam hardening contribution in different metallic filters applications. Note in the
reconstruction slices of mini-screws and they corresponding profile along the arrow line
induces artifacts in μCT because rays from some projection angles are hardened to
a differing extent than rays from other angles, which mixed up the reconstruction
algorithms. This phenomenon leads to an image error, which reduces the image
quality in CT/μCT measurements. In fact, these issues clearly indicate serious im-
pact on quantitative μCT measurements. In this study, in order to avoid this issue a
combination of two approaches was used. The scans were acquired over 360◦ and an
aluminum filter of 1.0 mm of thickness placed at the exit window of the X-ray tube
was used in this scan step. Figure 6 shows μCT profiles obtained with different kind
of materials. It is possible to note that the metallic filter affects the effective energy
and in consequence the attenuation coefficient. Note in the slices reconstructions the
reduced artefacts compared to no filter μCT slice reconstructed image.
Influence of Beam Hardening Artifact in Bone Interface Contact . . . 203
Fig. 7 Typical μCT signal profile through the center of the bone implant sample: different beam
hardening correction depth values of 360◦ scanning
Another attitude was taken during the reconstruction progression. A few recon-
struction parameters can be adjusted on the reconstruction software and one of them
is the beam hardening correction. This option compensates the problem by a linear
transformation in which several correction (0, . . ., 100) depths can be selected ac-
cording to the object density. Bone and metal can be easily distinguished in Fig. 7.
It is also possible to see the difference between the profiles along the arrow line with
and without beam hardening correction through the center of the sample. Fine-tuning
function was used in order to obtain the optimum depth correction value.
In order to evaluate BIC the ratio between the intersection surfaces (IS) of the
mini-screw and the trabecular bone was calculated. Traditionally, the contact surface
of the bone tissue with the screw, called BIC was studied through, histological tech-
niques [4]. Alternatively, the μCT emerged as a non-destructive method of analysis.
However, some important differences are observed between the two techniques. In
the first one, 2D images are evaluated while in the second one we have the possibility
of accessing all the 3D data. Furthermore, due to artifacts created by the metal in
μCT, a small image strip adjacent to the bone tissue mini-implant should be disre-
garded during the analysis. Due to these facts, a new index of analysis can be created
when μCT is used: the osteointegration volume/total volume of the implant, which is
effective for predicting the mechanical attachment of implants to the bone [8]. In this
study, we believe that it would be more appropriate to call the index “bone implant
204 I. Lima et al.
Fig. 8 μCT transaxial binary images used at intersection between bone and implant surface (IS)
with different pixel(a)/voxel(b) values of dilation. It is possible to see what happens when no image
processing analyzis is performed a. r represents the values of the round kernel radius of the dilation
morphological operation
Table 1 μCT BIC results: different pixel/voxels sizes of round kernel dilation in order to avoid
metal-induced artifact
Radius size dilatation (r) Pixel dilatation Voxel dilatation
IS (mm) BIC (%) IS (mm) BIC (%)
2 14.42 81.6 17.68 91.3
4 14.23 80.5 13.74 77.7
6 13.86 78.4 13.00 73.5
8 13.65 77.2 12.66 71.6
10 13.50 76.4 12.32 69.7
12 13.48 76.3 12.00 67.9
No Morphological operation 0.053 0.30
The specific metallic content of an implant may affect the severity of artifacts
on CT images. Titanium alloy hardware causes the least obtrusive artifact of CT
imaging, whereas stainless steel implants cause significant beam attenuation and
artifact. Knowledge of the composition of the implanted material at the time of the
CT examination may be helpful, as technical parameters may be then adjusted to
minimize artifacts and to spare the patient from excess radiation.
The composition of the dental implant and mini-implant is very similar. Both are
composed by titanium alloys, so the data of this study can be useful in dental implant
studies.
The present study focused the evaluation of BIC by μCT and identified that it is
important to investigate the metallic artifact around the mini screws, which can be
assessed by different pixel size dilation showing a correlation pattern between beam
hardening artifact correction and BIC measurements.
4 Conclusion
Acknowledgments The authors would like to thanks CNPQ and FAPERJ for financial support.
206 I. Lima et al.
References
1 Introduction
Fabric tensors aim at modeling through tensors both orientation and anisotropy of
trabecular bone. Many methods have been proposed for computing fabric tensors
from segmented images, including boundary-, volume-, texture-based and alternative
methods (cf. [21] for a complete review). However, due to large bias generated by
partial volume effects, these methods are usually not applicable to images acquired
in vivo, where the resolution of the images is in the range of the trabecular thickness.
Recently, different methods have been proposed to deal with this problem. In general,
these methods directly compute the fabric tensor on the gray-scale image, avoiding
in that way the problematic segmentation step.
Different imaging modalities can be used to generate 3D images of trabecular
bone in vivo, including different magnetic resonance imaging (MRI) protocols and
computed tomography (CT) modalities. The main disadvantages of MRI are that it
requires long acquisition times that can easily lead to motion-related artifacts and that
the obtained resolution with this technique is worse compared to the one obtained
through CT in vivo [8]. Regarding CT modalities, cone beam CT (CBCT) [16, 22]
and high-resolution peripheral quantitative CT (HR-pQCT) [1, 5] are two promising
CT techniques for in vivo imaging. Although these techniques are not appropriate to
all skeletal sites, their use is appealing since they can attain higher resolutions and
lower doses than standard clinical CT scanners. CBCT has the extra advantages with
respect to HR-pQCT that it is available in most hospitals in the western world, since
it is used in clinical practice in dentistry wards, and, on top of that, the scanning time
is shorter (30s vs. 3min), so it is less prone to motion artifacts than HR-pQCT.
As already mentioned, there are many methods available for computing tensors
describing anisotropy in gray-scale [21]. A strategy for choosing the most appropriate
method is to assess how similar the tensors computed from a modality for in vivo
imaging (e.g., CBCT) are with respect to the ones computed from the reference
imaging modality (micro-CT) for the same specimens. This was actually the strategy
that we follow in this chapter.
From the clinical point of view, it seems more relevant to track changes in
anisotropy than in the orientation of trabecular bone under treatment, since osteo-
porosis can have more effect on its anisotropy than on its orientation [13, 23]. Thus,
the aim of the present study was to compare anisotropy measurements from different
fabric tensors computed on images acquired through cone beam computed tomog-
raphy (CBCT) to the same tensors computed on images acquired through micro
computed tomography (micro-CT).
Due to its flexibility, we have chosen in this study our previously proposed gen-
eralized mean intercept length (MIL) tensor [18] (GMIL) with different kernels and,
due to its simplicity, the global gray-scale structure tensor (GST) [25]. This chapter
is an extended version of the work in [20].
The chapter is organized as follows. Section 2 presents the material and methods
used in this study. Section 3 shows comparisons between using GMIL and GST in
both CBCT and micro-CT data. Finally, Sect. 4 discusses the results and outlines
our current ongoing research.
Anisotropy Estimation of Trabecular Bone in Gray-Scale 209
The samples in this study consisted of 15 bone biopsies from the radius of human
cadavers donated to medical research. The biopsies were approximately cubic with a
side of 10 mm. Each cube included a portion of cortical bone on one side to facilitate
orientation. The bone samples were placed in a test tube filled with water and the
tube was placed in the centre of a paraffin cylinder, with a diameter of approximately
10 cm, representing soft tissue to simulate measurements in vivo. After imaging, a
cube, approximately 8 mm in side, with only trabecular bone was digitally extracted
from each dataset for analysis.
The specimens were examined both with CBCT and with micro-CT. The CBCT
data were acquired with a 3D Accuitomo FPD 80 (J. Morita Mfg. Corp., Kyoto,
Japan) with a current of 8 mA and a tube voltage of 85 kV. The obtained resolution
was 80 micrometers isotropic. The micro-CT data were acquired with a μ CT 40
(SCANCO Medical AG, Bassersdorf, Switzerland) with a tube voltage of 70 kVp.
The voxels have an isotropic resolution of 20 microns. Figure 1 shows slices and
volume renderings of one of the imaged specimens.
2.3 Methods
The tensors were computed through the generalized MIL tensor (GMIL) and the
GST.
Basically, the GMIL tensor is computed in three steps. The mirrored extended Gaus-
sian image (EGI) [12] is computed from a robust estimation of the gradient. Second,
the EGI is convolved with a kernel in order to obtain an orientation distribution func-
tion (ODF). Finally, a second-order fabric tensor is computed from the ODF. More
formally, the generalized MIL tensor is computed as:
v vT
MIL = 2
dΩ, (1)
Ω C(v)
where v are vectors on the unitary sphere Ω, and C is given by:
C = H ∗ E, (2)
210 R. Moreno et al.
Fig. 1 Slices (left) and volume renderings (right) of one of the imaged specimens. Top: images
acquired through micro-CT. Bottom: images acquired through CBCT
that is, the angular convolution (∗) of a kernel H with the mirrored EGI E. Thanks to
the Funk-Hecke theorem [3, 9], this convolution can be performed efficiently in the
spherical harmonics domain when the kernel is positive and rotationally symmetric
with respect to the north pole.
One of the advantages of the GMIL tensor is that different kernels can be used in
order to improve the results. In this study, the half-cosine (HC) and von Mises-Fisher
(vMF) kernels have been applied to the images. The HC has been selected since it
makes equivalent the generalized and the original MIL tensor. The HC is given by:
⎧
⎨cos (φ), ifφ ≤ π/2
H (φ) = (3)
⎩0, otherwise,
with φ being the polar angle in spherical coordinates. Moreover, the vMF kernel,
which is given by [14]:
κ
H (φ) = eκ cos (φ) , (4)
4π sinh (κ)
has been selected since it has a parameter κ that can be used to control its smoothing
action. In particular, the smoothing effect is reduced as the values of κ are increased
[18].
Anisotropy Estimation of Trabecular Bone in Gray-Scale 211
Fig. 2 Graphical representation of some kernels from the broadest to the narrowest, where zero
and the largest values are depicted in blue and red respectively. Notice that the impulse kernel has
been depicted as a single red dot in the north pole of the sphere
Figure 2 shows different kernels that can be used with the GMIL tensor. As already
mentioned, these kernels must be positive and symmetric with respect to the north
pole. As shown in the figure, the HC kernel is too broad (it covers half of the sphere),
which can result in excessive smoothing. On the contrary, the impulse kernel is the
sharpest possible kernel. As shown in [18], the GST makes use of the impulse kernel.
In turn, the size of the smoothing effect of the vMF kernel can be controlled through
the parameter κ. As shown in the figure, vMF is broader than the HC for small values
of κ and it converges to the impulse kernel in the limit when κ → ∞.
On the other hand, the GST computes the fabric tensor by adding up the outer product
of the local gradients with themselves [25], that is:
GST = ∇Ip ∇IpT dI , (5)
p∈I
Table 1 Mean (SD) of E1’ for fabric tensors computed on CBCT and micro-CT and the mean
difference (SD) between both values. HC and vMF refer to the generalized MIL tensor, with the HC,
and vMF kernels respectively. Parameter κ for vMF is shown in parenthesis. Positive and negative
values of the difference mean over- and under estimations of CBCT with respect to micro-CT. All
values have been multiplied by 100
Tensor micro-CT CBCT Difference
HC 44.65 (1.54) 42.38 (0.90) 2.25 (0.84)
vMF(1) 34.12 (0.29) 34.70 (0.18) 0.42 (0.15)
vMF(5) 51.55 (3.56) 47.07 (2.17) 4.51 (1.82)
vMF(10) 58.98 (4.63) 53.90 (3.21) 5.11 (2.13)
GST 45.69 (1.58) 44.79 (1.58) 0.90 (2.09)
[15] or tensor voting [19]. However, the most used ST is given by:
where Gσ is a Gaussian weighting function with zero mean and standard deviation
σ . In fact, ST becomes the GST when σ → ∞. The main advantage of this structure
tensor is that it is easy to code.
3 Results
where E1, E2 and E3 are the largest, intermediate and smallest eigenvalues of the
tensor. These three values have been selected since they are directly related to the
shape of the tensor.
Tables 1–3 show the mean and standard deviation of E1’, E2’ and E3’ computed
on micro-CT and CBCT for the tested methods, and the mean difference and standard
deviation between micro-CT and CBCT. As a general trend, the tested methods tend
to overestimate E1’ and underestimate E2’ and E3’ in CBCT. As shown, the best
performance is obtained by vMF with κ =1 with small differences between tensors
computed in both modalities. However, the tensors computed with this broad kernel
are almost isotropic (cf. Tables 2 and 3), which makes it not suitable for detecting
Anisotropy Estimation of Trabecular Bone in Gray-Scale 213
Table 2 Mean (SD) of E2’ for fabric tensors computed on CBCT and micro-CT and the mean
difference (SD) between both values. HC and vMF refer to the generalized MIL tensor, with the HC,
and vMF kernels respectively. Parameter κ for vMF is shown in parenthesis. Positive and negative
values of the difference mean over- and under estimations of CBCT with respect to micro-CT. All
values have been multiplied by 100
Tensor micro-CT CBCT Difference
HC 65.94 (5.85) 71.50 (3.53) −6.11 (2.19)
vMF(1) 93.70 (1.69) 95.27 (1.02) −1.84 (0.73)
vMF(5) 52.13 (9.54) 61.63 (6.53) −8.48 (3.09)
vMF(10) 39.41 (9.74) 48.31 (7.75) −8.96 (4.43)
GST 80.71 (10.66) 78.58 (7.66) 2.24 (7.41)
Table 3 Mean (SD) of E3’ for fabric tensors computed on CBCT and micro-CT and the mean
difference (SD) between both values. HC and vMF refer to the generalized MIL tensor, with the HC,
and vMF kernels respectively. Parameter κ for vMF is shown in parenthesis. Positive and negative
values of the difference mean over- and under estimations of CBCT with respect to micro-CT. All
values have been multiplied by 100
Tensor micro-CT CBCT Difference
HC 58.29 (2.93) 65.18 (2.20) −5.58 (2.98)
vMF(1) 91.09 (1.01) 92.92 (0.67) −1.59 (0.89)
vMF(5) 42.72 (5.11) 51.20 (3.71) −9.55 (4.49)
vMF(10) 31.17 (4.94) 37.81 (3.91) −6.64 (2.70)
GST 38.71 (4.90) 44.96 (3.78) −6.31 (4.40)
Table 4 Correlations between CBCT and micro-CT of E1’, E2’ and E3’ of different fabric tensors.
HC and vMF refer to the generalized MIL tensor, with the HC, and vMF kernels respectively.
Parameter κ for vMF is shown in parenthesis. 95 % confidence intervals are shown in parentheses
Tensor E1’ E2’ E3’
HC 0.90 (0.73;0.97) 0.91 (0.76;0.97) 0.67 (0.23;0.88)
vMF(1) 0.90 (0.72;0.97) 0.90 (0.73;0.97) 0.70 (0.29;0.89)
vMF(5) 0.91 (0.75;0.97) 0.91 (0.75;0.97) 0.80 (0.48;0.93)
vMF(10) 0.92 (0.76;0.97) 0.90 (0.72;0.97) 0.84 (0.57;0.94)
GST 0.51 (0.00;0.81) 0.71 (0.33;0.90) 0.51 (0.00;0.81)
and E3’ for HC, vMF (with κ = 10) and GST. It can be seen that the best correlations
are yielded by vMF with different values of κ, and GST has a poor performance.
Figure 3 (right) shows correlation plots of the three eigenvalues normalized by
the sum of them for the same three methods. As shown in this figure, the tensors
yielded by the three methods have different shapes. First, vMF with κ = 10 generates
the most anisotropic tensors with larger differences between E1 and E2 than HC
and GST. Second, HC generates the most isotropic tensors with smaller differences
between values of E1, E2 and E3 than the other tensors. Finally, unlike GST, both
HC and vMF generate tensors that are close to be orthotropic, that is, E2 ≈ E3. This
is in line with the common assumption of orthotropy for trabecular bone [28].
Figures 4–6 show Bland-Altman plots for the generalized MIL tensor with the
HC and vMF (with κ = 10) kernels and the GST. As seen in these figures, GST yields
wider limits of agreement, i.e., larger discrepancies between CBCT and micro-CT,
than HC and vMF, in particular for E2’ and E3’. One of the advantages of using the
vMF kernel is that its parameter can be adjusted in order to improve the correlations
between CBCT and micro-CT. Figure 7 shows the evolution of the correlations
between CBCT and micro-CT with the parameter κ of the generalized MIL tensor
with the vMF kernel. From this figure, E1’ and E2’ attain their maxima at κ = 10,
κ = 5 respectively, while E3’ asymptotically approaches a correlation of 0.875 when
κ → ∞. Since the three measurements determine the shape of the tensor, we suggest
to choose the value of κ that maximizes the three correlations, that is, that maximizes
(E1’+E2’+E3’)/3. In our case, such a value is κ = 10, as is also shown in Fig. 7.
4 Discussion
We have compared in this chapter the anisotropy of different fabric tensors estimated
on images acquired through CBCT and micro-CT of 15 trabecular bone biopsies from
the radius. The results presented in the previous section show strong correlations
between micro-CT and CBCT for the generalized MIL tensor with HC and vMF
kernels, especially with κ = 10. In addition, good agreements between measurements
in CBCT and the reference micro-CT have been shown through Bland-Altman plots
for HC, vMF with κ = 10 and GST. An interesting result is that the GST yields clearly
lower correlation values than the generalized MIL tensor using either HC or vMF
kernels. We have shown that the GST can be seen as a variant of the generalized MIL
tensor where the impulse kernel is applied instead of the HC [18].
In this line, the results from the previous section suggest that the use of broader
smoothing kernels such as HC or vMF has a positive effect for increasing the cor-
relation of the tensors computed on images acquired through suitable scanners for
in vivo with the ones that can be computed from images acquired in vitro. Although
the three tested methods yield tensors that share their eigenvectors, their eigenvalues
are different, as shown in Fig. 3, which is a natural consequence of using different
smoothing kernels. Moreover, the high correlations reported for HC and vMF enable
Anisotropy Estimation of Trabecular Bone in Gray-Scale 215
0.60
CBCT
CBCT
0.40
R² = 0.8188
R² = 0.7698
0.50 0.30
R² = 0.2649
R² = 0.2262
0.20
R² = 0.8188
0.40 0.10
0.40 0.50 0.60 0.70 0.10 0.30 0.50 0.70
micro-CT micro-CT
R² = 0.517 0.60
0.80 R² = 0.8379
R² = 0.8364
0.50
CBCT
CBCT
0.60 0.40
R² = 0.8101
R² = 0.7974
0.30
0.40
0.20
R² = 0.466
0.20 0.10
0.20 0.40 0.60 0.80 1.00 0.10 0.30 0.50 0.70
micro-CT micro-CT
CBCT
R² = 0.2649
0.40
R² = 0.6614
0.40
0.30
0.30 R² = 0.7031
0.20
R² = 0.5416
0.20 0.10
0.20 0.30 0.40 0.50 0.60 0.70 0.10 0.30 0.50 0.70
micro-CT micro-CT
Fig. 3 Left: correlation plots for E1’ (top), E2’ (middle) and E3’ (bottom) between CBCT and
micro-CT for HC, vMF (κ = 10) and GST. Right: correlation plots for HC (top), vMF (κ = 10)
(middle) and GST (bottom) between CBCT and micro-CT for the three eigenvalues normalized by
the sum of them
216 R. Moreno et al.
-0.06
-0.09
-0.12
0.55 0.60 0.65 0.70 0.75 0.80
0
-0.02
-0.04
-0.06
-0.08
-0.1
-0.12
0.56 0.58 0.60 0.62 0.64 0.66 0.68
to eliminate of the systematic errors reported in Tables 1–3 and in the Bland-Altman
plots for these two types of fabric tensors.
Another interesting observation is that vMF yielded better results than the standard
HC. This means that κ can be used to tune the smoothing in such a way that the results
are correlated with in vitro measurements. For the imaged specimens, a value of κ =
10 yielded the best correlation results.
The results presented in this chapter suggest that advanced fabric tensors are
suitable for in vivo imaging, which opens the door to their use in clinical practice.
In particular, the results show that the generalized MIL tensor is the most promising
option for use in vivo. As shown in this chapter, this method is advantageous since
it has the possibility to improve its performance by changing the smoothing kernel
by a more appropriate one, as it was shown in this chapter for the vMF kernel.
A poor performance of the GST has also been reported in images acquired through
multi-slice computed tomography (MSCT) [26]. The authors of that study hypoth-
esized that such a bad performance could be due to voxel anisotropy obtained from
MSCT. However, the results from the current study suggest that the problems of the
GST are more structural, since they are also present in CBCT with isotropic voxels.
Anisotropy Estimation of Trabecular Bone in Gray-Scale 217
0
-0.03
-0.06
-0.09
-0.12
-0.15
0.30 0.35 0.40 0.45
Thus, the problems of GST seem more related to the applied kernel (the impulse
kernel) than to the voxel anisotropy of the images.
Ongoing research includes performing comparisons in different skeletal sites,
different degrees of osteoporosis and comparing the results with images acquired
through HR-pQCT and micro-MRI [6, 11]. Furthermore, relationships between fab-
ric and elasticity tensors will be explored. The MIL tensor has extensively been used
for predicting elasticity tensors in trabecular bone [2, 10, 27]. However, since the
GMIL with the vMF kernel has a better performance than the MIL tensor for re-
producing in vitro measurements, we want to investigate whether or not the GMIL
tensor can also be used to increase the accuracy of the MIL tensor for predicting the
elastic properties of trabecular bone.
In the same line, we have recently hypothesized that trabecular termini (i.e., free
ended trabeculae [24]) should not be considered for computing fabric tensors since
contribution of termini to the mechanical competence of trabecular bone is rather
limited [17]. Thus, it is worthwhile to assess the power of fabric tensors that disregard
termini for predicting elasticity.
218 R. Moreno et al.
0.03
0
-0.03
-0.06
-0.09
-0.12
-0.15
0.30 0.35 0.40 0.45 0.50
0.73 (E1'+E2'+E3')/3
(E1'+E2'+E3')/
0.68
5 10 15 20 25 30 35 40 45 50
Acknowledgements We thank Andres Laib from SCANCO Medical AG for providing the micro-
CT data of the specimens. The authors declare no conflict of interest.
Anisotropy Estimation of Trabecular Bone in Gray-Scale 219
References
22. Mulder L, van Rietbergen B, Noordhoek NJ, Ito K (2012) Determination of vertebral and
femoral trabecular morphology and stiffness using a flat-panel C-arm-based CT approach.
Bone 50(1):200–208
23. Odgaard A, Kabel J, van Rietbergen B, Dalstra M, Huiskes R (1997) Fabric and elastic principal
directions of cancellous bone are closely related. J Biomech 30(5):487–495
24. Tabor Z (2005) Novel algorithm detecting trabecular termini in μCT and MRI images. Bone
37(3):395–403
25. Tabor Z, Rokita E (2007) Quantifying anisotropy of trabecular bone from gray-level images.
Bone 40(4):966–972
26. Tabor Z, Petryniak R, Latała Z, Konopka T (2013) The potential of multi-slice computed
tomography based quantification of the structural anisotropy of vertebral trabecular bone. Med
Eng Phys 35(1):7–15
27. Zysset PK (2003) A review of morphology-elasticity relationships in human trabecular bone:
theories and experiments. J Biomech 36(10):1469–1485
28. Zysset PK, Goulet RW, Hollister SJ (1998) A global relationship between trabecular bone
morphology and homogenized elastic properties. J Biomech Eng 120(5):640–646
Fractured Bone Identification from CT Images,
Fragment Separation and Fracture Zone
Detection
Abstract The automation of the detection of fractured bone tissue would allow to
save time in medicine. In many cases, specialists need to manually revise 2D and 3D
CT images and detect bone fragments and fracture regions in order to check a frac-
ture. The identification of bone fragments from CT images allows to remove image
noise and undesirable parts and thus improves image visualization. In addition, the
utilization of models reconstructed from CT images of patients allows to customize
the simulation, since the result of the identification can be used to perform a recon-
struction that provides a 3D model of the patient anatomy. The detection of fracture
zones increases the information provided to specialists and enables the simulation of
some medical procedures, such as fracture reduction. In this paper, the main issues to
be considered in order to identify bone tissue and the additional problems that arise if
the bone is fractured are described. The identification of fractured bone includes not
only bone tissue segmentation, but also bone fragments labelling and fracture region
detection. Moreover, some fragments can appear together after the segmentation
process, hence additional processing can be required to separate them. After that,
currently proposed approaches to identify fractured bone are analysed and classified.
The most recently proposed methods to segment healthy bone are also reviewed in
order to justify that the techniques used for this type of bone are not always suitable
for fractured bone. Finally, the aspects to be improved in the described methods are
outlined and future work is identified.
1 Introduction
The automatic identification of bone tissue from computed tomographies (CT im-
ages) is a helpful procedure in medical visualization and simulation. Nowadays, the
specialist has to manually revise 2D and 3D CT images to detect bone fragments
and fracture regions to check a fracture in many cases. The segmentation of bone
fragments removes image noise and undesirable parts and therefore improves im-
age visualization. Advances in the visualization of medical images are rewarding
because they prevent the specialists reviewing 2D and 3D images manually and thus
they enable time saving. In medical simulation, the result of the segmentation can
be used to perform a reconstruction that provides a 3D model of the patient anatomy
which can be utilized to customize the simulation. These generated models are also
useful to provide additional information during the intervention. On the other hand,
the detection of fracture zones increases the information provided to specialists and
enables the simulation of some medical procedures, such as bone fracture reduction.
In the literature, many methods have been proposed to segment healthy bone.
Most of these methods are focused on a specific bone or require previous learning.
These constraints do not allow to apply them to the segmentation of fractured bone,
since the shape of the bone fragments is often unpredictable, especially in fractures
caused by trauma. On the other hand, the identification of fractured bone adds some
additional tasks. Specifically, it requires to label fragments and, in some cases, to
separate wrongly joined fragments. Moreover, some applications also require to
detect bone regions. Thus, specific methods are needed in order to identify fractured
bones from CT images. In addition, each type of fracture has different features, hence
there are necessary different methods in order to identify bone fragments in all type
of fractures. In this paper, the main aspects to be considered to identify healthy and
fractured bone are described. This allows to check what techniques applied in healthy
bone segmentation may or may not be used to identify fractured bone. Moreover,
the identification of fractured bone includes not only bone tissue segmentation, but
also bone fragment labelling and fracture region detection, hence these processes are
also analysed. After the segmentation process, several bone fragments can appear
together as only one. Therefore, some additional processing can be required. Once
all these issues are analysed, currently proposed approaches to segment healthy
bone, identify fractured bone, separate bone fragments and detect fracture zones are
revised and classified. This enables the outline of the aspects to be improved and the
identification of future work.
In the next section, the main issues for both healthy and fractured bone detection
are discussed. This includes the special aspects to be considered in each type of bone
fracture. Then, we describe and classify previous work related to the segmentation
of healthy and fractured bone. In the case of fractured bone, the approaches used to
label fragments, to separate wrongly joined fragments and to detect fracture regions
are also classified. Finally, this review allows to know the strengths and weakness of
each approach and thus the issues that remain unsolved.
Fractured Bone Identification from CT Images, . . . 223
Fractured bone tissue is more difficult to identify because it has some additional
features to be considered. Due to the fact that bone fragments may have arbitrary
shape and can belong to any bone in a nearby area, it is necessary to label all the
fragments during the segmentation process. In some cases, this labelling requires
expert knowledge. In addition, a priori knowledge can not be easily used because
it is uncommon to find two identical fractures and therefore it is difficult to predict
the shape of the bone fragments, specially in comminuted fractures. On the other
hand, bone fragments are not completely surrounded by cortical tissue, since they
have areas on the edges without cortical tissue due to the fracture. Finally, proximity
between fragments and the resolution of the CT image may cause that different
fragments appear together as one in the image. For this reason, smoothing filters
224 F. Paulano et al.
should be used with caution. This type of filters can deform the shape of bone
fragments and fracture zones or even remove small bone fragments. In some cases, it
is necessary to detect the fracture zone of each fragment after its segmentation. The
fracture zone is the area of the bone where the fracture occurs and is composed of
trabecular tissue (Fig. 2). In situations in which bone fragments appear connected,
it is difficult to accurately identify the fractured zone of each fragment. Therefore,
post-processing can be necessary to delimit fracture zones in these situations.
The method applied in fractured bone identification depends on the fracture type.
Based on the fracture line, a fracture can be classified as (Fig. 3): greenstick, trans-
verse, oblique, spiral, avulsed, segmental and comminuted [7]. In a greenstick
Fractured Bone Identification from CT Images, . . . 225
fracture (Fig. 4a) there are no fragments because the bone is not completely bro-
ken. Thus, labelling is not necessary. Since the fracture barely changes the shape of
the bone, segmentation methods that are based on previous knowledge are available.
Nevertheless, the edges of the fracture zone, composed of trabecular tissue, may re-
quire special processing. The detection of the fracture zone is specially complicated
since the bone is not completely broken and trabecular tissue is very heterogeneous.
Therefore, the fracture zone can be fuzzy in the CT image.
Transverse, oblique and spiral fractures (Fig. 4b, c, d, e, and f) can be similarly
treated during the segmentation. Despite of having different fracture lines, these types
of fracture generate two fragments with similar shape. Labelling is necessary, but
expert knowledge is not required. Segmentation methods that can be applied depend
on whether or not there is displacement. If there is no displacement (Fig. 4c, d, e,
and f), they can be processed as a greenstick fracture but considering that there are
two fragments. These two fragments can be completely joined, hence an additional
processing to separate them may be required. In order to detect fracture zones, the
same issues applicable to greenstick fractures should be considered. In the case
226 F. Paulano et al.
Fig. 5 CT images
representing highly
comminuted bone fractures
that there is displacement (Fig. 4b), the probability that both fragments are jointly
segmented decreases and methods based on prior knowledge are almost discarded. In
return, the fracture zone is easier to be identified. Avulsed fractures normally occur
near a join thus the fracture zone is composed almost exclusively by trabecular tissue
and the boundaries of the fragments are weak. This complicates the identification of
the fracture zone because practically the entire fragment is surrounded by trabecular
tissue. Segmental fractures are simple fractures that generate three bone fragments.
Therefore, they can be treated as transverse or oblique fractures but considering
that there are two distinct fracture regions. Comminuted fractures (Fig. 5) add some
additional constraints, hence this is the type of fracture that is more complicated to be
segmented. Comminuted fractures usually generate small fragments and bone may
be deformed due to the fracture. This is because comminuted fractures are usually
associated with crush injuries. In most cases, some fragments overlap in the CT image
and require additional processing to be separated. Labelling is necessary and expert
knowledge is strongly required to identify fragments. The detection of fracture zones
is complicated in this case. Due to the complexity of the fracture, several fracture
zones are generated. Since the relationship between fragments in this type of fractures
is many-to-many, it can be necessary not only to identify fracture zones, but also to
Fractured Bone Identification from CT Images, . . . 227
delimit which part of the fracture zone corresponds to each fragment. As mentioned
before, some fragments can overlap due to the fracture and therefore post-processing
and expert knowledge can be required to accurately identify fracture zones.
In recent years, many approaches have been proposed in order to segment bone tissue
from CT images. Most of these methods are focused on the segmentation of a specific
area. In [25] authors combine region growing, active contours and region competition
to segment carpal bones. An expectation maximization algorithm has been utilized to
segment phalanx bones [23]. The method requires a previously generated CT atlas.
In [18], 3D region growing is used to segment the inferior maxillary bone from CT
images. In order to fill holes in the segmented surface, a morphological operation
of closing is used. Then, 3D ray casting is applied to segment the internal region of
the bone by determining which points are inside of the outer shell. The segmented
voxels are classified as cortical or trabecular bone using a fuzzy c-means algorithm.
To improve the result, an adapted median filter allows to remove outliers. A 3D
region growing method has also been used to segment bone tissue in [32]. Both
the seeds and the threshold are calculated automatically. Since they use an unique
threshold, some areas of bone are not segmented and they propose a method to fill
them. This segmentation approach has been tested to segment skull and spine bones.
A novel active contour model is utilized to segment bone tissue in [28]. The statistical
texture method has also been proposed to segment mandible bones from CT images
[19]. In [17] authors use a 3D deformable balloon model to segment the vertebral
bodies semi-automatically. Graph cuts have also been used to segment vertebrae [2].
Previously, seeds are automatically placed using the matched filter and vertebrae
are identified with a statistical method based on an adaptive threshold. Cortical and
trabecular bone are then separated by using a local adaptive region growing method.
In [15], Willmore flow is integrated into the level set method to segment the spinal
vertebrae. Graph cuts have also been employed to segment the hip bone [16]. Most
of these approaches can not be applied to the segmentation of fractured bone tissue
because they take advantage of the prior knowledge of the shape of the bones.
Statistical methods are frequently used to segment bone tissue [3]. In this case,
they use a generative model to classify pixels into cortical bone or another tissue.
A learned model is constructed by modeling probability functions using Gaussian
mixture models. Then, the learned model allows to assign a probability to each pixel
and a maximum a-posteriori probability rule enables a crisp classification. In [12],
a genetic algorithm is used to search the better procedure to segment bone tissue
and to separate cortical and trabecular tissue. For that, the genetic algorithm requires
previous expert information. Despite the results obtained, learning based methods
228 F. Paulano et al.
can not be easily used to segment fractured bones because previous learning is not
available in most cases.
Several methods are based on the fact that the shape and the anatomy of the bone
are known [31]. In this work, an adaptive threshold method is utilized to segment bone
tissue. However, the method can not be applied to segment bone fractures because
it is based on the supposition that bone fragments are completely surrounded by
cortical tissue, and this is not always true in the case of a fracture. All the revised
works for segmenting healthy bone from CT images are summarized in Table 1.
The methods applied to the segmentation of healthy bone could not be suitable for
segmenting fractured bone. This is because, as seen in the previous section, fractured
bone has different features. Moreover, the identification of fracture bone requires to
carry out additional steps, such as labelling the fragments or splitting wrongly joined
fragments. Currently proposed methods to perform these steps are described below.
There are several papers that are focused on the identification of fractured bone. With
this aim, threshold-based methods are used in most cases. The most basic threshold-
based method consist in defining an intensity interval that corresponds to bone tissue
and calculating the pixels in the image that belong to this interval [24]. The intensity
interval can be defined manually or can be calculated from the information provided
by the image. On the other hand, the interval can be used in the hole stack or can
be defined for each slice. The second option is usually the most successful because,
as seen in Sect. 2, intensity values differ between slices. Several works propose to
use thresholding to segment fractured bone. In [20], ulna, radius and carpus are
segmented to simulate a virtual corrective osteotomy. Therefore, the segmentation is
performed on non-fractured bones and then the segmented bones are virtually cut. In
order to separate bone from other tissues, an user-defined threshold is used. In [27],
the area where the bones are located is detected using a threshold-based method.
Then, they present manual and semi-automatic tools for interactively segmenting
bone fragments. This toolkit includes separation, merge and hole filling tools to
generate individually segmented fragments from the result of the threshold-based
segmentation. Thus, the method achieves accuracy at the expense of requiring a lot
of user intervention. A global fixed threshold method has been utilized in [26] to
detect the trabecular bone fracture zone. Due to the difference of intensity values
between slices, it is difficult to set a threshold that fits all the slices.
Region growing is a threshold-based method that allows to limit the segmentation
to a specific area [8]. To that end, the algorithm requires to place seeds before starting
Fractured Bone Identification from CT Images, . . . 229
Table 1 Summary of the works for identifying healthy bone which are described in this paper
Authors Requirements Interaction Methods Evaluation Achievements
set
Sebastian et al. – Specify Region Carpus Combine the
(2003) parameters growing, advantages of all
active the methods used
contours and
region
competition
Mastmeyer et al. – Set seeds 3D Vertebrae Vertebra
(2006) and deformable separation
markers balloon
model
Battiato et A learned Set the Gaussian Knee Cortical tissue
al. (2007) model threshold mixture pixels
models classification
Ramme et CT atlas Place Expectation Phalanxes Semi-automatic
al. (2009) landmarks maximization segmentation
Moreno et al. – Set the seed 3D region Inferior Bone tissue
(2010) point growing maxilar classification
Zhao et al. (2010) – – 3D region Skull Threshold and
growing seeds
automatically
selected
Aslan et al. (2010) – – Graph cuts Vertebrae Automatic cortical
and region and trabecular
growing tissue
classification
Zhang et al. (2010) – – Adaptive Calcaneus Automatic
thresholding and segmentation
vertebra
Truc et al. (2011) – – Active Knee and Bone contours
contours heart extraction from
CT and MRI
images
Nassef et al. (2011) – – Statistical Mandible Identification of
texture different bone
tissues
Janc et al. (2011) Expert bone – Genetic Mandible, Cortical and
identifica- algorithm skull and trabecular tissues
tion knee separation
Lim et al. (2013) – Set initial Level set Vertebrae Deal with missing
contours information
Malan et al. (2013) Previous – Graph cuts Hip Detailed tissue
manual seg- classification
mentation
All the works require CT images as input
230 F. Paulano et al.
the segmentation. The selection of the seed points can be performed manually or au-
tomatically. The manual placement of the seeds enables the labelling of the different
bone fragments. Moreover, the algorithm also needs to define an intensity interval.
As in the previous case, the interval can be defined globally or for each slice. Once
the seeds have been placed and the interval has been defined, the algorithm check all
their neighbouring pixels. If the intensity of a neighbouring pixel is outside of the
defined interval, it is discarded. Otherwise, the pixel is included in the segmented
area and its adjacent pixels are studied. The algorithm stops when there are no pixels
to study. The result of the algorithm can differ depending on the criteria used to ac-
cept or discard pixels. The basic algorithm accepts a pixel if its intensity is inside the
interval. This approach allows to detect small bone features but image noise can also
be segmented. However, noise can be mostly reduced using smoothing filters. There-
fore, this approach can be suitable for segmenting fractured bone. Other approaches
decide to accept or discard a pixel based on the intensity value of its neighbours. The
simplest option is to accept a pixel if all its neighbours have intensity values inside
the interval. Another option is to use a criteria based on statistical values calculated
from the neighbouring pixels. In this case, small features could be discarded. Thus,
this variation could not be suitable for segmenting fractured bone.
Region growing based methods are the best used for segmenting fractured bone.
A semi-automatic threshold-based method and region growing have been utilized to
extract bone contours from CT scans in [10]. Before that, thresholding is applied to
obtain the area where bone tissue is located. Then, redundant contours are removed
using an absolute and a relative spatial criterion. To improve the result, smooth-
ing algorithms are applied and close contours are joined. In [11], authors use an
interactive method to segment complex humeral bone fractures. In a first step, the
method calculates a sheetness measure in order to extract the cortical layer of the
fragments. Then, a semi-automatic region growing is performed on the obtained 3D
sheetness data. Voxels with a sheetness measure less than a threshold are labeled as
belonging to cortical bone fragments. Region growing is performed using a wave
propagation strategy in order to reduce memory consumption and increase compu-
tation speed. Seed points and the sheetness threshold are interactively selected by
the user. The placement of the seed is used to label the bone fragments, hence this
process is repeated until all the fragments have been labelled. In [9], authors also
use a sheetness-based method to segment fractured pelvic bones. In order to identify
cortical tissue, a local adaptive thresholding method, based on the sheetness measure
and a weight factor, is utilized. In order to segment trabecular tissue, a region grow-
ing method, based on the previous cortical bone segmentation, is applied using an
adaptive threshold. In [14], authors present a multi-region segmentation approach to
identify pelvic fractures. The seed points are automatically established by searching
in the image pixels that have an intensity value higher than a threshold. Once a seed
is found, its region is propagated to avoid finding another seed inside it. After that,
a region growing algorithm propagates all regions in turns. In each cycle of propa-
gation, the gray values of the fronts are set to be equal and reduced by the threshold
iteratively. To that end, the threshold value is determined in an iterative process.
Fractured Bone Identification from CT Images, . . . 231
Table 2 Summary of the works to identify fractured bone which are described in this paper. The
bone fragments are labelled in all cases
Authors Requirements Interaction Methods Evaluation set Achievements
Neubauer et al. – Define the Thresholding Ulna, radius Semi-automatic
(2005) threshold and carpus bone fragments
separation
Pettersson et al. Prototypes Generate the Morphon Hip Automatic
(2006) prototype non-rigid segmentation
registration
Gelaude et al. – Customization Thresholding Pelvis and Contours
(2006) and region humerus adaptation
growing
Harders et al. – Set seed Region Humerus Labelling is
(2007) points growing performed
during
segmentation
Fornaro et al. – Set seed Adaptive Acetabulum Automatic
(2010) points thresholding detection of
and region incorrect bone
growing fragment
separation
Tomazevic et – Interactive Thresholding Articulations Accurate
al. (2010) tools segmentation
Tassani et al. Prototypes Prototype Global Femur and Fracture zone
(2012) generation thresholding tibia detection
Lee et al. – Region Region Pelvis Automatic
(2012) combination growing definition of
and thresholds and
separation seeds
All the works also require CT scans as input.
The proximity between fragments and the resolution of the medical images can
cause that several bone fragments appear together after the segmentation procedure.
232 F. Paulano et al.
In that case, these bone fragments must be separated. Current works usually propose
methods not only to identify bone fragments, but also to separate wrongly joined
fragments.
Some proposed methods allow to separate bone fragments manually. These meth-
ods achieve accuracy at the expense of requiring a lot of user intervention. In [11],
authors use a manual procedure to separate erroneously connected fragments. To that
end, the user can draw a cut line onto the surface of the bone fragments to define a
set of separation voxels. Then, these set is grown parallel to the screen and extruded
along the viewing vector. After that, the segmentation process is repeated to deter-
mine if the connection still exists. This manual procedure takes about five minutes.
In [27], authors present a tool to separate bone fragments in a 3D model. For this
purpose, the user must position seed points on different fracture locations and the
tool calculates the fracture line in between. If there is no fragment line visible, a cut
tool can be used.
Manual tasks take a long time, hence other methods try to split bone fragments
as automatically as possible. A semi-automatic watershed-based method has been
used to separate erroneously joined bone fragments resulted from a threshold-based
segmentation [20]. The proposed method needs that the user selects a voxel located
on the boundary between the two fragments. Then, a watershed based segmentation
algorithm performs the separation. This method achieves good results, but manual
corrections need to be performed in case of inaccuracies. In [9], authors propose to
apply a 3D connected component algorithm to separate bone fragments in simple
cases. Moreover, the algorithm also allows to reject small fragments and remove false
positive labelled structures. In order to deal with fractures in which the boundary of
the bone is weak, they propose to use graph cuts. For that, seeds have to be added by
the user to each bone fragment. They also introduce an optimized Ransac algorithm
to detect fracture gap planes and thus to identify incorrect bone fragment separation.
With the aim of refining the segmentation in zones with low bone density, they use
another graph cut based approach. Another proposed solution consists in performing
a re-segmentation [14]. If the proposed multi-region segmentation fails, authors
provide a manual region combination algorithm that allows to blend the wrongly-
segmented regions, and a region re-segmentation that enables the separation of the
incompletely-segmented objects. Region combination allows to combine several
fragments into one interactively. The user needs to select the fragments one by one
and the algorithm combines them into one. The region re-segmentation consists in
applying the multi-region segmentation algorithm to a specific region defined by
the user. The initial threshold is set higher than usual in order to ensure that the
two regions are detected. The target threshold does not change during the growing
process. These two algorithms, region combination and region re-segmentation, can
be executed repeatedly until all the bone fragments are accurately separated. All the
revised works to separate wrongly joined bone fragments are summarized in Table 3.
Fractured Bone Identification from CT Images, . . . 233
Table 3 Summary of the works to separate erroneously joined fragments which are described in
this paper
Authors Requirements Interaction Methods Evaluation set Achievements
Neubauer et al. – Select a voxel Watershed Ulna, radius Some cases are
(2005) located and carpus resolved by
between the selecting a
fragments voxel on the
border
Harders et al. – Draw a cut line Interactive Humerus All cases are
(2007) method separated
drawing a line
Fornaro et al. – Set seeds 3D Acetabulum Detect
(2010) connected incorrect bone
components fragments
labelling and separation
graph cuts automatically
Tomazevic et – Set seed points Interactive Articulations Accurate
al. (2010) and a cut tool method separation of
bone fragments
Lee et al. – Interactive Region re- Pelvis User only has
(2012) region segmentation to specify the
combination region of
and separation interest
Sometimes, it is useful to perform the identification of the fractured area. For instance,
the simulation of a fracture reduction and the virtual analysis of the fracture can
require to previously calculate this area. Therefore, some approaches have been
proposed to calculate the fractured area after the segmentation of bone fragments.
Statistical based approaches have been proposed to identify fractured zones [29].
In this work, authors semi-automatically reconstruct highly fragmented bone frac-
tures. Before performing the fracture reduction, they need to separate intact and
fractured zones of each bone fragment. For that purpose, they propose to use a mix-
ture model consisting of two Gaussian probability distributions to perform a binary
classification. They choose a threshold that enables the classification of intact-surface
intensities and minimizes the type I classification errors. Thus, this threshold allows
to separate fractured and intact surfaces. After classifying all points, the fractured
surface is the largest continuous region of fractured surface points. In [33], an exten-
sion of the previous method that improves fragment alignment in highly fragmented
bone fractures has been presented. In order to separate fractured and intact surfaces,
they use a two-class Bayesian classifier based on the intensity values previously
mapped on the surface vertices.
Other proposals take advantage of the specific shape of a particular type of bone.
In [30], authors present an approach to semi-automatically perform the reduction of
234 F. Paulano et al.
cylindrical bones. In order to identify vertices of the fractured area, they check the
normal orientation of each vertex and compare it with the bone axis. This method
does not work when fracture lines are almost parallel to the bone axis.
Curvature analysis has also been used to identify fractured surfaces [21]. In this
work, authors present a procedure to virtually reduce proximal femoral fractures.
In order to obtain fracture lines in each slice, they use curvature analysis. For that
purpose, a 3D curvature image is generated. To begin with, 0 or 1 values are assigned
to each voxel depending on the voxel position: 1 is assigned if the voxel is inside
the fragment region and 0 is assigned if it is outside. After that, the surface voxels
are defined as 1-value voxels adjacent to 0-value voxels. The 3D curvature image is
generated by setting Kabs to each voxel belonging to the fracture surface and 0 to
the rest of voxels, where Kabs = |k1 | + |k2 |. k1 and k2 are the maximum and the
minimum curvature respectively, and are obtained from K and H
where h(x, y) is a quadratic function fitted to 3D points generated from the sur-
face voxels. Once the 3D curvature image is generated, an interactive line-tracking
software allows to extract the fracture zone from the generated 3D curvature image.
In [26], authors perform a comparison with healthy models in order to identify
trabecular tissue in fractured zones. To that end, authors compare the fractured region
of interest in both pre-failure and post-failure slices. These regions are identified as
disconnected trabecular tissue in the slice. If the regions of interest of both slices over-
lap less that a predefined threshold, the region is classified as broken. The threshold
is determined by minimizing the root mean square error (RMSE) between resulted
values and values manually calculated
*
i (ai(x) − vi )
2
RMSE = (3)
n
where ai(x) and vi are the calculated and the visually obtained values respectively
and n is the number of analysed cases. Finally, they apply a median filter to remove
the generated noise.
Interactive methods have also been proposed to identify fracture surfaces in order
to be used in virtual craniofacial reconstruction [4, 6, 5]. In these works, fracture
contours are extracted interactively from segmented bone fragments. With that aim,
user has to select points belonging to the fractured area and then a contour tracing
algorithm generates the rest of the points. Once the fracture contours are calculated,
the 3D surface is generated by collating the contours extracted from each slice.
Table 4 summarizes all the analysed works to detect fracture zones.
Fractured Bone Identification from CT Images, . . . 235
Table 4 Summary of the works to identify fracture zones which are described in this paper
Authors Requirements Interaction Methods Evaluation Achiev ements
set
Winkelbach Cylindrical – Comparison Femur Automatic
et al. (2003) bones of normal identification in
vectors cylindrical bones
Willis et al. – Set threshold Gaussian Tibia Identification of
(2007), and subdivide mixture fracture zones in
Zhou et al. fractured models and comminuted
(2009) zones Bayesian fractures
classifiers
Bhandarkar – Select points Contour Mandible User only has to
et al. (2007) belonging to tracing select the end
Chowdhury the fractured algorithms points of the
et al. (2009) zone fracture contour
in each slice
Okada et al. – Extract Curvature Femur The 3D curvature
(2009) fracture lines analysis image eases the
interaction
Tassani et A healthy Visually Comparison Femur Interaction is only
al. (2012) model check values with healthy and tibia required to define
to set the models the threshold
threshold
4 Discussion
The previous revision allows us to made a classification of the methods used to iden-
tify both healthy and fractured bone (Fig. 6). In order to identify fractured bones,
it is necessary not only to segment, but also to label the bone fragments. Consider-
ing the previous revision, threshold-based methods have been used in most cases.
Currently proposed threshold-based methods obtain good results, but they can be
improved in some aspects. The selection of threshold intensity values is one of the
most challenging procedures. Threshold values are difficult to be determined even
manually and each slice may require a different threshold value. In addition, it is
particularly difficult to set the threshold to segment bone tissue near the joints. The
ideal would be that the threshold values were selected automatically from the infor-
mation available in the set of slices in all cases. Because of the complexity of the
fractures, it is difficult to label bone fragments automatically. This procedure may
require expert knowledge, but it must be reduced as possible. Thresholding-based
approaches do not label bone fragments, hence fragments have to be labelled after
the segmentation process. Other approaches try to solve it by using seeded-based
methods. By the time they place the seeds, they identify the bone fragments. Thus,
seeds should be placed by an expert in some cases. Ideally, all the bone fragments
should be segmented automatically and simple bone fragments should be identified
236 F. Paulano et al.
Fig. 6 Schema representing the different approaches currently proposed to identify both healthy
and fractured bone
without user intervention. Then, the expert could decide the bone to which each
fragment belongs in the most complex cases.
Due to the fracture, two different fragments can be completely joined. This is spe-
cially common in fractures caused by crashes. In addition, the image resolution can
cause that very close fragments appear joined. These joined fragments are difficult
to be separated during the segmentation process, hence current fractured bone iden-
tification approaches propose to separate them after the segmentation. New methods
that solve this problem in a more automatic way are required. One solution would be
to improve the segmentation method, hence no joined fragments are generated. This
would be the faster solution, because no additional methods are required. However,
the usual resolution of the CT scans makes it very difficult. The alternative is to
implement a method that automatically separates wrongly joined fragments resulted
from the segmentation. Manual and semi-automatic fragment separation takes a lot
time, hence these new methods would be important to enable time saving. On the
other hand, the use of higher resolution images, such as μCT, could avoid that frag-
ments appear together in most cases. Nevertheless, this type of images is not always
available.
Once all the bone fragments have been identified, some applications, such as
fracture reduction or fracture analysis, require to detect fracture zones. Different
interactive methods have been proposed to delimit the fracture area. Some of these
methods propose to calculate fracture lines in each slice and then join them to generate
the fracture area. Following this approach, it is easier to detect and fix anomalies
in each slice. In contrast, this type of methods usually requires more time since
Fractured Bone Identification from CT Images, . . . 237
fracture line detection is performed in each slice and user interaction is needed. Other
methods use 3D interactive techniques to identify the fracture zone. These methods
are usually faster but the interaction is usually much more complex. Methods based
on prior knowledge have also been proposed to identify the fracture zone. These
methods are usually faster but are restricted to specific bones and fracture types. In
summary, currently proposed methods to detect fracture zones are based on previous
knowledge or need user interaction (Fig. 6). Therefore, new methods that calculate
fracture zones using the information available in the slice would be useful. In addition,
these new methods should be as automatic as possible.
All these shortcomings are summarized in the following points:
• Separate wrongly joined bone fragments after or during the segmentation process
without user intervention.
• Select the threshold for each slice automatically from the information available
in the CT stack.
• Label the bone fragments with minimal user interaction.
• Detect fracture zones using information from the CT stack as automatically as
possible.
5 Conclusion
In this paper, the main issues to be considered when identifying both healthy and
fractured bone tissues have been described. Moreover, currently proposed methods
for healthy and fractured bone identification have been discussed and classified. This
revision has shown that most of the methods applied to the segmentation of healthy
bone can not be utilized to identify fractured bone. Moreover, it has allowed to know
which algorithms have been applied in order to identify each type of bone and fracture
as well as the results obtained. In the case of the identification of fractured bones,
emphasis has also been placed in the proposed methods to label bone fragments,
separate fragments that have been segmented together incorrectly and detect fracture
zones. Finally, the shortcomings of the currently available methods have been revised
and identified.
Acknowledgements This work has been partially supported by the Ministerio de Economía y
Competitividad and the European Union (via ERDF funds) through the research project TIN2011-
25259.
References
1. Allili MS, Ziou D (2007) Automatic colour–texture image segmentation using active contours.
Int J Comput Math 84(9):1325–1338
2. Aslan MS, Ali A, Rara H, Farag AA (2010) An automated vertebra identification and seg-
mentation in CT images. In: 2010 IEEE International conference on image processing, IEEE,
233–236
238 F. Paulano et al.
3. Battiato S, Farinella GM, Impoco G, Garretto O, Privitera C (2007) Cortical bone classifica-
tion by local context analysis. In: Gagalowicz A, Philips W (eds) Computer vision/Computer
graphics collaboration techniques, vol. 4418. Springer, Berlin pp 567–578
4. Bhandarkar SM, Chowdhury AS, Tang Y, Yu JC, Tollner EW (2007) Computer vision guided
virtual craniofacial reconstruction. Comput Med Imaging Graph J Comput Med Imaging Soc
31(6):418–427
5. Chowdhury AS, Bhandarkar SM, Robinson RW, Yu JC (2009) Virtual craniofacial reconstruc-
tion using computer vision, graph theory and geometric constraints. Pattern Recognit Lett
30(10):931–938
6. Chowdhury AS, Bhandarkar SM, Robinson RW, Yu JC (2009) Virtual multi-fracture craniofa-
cial reconstruction using computer vision and graph matching. Comput Med Imaging Graph J
Comput Med Imaging Soc 33(5):333–342
7. Egol K, Koval KJ, Zuckerman JD (2010) Handbook of fractures. Lippincott Williams & Wilkins
(LWW), Philadelphia
8. Fan J, Zeng G, Body M, Hacid MS (2005) Seeded region growing: an extensive and comparative
study. Pattern Recognit Lett 26(8):1139–1156
9. Fornaro J, Székely G, Harders M (2010) Semi-automatic segmentation of fractured pelvic bones
for surgical planning. In: Bello F, Cotin S (eds) Biomedical simulation, vol. 5958. Springer,
Berlin pp 82–89
10. Gelaude F, Vander Sloten J, Lauwers B (2006) Semi-automated segmentation and visualisation
of outer bone cortex from medical images. Comput Meth Biomech Biomed Eng 9(1):65–77
11. Harders M, Barlit A, Gerber C, Hodler J, Székely G (2007) An optimized surgical planning
environment for complex proximal humerus fractures. In: MICCAI Workshop on interaction
in medical image analysis and visualization
12. Janc K, Tarasiuk J, Bonnet AS, Lipinski P (2011) Semi-automated algorithm for cortical and
trabecular bone separation from CT scans. Comput Meth Biomech Biomed Eng 14(1):217–218
13. Knutsson H., Andersson M. (2005) Morphons: segmentation using elastic canvas and paint on
priors. In: IEEE International conference on image processing 2005, IEEE, II–1226
14. Lee PY, Lai JY, Hu YS, Huang CY, Tsai YC, Ueng WD (2012) Virtual 3D planning of pelvic
fracture reduction and implant placement. Biomed Eng Appl Basis Commun 24(3):245–262
15. Lim PH, Bagci U, Bai L (2013) Introducing Willmore flow into level set segmentation of spinal
vertebrae. IEEE Transac Biomed Eng 60(1):115–122
16. Malan DF, Botha CP, Valstar ER (2013) Voxel classification and graph cuts for automated
segmentation of pathological periprosthetic hip anatomy. Int J Comput Assist Radiol Surg
8(1):63–74
17. Mastmeyer A, Engelke K, Fuchs C, Kalender WA (2006) A hierarchical 3D segmentation
method and the definition of vertebral body coordinate systems for QCT of the lumbar spine.
Med Image Anal 10(4):560–577
18. Moreno S, Caicedo SL, Strulovic T, Briceño JC, Briceño F, Gómez S, Hernández M (2010)
Inferior maxillary bone tissue classification in 3D CT images. In: Bolc L, Tadeusiewicz R,
Chmielewski LJ, Wojciechowski K (eds) Computer vision and graphics, vol. 6375. Springer,
Berlin, pp 142–149
19. Nassef TM, Solouma NH, Alkhodary M, Marei MK, Kadah YM (2011) Extraction of hu-
man mandible bones from multi-slice computed tomographic data. In: 2011 1st Middle East
conference on biomedical engineering, IEEE, 260–263
20. Neubauer A, Bühler K, Wegenkittl R, Rauchberger A, Rieger M (2005) Advanced virtual
corrective osteotomy. Int Congr Ser 1281:684–689
21. Okada T, Iwasaki Y, Koyama T, Sugano N, Chen Y, Yonenobu K, Sato Y (2009) Computer-
assisted preoperative planning for reduction of proximal femoral fracture using 3-D-CT data.
IEEE Transac Biomed Eng 56(3):749–759
22. Pettersson J, Knutsson H, Borga M (2006) Non-rigid registration for automatic fracture
segmentation. In: IEEE International conference on image processing, 1185–1188
23. Ramme AJ, DeVries N, Kallemyn NA, Magnotta VA, Grosland NM (2009) Semi-automated
phalanx bone segmentation using the expectation maximization algorithm. J Digital Imaging
22(5):483–491
Fractured Bone Identification from CT Images, . . . 239
24. Sahoo P, Soltani S, Wong A (1988) A survey of thresholding techniques. Comput Vision Graph
Image Process 41(2):233–260
25. Sebastian TB, Tek H, Crisco JJ, Kimia BB (2003) Segmentation of carpal bones from CT
images using skeletally coupled deformable models. Med Image Anal 7(1):21–45
26. Tassani S, Matsopoulos GK, Baruffaldi F (2012) 3D identification of trabecular bone frac-
ture zone using an automatic image registration scheme: A validation study. J Biomech
45(11):2035–2040
27. Tomazevic M, Kreuh D, Kristan A, Puketa V, Cimerman M (2010) Preoperative planning
program tool in treatment of articular fractures: process of segmentation procedure. In: XII
Mediterranean conference on medical and biological engineering and computing 2010, 29,
430–433
28. Truc PTH, Kim TS, Lee S, Lee YK (2011) Homogeneity and density distance-driven
activecontours for medical image segmentation. Comput Biol Med 41(5):292–301
29. Willis A, Anderson D, Thomas T, Brown T, Marsh JL (2007) 3D reconstruction of highly
fragmented bone fractures. Medical Imaging 2007: Image processing. Proceedings of the
SPIE, 6512
30. Winkelbach S, Westphal R, Goesling T (2003) Pose estimation of cylindrical fragments for
semi-automatic bone fracture reduction. In: Pattern recognition, Springer, Berlin, pp 566–573
31. Zhang J, Yan CH, Chui CK, Ong SH (2010) Fast segmentation of bone in CT images using 3D
adaptive thresholding. Comput Biol Med 40(2):231–236
32. Zhao K, Kang B, Kang Y, Zhao H (2010) Auto-threshold bone segmentation based on CT
image and its application on CTA bone-subtraction. In: 2010 Symposium on photonics and
optoelectronics, 1–5
33. Zhou B, Willis A, SuiY, Anderson D, Thomas T, Brown, T (2009) Improving inter-fragmentary
alignment for virtual 3D reconstruction of highly fragmented bone fractures. SPIE Medical
Imaging, 7259
On Evolutionary Integral Models for Image
Restoration
Abstract This paper analyzes evolutionary integral based methods for image restora-
tion. They are multiscale linear models where the restored image evolves according
to a Volterra equation, and the diffusion is handled by a convolution kernel. Well-
posedness, scale-space properties, and long term behaviour are investigated for the
continuous and semi-discrete models. Some numerical experiments are included.
They provide different rules to select the kernel, and illustrate the performance of
the evolutionary integral model in image denoising and contour detection.
1 Introduction
where ∇u, and ∇ 2 u are, respectively, the gradient and Hessian matrices of u with
respect to the space variable x = (x, y); Ω ⊂ R2 is a bounded domain (typically a
square) with boundary ∂Ω; (2) is an homogeneous Neumann boundary condition;
u0 stands for the image to be restored; and F is a second-order differential operator,
[1, 2, 27]. The possibilities for F should satisfy basic properties concerning, at
least, three aspects of the model: the well-posedness of the continuous problem and
discretizations (a way of controlling the stability of the process); the satisfaction of
scale-space properties (as a way to have architectural properties of the multiscale
analysis, to ensure that the evolved image is a regularized version of the original one
or the preservation of important features of the image); finally, the control of this
smoothing and also the edge-enhancing in the multiscale process.
In this sense, although the classical Gaussian filtering, with F (x, u, ∇u, ∇ 2 u) =
div (∇u) in (1), is well-posed, provides stable discretizations and satisfies several
scale-space properties, sometimes it is not efficient in the control of the diffusion,
mainly because of the oversmoothing effect. In order to overcome this and other
drawbacks, the literature stresses two main ideas: a nonlinear control of the diffusion,
and the inclusion of anisotropy to make this control local and capable to discriminate
discontinuities and edges. Several proposals in this sense can be seen in, e.g. [1–3,
14, 22, 32, 33], and references therein.
More recent is the use of evolutionary integral equations of the form, [25]
t
u(t, x) = u0 (x) + k(t − s)Lu(s, x) ds, (t, x) ∈ [0, T ] × Ω, (3)
0
∂u
(t, x) = 0, (t, x) ∈ ∂Ω × [0, T ],
∂n
as models for the multiscale analysis. In (3), L = Δ stands for the Laplace operator,
and k(t) is a convolution kernel. The case k(t) = 1 leads to the heat equation, and
k(t) = t to the wave equation with zero initial velocity. (A more general context
can be seen e.g. in [9, 18, 23]). If k(t) is differentiable, and k(0) = 0, then (3) is
equivalent to the integro-differential problem
t
ut (t, x) = k (t − s)Lu(s, x) ds, (t, x) ∈ [0, T ] × Ω, (4)
0
u(0, x) = u0 (x), x ∈ Ω,
∂u
(t, x) = 0, (t, x) ∈ [0, T ] × ∂Ω.
∂n
In [8, 11] a control of the diffusion based on (4) with
has been proposed, where α ∈ (1, 2), and Γ is the Gamma function. Model (4), (5)
interpolates the linear heat equation (α = 1), and the linear wave equation (α = 2),
leading α to take a role of viscosity parameter, a term to control the diffusion of the
On Evolutionary Integral Models for Image Restoration 243
image through the scales [23]. It is also natural to try to handle the diffusion through
a selection of α as function of the image at the scale. In [11] a numerical technique,
consisting of discretizing (4), (5) with a possibly different value of α for each pixel
of the image is introduced. This procedure is modified in [10] to allow to consider
a nonconstant viscosity parameter. This approach forms part of the growing interest
in the use of fractional calculus for signal processing problems, see [21] for a review
of fractional linear systems and also [12, 31], along with references therein.
The purpose of this paper is going more deeply into the evolutionary integral
modelling for image restoration, generalizing [8, 11] in several ways, representing
the following novelties:
• Under several non-restrictive hypotheses on the kernel k, the continuous model
(3) is proved to satisfy scale-space properties (grey-level shift invariance, reverse-
contrast invariance, translational invariance, and conservation of average value).
Furthermore, the solution is shown to behave as the constant average value for
long times. (Although the application of the evolutionary model (1) to image
restoration does not usually require long times of computation, a good behaviour
in this sense should always be taken into account).
• The semi-discrete (in space) version of (3) is also studied. Under some hy-
potheses on the discrete spatial operator, it is proved that the corresponding
semi-discrete model also possesses several scale-space properties (grey-level shift
invariance, reverse-contrast invariance, and conservation of a semi-discrete av-
erage value) as well as the constant behaviour as limit for long times. When the
semi-discrete model is considered as an approximation to the continuous one (3),
these properties enforce the relation between them.
• From the computational point of view, the freedom to choose the kernel k is
strongly emphasized, since it can be used to control several features of the image:
restoration, noise removal, or edge detection. Such properties are illustrated here
by means of some examples with medical images.
According to these new results, the structure of the paper is as follows. Section 2 is
devoted to the analysis of the above mentioned properties of the continuous model
(3). These properties are proved for the Laplace operator, although the way how to
extend them to more general spatial operators, [13], is described. The study of the
semi-discrete (in space) version is carried out in Sect. 3. Finally, Sect. 4 illustrates
the performance of the model with numerical examples. Some details about the
implementation are explained and the corresponding codes are applied to several
images by using different choices of the kernel. Sect. 5 contains some conclusions
and future lines of research.
With the purpose of investigating the degree of adaptation of the evolutionary in-
tegral approach to the image processing rules, derived here are some properties of
the continuous model (3). The following hypotheses on the kernel function k are
assumed:
244 E. Cuesta et al.
2.1 Well-Posedness
A first point of analysis concerns the well-posedness of the problem, which is usually
a nontrivial question for some nonlinear models in image processing, [1, 22, 33].
In this case, under the hypotheses (H1)–(H4), results about existence, uniqueness,
and regularity of solutions are obtained directly from the general theory of Volterra
equations. Let S(t) be the resolvent of (3), that is, the transitional operator such that
is the solution of (3) at x and time t, with initial condition u0 . It can be proved, see
e.g. [25], Theorem 3.1, that u in (6) is C 1, and there is M ≥ 1 such that
Proof Hereafter, for the sake of the simplicity of the notation, u0 (x) will be denoted
by u0 . Properties (P1)–(P3) are consequence of the uniqueness of solution. It is clear
that S(t)u0 = 0 if u0 = 0. On the other hand, if C is a constant, the functions
u1 (t) = S(t)(u0 + C) and u1 (t) = S(t)(u0 ) + C are solutions of (3) with initial
condition u0 + C; thus, uniqueness proves (P1). The same argument proves (P2). As
far as (P3) is concerned, note that
t
τh u0 (x) + k(t − s)Δτh (S(t)u0 ) ds
0
t
= τh u0 (x) + k(t − s)τh Δ (S(t)u0 ) ds = τh (S(t)u0 ) .
0
Thus, τh (S(t)u0 ) satisfies (3) with initial condition τh u0 , and therefore coincides
with S(t)(τh u0 ). Finally, observe that if
I (t) = u(t, x) dx,
Ω
then the regularity of the solution implies that I (t) is continuous, for t ≥ 0,
differentiable, for t > 0, and
t
d
I (t) = ut (t, x) dx = k (t − s)Δu(s, x) ds dx
dt Ω Ω 0
t !
= k (t − s) Δu(s, x) dx ds.
0 Ω
Now, the divergence theorem, and the boundary conditions imply that d
dt
I (t) =0
and therefore I is constant, for all t ≥ 0. 2
ΔV (x) = λV (x), x ∈ Ω,
∂V
(x) = 0, x ∈ ∂Ω,
∂n
that has the eigenvalues λl,m = −(lπ)2 − (mπ)2 , with a complete, orthogonal system
of eigenfunctions, Vl,m (x) = cos (lπx) cos (mπy), for l, m ∈ N ∪ {0}. In particu-
lar V0,0 (x) = 1, λ0,0 = 0. The expansion of the initial condition is, using the
orthogonality, of the form
u0 (x)Vl,m (x) dx
u0 (x) = γl,m Vl,m (x), γl,m = Ω 2
,
l,m Ω Vl,m (x) dx
where in particular γ0,0 = (1/A(Ω)) Ω u0 (x) dx is the average grey value. The time-
dependent components of the expansion (7) satisfy the integro-differential problems
t
Tl,m (t) = λl,m k (t − s)Tl,m (s) ds, t > 0, l, m ∈ N ∪ {0}, (8)
0
Tl,m (0) = γl,m , Tl,m (0) = 0, l, m ∈ N ∪ {0}
This leads to
γl,m
L Tl,m (z) = .
z 1 − λl,m L (k) (z)
Therefore, for λl,m < 0, and under the hypotheses (H1)–(H4), as t → +∞,
Tl,m (t) → 0 and the solution u behaves like
T0,0 (t)V0,0 (x) = γ0,0 = (1/A(Ω)) u0 (x) dx.
Ω
Remark 1 Extension to more general spatial operators. The previous analysis has
been performed by using the Laplacian as spatial operator. The model (3) can be
generalized by considering an unbounded, closed, densely defined linear operator L
with domain D(L) on some Hilbert space H (Ω) of functions defined on Ω. Some
of the previous properties also hold in this general case. More explicitly, we assume
the following:
(h1) L is negative, in the sense that Lu, u ≤ 0, where ·, · stands for the inner
product in H (Ω), and u ∈ D(L).
(h2) L is self-adjoint under Neumann boundary conditions.
On Evolutionary Integral Models for Image Restoration 247
(Pλ u)(0) = Pλ u0 .
As before, the use of the Laplace transform implies that (Pλ u)(t) → 0 as t → +∞
for any λ < 0, and therefore
1
u(t, x) → P0 u0 = u0 (x) dx, t → +∞.
A(Ω) Ω
This generalization can be applied, for example, to operators consisting of some
fractional powers of the Laplacian, [13].
A semi-discrete (in space) version of (3) will be now introduced and analyzed. Con-
sider a uniform M × M pixel mesh of Ω, with length h > 0, let Lh : RN → RN be
a discrete operator on RN , N = M 2 , satisfying the following requirements:
(R1) Lh is symmetric and negative semi-definite.
(R2) λ = 0 is a simple eigenvalue with the corresponding eigenspace generated by
e = (1, 1, . . ., 1)T ∈ RN .
Then the semi-discrete evolutionary integral model has the form
t
uh (t) = u0,h + k(t − s)Lh uh (s) ds, 0 ≤ t ≤ T, (9)
0
where k(t) satisfies (H1)–(H4), uh (t) stands for the N × 1 vector-image function at
time t at the pixel mesh, u0,h is the initial data at the grid points. For convenience,
uh will be sometimes considered as a matrix, in such a way that (uh (t))i,j will
stand for the component associated to the pixel at the position (i, j ) of the mesh,
i, j = 1, . . ., M. Neumann boundary conditions are somehow included in Lh , see
Remark 4 below.
248 E. Cuesta et al.
The same theory as in the continuous case guarantees well-posedness of (9), under
(H1)–(H4), [25]. On the other hand, some scale-space properties and the long-term
behaviour are proved in the following result.
Theorem 2 Let Sh (t) be the solution operator (10) associated to (9), where Lh
satisfies (R1), (R2) above. Then the following properties hold:
(Q1) Grey level shift invariance: Sh (t)(0) = 0, Sh (t)(u0 + C) = Sh (t)u0 + C, for
any constant C.
(Q2) Reverse contrast invariance: for t ≥ 0, Sh (t)(− u0 ) = −Sh (t)u0 .
(Q3) Conservation of average value: if t > 0 and (u0,h )i,j is the value of u0,h at the
(i, j )th-pixel,
1 1
M M
(u0,h )i,j = Sh (t)u0,h i,j .
N i,j =1 N i,j =1
1
M
I (t) = uh (t)i,j .
N i,j =1
where ·, · denotes the Euclidean product in RN . Therefore, I (t) is constant. Finally,
for the proof of (Q4), note that, according to (R1) and (R2), Lh is diagonalizable,
and the eigenvalues λ1 , . . ., λN are non-positive, with λ1 = 0 simple. We can also
write
Lh = P Dh P −1 ,
where Dh = diag(λ1 , . . . , λN ) stands for the diagonal matrix whose diagonal entries
are {λ1 , . . . , λN }, and P is orthogonal with the first column given by (1/M)e. (In
the representation of the spectrum, the λj are repeated according to the geometric
multiplicity of the eigenvalues.) By using (H1)–(H4), (9) is equivalent to
t
uh (t) = k (t − s)Lh uh (s) ds, t ∈ [0, T ]
0
where vh = (vh,1 , vh,2 , . . . , vh,N )T . Now as in the proof of Theorem 1 Property (P4),
we have that vh,j (t) → 0 as t → +∞, for j = 2, . . . , N , and
1
M
vh,1 (t) = vh,1 (0) = P T u0,h 1 = (u0,h )i,j .
M i,j =1
1
Therefore, since the first column of P is M
e,
⎛ ⎞
1 M
lim uh (t) = lim P vh (t) = ⎝ (u0,h )i,j ⎠ e.
t→+∞ t→+∞ N i,j =1
2
Remark 2 Since Eq. (9) can be written as
t
uh (t) = u0,h + Lh k(t − s)uh (s) ds, 0 ≤ t ≤ T,
0
4 Numerical Experiments
Before illustrating the behaviour of the model (3) with numerical examples, some
previous comments on the implementation are required. From the semi-discrete
version (13) (where a certain degree of approximation of Lh to L is assumed), the
practical implementation is carried out throughout a suitable time discretization of
(13) and in this sense a large variety of numerical methods is available. (A detailed
analysis is out of the scope of the paper, and we refer the reader to a future work.) In
order to obtain stable numerical approximations, one of the key points is the choice
of a suitable treatment of the convolution integral. The general formulation for the
advance in time will then be of the form
n
un = u0 + Qn−j uj , n ≥ 1, (15)
j =0
In (15), for a given time step τ > 0, un stands for the approximation to the solution
of (13) at time level tn = nτ , n ≥ 0. Thus, for a given initial image u0 , the system of
difference equations is implemented up to the final time T. The weights Qn , n ≥ 1,
depend on the matrix of kernels D(t), and the discrete operator Lh in (13), in a
On Evolutionary Integral Models for Image Restoration 251
way given by the chosen formula to treat the convolution integral. For the numerical
experiments below, the operator (14) is considered, and the time discretization makes
use of convolution quadratures, which are based on the backward Euler method (see
[6, 19, 20] for details).
A second point of relevance for the implementation is the treatment of the operator
Lh D(t) in (13). In this sense, we observe that the time discretization (15) only requires
the values of Lh D(t) at times tn = nτ , n = 0, . . . N. Then, each diagonal component
kn (t) can be computed via the stair function
"kn (t) = kn0 + (knj − knj −1 )U (t − tj ), 1 ≤ n ≤ N , (16)
j ≥1
j
where U : [0, T ] → R is the Heaviside function, and kn := kn (tj ), j ≥ 0. The
introduction of (16) is justified by the requirement, presented in some discretizations
(in particular, the one considered here) of making use of the Laplace transform, L(k̃),
which is not guaranteed for any choice of functions kn . In fact,
k 0 kn − k n
j −1
N j
L k"n (t) (z) = n + e−z tj .
z j =1
z
Considered here are two types of experiments. The first one has an illustrative pur-
pose: different choices of the kernel generate restored images un from simple, small,
and synthetic initial images u0 by using (15), and several features, mainly related
to denoising and edge detection, are observed. The second group of experiments
consider zoomed parts of a human leg and aneurysm TACs (Fig. 1), with the aim
252 E. Cuesta et al.
of showing the previous effects, and calibrating the computational effort on more
sophisticated and of larger size images.
Two kinds of convolution kernels (or, rather, their discrete versions) are used. The
corresponding numerical experiments are explained below.
1. The first choice is an improved version of fractional-type kernels (5), analyzed
and implemented in [8, 11]. Note that the kernel function in (5) can be selected to
control the smoothing effect on the image via the parameter α, which is chosen,
at each pixel, to this task. This idea can now be generalized by considering
more adaptable kernels. Two natural choices can be the following, for each pixel
n = 1, . . . , N , (that is, each diagonal entry of D in (13)):
(a)
t αn (t)−1
kn (t) := , t ≥ 0, 1 ≤ n ≤ N, (17)
Γ (αn (t))
1
Kn (α, z) = , 1 ≤ n ≤ N, (18)
zL(αn )(z) z
where L(αn )(z) stands for the Laplace transform of some αn (t), [10, 26].
Note that these kernels would correspond to fractional-type models, but where the
order may vary with time; in this sense, they can be considered as an extension of
those treated in [8, 11], since in the case of αn (t) = αn is constant, the two kernels
(17) and (18) coincide with (5). In computational terms, (a) is less suitable since
the Laplace transform cannot be in general explicitly computed.
The role of the kernels is illustrated by taking (18) with different choices of the
viscosity parameters αn (t). For instance, assume that the restoration requires some
preservation of edges and vertices and, at the same time, removes isolated spots
(noisy pixels), and let αn(m) be the value of the viscosity parameter at n-th pixel
in the vector representation), and at time step tm . Hence, grosso modo, automatic
selection of the values αn(m) must satisfy the following criteria: pixels where the
gradient is large (vertices, edges, isolated spots, etc) should be associated with
values of the corresponding αn close to two (low diffusion), and pixels with lower
gradients (flat areas) should be associated with values of αn close to one (hight
diffusion). In this context, for example, spots (isolated noisy pixels, i.e. pixels
with very high gradient variation), should be associated with values of αn close
to one, while very flat areas should correspond to values of αn close to two.
On Evolutionary Integral Models for Image Restoration 253
The influence of the parameters αn(m) is shown by the following, simple example.
We try to remove the white spot in Fig. 2, by processing an initial 6 × 6 image
with three kernels of the form (18): the first two are computed by using the
profiles 1 and 2, for the viscosity parameter, given in Fig. 3 . The third method
considers an initial selection which remains constant during the whole evolution.
The evolution of the image with these three kernels is illustrated in Figs. 4, 5, and
6, respectively.
Their comparison suggests that Profile 2 gives better results: the edge has not been
damaged, and the noisy pixel is well removed, that is, spurious perturbations in
the neighboring pixels are not present, contrary to what is happening in the case
of the two other kernels.
The nonlocal method (15) associated to Profile 2 has been applied to zoomed
parts of Fig. 1a, b. The corresponding evolution is illustrated, at different times,
in Figs. 7 and 8, respectively. Observe that the noise (meaningless information
in this case) has been removed and, additionally, the main structures (including
contours and edges) are well preserved. This is useful, for example, if a posterior
contour detection, or a meaningful localization of structures, is required.
The effect of longer time integration on the image is illustrated in Fig. 9, where a
zoomed subimage of Fig. 1b is taken as initial condition for (15) and evolved up
to two different final times T. The results suggest a good behaviour for (always)
moderately long times, since the smoothing effect (which removes the mean-
ingless information) affects the preservation of the edges in a less expected way
(having in mind that the model (3) is linear).
2. A second choice of the kernel makes use of (16), where the advantage given by the
freedom to select the discrete values kj , j ≥ 0, has been taken into account. Thus,
the following strategy has been adopted (this will generalize that considered in the
2
2
1.8
1.8
Values of α
Values of α
1.6 1.6
1.4 1.4
1.2 1.2
1 1
0.8 0.8
−0.2 0 0.2 0.4 0.6 0.8 1 1.2 −0.2 0 0.2 0.4 0.6 0.8 1 1.2
Variation of the gradient (normalized) Variation of the gradient (normalized)
Fig. 4 Evolved spot at times tn = 0, 0.05, 0.5, 1, 1.5, 2, with the profile of Fig. 3a
previous item): For those pixels that will remain unchanged, the corresponding
kernel in the matrix D(t) will approach zero from some time level value tj (making
the evolution almost stationary); for those pixels to be removed, the corresponding
kernel will approach one from some time (in this case, the evolution behaves like
that of the heat equation).
This strategy has been first applied to two synthetic, simple images, with inner
structure. The evolution of the first one is represented in Fig. 10. In this case, no
isolated pixels are present and the choice of the discrete values of the kernel helps
to preserve the structure (edges, corners and the inner square). In a second image
(Fig. 11), the inner square is replaced by an inner spot (representing an isolated
pixel) and the same strategy for the choice of k in the model removes the spot
without a relevant worsening of the rest of the structure.
The conclusions, derived from these simple examples, are arguments in favour of
the application of the model (15) with this adaptative choice of the kernel to more
complex images. A priori, the nonlocal character of (15) suggests the drawback
of the computational cost of the implementation. In this sense, the selection of
the kernel can also be managed to overcome this problem. An example of this
is given in Fig. 12, where a subimage of Fig. 1b is evolved with (15), and the
same kernel as in Figs. 10 and 11. Note that the previously described effects are
On Evolutionary Integral Models for Image Restoration 255
Fig. 5 Evolved spot at times tn = 0, 0.05, 0.5, 1, 1.5, 2, with the profile of Fig. 3b
obtained in a short final time (T = 1), and with a small number of steps (N = 10
in this case; that is τ = 10−1 ).
In order to emphasize the freedom provided by the choice of the kernel via (16),
a final example is displayed. In this case, the discrete values of the kernels are
distributed by setting diffusion within a suitable range of values. More specifically,
we modify the strategy as follows: two thresholds 0 < ε1 < ε2 < 1 for the values
of the gradient of the image (normalized to one) at the pixels are first fixed. Then,
if the gradient at a pixel is below ε1 or above ε2 then the pixel will be removed, and
the corresponding kernel will approach one (high diffusion is applied). Otherwise,
the kernel will go to zero as in the previous strategy. The resulting system (15)
is able to improve the detection of the contours. This is first observed in Fig. 13
where, from the same original image as in Fig. 11a, the method locates the border
of a square (and still removes the isolated inner spot). This improved effect is
also generated in a subimage of the aneurysm in Fig. 1b (displayed in Fig. 14) in
a short time.
256 E. Cuesta et al.
Fig. 6 Evolved spot at times tn = 0, 0.05, 0.5, 1, 1.5, 2, with a constant profile α
Fig. 7 Subimage 400 − 450 × 320 − 370 of the Fig. 1a (left), and processed with T = 3, τ = T /50,
at time levels t25 (middle), and t50 (right)
In this paper, evolutionary integral models for image restoration, where the image
evolves according to a linear, and nonlocal equation of Volterra type, are studied.
It is shown that under non-restrictive hypotheses on the convolution kernel, the
On Evolutionary Integral Models for Image Restoration 257
Fig. 8 Subimage 500 − 550 × 450 − 500 of the Fig. 1a (left), and processed with T = 3, τ = T /30,
at time levels t15 (middle), and t30 (right)
Fig. 9 Subimage 470 − 550 × 200 − 300 of the Fig. 1b (left), processed with T = 3, τ = T /40,
and at time level t40 (middle), and processed with T = 5, τ = T /30, and at time level t30 (right)
Fig. 10 Original Syntetic image 10 × 10 (left), processed with T = 7, τ = T /70, at time level t40
(middle), and t70 (right)
continuous and semi-discrete (in space) models are well-posed, satisfy some scale-
space properties and behave like a constant (the average value) for long times. One of
the advantages of the models is the freedom when selecting the discrete values of the
kernel for the implementation. This provides the method with a relevant adaptability
to the image to be restored. Several examples of this property are shown in the
258 E. Cuesta et al.
Fig. 11 Original Syntetic image 10 × 10 (left), processed with T = 7, τ = T /70, at time level t40
(middle), and t70 (right)
Fig. 12 Subimage 470 − 490 × 200 − 220 of the Fig. 1b (left), processed with T = 1, τ = T /10,
and at time level t5 (middle), and l t10 (right)
Fig. 13 Original Syntetic image 10 × 10 (left), processed with T = 7, τ = T /70, at time level t20
(middle), and t40 (right)
numerical experiments presented in the paper. They are mainly focused on the ability
of the method (according to the selection of the kernel) for denoising and contour
detection for short time integration. The promising results encourage us to analyze
the fully discrete model in more detail and to incorporate nonlinearities as subjects
of a future work.
On Evolutionary Integral Models for Image Restoration 259
Fig. 14 Subimage 110 − 170 × 240 − 280 of the Fig. 1b (left), processed with T = 3, τ = T /30,
and at time level t15 (middle), and l t30 (right)
References
1. Álvarez L, Guichard F, Lions P-L, Morel J-M (1993) Axioms and fundamental equations for
image processing. Arch Ration Mech Anal 123:199–257
2. Aubert J, Kornprobst P (2001) Mathematical problems in image processing. Springer-Verlag,
Berlin
3. Bartels S, ProhlA (2007) Stable discretization of scalar and constrained vectorial Perona–Malik
equation. Interfaces Free Bound 4:431–453
4. Boyd, JP (2001) Chebyshev and Fourier spectral methods, 2nd edn. Dover, New York
5. Brezis H (2011) Functional analysis, Sobolev spaces and partial differential equations.
Springer, New York
6. Calvo MP, Cuesta E, Palencia C (2007) Runge–Kutta convolution quadrature methods for
well-posed equations with memory. Numer Math 107:589–614
7. Collatz L (1960) The numerical treatment of differential equations. Springer-Verlag, NewYork
8. Cuesta E, Finat J (2003) Image processing by means of a linear integro–differential equation.
IASTED1:438–442
9. Cuesta E, Palencia C (2003) A numerical method for an integro–differential equation with
memory in Banach spaces. SIAM J Numer Anal 41:1232–1241
10. Cuesta E, Durán A, Kirane M, Malik SA (2012) Image filtering with generalized fractional inte-
grals. In: Proceedings of the 12th international conference on computational and mathematical
methods in science and engineering. CMMSE 2012:553–563
11. Cuesta E, Kirane M, Malik SA (2012) Image structure preserve denoising using generalized
fractional time integrals. Signal Process 92:553–563
12. Deng T-B, Qin W (2013) Coefficient relation-based minimax design and low-complexity
structure of variable fractional-delay digital filters. Signal Process 93:923–932
13. Didas S, Burgeth B, Imiya A, Weickert J (2005) Regularity and scale-space properties of
fractional high order linear filtering. In: Kimmel R, Sochen N, Weickert J (eds) Scale-space
and PDE methods in computer vision, LNCS, vol 3459. Springer-Verlag, Berlin, pp 13–25
14. Guidotti P, Lambers JV (2009) Two new nonlinear diffusion for noise reduction. J Math
Imaging Vision 33:25–37
15. Kuo T-Y, Chen H-Ch, Horng T-L (2013) A fast Poisson solver by Chebyshev pseudospectral
method using reflexive decomposition. Taiwan J Math 17:1167–1181
260 E. Cuesta et al.
16. Lee JS (1983) Digital image smoothing and the sigma filter. Comput Vision Graph Image
Process 24:253–269
17. López–Fernández M, Palencia C (2004) On the numerical inversion of the Laplace transform
of certain holomorphic mappings. Appl Numer Math 51:289–303
18. López–Fernández M, Lubich Ch, Schadle A (2008) Adaptive, fast and oblivious convolution
in evolution equations with memory. SIAM J Sci Comput 30:1015–1037
19. Lubich Ch (1988) Convolution quadrature and discretized operational calculus I. Numer Math
52:129–145
20. Lubich Ch (1988) Convolution quadrature and discretized operational calculus II. Numer Math
52:413–425
21. Magin R, Ortigueira MD, Podlubny I, Trujillo J (2011) On the fractional signals and systems.
Signal Process 91:350–371
22. Perona P, Malik J (1990) Scale-space and edge detection using anisotropic diffusion. IEEE
Trans Pattern Anal Mach Intell 12:629–639
23. Podlubny I (1999) Fractional differential equations. Academic, London
24. Proskurowsky W, Widlund O (1980) A finite element-capacitance matrix method for the
Neumann problem for Laplace’s equation. SIAM J Sci Stat Comput 1:410–425
25. Pruss J (1993) Evolutionary integral equations and applications. Birkhäuser, Basel
26. Ross B, Samko S (1995) Fractional integration operator of variable order in the holder spaces
hλ(x) . Int J Math Math Sci 18:777–788
27. Rudin L, Osher S, Fatemi E (1992) Nonlinear total variation based noise removal algorithm.
Phys D 60:259–268
28. Scarpi G (1972) Sulla possibilita di un modello reologico di tipo evolutivo. Rend Sc Nat Fis
Mat II 52:570–575
29. Strickwerda JC (1989) Difference schemes and partial differential equations. Wadsworth and
Brooks, Pacific Grove
30. Trefethen Ll N (2000) Spectral methods in MATLAB. SIAM, Philadelphia
31. Tseng Ch-Ch, Lee S-L (2013) Designs of two-dimensional linear phase FIR filters using
fractional derivative constrains. Signal Process 93:1141–1151
32. Weickert J (1997)A review of nonlinear diffusion filtering, Lecture notes in computer science—
scale space theory in computer science. Springer, Berlin
33. Weickert J (1998) Anisotropic diffusion in image processing. B. G. Teubner, Stuttgart
34. Yaroslavsky LP (1985) Digital picture processing. An introduction. Springer-Verlag, NewYork
35. Zhuang Y, Sun XH (2001) A high-order fast direct solver for singular Poisson equations. J
Comput Phys 171:79–94
Colour Image Quantisation using KM and KHM
Clustering Techniques with Outlier-Based
Initialisation
Abstract This chapter deals with some problems of using clustering techniques K-
means (KM) and K-harmonic means (KHM) in colour image quantisation. A lot of
attention has been paid to initialisation procedures, because they strongly affect the
results of the quantisation. Classical versions of KM and KHM start with randomly
selected centres. Authors are more interested in using deterministic initialisations
based on the distribution of image pixels in the colour space. In addition to two
previously proposed initialisations (DC and SD), here is considered a new outlier-
based initialisation. It is based on the modified Mirkin’s algorithm (MM) and places
the cluster centres in peripheral (outlier) colours of pixels cloud. New approach takes
into account small clusters, sometimes representing colours important for proper
perception of quantised image. Pixel clustering was created in the RGB, YCbCr and
CIELAB colour spaces. Finally, resulting quantised images were evaluated by means
of average colour differences in RGB (PSNR) and CIELAB(ΔE) colour spaces and
additionally by the loss of colourfulness (ΔM).
1 Introduction
True colour images acquired by a camera contain only a small subset of all possi-
ble 16.7 million colours. Therefore, it makes sense to further reduce the number of
colours in the image. Nowadays, the colour image quantisation (CIQ) is an important
auxiliary operation in the field of colour image processing and is very useful in image
compression, image pre-segmentation, image watermarking and content-based im-
age retrieval (CBIR). These algorithms are also still used to present the true colour
images on devices with limited number of colours. CIQ reduces significantly the
Fig. 1 Simple colour image and its clusters in RGB colour space
number of colours in the image to the specially selected set of representative colours
(colour palette). Colour palette generation is the most important step in any CIQ
method. Proper choice of the colour palette helps minimize the colour difference
between the original image and the quantised image.
There exist two main classes of CIQ techniques: splitting techniques and clustering
techniques [1]. The splitting techniques divide the colour space into smaller disjoined
subspaces and then a colour palette is built by choosing representative colours from
these subspaces. Good examples of such techniques are the Median Cut [8], Octree
[5] and Wu’s [16] algorithms. For example, the Median Cut method first locates a
tightest box in RGB colour space, that encloses all image colours. Then, the box is
cut on the longest side and two subboxes are formed. As a result of a such cut both
subboxes should contain the same number of colours and from here comes the name
of the method. Next, a subbox with longest side is cut. This process continues until
the total number of subboxes is smaller than the number of colours in the palette
chosen for the quantised image. All colours in one subbox are represented by their
mean value.
On the other hand, the clustering techniques are the optimization tasks that min-
imize the quantisation errors by minimization the sums of distances between the
cluster centres and cluster points. One of the most popular clustering techniques is
the K-means (KM) technique [10] and its existing modifications e.g. K-harmonic
means (KHM) technique [19]. The clustering has a long tradition of use to quantize
colour images [18]. It can be easily to see that each of the dominant colours in nat-
ural image corresponds to a separate fragment of pixels cloud in the colour space,
which can be called a cluster (Fig. 1). As a generally statement, it may be found that
the splitting techniques are faster than the clustering techniques but they have larger
quantisation errors.
The results of many clustering techniques depend on method of determination of
initial cluster centres, used colour space, applied colour metric etc. Such sensitivity to
initialisation is an important disadvantage of these clustering techniques. A random
selection of the initial centres, used in classical KM version, is not able to achieve
Colour Image Quantisation using KM and KHM Clustering . . . 263
repeatable results in colour image quantisation. Therefore, in our previous paper [3]
we attempted to use two new heuristic methods of initialisation. The first method,
which is an arbitrary one, is based on uniform partitioning of diagonal of RGB cube
(DC) into k segments. Gray levels in the middle of segments are used as initial
centres. If an image is clustered into k clusters, k initial cluster centres are located
on the gray level axis. The second method, which is an adaptive one, uses a size
of pixel cloud of a colour image and the method has been marked as SD. First,
the mean value and standard deviation (SD) for each RGB component of all image
pixels are calculated. Then, around the point of mean colour (a pixel cloud centre)
a rectangular cuboid with sides equal 2σR , 2σG and 2σB is constructed. We assume
that it lies within the RGB cube. Next, the main diagonal of the cuboid is divided into
k equal segments. The centres of these segments are used as initial cluster centres.
Initial cluster centres in KM can also come directly from splitting algorithms e.g.
from MC or Wu’s algorithms and such combined approach (MC+KM, Wu+KM) was
proposed few years ago [14]. Experiments have shown that Wu+KM technique offers
a slightly better performance than MC+KM and KM initialised by SD approach.
Appropriate initialisation provides the high quality clustering achieved by run-
ning small number of iterations and avoids the formation of empty clusters, which
sometimes occurs in the case of DC initialisation. The result of empty clusters is a
reduction in the number of colours in quantised image. Removing empty clusters
needs changing the cluster centres or splitting a newly created cluster. Good initiali-
sation for the KM technique, used in colour quantisation, is still looked for by many
researchers [2].
The KHM is based on harmonic means, instead of arithmetic means and addition-
ally uses fuzzy membership of pixels to clusters and dynamic weight functions, what
means different influence an individual pixel on calculating new values of centres in
each iteration. KHM is robust to initialisation and creates non-empty clusters. A dis-
advantage of KHM in relation to KM is greater computational complexity, resulting
in a longer computation time.
The clustering process can be conducted not only in the RGB colour space, but
also in other colour spaces. Here a special role is played by recommended in 1976
the CIELAB colour space [17]. It is a perceptually uniform colour space which
approximately expresses a way of human colour perception. The Euclidean distance
in this space is approximately equal to the perceptual colour difference. This should
be of great importance in the process of clustering. Unfortunately, the transform from
RGB to CIELAB is complicated and nonlinear.
The YCbCr colour space is applied in CIQ task among other used colour spaces.
Its advantage, in comparison to CIELAB colour space, is a linearity of transforma-
tion from RGB space, which results in faster calculation of the YCbCr components.
Although the colour difference in the YCbCr space less corresponds to the human
colour perception than the colour difference calculated in CIELAB, however makes it
better than the Euclidean distance calculated in RGB space. The YCbCr components
can be received from the following transformation [9]:
The colour quantisation error depends on the number of colours in palette (e.g. 256,
64, 16, 8, 4 colours): the smaller number of colours in palette, then larger is the
quantisation error. Objective CIQ quality measures (Fig. 2) are very important in the
evaluation process of different colour quantizers.
Commonly used most popular measure is the Mean Squared Error (MSE) defined
by:
1 ( )
M N
MSE = (Rij − Rij∗ )2 + (Gij − G∗ij )2 + (Bij − Bij∗ )2 (4)
3MN i=1 j =1
where M and N are the image dimensions in pixels, Rij , Gij , Bij are the colour
components of the pixel at location (i, j ) in the original image and Rij∗ , G∗ij , Bij∗
are the colour components of the pixel in quantised image. The smaller the MSE
value, the better is the quantised image. Other error measure applied to evaluation of
quantisation is Peak Signal-to-Noise Ratio (PSNR), good correlated with MSE value
and expressed in decibel scale:
255
PSNR = 20log10 √ (5)
MSE
Colour Image Quantisation using KM and KHM Clustering . . . 265
Unfortunately, these both measures that come from the signal processing field are
poorly correlated with subjective visual quality of an image. The quantisation error
can be treated as a colour error that should be determined in a perceptually uniform
colour space. Therefore, an average colour difference in CIELAB colour space (ΔE)
is sometimes applied as a quantisation error:
1 +
M N
ΔE = (Lij − L∗ij )2 + (aij − aij∗ )2 + (bij − bij∗ )2 (6)
MN i=1 j =1
where: Lij , aij , bij are the colour components of the pixel at location (i, j ) in the
original image and L∗ij , aij∗ , bij∗ are the CIELAB colour components of the pixel in
the quantised image. Also the loss of image colourfulness due to colour quantisation
can be used as an additional tool for evaluation of quantisation error [13].
ΔM = Morig − Mquant (7)
where σrg , σyb are the standard deviations and μrg , μyb are the mean values of
opponent colour components of the image pixels. The opponent components are
approximated by following simplified equations:
rg = R − G (9)
yb = 0.5(R + G) − B (10)
A set of five natural images has been randomly chosen from Berkeley’s image
database [11] and presented in Fig. 3 in order of their number of unique colours.
266 H. Palus and M. Frackiewicz
All these images were acquired at the same spatial resolution, i.e. 481×321 pixels.
First tests were conducted to show that the larger unique number of colours in the
original image, the larger also quantisation errors for a given size of the palette (here
eight colours). The number of iterations in used clustering techniques was equal to
15 and the quantisation was realised by KM and KHM techniques in the RGB colour
space. The data in Table 1 shows the error values for KM technique with two different
initialisations: DC and SD. Similarly, Table 2 contains error values calculated for
more efficient KHM technique. It should be noted that in both cases with the decreas-
ing numbers of unique colours in images in Fig. 3 generally decreases the values of
Colour Image Quantisation using KM and KHM Clustering . . . 267
quantisation errors, i.e. increases PSNR and decrease ΔE and ΔM. A similar effect
also occurs for two tested splitting algorithms: MC and Wu’s (see Table 3).
In this way we confirmed a quite obvious hypothesis about the impact of the
number of unique colours in the image on the quantisation error.
Both DC and SD initialisations generate the starting centres of clusters located close
to gray line. In the case of KM these locations of centres largely determine the
final colours of the quantised image. There exist colour images for which the KM
268 H. Palus and M. Frackiewicz
technique with earlier presented initialisations (DC, SD) does not give good results,
particularly when the size of colour palette is small (e.g. four or eight colours).
Good example of such image is shown in Fig. 4a. This image is not very colour-
ful, but it contains 138 877 unique colours! Colour pixels, as in other images, are
generally grouped along diagonal of the RGB cube. Small red part of pixel cloud
represents a red letter lying in the middle of the image (see Fig. 4b). The forma-
tion of the separate red cluster can be very important for CIQ application in image
segmentation.
Unfortunately, the colour quantisation into 4 colours by KM and KHM techniques
with initialisations DC and SD does not permit to obtain the red letter in quantised
image (see Fig. 4c, d). Therefore we looked for a better method of initialisation
for our clustering techniques and we have found an intelligent initialisation of
KM proposed by Mirkin [12]. In this method the initialisation of KM is based on
so-called Anomalous Pattern (AP) clusters, which are the most distant from the
centre of cloud of points. Such outliers (peripheral points of the cloud) are the most
important in this initialisation. This algorithm is general in nature and can be used
in many different pattern recognition tasks.
1. Find the centre of cloud of points in RGB colour space and mark it as C.
2. Find a furthest point away from centre C and mark it as Cout.
3. Perform the KM clustering into two clusters based on appointed previously
centres: C and Cout and just the centre Cout is repositioned after each
iteration.
4. Add the RGB components of Cout to the list of stored centres.
5. Remove all points belonging to the cluster with centre Cout.
6. Check that there are still points in the cloud. If so, go back to the pt.2.
7. Sort obtained clusters by size (the number of elements) and select k largest
clusters. Their centres are final starting centres for KM clustering.
Fig. 4 Results of colour quantisation: a original image, b colour gamut, c KM with DC (4 colours),
d KM with SD (4 colours), e KM with MM (4 colours)
270 H. Palus and M. Frackiewicz
1. Find the centre of cloud of points in RGB colour space and mark it as C.
2. Find a furthest point away from centre C and mark it as Cout.
3. Add the RGB components of Cout to the list of stored centres.
4. Perform the KM clustering into two clusters based on appointed previously
centres: C and Cout and just the centre Cout is repositioned after each
iteration.
5. Remove all points belonging to the cluster with centre Cout.
6. Check that there are still points in the cloud. If so, go back to the pt.2.
7. Select the first k clusters determined by this algorithm. Their centres are
final starting centres for KM clustering.
The MM initialisation permits to get the red letter in the image during the quantisation
into 4 colours (see Fig. 4e). For all the considered initialisations we calculated a
colour error for the red letter in quantised image: ΔE(DC) = 77, ΔE(SD) = 60 and
ΔE(MM) = 11. These results demonstrate the superiority of MM initialisation over
other tested initialisations.
Figure 5 illustrates subsequent eliminations of outlier clusters from the cloud of
points presented as 3D scatter plot and helps to understand the algorithm. Here, the
Colour Image Quantisation using KM and KHM Clustering . . . 271
Fig. 6 Results of colour quantisation a original image, b colour gamut for original image, c KM
with DC initialisation, d KM with SD initialisation, e KM with MM initialisation
third step of MM (see Fig. 5c) has particular importance, because a centre of red
cluster is detected.
Another example of usefulness of MM initialisation is the quantisation of the
image shown in Fig. 6. Particular attention should be paid to the blue beads, which
are perceptually important region in the original image. The image is quantised into
8 colours. The colour quantisation by KM technique with initialisations DC and SD
generates the images without blue pixels in quantised image; the beads are gray (see
RGB values in Fig. 6c, d). This problem is solved by using the MM initialisation,
as shown in Fig. 6e. We calculated appropriate colour errors for the blue beads:
ΔE(DC) = 32, ΔE(SD) = 33 and ΔE(MM) = 4. Definitely the smallest error
again achieved the MM initialisation.
Similar experiments were also carried out with other images. Their visual evalu-
ation confirmed the advantages of the MM initialisation. Despite the limited palette,
each quantised image contained the perceptually significant colours. On the other
hand, generally accepted image quality measures for quantised images do not give
clear results (see Table 4). Only ΔM, the loss of image colourfulness, that is strongly
related to colour perception, shows the advantage of MM initialisation.
272 H. Palus and M. Frackiewicz
In the first group of tests were determined the positions of starting centres of clusters
for three compared initialisations: DC, SD and MM. The colour pixels in pixels cloud
of natural images are generally grouped along diagonal of the RGB cube. Black
dots plotted on a pixels cloud present the location of these centres. All clusterings
presented in this section were achieved after 30 iterations of KM technique.
The first test image (Fig. 7a) contains perceptually important red area lying in the
middle of the image and showing a paraglider, which can be seen in Fig. 7b, c, d
as a small part of pixels cloud directed to the red colour. In the case of DC and SD
Colour Image Quantisation using KM and KHM Clustering . . . 273
initialisation all eight initial centres are located on the diagonal of the RGB cube,
only the MM initialisation (Fig. 7d) generates two peripherally located centres, one
of which is contained in a cluster of red pixels. This gives a chance to get a good
CIQ result using the KM technique with MM initialisation.
The second test image (Fig. 8a), quantised into 6 colours, creates a specific pixels
cloud (Fig. 8 b, c, d) in the RGB space with three branches for three colours R, G
and B. Only the MM initialisation puts the initial centres in these sectors, which are
important for further clustering (Fig. 8d).
The third considered test image (Fig. 9a) presents a book cover and contains six
colour characters with distinct chromatic colours and it is characterized by more
complex pixels cloud (Fig. 9b, c, d). Again, only the MM initialisation (Fig. 9d)
settles a part of centres outside of the main pixels cloud, which gives a opportunity
to obtain a good CIQ result.
The second part of tests serves to compare the quality of images quantised with dif-
ferent initialisations. These tests were performed in the RGB colour space and two
274 H. Palus and M. Frackiewicz
additional colour spaces: YCbCr (linear transformation of RGB space) and percep-
tually uniform CIELAB colour space (non-linear transformation of RGB space). In
addition to the subjective visual assessment a loss of image colourfulness ΔM was
used, and the other typical quality measures described in Sect. 2 were rejected. Their
nature makes that the colours of the perceptually significant regions with small areas
do not play a noticeable role.
Figure 10 shows quantised versions of the original image presented in Fig. 7a. The
visual assessment indicates a dominance of the MM initialisation since regardless of
the type of colour space a reddish paraglider remains in quantised image. Particular
attention is paid the quantisation in the CIELAB colour space, where the loss of
image colourfulness ΔM regardless of initialisation is smallest.
Figure 11 shows quantised versions of the original image presented in Fig. 8a.
Original image contains the three chromatic colours only, so it is easy to visually
assess the results of quantisation. These three chromatic colours remained in the
quantised images in four of nine cases only. There are three images quantised after
MM initialisation and one image quantised in CIELAB space after SD initialisation.
The original image is not a natural image. Perhaps that is why the relation between
the results is here not so clear.
Colour Image Quantisation using KM and KHM Clustering . . . 275
Fig. 10 KM results for the image from Fig. 7a, a, b, c, results with DC initialisation, d, e, f results
with SD initialisation, g, h, i results with MM initialisation (k = 8)
Figure 12 shows quantised versions of the original image in Fig. 9a. The original
image contains six characters with distinct chromatic colours, making easy a visually
assessment of the quantised images. The caption below Fig. 12 includes a number
of chromatic colours recognized by the observer. You can notice that the maximal
number of chromatic colours obtained after CIQ is 4 and it has been achieved only
in case of MM initialisation in the YCbCr and CIELAB spaces. These results occur
simultaneously with the least loss of image colourfulness ΔM.
6 Conclusions
In this chapter we showed for two different CIQ techniques that the number of unique
colours in the natural image significantly influences on the value of quantisation error.
But the main contribution of the work is a new alternative way for initialisation of KM,
which provides better CIQ results. This approach based on detection and elimination
276 H. Palus and M. Frackiewicz
Fig. 11 KM results for the image from Fig. 8a, a, b, c results with DC initialisation, d, e, f results
with SD initialisation, g, h, i results with MM initialisation (k = 6)
outlier clusters, named here MM, does not lose the perceptually important colour
regions of the original image. Additionally, the usefulness of quality measure called
the loss of colourfulness to CIQ assessment has been confirmed.
Acknowledgements This work was supported by Polish Ministry for Science and Higher Edu-
cation under internal grant BK-/RAu1/2014 for Institute of Automatic Control, Silesian University
of Technology, Gliwice, Poland.
Colour Image Quantisation using KM and KHM Clustering . . . 277
Fig. 12 KM results for the image from Fig. 9a, a, b, c results with DC initialisation, d, e, f results
with SD initialisation, g, h, i results with MM initialisation (k = 8)
References
1. Brun L, Tremeau A (2003) Color quantization. In: Sharma G (ed) Digital color imaging
handbook. CRC, Boca Raton, pp 589–637
2. Celebi ME (2011) Improving the performance of k-means for color quantization. Image Vision
Comput 29(4):260–271
3. Frackiewicz M, Palus H (2011) KM and KHM clustering techniques for colour image quanti-
sation. In: Tavares JMR, Jorge RN (eds) Computational vision and medical image processing,
vol. 19. Springer, Netherlands, pp 161–174
4. Frackiewicz M, Palus H (2013) Outlier-based initialisation of k-means in colour image quantisa-
tion. In: Informatics and Applications (ICIA), Lodz, Poland, Second International Conference
on Informatics and Applications, pp 36–41
5. Gervautz M, Purgathofer W (1990)A simple method for color quantization: octree quantization.
In: Glassner AS (ed) Graphics gems. Academic, San Diego pp 287–293
6. Hasler D, Suesstrunk S (2003) Measuring colourfulness for natural images. In: Electronic
imaging 2003: human vision and electronic imaging VIII, Proceedings of SPIE, vol. 5007,
pp 87–95
7. Hassan M, Bhagvati C (2012) Color image quantization quality assessment. In: Venugopal K,
Patnaik L (eds) Wireless networks and computational intelligence, vol. 292. Springer, Berlin,
pp 139–148
278 H. Palus and M. Frackiewicz
8. Heckbert P (1982) Color image quantization for frame buffer display. ACM SIGGRAPH
Comput Graph 16(3):297–307
9. Koschan A, Abidi M (2008) Digital color image processing. Wiley, New York
10. Mac Queen J (1967) Some methods for classification and analysis of multivariate observations.
In: Proceedings of the 5th Berkeley symposium on mathematics, statistics, and probabilities,
vol. I, pp 281–297. Berkeley and Los Angeles, CA, USA
11. Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and
its application to evaluating segmentation algorithms and measuring ecological statistics. In:
Proceedings of the 8th international conference on computer vision, pp 416–423. Vancouver,
BC, Canada
12. Mirkin B (2005) Clustering for data mining: a data recovery approach. Chapman & Hall,
London
13. Palus H (2004) On color image quantization by the k-means algorithm. In: Droege D, Paulus
D (eds) Proceedings of 10. Workshop Farbbildverarbeitung, pp 58–65
14. Palus H, Frackiewicz M (2010) New approach for initialization of k-means technique applied
to color quantization. In: Information Technology (ICIT), Gdansk, Poland, 2nd international
conference on information technology, pp 205–209
15. Palus H, Frackiewicz M (2013) Colour quantisation as a preprocessing step for image seg-
mentation. In: Tavares JMR, Natal Jorge RM (eds) Topics in medical image processing and
computational vision, Lecture notes in computational vision and biomechanics, vol. 8. Springer,
Netherlands, pp 119–138
16. Wu X (1991) Efficient statistical computations for optimal color quantization. In: Arvo J (ed)
Graphic gems II. Academic Press, New York, pp 126–133
17. Wyszecki G, Stiles W (1982) Color science: concepts and methods, quantitative data and
formulae. Wiley, New York
18. Xiang Z, Joy G (1994) Color image quantization by agglomerative clustering. IEEE Comput
Graph Appl 14(3):44–48
19. Zhang B, Hsu M, Dayal U (1999) K-harmonic means-data clustering algorithm. Tech. Rep.
TR HPL-1999-124, Hewlett Packard Labs, Palo Alto, CA, USA
A Study of a Firefly Meta-Heuristics
for Multithreshold Image Segmentation
1 Introduction
maximizes the additivity property for its entropy. Such property states that the total
entropy for a whole physical system (represented by its probability distribution) can
be calculated from the sum of entropies of its constituent subsystems (represented
by their individual probability distributions).
Kapur et al. [2] maximized the upper threshold of the maximum entropy to obtain
the optimal threshold, and Abutaleb [3] improved the method using bidimensional
entropies. Furthermore, Li and Lee [4] and Pal [5] used the direct Kullback-Leibler
divergence to define the optimal threshold. And some years before, Sahoo et al.
[6] used the Reiny-entropy seeking the same objective. More details about these
approaches can be found in [7], which presents a review of entropy-based methods
for image segmentation.
Considering the restrictions of Shannon entropy, Albuquerque et al. [8] proposed
an image segmentation method based on Tsallis non-extensive entropy [9], a new
kind of entropy that is considered as a generalization of Shannon entropy through
the inclusion of a real parameter q, called “non-extensive parameter”. The work of
Albuquerque [8] showed promising results and a vast literature demonstrating the
performance of this method against the Optimal Threshold Problem. Although it is
a new contribution to the field, this paper will not address the Tsallis entropy.
A logical extension of binarization is called multi-thresholding [10, 11], which
consider multiple thresholds on the search space, leading to a larger number of
regions in the process of segmentation.
However, since the optimal threshold calculation is a direct function of the
thresholds quantity, the time required to search for the best combination between
the thresholds tends to grow exponentially. Furthermore, the optimum quantity of
thresholds is still a topic for discussion. Thus, the literature has proposed the use of
meta-heuristics that may be efficient for the calculation of thresholds, one of them
being the Firefly.
Recently, M. Horng [11] proposed an approach based on Minimum Cross-Entropy
thresholding (MCET) for multilevel thresholding with the same objective function
criterion as proposed by P. Yin [10]. The main conclusion of the work was that the
Cross-Entropy based method, a linear time algorithm, obtained thresholds values
very close to those found by equivalent exhaustive search (brute-force) algorithms.
However, the results were inconclusive since their methodology to evaluate the
experiment was subjective.
This article proposes an analysis of the Firefly meta-heuristics for multi-threshold-
based image segmentation. We also present the use of a Golden Standard Image
Base that allows us to compare the segmentation results of different algorithms in an
objective manner.
The strategy used in this study is the comparison of the obtained results with ex-
haustive methods results, both manual and automatic. Although these methods have
polynomial complexity in O(nd+1 ) order, where d and n are the number of thresh-
olds and histogram bins respectively. It is computationally expensive to calculate the
results for d ≥ 3.
A Study of a Firefly Meta-Heuristics for Multithreshold Image Segmentation 281
3 Firefly Meta-Heuristics
The Firefly (FF) algorithm was proposed by Xin-She Yang [13] and is a meta-
heuristics inspired on the fireflies behavior, which are attracted among themselves
according to their natural luminescence.
282 H. Erdmann et al.
that is being manipulated. Then, the fireflies luminescences are updated iteratively
under pre-established rules until the algorithm convergence to a global minimum.
The papers of Lukasik and Zak [15] and Yang [13] suggest that the FF overcome
other meta-heuristics, such as the Ant Farm [16], Tabu search [17], PSO [18] and
Genetic Algorithms [19]. Thus, the FF was presented as a computing-time efficient
method to the Multilevel Thresholding Problem (MLTP). Recently, the work of
[20] showed a computational time comparison of the FF against the other method,
demonstrating that the FF is more efficient when the evaluation function is modeled
with the maximum inter-cluster variance. Other works, such as [11] and [10] also
showed similar results when applied to the MLTP.
Specifically for the MLTP modeling, each firefly is a d-dimensional vector, where
each dimension is a single threshold that partitions the histogram space. In the specific
work of M. H. Horng and R. J. Liou [11], the goal was to minimize the objective
284 H. Erdmann et al.
function using the Cross-Entropy of the intensities histogram associated with each
segmented image criteria.
The Algorithm 1 describes the FF, where a solution set of n initial fireflies is given
on line 3. Each firefly fi is a d-dimensional vector and xki is the k-th threshold of i-th
solution. More details about the FF can be found in [11] and [13].
In this paper we show the results obtained using a novel approach for the firefly
algorithm. Our contribution is the use of Tsallis non-extensive entropy as a kernel
evaluation function for the firefly algorithm. This type of entropy is described in the
following sections.
The very celebrated Shannon entropy has been achieved several applications since
C. Shannon proposed it for information theory [21]. Considering a probability dis-
tribution P (H ) = {h(1), h(2), . . . , h(n)}., the Shannon entropy, denoted by S(H ), is
defined as:
L
S(H ) = − hi log (hi ) (1)
i=1
As stated before, T. Pun [1] applied this concept for 1LTP through the following
idea. Let two probability distributions from P (H ), one for the foreground, P (H1 ),
and another for the background, P (H2 ), given by:
h1 h2 ht
P (H1 ) : , ,..., (2)
p A pA pA
ht+1 ht+2 hL
P (H2 ) : , ,..., (3)
pB pB pB
where pA = ti=1 pi and pB = Li=t+1 pi .
If we assume that H1 and H2 are independent random variables, then the entropy
of the composed distribution1 verify the so called additivity rule:
1
we define the composed distribution, also called direct product of P = (p1 , . . . , pn ) and Q =
(q1 , . . . , qm ), as P ∗ Q = {pi qj }i,j , with 1 ≤ i ≤ n and 1 ≤ j ≤ m
A Study of a Firefly Meta-Heuristics for Multithreshold Image Segmentation 285
In the case of 1LTP, the optimal threshold t ∗ is that one which maximizes the
Eq. (4), which can be computed in O(L2 ) time.
As before, by assuming independent distributions and under the same normaliza-
tion restrictions, it is easy to extend the Eq. (4) for the case of d > 1 partitions, to
obtain the following generalization of the additive rule:
which, as in the case of cross-entropy, requires O(Ld+1 ) in order to achieve the set
of d optimal thresholds that maximizes the entropy in Expression (5).
As mentioned before, the Tsallis entropy is a generalization of the Shannon one (see
[22] and references therein). The non-extensive Tsallis entropy of the distribution
P (H ), denoted by Sq (H ), is given by:
L
q
1− hi
i=1
Sq (H ) = (6)
1−q
The main feature observed in Eq. (6) is the introduction of a real parameter q,
called non-extensive parameter. In [9] it is shown that, in the limit q → 1, Eq. (6)
meets the Eq. (1).
For Tsallis entropy we can find an analogous of the additivity property (Expression
(4)), called pseudo-additivity due to the appearance of an extra term. For 1LTP
(d = 1), given two independent probability distributions P (H1 ) and P (H2 ) from
P (H ), the pseudo-additivity formalism of Tsallis entropy is given by the following
expression:
where Sq (H1 ) and Sq (H2 ) are calculated by applying Eq. (6) for the probability
distributions P (H1 ) and P (H2 ).
For this 1LTP, the optimal threshold t ∗ is the one that maximizes the pseudo-
additivity property (7), and is computed in O(L2 ). As in the case of Shannon entropy,
we can easily derive a generalized version of Eq. (7) given by:
which is useful for MLTP. However, for the same reasons of cross-entropy and Shan-
non entropy, the computational time for solving the corresponding MLTP (without
a recursive technique) is O(Ld+1 ).
286 H. Erdmann et al.
Fig. 2 Adapted from [24]. Automatic q value calculation. In this figure it is possible to see that the
optimum value of q is 0.5
As Shannon Entropy (SE), Tsallis Entropy (TE) also tries to balance mutual
information between partitions of a distribution, since it depends on the individ-
ual probabilities instead of their positions. Note that the parameter q powers the
probability values, given a fine tuning in the pseudo-additivity maximization.
The main downside of the Tsallis entropy used by researchers such as Albuquerque
et al. [8], and Rodrigues et al. [23], is the definition of the q parameter that usually
is done manually. Thus, Rodrigues and Giraldi proposed a novel method for the
automatic calculation of q value [24]. Since the maximal entropy of a probabilistic
distribution X occur when all states of X, (x1 , x2 , . . . ,xn ) have the same probability.
So the maximum entropy of the X distribution, SMAX , is given by Eq. (9).
1
SMAX = 1 − n(p q (x)) (9)
q −1
where: q is the entropic parameter and n is the amount of elements of the X
distribution.
From the point of view of information theory , the lesser the relation between the
entropy Sq produced by a q value and the maximal entropy SMAX of a system, the
greater is the information contained in that system. This is a well-known concept
of the information theory and gives us the idea that an optimum q value can be
calculated by minimizing the Sq /SMAX function [24].
A Study of a Firefly Meta-Heuristics for Multithreshold Image Segmentation 287
Thus for each distribution, we calculate the values of the relations between the
Sq entropy and the maximal entropy SMAX for each value of q varying in the range
of [0.01, 0.02, . . . , 2.0] in order to find the q value that minimizes the relation. In
Fig. 4, one can observe the behavior of the relation between Sq and SMAX throughout
the q variation.
In this work, we made use of 300 images from the Berkeley University database [12].
Such images are composed of various natural scenes, wherein each was manually
segmented. The task to segment an image into different cognitive regions is still an
open problem. It is possible to highlight two main reasons for it to be considered a
difficult task: (i) a good segmentation depends on the context of the scene, as well
as from the point of view of the person who is analyzing it; and (ii) it is rare to
find a database for formal comparison of the results. Generally, researchers present
their results comparing just a few images, pointing out what they believe is correct.
In these cases, probably the same technique will work only with other images that
belong to the same class. Still, the question that remains unresolved is: “ What is a
correct segmentation?”.
In the absence of an answer to the question, a reference is necessary that allows
the comparison of several techniques under the same database or parametrization.
Regarding this, the image database used here can be considered as an attempt to
establish such reference.
The Fig. 3 shows many examples of the pictures that belong to the database and
the overlapping of 5 edge-maps derived from the manual segmentation, which de-
notes the high level of consistency between segmentations done by different persons.
Additional details about this image database can be read on [12].
When overlapping the five edge-maps of the same image as in Fig. 3, some edges
do not match, thus the final intensity of each edge of the overlapped image is going
to be higher if it overlaps more edges and less intense otherwise. In this article, we
made use of 300 images as comparison base (gold standard) for our experiments.
Furthermore, the divergence of information in the absolute value between the
automatically-obtained segmentations and the golden standard (manually-obtained
segmentations) were also not considered as a segmentation-quality measure. So,
the image database is used as a tool for comparison between the results of the two
evaluated methods.
6 Similarity Measure
We defined a function to measure the similarity between the manual and the automatic
segmentation. However, this is a difficult task and the problem is still unsolved.
Sezgin and Sankur [25] proposed 5 quantitative criteria for measuring the luminance
288 H. Erdmann et al.
Fig. 3 Sample images from the Berkley University database [12] composed of 300 manually
segmented images used on the experiments as ground truth
A Study of a Firefly Meta-Heuristics for Multithreshold Image Segmentation 289
region and shaped 20 classical methods to measure the similarity between them.
But the criterion they proposed was not based on a golden standard defined set of
images, thus the method of comparison proposed in [25] can be used only as an
intrinsic quality evaluation of the segmented areas: i.e, one output image segmented
into uniformly molded regions cannot be considered as close as expected to the
manual segmentation.
On the other side, golden standard based measuring techniques are also difficult
to propose when the system needs to detect several regions of the image at the same
time, a common task in computer vision. Besides that, to compare corresponding
edges brings difficulty to detect entire regions, as well as their location in space.
Also, in the area of computer vision, is an important demand to be able to deduct
regions that are interrelated.
Although it is possible to design an algorithm which tolerates localization errors,
it is likely that detecting only the matching pixels and assuming all others are flaws
or false positive and may provide a poor performance.
One can speculate from Fig. 3 that the comparison between the edge-maps derived
from the automatic and manual segmentations must tolerate localization errors as
long as there are also divergences on the edges of the golden standard. Thus, the
consideration of some differences can be useful in the final result as shown in [12].
On the other hand, from 2D edge-maps, such as the one we used, one can ob-
tain two types of information: geometrical dispersion and intensity dispersion. The
geometric dispersion measures the size and the location of the edges; the intensity
dispersion measures how common is that edge among all manual segmentations that
were overlapped. Thus, the geometric dispersion between two edge-maps has its
information measured in a quantitative manner, in the x and y dimensions, while the
luminance dispersion can be represented by the z dimension.
The divergence of information between the two edge-maps of an M × N image
in the x dimension is calculated by the Euclidean distance between the two maps,
where the Hx as vertical projection at the edge map for automatic segmentation and
the Mx is the corresponding vertical projection for the manual one.
So, in this article, we propose a similarity function between the two edge-vertical-
projection Mx and Hx of the x dimension presented in Eq. (10) to measure how
different the automatically-obtained segmentation (ASx ) is from the manual one
(golden standard, GSx ), in this specific direction:
M
Simx (GSx |ASx ) = (Mx (i) − Hx (i))2 , (10)
i=1
where M is the size of x distribution, Mx and Hx are the image edges projections in
the x direction, manual and automatic respectively. Mx and Hx are obtained by sum
of values greater than 0 in each column.
Similarly, the corresponding function to y direction is given for
N
Simy (GSy |ASy ) = (My (i) − Hy (i))2 , (11)
i=1
290 H. Erdmann et al.
where N is the size of y distribution and My and Hy are obtained by sum of values
greater than 0 in each line. The corresponding function to z direction is given for
L
Simz (GSz |ASz ) = (Mz (i) − Hz (i))2 , (12)
i=0
where L = [0, 1, ..., 255] is the total of image gray levels. ASz and GSz represent
the grayscale histogram.
Thus, in this study, we propose the following evaluation function to measure the
similarity between two edge-maps:
The methodology shown in Fig. 1 describes both scenarios used in this paper: (i)
the segmentation with 1, 2 and 3 thresholds found by an exhaustive search; and
(ii) the segmentation with 1, 2 and 3 thresholds obtained with the use of the FF
meta-heuristic.
The main reason for using the exhaustive search was to guarantee that the whole
solution space is explored in order to find the thresholds that provide the closest
results to the golden standard for each image.
The authors of [13] and [11] presented multi-thresholding approaches based on
the FF algorithm and made a comparison with the exhaustive strategy, where the FF’s
kernel was chosen as being the Cross-Entropy approach. This type of comparison
is limited, since it is only a relative matching between the FF result and the one
A Study of a Firefly Meta-Heuristics for Multithreshold Image Segmentation 291
As in Fig. 1, we applied a threshold (1 level) for each image. Then, for each possible
threshold, the image was segmented. Then we applied a gradient-based edge detector
which returns the boundaries of the regions that were found. Next, the comparison
between the newly obtained edge-map and the golden standard is given by Eq. (13).
If T = {t1 , t2 , . . . , tL }, where L = 256, then the optimal threshold topt ∈ T is the
one that minimizes Eq. (13). These procedure was then repeated for 2 and 3 levels,
remembering that the solutions space grows exponentially, since we need |T |2 and
|T |3 tests for segmentating with 2 and 3 levels respectively.
Despite being an exhaustive strategy, the algorithm surely returns the optimal
results. This means that no other thresholding-based segmentation algorithm can
outmatch this algorithm’s results because it searches through all possible threshold
combinations in the solution’s space. Thus, the distance between the exhaustive
search and the Golden Standard are the lowest possible and can be used as a lower
boundary for minimizing the Eq. (13). This strategy is more appropriate than the
noise minimization that was proposed in [13] and [11].
If I = {i1 , i2 , . . . , i300 } is the 300 image set, for each ij ∈ I , we can associate an
array Si = [si1 , si2 , si3 ], where si1 is the value given by Eq. (13) for the binarization
of ij with the optimal topt ; si2 is the following value for the multi-thresholding of
ij with the optimal thresholds {topt1 , topt2 } ∈ T ; and finally, si3 is the corresponding
array with 3 thresholds {topt1 , topt2 , topt3 } ∈ T .
For better visualization of the results, we created an M300×3 matrix, where each
Mij (1 ≤ i ≤ 300 and 1 ≤ j ≤ 3) element is the value of sij ∈ Si associated with the
i image. Each i line of M was normalized into 3 intensity values L ∈ {0, 128, 255},
so that Mij = 0 if sij = maxSi ; Mij = 255, if sij = minSi ; and Mi,j = 128, if sij is
the median of Si . The Fig. 5 shows M as one single image with dimensions 300 × 3
resized to 300 × 300 for better visualization.
Thus, for cell (i, j ) of M on Fig. 5, the brighter the pixel, the more the image
segmented with the j -th threshold resembles the manually segmented image. The
darker the pixel, greater the difference between them.
292 H. Erdmann et al.
Fig. 5 Exhaustive segmentation results (left) and FF Meta-Heuristics segmentation results (right).
Each row represents one of the 300 images from the database. The columns are the results of the
segmentation with 1, 2 and 3 thresholds. For each row, the brighter the column, the more the image
segmented with the corresponding threshold resembles the manually segmented image
The experiments were repeated using the FF segmentation, except for the threshold
calculation, that is done with the Algorithm 1. Just like the experiments with the
exhaustive search method, we also created a M300×3 matrix with the same properties
as the previous one. Comparing the Fig. 5, it is possible to notice a similarity between
them, indicating that the FF results are close to the exhaustive method.
7.3 Discussion
Looking closer to the Fig. 5, one can perceive a gradient from dark to bright in
both methods used, so that, for most rows (images), the columns that correspond to
the segmentation with 3 thresholds are brighter (more similar) than the others. This
means that, in our experiments, segmentating an image into 4 levels (3 thresholds),
generally, gives us better results than with lesser threshold levels. The opposite
applies to the first column, which is darker than the other 2, meaning that although
A Study of a Firefly Meta-Heuristics for Multithreshold Image Segmentation 293
Table 1 Comparison between the exhaustive search (BF) and the FF results
Avg. Dist. with GS Exhaustive search Firefly algorithm Difference (%)
FF = BF 21.41 24.15 11.33
FF = BF 21.36 23.69 9.85
Total 21.39 23.93 10.61
Table 2 Quantitative comparison between the exhaustive search (BF) and the FF results
Threshold results Exhaustive search Firefly algorithm
1 Threshold 5 53
2 Thresholds 114 73
3 Thresholds 181 174
1 Threshold (when FF=BF) 1 1
2 Thresholds (when FF=BF) 36 36
3 Thresholds (when FF=BF) 116 116
it’s the fastest and easiest way to segmentate an image, binarizing generally produces
the worst segmentation results when compared with the results obtained with more
thresholds.
In a more detailed analysis, we listed on Table 1 a general comparison between the
results of the FF Meta-Heuristics and the exhaustive search (or brute-force, BF). The
columns 2 and 3 represent the average distance between the golden standard (GS)
and the BF and FF methods respectively. The last column represents the percent
difference between the FF and BF. The first line describes the average distances
when the BF and the FF thresholds are equal. The second line lists the distances
when the results are not equal. And the final line summarizes all the results.
The main observation that can be made from the results listed on Table 1 is
that the FF algorithm’s results are very close to the exhaustive search. The average
difference between them is 10.61 %. This shows that even when the FF does not
find the optimum threshold, the distance between the segmentation obtained from
its result is only 10.61 % different from the desired. As the threshold quantity grows,
the levels combination tends to a combinatory explosion that causes the exhaustive
search method to be impossible to calculate. The FF methods responds well in these
cases, finding a result that is only 10.61 % different from the optimum but with linear
processing time.
The Table 2 describes a quantitative comparison between the exhaustive search
and the FF Meta-Heuristics. On the first line, it shows how many images where
best segmented with 1 threshold with the BF and the FF methods (columns 2 and
3 respectively). The second and third lines describe how many images were best
segmented with 2 and 3 thresholds respectively. The other three lines describe how
many images where best segmented with 1,2 and 3 thresholds respectively with the
FF and BF methods resulting in the same thresholds amount.
294 H. Erdmann et al.
From Table 2, it is possible to reaffirm the observations made from Fig. 5, that
generally, 3 thresholds produce a better segmentation than 2 which, in turn, is still
better than 1 threshold (binarization). This can be explained due to the matrix nor-
malization. The brighter the Mi,j cell, the closer the j -th threshold segmentation is to
the manual segmentation. So, it is possible to conclude that this approximation gets
higher as the j value increases. That is, if the goal of the threshold segmentation is
to find the threshold set that results in a segmentation that is close to the manual one,
then, to use 3 thresholds is more efficient than 2 which in turn is better than 1. How-
ever, one can speculate that beyond 3 thresholds, the results tend to get worse since
this leads to the over-segmentation of the image. But this is a further investigation
out of the current scope.
8 Conclusions
References
1. Pun T (1981) Entropic thresholding: a new approach. Comput Graphics Image Process 16:210–
239
2. Kapur JN, Sahoo PK, Wong AKC (1985) A new method for gray-level picture thresholding
using the entropy of the histogram. Comput. Graphics Image Process 29:273–285
3. Abutaleb AS (1989) A new method for gray-level picture thresholding using the entropy of the
histogram. Comput Graphics Image Process 47:22–32
4. Li CH, Lee CK (1993) Minimum cross entropy thresholding. Pattern Recognit 26:617–625
5. Pal NR (1996) On minimum cross entropy thresholding. Pattern Recognit 26:575–580
A Study of a Firefly Meta-Heuristics for Multithreshold Image Segmentation 295
6. Sahoo P, Soltani S, Wong A, Chen Y (1988) A survay of thresholding techniques. Comput Vis
Gr Image Process 41(1):233–260
7. Chang C-I, Du Y, Wang J, Guo S-M, Thouin P (2006, Dec.) Survey and comparative analysis
of entropy and relative entropy thresholding techniques. IEEE Proc, Vis, Image Signal Process
153(6):837–850
8. Albuquerque M, Esquef I, Mello A (2004) Image thresholding using tsallis entropy. J Stat Phys
25:1059–1065
9. Tsallis C (1999, March) Nonextensive statistics: theoretical, experimental and computational
evidences and connections. Braz J Phys 29(1):1–35
10. Yin PY (2007) Multilevel minimum cross entropy threshold selection based on particle swarm
optimization. Appl Math Comput 184:503–513
11. Horng MH, Liou RJ (2011) Multilevel minimum cross entropy threshold selection based on
firefly algorithm. Expert Syst Appl 38:14805–14811
12. Martin D, Fowlkes C, Tal D, Malik J (2001, July) A database of human segmented natural
images and its application to evaluating segmentation algorithms and measuring ecological
statistics. In: Proc. 8th Int’l Conf. Computer Vision, vol 2, pp 416–423
13. Yang XS (2009) Firefly algorithms for multimodal optimization. Stochastic algorithms:
fundation and applications, SAGA 2009. Lecture Notes Computer Science 5792:169–178
14. Erdmann H, Lopes LA, Wachs-Lopes G, Ribeiro MP, Rodrigues PS (2013) A study of firefly
meta-heuristic for multithresholding image segmentation. In: VIpImage: Thematic Conference
on Computational Vision and Medical Image Processing, Ilha da Madeira, Portugal, October,
14 to 16 2013, pp 211–217
15. Lukasik S, Zak S (2009) Firefly algorithm for continuous constrained optimization tasks. In:
1st International Conference on Computational Collective Intelligence, Semantic Web, 5-7
October 2009.
16. Dorigo M (1992) Optimization, learning, and natural algorithms. Ph. D. Thesis, Dipartimento
di Elettronica e Informazione, Politecnico di Milano, Italy
17. Glover F (1989) Tabu search. PART I, ORSA J Comput 1:190–206
18. Kennedy J, Goldberg RC (1997) Particle swarm optimization. In: Proceedings of IEEE
International Conference on Neural Networks, vol IV, pp 1942–1948
19. Goldberg DE (1997) Genetic algorithms in search, optimization, and machine learning.
Addison Wesley, Reading
20. Hassanzadeh T,Vojodi H, EftekhariAM (2011)An image segmentation approach based on max-
imum variance intra-cluster method and firefly algorithm. In: Seventh International Conference
on Natural Computation, IEEE, Ed., Shanghai, China, pp 1844–1848
21. Shannon C, Weaver W (1948) The mathematical theory of communication. University of Illinois
Press, Urbana
22. Tavares AHMP (2003) Aspectos matemáticos da entropia. Master Thesis, Universidade de
Aveiro
23. Giraldi G, Rodrigues P (2009) Improving the non-extensive tsallis non-extensive medical image
segmentation based on tsallis entropy. Pattern Analysis and Application, vol. Submitted
24. Rodrigues P, Giraldi G (2009) Computing the q-index for tsallis non-extensive image segmen-
tation. In XXII Brazilian Symposium on Computer Graphics and Image Processing (Sibgrapi
2009), SBC, Ed., vol. To Appear
25. Sezgin M, Sankur B (2004, Jan) Survay ove image thresholding techniques and quantitative
performance evaluation. J Eletr Imaging 13(1):146–165
Visual-Inertial 2D Feature Tracking based on an
Affine Photometric Model
1 Introduction
Many different applications in the field of computer vision (CV) require the robust
identification and tracking of distinctive feature points in monocular image sequences
acquired by a moving camera. Prominent examples of such applications are 3D scene
modelling following the structure-from-motion (SfM) principle or the simultane-
ous localisation and mapping (SLAM) for mobile robot applications. The general
procedure of feature point tracking can be subdivided in two distinctive phases:
Fig. 1 Re-identification of
single feature point in two
subsequent frames of an
image sequence
The first stage for realising this was the development of an inertial smart sensor
system (S 3 ) based on a bank of inertial measurement units in MEMS1 technology.
The S 3 is able to compute the actual absolute camera pose (position and orientation)
for each frame. The hardware employed and the corresponding navigation algorithm
are described in Sect. 2. As a second step a visual feature tracking algorithm, as
described in Sect. 3, needs to be implemented. This algorithm considers prior motion
estimates from the inertial S 3 in order to guarantee a greater convergence region of
the optimisation problem and deliver an improved overall tracking performance. The
results are briefly discussed in Sect. 4. Finally Sect. 5 concludes the whole work and
describes potential future work.
For the implementation of an Inertial Fusion Cell (IFC) a smart sensor system (S 3 ) is
suggested here, which is composed as a bank of different micro-electromechanical
systems (MEMS). The proposed system contains accelerometers, gyroscopes and
magnetometers. All of them are sensory units with three degrees of freedom (DoF).
The S 3 contains the sensors itself, signal conditioning (filtering) and a multi-sensor
data fusion (MSDF) scheme for pose (position and orientation) estimation.
The general architecture of the S 3 is shown in the following Fig. 2, where the overall
architecture contains the main ‘organ’ consisting of the sensory units as described in
Sect. 2.2. A single micro controller is used for analogue-digital-conversion (ADC),
signal conditioning (SC) and the transfer of sensor data to a PC. The actual sensor
fusion scheme is realised on the PC.
2.2 Hardware
1
MEMS—micro-electromechanical systems.
300 D. Aufderheide et al.
the earth’s magnetic field. All IMU sensors are connected to a micro controller
(ATMega328) which is responsible for initialisation, signal conditioning and com-
munication. The interface between sensor and micro controller is based on I 2 C-Bus
for the accelerometer and magnetometer, while the gyroscope is directly connected
to ADC channels of the AVR. So the used sensor setup consists of three orthogonal ar-
T
ranged accelerometers measuring a three dimensional acceleration ab = ax ay az
normalised with the gravitational acceleration constant g. Here b indicates the actual
body coordinate system in which the entities are measured. The triple-axis gyro-
T
scope measures the corresponding angular velocities ωb = ωx ωy ωz around the
sensitivity axes of the accelerometers. The magnetometer is used to sense the earth’s
T
magnetic field mb = mx my mz . Figure 3 shows the general configuration of all
sensory units and the corresponding measured entities.
Visual-Inertial 2D Feature Tracking based on an Affine Photometric Model 301
Measurements from MEMS devices in general and inertial MEMS sensors in partic-
ular suffer from different error sources. Due to this it is necessary to implement both:
an adequate calibration framework and a signal conditioning routine. The calibration
of the sensory units is only possible if a reasonable sensor model is available in ad-
vance. The sensor model should address all possible error sources. Here the proposed
model from [14] was utilised and adapted for the given context. It contains:
• Misalignment of sensitivity axes—Ideally the three independent sensitivity axes
of each inertial sensor should be orthogonal. Due to imprecise construction of
MEMS-based IMUs this is not the case for the vast majority of sensory packages.
The misalignment can be compensated by finding a matrix M which transforms
the non-orthogonal axis to a orthogonal setup.
• Biases—The output of the gyroscopes and accelerometers should be exactly zero
if the S 3 is not moved at all. However there is typically a time-varying offset for real
sensors. It is possible to differentiate g-independent biases (e.g. for gyroscopes)
and g-dependent biases. For the latter there is a relation between the applied
acceleration and the bias. The bias is modelled by incorporation of a bias vector b.
• Measurement noise—The general measurement noise has to be taken into account.
The standard sensor model contains a white noise term n.
• Scaling factors—In most cases there is an unknown scaling factor between the
measured physical quantity and the real signal. The scaling
can be compensated
for by introducing a scale matrix S = diag sx , sy , sz .
A block-diagram of the general sensor model is shown in the following figure (Fig. 4).
Based on this it is possible to define three separate sensor models for all three
sensor types2 , as shown in the following equations:
ωb = Mg · Sg · ωb + bg + ng (1)
ab = Ma · Sa · ab + ba + na (2)
2
The different sensor types are indicated by the subscript indices at the entities in the different
equations.
302 D. Aufderheide et al.
mb = Mm · Sm · mb + bm + nm (3)
It was shown that M and S can be determined by sensor calibration procedure in which
the sensor array is moved to different known locations to determine the calibration
parameters. Due to their time-varying character, the noise and bias terms cannot
be determined a-priori. The signal conditioning step on the μC takes care of the
measurement noise by integrating an FIR digital filter structure. The implementation
realises a low-pass FIR filter based on the assumption that the frequencies of the
measurement noise are much higher than the frequencies of the signal itself. The
complete filter was realised in software on the μC, where the cut-off-frequencies for
the different sensory units were determined by an experimental evaluation.
Classical approaches for inertial navigation are stable-platform systems which are
isolated from any external rotational motion by specialised mechanical platforms.
In comparison to those classical stable platform systems, the MEMS sensors are
mounted rigidly to the device (here: the camera). In such a strapdown system,
it is necessary to transform the measured quantities of the accelerometers, into a
global coordinate system by using known orientations computed from gyroscope
measurements. In general the mechanis system level operation of a strapdown in-
ertial navigation systems (INS) can be described by the computational elements
indicated in Fig. 5. The main problem with this classical framework is that location
is determined by integrating measurements from gyros (orientation) and accelerom-
eters (position). Due to superimposed sensor drift and noise, which is especially
Visual-Inertial 2D Feature Tracking based on an Affine Photometric Model 303
significant for MEMS devices, the errors for the egomotion estimation tend to grow
unbounded.
The necessary computation of the orientation ξ of the S 3 based on the gyroscope
measurements ωb and a start orientation ξ(t0 ) can be described as follows:
ξ = ξ(t0 ) + ωb dt (4)
Possible errors in the orientation estimation stage would lead also to a wrong
position, due to the necessity to transform the accelerations in the body coordi-
nate frame ab to the inertial reference frame (here indicated by the subscript i).
The following figure (Fig. 7) demonstrates the typical drifting error for the absolute
position (one axis) computed by using the classical strapdown methodology.
By using only gyroscopes, there is actually no way to control the drifting error for
the orientation in a reasonable way. It is necessary to use other information channels.
So the final framework for pose estimation considers two steps: an orientation esti-
mation and a position estimation as shown in Fig. 8. In comparison to the classical
strapdown method, the suggested approach here incorporates also the accelerometers
for orientation estimation. The suggested fusion network is given in the following
figure, and the different sub-fusion processes are described in Sects. 2.5 and 2.6.
The general idea for compensating the drift error of the gyroscopes is based on
using the accelerometers as an additional attitude sensor. Due to the fact that the
3-DoF accelerometer measures not only (external) translational motion, but also
the influence of the gravity, it is possible to calculate the attitude based on the
single components of the measured acceleration. At this point it should be noted that
measurements from the accelerometers can only provide roll and pitch angle Thus,
the heading angle of the unit has to be derived by using the magnetometer instead.
304
4 1
gyro roll
gyro pitch
3 0
gyro yaw
real angle
2 -1
1 -2
0 -3
gyro roll
gyro pitch
-1 -4
gyro yaw
real angle
-2 -5
0 1000 2000 3000 4000 5000 6000 0 1000 2000 3000 4000 5000 6000
sampling time [0.01s] sampling time [0.01s]
Fig. 6 Drifting error for orientation estimates based on gyroscope measurements, for two separate experiments
D. Aufderheide et al.
Fig. 7 Drifting error for absolute position estimates based on classical strapdown mechanisation of an inertial navigation system (left: acceleration measurements;
Visual-Inertial 2D Feature Tracking based on an Affine Photometric Model
# , $
φ = arctan2 ay2 , (ax + az )2 (7)
The missing heading angle can be obtained by using the readings from the magne-
tometer and the already determined roll and pitch angles. Here it is important to be
aware that the measured elements of the earth magnetic field have to be transformed
to the local horizontal plane (tilt compensation is illustrated in Fig. 10) as indicating
in the corresponding relations
Xh = mx · cϕ + my · sθ · sϕ − mz · sθ · sϕ
Yh = my · cθ + mz · sθ (8)
ψ = arctan 2 (Yh , Xh )
Visual-Inertial 2D Feature Tracking based on an Affine Photometric Model 307
a b
Fig. 11 a Discrete Kalman filter (DKF) for estimation of roll and pitch angles based on gyroscope
and accelerometer measurements. b DKF for estimation of yaw (heading) angle from gyroscope
and magnetometer measurements
-
ωk+1 = ωk+1 − bgyrok
ξk+1 = ξk + - ωk+1 dt (9)
bgyrok+1 = bgyrok
308 D. Aufderheide et al.
Here the actual measurements from the gyroscopes ωk+1 are corrected for by the
actually estimated bias bgyrok from the former iteration, before the actual angle ξk+1
is computed.
−
Computation of a priori error covariance matrix Pk+1
The a priori covariance matrix is calculated by incorporating the Jacobi matrix A
of the states and the process noise covariance matrix QK as follows:
−
Pk+1 = A · Pk · AT + QK (10)
The two steps (1) and (2) are the elements of the prediction step as indicated in
Fig. 11.
Computation of Kalman gain Kk+1
As a prerequisite for computing the a posteriori state estimate the Kalman gain
Kk+1 has to be determined by following Eq. 11.
−
−
−1
Kk+1 = Pk+1 · Hk+1
T
· Hk+1 · Pk+1 · Hk+1
T
+ Rk+1 (11)
+
Computation of a posteriori state estimate xk+1
The state estimate can now be corrected by using the calculated Kalman gain Kk+1 .
Instead of incorporating the actual measurements as in the classical Kalman structure
the suggested approach is based on the computation of an angle difference Δξ . The
difference is a comparison of the angle calculated from the gyroscope measures
and the corresponding attitude as derived from the accelerometers, respectively the
heading angle from the magnetometer, as already introduced in the introduction of
+
this chapter. So the relation for xk+1 can be formulated as:
+ −
xk+1 = xk+1 − Kk+1 · Δξ (12)
At this point it is important to consider the fact that the attitude measurements from the
accelerometers are only reliable if there is no external translational motion. Thus an
external acceleration detection is also needs to be part of the fusion procedure. For this
reason the following condition (see Rehbinder et al. [12]) is evaluated continuously:
+
!
a = (ax2 + ay2 + az2 ) = 1 (13)
If the relation is fulfilled there is no external acceleration and the estimation of the
attitude from accelerometers is more reliable than the one computed from rotational
velocities as provided by the gyroscopes. For real sensors, a threshold εg is introduced
to define an allowed variation from this ideal case. If the camera is not at rest the
observation variance for the gyroscope data σg2 is set to zero. By representing the
magnitude of the acceleration measurements as a and the earth gravitational field
g = [0, 0, −g]T the observation variance can be defined by following Eq. 14.
⎧
⎨σ 2 , a − g < εg
g
σg2 = (14)
⎩ 0, otherwise
Visual-Inertial 2D Feature Tracking based on an Affine Photometric Model 309
At this point the orientation of the camera is known by following the classical strap-
down approach. Hence, the position p can only be obtained by double integration of
the body accelerations a, when a known orientation Ξ = [φ θ ψ]T is available that
allows a rotation from body frame B to reference (or navigation) frame N by using
the direct cosine matrix (DCM) Cbn , defined as follows4 :
⎡ ⎤
cθcψ sϕsθcψ − cϕsψ cϕsθcψ + sϕsψ
⎢ ⎥
Cbn = ⎢⎣ cθsψ sϕsθsψ + cϕcψ cϕsθsψ − sϕcψ ⎦
⎥ (18)
−sθ sϕcθ cϕcθ
3
mdes describes the magnitude of the earth’s magnetic field (e.g. 48 μT in Western Europe).
4
For simplification: sα = sin(α) and cβ = cos(β).
310 D. Aufderheide et al.
⎡ ⎤
q12 − q22 − q32 + q42 2 (q1 q2 + q3 q4 ) 2 (q1 q3 − q2 q4 )
1 ⎢ ⎥
Cbn (q) = + ·⎢
⎣ 2 (q1 q2 − q3 q4 ) −q12 + q22 − q32 + q42 2 (q2 q3 + q1 q4 ) ⎥
⎦
q42 + e 2
2 (q1 q3 + q2 q4 ) 2 (q2 q3 − q1 q4 ) −q12 − q22 + q32 + q42
(19)
The photometric model is illustrated by Fig. 13, where a light source Λ illuminates
a scene and the emitted light is reflected by the main surface S to the image plane
Π , which is modelled by parameter σ .
Visual-Inertial 2D Feature Tracking based on an Affine Photometric Model 311
Fig. 13 Illustration of the photometric model with light rays reflected by the surface of the main
object and reflectance from other objects
312 D. Aufderheide et al.
Fig. 14 Prototype of a
visual-inertial sensor for
VIFtrack!
Due to reflectance from other objects (ambient light sources) there are additional
rays, which also change the intensity of an image pixel (parameter o). Due to the fact
that the photometric motion cannot be estimated by using the inertial measurements,
the corresponding values from the former frame are used as initial parameters for the
optimisation. After the warping of the descriptors the optimisation process for each
feature in X starts. For this optimisation, the following term needs to be minimized:
0 '2 1
& ( # $)
e = min Ω θ Ik(xi ) − Ik+1(x) (21)
pk k+1
x∈ν
4 Results
The approach was evaluated by using a visual-inertial prototype (as shown in Fig. 14)
which combines a standard industrial camera and the inertial smart sensor system. A
microcontroller located on the S 3 is responsible for synchronising camera and IMU
data.
An industrial robot was used in order to generate measurements with known mo-
tion, which can be used as ground truth sequences. Due to the fact that the background
of the project is the area of 3D modelling, the used sequences contain only single
5
For this a simple first-order Taylor expansion of the minimisation term is used.
Visual-Inertial 2D Feature Tracking based on an Affine Photometric Model 313
objects and a uniform background. The following figure illustrates exemplary frames
of a typical sequence (Fig. 15).
We tested different motion patterns and optimised the corresponding parameters
of the algorithm in order to produce best results. It was found that especially for high
rotational velocities of the camera the VIFtrack! approach is able to outperform other
feature tracking methods. Due to the fact that classical methods, such as the KLT-
tracker from [11], utilise a purely translational model it is quite clear that especially a
rolling camera leads to non-converging behaviour for many feature points. Figure 16
shows a typical motion pattern (slow camera speed) which we used for the evaluation.
The suggested scheme can increase the number of successfully tracked features6 up
to 60 % in comparison to classical KLT for sequences with a rolling camera.
Figure 17 shows a comparison of the tracking performance for the VIFtrack!-
method and the same principle (affine-photometric warping) only based on visual
information for a given sequence. The mean number of successfully tracked features
increases from 74 for visual-alone feature tracking up to 91 for the VIFtrack! scheme
respectively. Especially for applications where a specific number of corresponding
features is necessary (e.g. visual odometry) the VIFtrack!-method is useful, because
while the visual-alone feature tracker loses up to 54 % of its feature points, VIFtrack!
loses only up to 21 %.
The algorithm was also tested for a hand-held camera which was moved through
an indoor environment. Figure 18 shows two typical examples for the tracking of
features between two subsequent frames of the sequence. This sequence is more
complex because the camera is freely moving within an indoor environment and no
6
Here a successfully tracked feature is a feature which is not neglected based on the error threshold
elimit .
314 D. Aufderheide et al.
Fig. 16 Typical motion pattern for the evaluation describing rotation around the three Euler an-
gles: Black: ground truth motion from industrial robot (IRB), red: measured angles from inertial
measurements (IMU), green: estimated angles by fusion inertial and visual motion estimates (EKF))
Tracking performance
100
Number of tracked features
80
60
0
0 5 10 15 20 25
Time [s]
Fig. 17 Performance comparison between VIFtrack! and affine-photometric warping only based
on visual information for the “object” sequence
Visual-Inertial 2D Feature Tracking based on an Affine Photometric Model 315
Fig. 18 Two examples for subsequent feature tracking results for the sequence gathered from a
hand-held camera moved within an indoor environment
feature detected initially, within the first frame, remains visible for the entire se-
quence. For evaluating the VIFtrack! procedure a simple routine was introduced,
which generates a set of feature candidates 1 X from the first frame. During the mo-
tion of the camera the number of successfully tracked features n decreases over time.
Once n reaches a certain threshold , the algorithms generates a new set of feature
candidates k X from the actual frame k of the sequence. This simple procedure should
avoid that the tracking algorithm looses its track completely. The following table (Ta-
ble 1) shows how often the algorithm generates a new set of feature candidates for
the visual-inertial approach rV I and classical KLT rKLT .
Table 1 Comparison of the n rV I rKLT rKLT −rV I
rV I
number of reinitialisation of (%)
feature candidates for
VIFtrack! and classical KLT 100 13 18 38
80 16 23 44
60 21 31 48
40 35 53 51
20 44 75 70
316 D. Aufderheide et al.
It can be seen from Table 1, that the usage of the VIFtrack! scheme is able
to reduce the number of necessary re-initialisations of feature candidates due to
the more robust feature tracking. Especially for a small number of initial feature
candidates the visual-inertial feature tracking outperforms classical KLT.
5 Conclusion
The general problem of tracking a point feature throughout an image sequence ac-
quired by a moving camera requires the implementation of an algorithm which is
able to model the change of the visual appearance of each feature over time. The state
of the art motion model used for feature tracking is an affine-photometric warping
model, which models both changes in geometry and photometric conditions. For
camera movements which involve high rotational velocities the 2D displacement of
a point feature between two successive frames will increase dramatically. This leads
to a non-converging behaviour of the minimisation problem, which adjusts a set of
parameters in order to find the optimal match of the corresponding feature.
The usage of motion estimates, generated by an inertial smart sensor system as
initial estimates for the motion model, leads to an increasing number of feature
points, which can be successfully tracked throughout the whole sequence.
Future work will look into the possibility of fusing different motion estimates from
visual and inertial cues, which would hopefully lead to a higher robustness against
incorrect inertial measurements. For this visual-based relative pose estimators need
to be evaluated to get a handle on the accuracy (see Aufderheide et al. [4]).
References
1. Aufderheide D, Krybus W (2010) Towards real-time camera egomotion estimation and three-
dimensional scene acquisition from monocular image streams. In: Proceedings of the 2010
international conference on Indoor Positioning and Indoor Navigation (IPIN 2010). Zurich,
Switzerland, September, 15–17 2010, pp 1–10. IEEE – ISBN 978-1-4244-5862-2
2. Aufderheide D, Steffens M, Kieneke S, Krybus W, Kohring C, Morton D (2009) Detection
of salient regions for stereo matching by a probabilistic scene analysis. In: Proceedings of
the 9th conference on optical 3-D measurement techniques. Vienna, Austria, July, 1–3 2009,
pp 328–331. ISBN 978-3-9501492-5-8
3. Aufderheide D, Krybus W, Dodds D (2011) A MEMS-based smart sensor system for estimation
of camera pose for computer vision applications. In: Proceedings of the University of Bolton
Research and Innovation Conference 2011, Bolton, U.K., June, 28–29 2011, The University
of Bolton Institutional Repository
4. Aufderheide D, Krybus W, Witkowski U, Edwards G (2012) Solving the PnP problem for visual
odometry—an evaluation of methodologies for mobile robots. In: Advances in autonomous
robotics—joint proceedings of the 13th annual TAROS conference and the 15th annual FIRA
RoboWorld Congress Bristol, UK, August 20–23, pp 461–462
5. Harris C, Stephens M (1988) A combined corner and edge detector. In: Proceedings of the 4th
Alvey vision conference, pp 147–151
Visual-Inertial 2D Feature Tracking based on an Affine Photometric Model 317
6. Hwangbo M, Kim JS, Kanade T (2009) Inertial-aided KLT feature tracking for a moving
camera. In: 2009 IEEE/RJS international conference on intelligent robots and systems. St.
Louis, USA, pp 1909–1916
7. Hwangbo M, Kim JS, Kanade T (2011) Gyro-aided feature tracking for a moving camera:
fusion, auto-calibration and GPU implementation. Int J Robot Res 30(14):1755–1774
8. Jin H, Favaro P, Soatto S (2001) Real-time feature tracking and outlier rejection with changes
in illumination. In: Proceedings of the International Conference on Computer Vision (ICCV),
July 2001
9. Juan L, Gwun O (2009) A comparison of SIFT, PCA-SIFT and SURF. Int J Image Process
(IJIP) 3(4):143–152. CSC Journals
10. Kim J, Hwangbo M, Kanade T (2009) Realtime affine-photometric KLT feature tracker on
GPU in CUDA framework. The fifth IEEE workshop on embedded computer vision in ICCV
2009, Sept 2009, pp 1306–1311
11. Lucas B, Kanade T (1981) An iterative image registration technique with an application to
Stereo vision. In: International joint conference on artificial intelligence, pp 674–679
12. Rehbinder H, Hu X (2004) Drift-free attitude estimation for accelerated rigid bodies.
Automatica 40(4):653–659
13. Sabatini A (2006) Quaternion-based extended Kalman filter for determining orientations by
inertial and magnetic sensing. IEEE Trans Biomed Eng 53(7):1346–1356
14. Skog I, Haendel P (2006) Calibration of MEMS inertial unit. In: Proceedings of the IXVII
IMEKO world congress on metrology for a sustainable development
15. Tomasi C, Shi J (1994) Good features to track. In: IEEE computer vision and pattern recognition
1994
16. Welch G, Bishop G (2006) An introduction to the Kalman filter, Technical Report TR 95-041.
Department of Computer Science, University of North Carolina at Chapel Hill
Inferring Heading Direction from Silhouettes
Abstract Due to the absence of features that may be extracted from face, heading
direction estimation for low resolution images is a difficult task. For such images,
estimating heading direction requires to taking into account all information that may
be inferred from human body in image, particularly its silhouette. We propose in this
paper a set of geometric features extracted from shape shoulders-head, feet and knees
shapes which jointly allow the estimation of body direction. Other features extracted
from head-shoulders are proposed for the estimation of heading direction based on
body direction. The constraint of camera position related to proposed features is
discussed and results of experiments conducted are presented.
1 Introduction
Heading direction estimation is one of challenging tasks for computer vision re-
searchers especially in case of low resolution images. In case of high and medium
resolution images, many approaches has been proposed to solve this problem. A sur-
vey may be found in [11]. All of these approaches try to find the most discriminate
set of facial features which permit to estimate the pose. The objective to reach for
any proposed technique is to verify a set of criteria such as: Accuracy, Monocular,
Autonomous, Multi-person, Identity and Lighting invariant, Resolution independent,
Full range of head motion and Real time [11].
Face extraction in low-resolution images is an important task in the process of
heading direction estimation. Few works have been devoted for this purpose and
all present difficulties for detecting faces when the resolution of images decreases
[18]: Labeled training examples of head images are used to train various types of
classifiers such as support vector machines, neural networks, nearest neighbor and
tree based classifiers [3, 4, 13]. The disadvantage of these methods is the requirement
of all combinations of lighting conditions and skin/hair colour variations in order to
estimate an accurate classification.
Contextual features has been used in addition to visual ones in order to improve
the quality of heading direction estimation [1, 8, 9]. Using multiple views camera,
Voit et al. [17] estimate head pose for low resolution image by appearance-based
method. The head size varies around 20 × 25 and the obtained results are satisfactory
due to the use of multiple cameras. Additional contextual information: multiple
calibrated camera and a specific scene allows estimating of absolute coarse head
pose for wide-angle overhead cameras by integrating 3D head position [16].
Head-shoulders shape has been studied and many methods have been proposed
for the purpose of human detection in images using wavelet decomposition technique
and support vector machine [14] or background subtraction algorithm [12]. In other
side, Head-shoulders shape has been used for human tracking and head pose esti-
mation. In [12], the direction of head movements is detected and tracked throughout
video frames. Templates are captured for a specific position of the camera (mounted
sufficiently high above to provide a top-view of the scene) and do not use all positions
of the head pose. Shape context is used but this descriptor is sensitive to the locations
of pixels of the shape outline.
Another important feature that may contribute for heading-direction estimation
is the legs shape. However, the use of detectors on the lower parts of the body has
been introduced in many works for human body pose calculation and human action
recognition [15]. Legs shape has been also used for human segmentation. Lin et
al. [10] modeled the parts of the body particularly the legs in order to detect and
segment human. The proposed approach is based on the matching of part-template
tree images hierarchically proposed and used initially in [6, 7].
The problem or heading for low-resolution images without adding contextual in-
formation requires yet more contributions in order to deal with complex scenes where
human are relatively far from the camera. The performance of proposed methods are
principally limited because they are based on extracted features from the head which
are very dependent on camera placement and the chosen texture and skin color mod-
els depend on the resolution of the head in the image and therefore doesn’t work for
lower resolution.
In this paper, we investigate what can be done from shoulders-head and legs shapes
for heading direction estimation in case of low-resolution images. Firstly, a set of
features are extracted from shoulders-head and legs shapes and used for inferring
body direction. In the next, heading direction is estimated using body direction
and features extracted from head-shoulders shape. Section 2 covers the theoretical
aspects of body and heading direction estimation based on features extracted from
shoulders-head and legs shapes. Experiments are conducted to validate our approach
and obtained results are presented in Sect. 3.
Inferring Heading Direction from Silhouettes 321
Assuming that silhouettes of humans are extracted from images of low resolution,
our aim is to estimate body and heading directions. Geometric features are extracted
from silhouette due to the absence of other features that may be extracted from the
face for such images. We will focus in this paper on the parts head, shoulders, knees
and feet shapes which may be considered as a good features to achieve this task.
Body direction is firstly estimated using features extracted from head and shoulders,
knees and feet shapes. Secondly, heading direction is inferred from estimated body
direction and features of head and shoulders shape.
A shape leg is a part of human silhouette which plays a dominant role in the process of
inferring body direction from image. Indeed, our visual system is able to infer body
direction seeing only the outline shape legs (see Fig. 1). We propose three determinant
cues of shapes legs and head-shoulders that allow inferring body direction when they
are extracted from outline shape. These features cannot be computed for a fixed top
down camera because head-shoulders are confused with body silhouette.
The first one is the inflections of the knees. When a leg is well separated from
the other and the knee is inflected, a coarse body direction can be inferred with-
out ambiguity. Figure 2a illustrates an example of shape legs where feet are cut.
Our visual system can easily give an estimate of body direction because the feet
have limited possibilities of poses due to the geometry of one leg (high inflexion).
Figure 2b illustrates the correct poses and the directions can be inferred using the
feet shapes, however Fig. 2c shows impossible situation. The directions of the lines
joining inflexion points of the same leg are used to infer the body direction.
The second one is the direction of shape foot. Indeed, our visual system encounters
difficulties by looking at legs shapes without feet and cannot estimate body direction
for many configurations even if the body is moving and legs are well separated but
without inflexion of knees. For example, seeing to the outlines of Fig 3a, without
322 A. Bensebaa et al.
feet we cannot recognize to what direction body is moving. This ambiguity is clear
seeing at the original shapes (see Fig. 3b) and at new shapes obtained drawing feet
(see Fig. 3c). The base lines of the feet are good features because they indicate the
body direction. Their use is explained in Sect. 2.2.
The third feature concerns the variation of silhouette’s width along the shape
head-shoulders and the length of each shoulder. The ratio of the width of the upper
part (head) and the lower part (shoulders) with the varying of the shoulders length
are related to the angle of rotation. We noticed that there’s an opposite relationship
between the ratio and the orientation angle.
Body Direction Estimation Using Feet’s Features: This task consists to split the
lower human shape into separated legs, separated lower legs or grouped legs (The
two first cases include the case where the knee of one leg is inflected). We associate
to each foot a base line defined by two extremities of the foot located between the
heel and the toes. The outline of lower part is processed in order to determine the
baseline of the feet located between the heel and the toes. Firstly, high convexities
points Cv1 and Cv2 characterizing the outline foot are located (see Fig. 4). Secondly,
Inferring Heading Direction from Silhouettes 323
the last point of interest Cc representing a high concavity on this outline is located,
such as the distances CcCv2 is minimal. The convex point that represents toes, will
be the closest point to the concave point of the feet outline, the other convex point will
obviously correspond to the heel. Thus the base line joins the two convexities of the
foot and the orientation of feet corresponds to the vector carried by the feet base line.
Applying the 2D quasi-invariant, the angle between the two vectors measured in
3D-space varies slowly in the image as viewpoint varies [2]. As in the scene the
disposition of foot vectors is restricted by the human physic constraints, it will be
the same case in image plane; the body direction is inferred as the average of foot
directions. Once the base lines of feet are extracted, body orientation is computed as
the resultant vector of the two orientations (see Fig. 5a). When one foot is not put on
the ground, which correspond to a high inflection of the knee, the resultant vector
will have the direction of the base line of the other foot (see Fig. 5b).
Body Direction Estimation Using Knee’s Features: Extraction of inflection points
consists to find the best concave or convex pixels of the lower part of the silhouette
using the Chetverikov’s algorithm [5]. Among the selected points of inflection p, p ∗
which is the farthest to the line binding p− and p + is chosen. The position of p − , p+
to p ∗ is a parameter (see Fig. 6).
Many types of knees inflexion may be located (see Fig. 7). The direction of the
body follows the direction of the inflected knee considered as the direction of the
324 A. Bensebaa et al.
Fig. 7 Some cases of knee inflexion and the inferred direction of them
line joining the concave point to the convex one. Only the direction left towards right
and inversely will be considered.
Body Direction Estimation Using Head-Shoulders Features: Applying the algo-
rithm of D. Chetverikov [5], the two concave points (left and right) delineating the
head and the two convex points (left and right) extremities of shoulders are located.
Head is separated by locating the pixel having the minimum angle among the selected
point candidates. The two convex pixels are located based on high curvature. Each
pixel is characterized by the fact that it is the farthest from the line (L) connecting
the beginning of the shoulder and the end pixel of the head-shoulders outline (see
Fig. 8).
When human is in the centre of field view of the camera, the average of computed
ratios Rw (ratio of the widths of head and shoulders) estimated are given by Table 1
and the Fig. 9 illustrates an example corresponding to the rotation of a person towards
the left using the ratio Rw of head-shoulders.
Inferring Heading Direction from Silhouettes 325
We assume now that body direction is estimated based on the three features proposed
above (head-shoulders, knee inflexion and feet). In order to estimate the heading
direction, we will base our approach on two features extracted from head-shoulders
outline.
Features Extraction The first feature concerns the lengths of shoulders SL and SR
on shape head-shoulders. In some cases, the end of the neck is not visible on one side
due to head occlusion. In this case, it will be replaced by the point of high curvature
on head-shoulders outline.
The lengths of shoulders are important cues for both head and body directions
estimation and the difference between lengths of SL and SR arises from one of the
following configurations:
• Depending on the camera and body positions, the head can occlude a part of one
shoulder and then decreases the shoulder length. For example, when the camera
is on top at the right or at the left of the person (see Fig. 10).
• When human body is rotating, one of shoulders becomes less visible. This occurs
for example when the camera is on top even if the person is in front of the camera.
In this case, length of one shoulder decreases until that the two sides of the shape
head-shoulders do not correspond to shoulders.
Consequently, when the direction of body and head is in front to the camera, the
lengths L(SL ), L(SR ) of shoulders are identical. Otherwise, when the head is rotating
or when body is at the lateral side of the camera, this equality is not verified because
326 A. Bensebaa et al.
Fig. 11 Intersection of
shoulders in case where a
body and head are in front, b
body and head rotating
Fig. 12 Different poses of head where dR , dL are illustrated with blue and red color in case of
human is in the center of the field of view
in both cases the head occludes a part of one shoulder (see Fig. 10). We proved
geometrically that without occlusion by head, the lengths of one shoulder decreases
when body is rotating.
The second feature which completes the first one, concerns the occluded parts of
shoulders that permit to estimate head rotation. Let I be the intersection point of the
lines joining extremities of shoulders SL and SR (see Fig. 11). When body and head
are in front to the camera, the distances dL and dR from I to shoulders are identical
in the scene and in image plane. However, when head or body are rotating, these
distances are different in image because a part of shoulder is occluded by head and
thus in image the distance dL or dR includes the occluded segment of the shoulder and
a part of the neck. The distances dL , dR will be used to infer the heading direction.
Coarse Estimation of Head Direction Heading direction is estimated assuming
that in previous steps, the body orientation, the difference ΔL between the lengths
of shoulders (SL ) and (SR ) and the difference Δd between the distances dL and dR
are computed. We distinguish three cases: body is in the center, at the left, or at right
of the view field. For the two first cases, We give in Table 2 the results obtained of
heading direction applying a geometric reasoning depending on the values of ΔL
and δd and body direction. The third case is symmetrical to the second one. Figure 12
illustrates the variation of ΔL and δd in case where human in the center of the field
of view of the camera.
Inferring Heading Direction from Silhouettes 327
As we are interested in this work to images of low resolution which means a far field
of view, the camera may be:
• Fixed at the top and far from the scene. In this case, none from the features: head,
shoulders, legs and feet can’t be located using the blob representing human.
• Fixed so as its optical axis is oblique or horizontal towards the scene. in this case,
whatever the position of the camera relatively to human in the scene: in front or
at the lateral position, its head-shoulders, legs and feet are viewed. Consequently,
the availability of the proposed features depends only on the pose, which means
that inflexion of knees or feet base lines may be missed, what is required is the
presence of the head-shoulders outline.
3 Results
We applied our method on PETS data set. Firstly silhouettes are extracted and body
direction is firstly computed. In the next, heading direction is estimated. We used all
features extracted from head-shoulders, feet and knees outlines.
Figure 13 illustrates some poses, extracted silhouettes and computed body direc-
tions. Body direction is computed using the ratio Rw having respectively the values
328
Fig. 13 Some poses and extracted silhouettes and the computed body directions based on Rw values
A. Bensebaa et al.
Inferring Heading Direction from Silhouettes 329
2.6, 2.89, 2.25, 1.33, 1.36, 2.27, 2.09 giving the directions: [0◦ , 15◦ ], [0◦ , 15◦ ],
[15◦ , 30◦ ], [75◦ , 90◦ ], [75◦ , 90◦ ], [15◦ , 30◦ ], [0◦ , 15◦ ]. As the computed body direc-
tion for the two last poses (f ), (g) are done using only the first feature which cannot
differentiate if the body is in front or of back with regard to the camera.
The orientation of feet, when are located in the image, eliminates the ambiguity
(in front or of back). Figure 14 illustrates some body poses which combine only
features of head-shoulders and feet (knees inflexions are not visible).
The combination of features used for body direction depends on what can be
extracted in image. The features extracted from feet and knees are more strong than
those extracted from head-shoulders which just allows us to calculate the direction.
Figure 15 illustrates the results obtained when inflexion of knees are used in addition
of the ratio Rw .
Heading direction estimation is based on estimated body direction and the values
of dL , dR computed using head-shoulders outline. We can see in Figure 16 the use
of all presented features for estimating heading direction. Figure 17 summarizes this
combination of features and shows that a good estimation is made even if the images
are of low resolution.
4 Conclusion
We proposed in this paper a method for heading direction for images based on
geometric features which can be extracted from silhouette even if images are of
low resolution. Body direction is inferred from features extracted from outlines of
knees and feet and head-shoulders. This direction is used in addition to features
extracted from outlines of head-shoulders for estimating heading direction. The pro-
posed method has been applied on real images and achieves good estimation of
heading direction. Also, the features extracted are independent from camera pose,
except the top view where head-shoulders, knees and feet cannot be located on human
shape.
330
Fig. 15 Body orientation using the features: knee inflexion and Rw ratio
331
332
References
1. Ba SO, Odobez JM (2011) Multiperson visual focus of attention from head pose and meeting
contextual cues. IEEE Trans Pattern Anal Mach Intell 33(1):101–116
2. Binford TO, Levitt TS (1993) Quasi-invariants: theory and exploitation. In: Proceedings of
DARPA Image Understanding Workshop, pp 819–829
3. Benfold B, Reid I (2008) Colour invariant head pose classification in low resolution video. In:
Proceedings of the 19th British Machine Vision Conference
4. Benfold B, Reid I (2011) Unsupervised learning of a scene-specific coarse gaze estimator. In:
Proceedings of the International Conference on Computer Vision (ICCV), pp 2344–2351
5. Chetverikov D (2003) A simple and efficient algorithm for detection of high curvature points
in planar curves. In: Computer analysis of images and patterns, 10th international conference,
CAIP 2003, Groningen, the Netherlands, August 2003, pp 25–27
6. Gavrila DM (1999, Jan) The visual analysis of human movement: a survey. Comput Vis Image
Underst 73(1):8298
7. Gavrila DM (2007) A bayesian, exemplar-based approach to hierarchical shape matching. IEEE
Trans Pattern Anal Mach Intell 29(8):1408–1421
8. Lanz O, Brunelli R (2008) Joint Bayesian tracking of head location and pose from
low-resolution video. In: Multimodal technologies for perception of humans, pp 287–296
9. Launila A, Sullivan J (2010) Contextual features for head pose estimation in football games.
In: International conference on pattern recognition (ICPR 2010), Turkey, pp 340–343
10. Lin Z, Davis LS (2010) Shape-based human detection and segmentation via hierarchical part-
template matching. IEEE Trans Pattern Anal Mach Intell 32(4):604–618
11. Murphy-Chutorian E, Trivedi MM (2009, April) Head pose estimation in computer vision: a
survey. Pattern Anal Mach Intell, IEEE Trans 31(4):607–626
12. Ozturk O, Yamasaki T, Aizawa K (2009) Tracking of humans and estimation of body/head
orientation from top-view single camera for visual focus of attention analysis. In: IEEE 12th
international conference on computer vision workshops (ICCV workshops), pp 1020–1027
13. Robertson NM, Reid ID (2006) A general method for human activity recognition in video.
Comput Vis Image Underst 104(2–3):232–248
14. SunY, WangY, HeY, HuaY (2005) Head-and-shoulder detection in varying pose. In: Advances
in natural computation, first international conference, ICNC, Changsha, China, pp 12–20
15. Singh VK, Nevatia R, Huang C (2010) Efficient inference with multiple heterogeneous part
detectors for human pose estimation. In: Computer vision ECCV 2010, pp 314–327
16. Tian YL, Brown L, Connell C, Sharat P, Arun H, Senior A, Bolle R (2003) Absolute head pose
estimation from overhead wide-angle cameras. In: IEEE international workshop on analysis
and modeling of faces and gestures, AMFG 2003, pp 92–99
17. Voit M, Nickel K, Stiefelhagen R (2006) A Bayesian approach for multi-view head pose esti-
mation. In: IEEE international conference on multisensor fusion and integration for intelligent
systems, pp 31–34
18. Zheng J, Ramirez GA, Fuentes O (2010) Face detection in low-resolution color images. In:
Proceedings of the 7th international conference on image analysis and recognition, ICIAR’10,
Portugal, pp 454–463
A Fast and Accurate Algorithm for Detecting
and Tracking Moving Hand Gestures
Abstract Human vision plays a very important role in the perception of the en-
vironment, communication and interaction between individuals. Machine vision is
increasingly being embedded in electronic devices, as cameras are used with the
function of perceiving the environment and identifying the elements inserted in a
scene. Real-time image processing and pattern recognition are processing intensive
tasks, even with the technology of today. This chapter proposes a vision system that
recognizes hand gestures combining motion detection techniques, detection of skin
tones, and classification using a model based on the Haar Cascade and CamShift
algorithms. The new algorithm presented is 29 % faster than its competitors.
1 Introduction
The evolution of computing devices made possible new types of man-machine in-
teraction. Touch screens, voice recognition, and motion detection are amongst the
main representatives of such new interfaces. Motion detection systems are becoming
more popular every day and either use controls with markers or cameras for modern
gesture recognition. Motion recognition systems provide a very flexible and flexible
way to allow users to control equipments and softwares without using of traditional
devices such as keyboards, mice and remote controls.
2 Related Works
This chapter presents a gesture recognition algorithm that combines techniques that
seek to reduce the consumption of hardware resources and increase the efficiency of
gesture tracking. Thus, the works related to this article address the steps necessary to
the construction of a faster and more efficient algorithm for human gesture tracking
to “navigate” on a computer screen.
The difficulty in recognizing a color pattern, complicated by the “noise” inserted
by uneven illumination of the environment and the throughput limitations of the
embedded system was addressed in reference [18].
The image processing strategy of reducing the image resolution and yielding only
the silhouette of the original image that contains only the parts that moved was
described in reference [17].
The work [19] exploits the classification model based on gestures of a tree of
features proposed in reference [9], working with images of size 640 × 480 and the
A Fast and Accurate Algorithm for Detecting and Tracking Moving Hand Gestures 337
classifier generated from a set of 300 images containing a gesture with its variations.
The method proposed in [9] was initially used to detect faces, but could be trained to
detect any object that has features that could be distinguished from the background.
To allow users to interact with computing devices over larger distances, for exam-
ple, for handling an iDTV (interactive Digital TV) set with a distance between device
and spectator over 3 m, the images need to have quality and definition enough to meet
the requirements of the gesture recognition algorithm. The work [20] deals with the
construction of a strong cascade type classifier, formed from a set of 2000 gesture
images. The approach applied in that work combines the techniques of motion de-
tection to reduce the observation area on the image, detecting skin tones to restrict
gestures and search only on elements that have moved and have a skin color pattern,
followed by the classification of gestures using the model described in reference [9].
This chapter builds upon the work developed in [20], replacing the Haar classifier
in the step of tracking by CamShift—Continuously Adaptive Mean Shift [1]. The
AdaBoost Haar classifier needs to search for the information in each frame to identify
the object of interest at each stage and such information is used in each of the cascaded
features.
Detecting a hand moving over a relatively constant background seems to be a
simple task at first glance, but in reality that is a complex process. The major problem
faced is the large amount of input information available. Another problem addressed
in computer vision is the poor reliability and instability in object tracking, due to,
among other things, changes in lighting, occlusion, motion and noise in the capture
equipment. The human vision system integrates several features that are analyzed in
parallel, such as motion, color, contour, etc. Thus, with the acquired “knowledge of
the surrounding world”, one is able to easily deal with identification problem, most
of times. Accomplishing those tasks in a computer is not an easy task, however [12].
When developing a computer vision application, one must first define how to
capture gestures. In this chapter the optical model [16] was adopted. Such model
uses cameras that receive images and deliver them to the algorithm without physical
markers to assist in the process of searching for patterns in the images. This step
is important because the extracted features are used to train the gesture recognition
tool.
Two tasks are of paramount importance in gesture recognition: the construction
of the classifier, which serves as a knowledge base system and the image processing
application. The group of acquired images must undergo a noise reduction process
and elimination of unnecessary data before “feeding” the classifier. Such was the
strategy used in reference [20] and maintained in the present chapter in order to
have a common comparison basis for the results obtained when the Haar classifier
is replaced by the CamShift one in the process of tracking gestures.
338 W. C. S. S. Simões et al.
Fig. 1 Gestures suggested by the group of people after a questionnaire, using techniques of usability
engineering [11, 16]
Classifiers are responsible for the clustering of the input space. This clustering pro-
cess is carried out to determine the class of each object from its features. Such
clustering process can be of two types: supervised and unsupervised. In supervised
feature clustering, during training, the test samples are accompanied by annotations
indicating the actual class of the sample. The unsupervised classifier must infer N
divisions in the group data from relations between the characteristics of the samples,
the number of divisions normally specified by the developer. Among the methods
using unsupervised classification learning there are decision trees [9] and Boosting
[5].
Many algorithms use only the decision tree to get the features of the objects in
the images, because each node is associated with a measure that represents the ratio
between the amounts of each class of object in the tree node. This measure can be
modified through breaks (splits), which are performed on the dataset from restrictions
on the values of certain features in order to reduce the combination of different classes
in the same node. This clustering mode has the disadvantage that characteristics may
be sensitive to overfitting, and if incorrectly trained the classifier may be incorrectly
set up. Overfitting occurs when the statistical model used describes a random error
or noise instead of the desired object or gesture.
The Boosting technique allows building a strong classifier on top of a number of
weak classifiers, though the combination of their results. In particular AdaBoost—
Adaptive Boosting [2, 5, 11], which extended the original Boosting method to make
it adaptive, needs special attention. This method represents each weak classifier by a
small decision tree, which normally comprises only a break (Split). As the algorithm
progresses, the weak classifiers focus on points that the previous step had the worst
results, incrementally improving the quality of the final response. For this reason,
this chapter uses AdaBoost based classifiers.
The gestures (Fig. 1) that were mapped to the construction of classifiers were
defined from a study in [20].
To build an AdaBoost classifier it is necessary to choose two sets of images: the
positive, which contains the object one wants to map and the negative, which contains
other objects. After defining those two groups of images, three algorithms provided
A Fast and Accurate Algorithm for Detecting and Tracking Moving Hand Gestures 339
by the set of OpenCV libraries [2] are used. They are: Objectmarker, CreateSamples
and Traincascade
Objectmarker is responsible for scoring the positive images of the objects of
interest, creating a file containing the image name and the coordinates of the marking
area. Such text file is converted into a vector through the tool CreateSamples that
while standardizing brightness, lighting, and suitably scaling the window to the size
for the images to be cropped from the group of positive images. The default size
chosen for the images of this chapter is 20 by 20 pixels. The greater the number
of images and variations regarding illumination, reflection, backgrounds, scaling,
rotation, etc. in this step, the more accurate is the resulting classifier.
According to reference [9], each stage of the cascading should be independent
of the others, allowing creating a simple tree. When it is necessary to increase the
accuracy of the classifier, more images or more stages to the tree must be added.
Many references, such as [13, 14, 22, 23], suggest that in order to reach an accurate
classifier about 10,000 images are necessary.
This project made use of 2000 images acquired through an image capture software
written in Java. Such number of images was empirically defined by tuning the number
of images for the tree construction features. The process started with 500 images,
and it was found that increasing number of images, each stage became stronger,
improving the classifier eventually.
Another relevant feature that must be observed is the resolution of the images
used. While the literature indicates the use of images with dimensions of 640 × 480
pixels, this study used images with resolution 320 × 240 pixels, obtained from a
camera with native resolution of 12 mega pixels, which greatly increased the number
of perceived characteristics at each stage and a performance far superior to that
obtained with images of 640 × 480 pixels.
Finally, after these two steps, the vector of positive images and folder containing
the negative images are submitted to the algorithm Traincascade that performs the
training and the creation of the cascade of classifiers. This algorithm compares the
positive and negative images, used as a background, attempting to find edges and
other features [17]. This is the step that is more time intensive to execute, thus it
was important to monitor the estimates that are displayed on the screen and see if
the classifier would be either effective or not based on the successes and false alarm
rates at each stage. Reference [9] indicates that it takes at least 14 steps to start the
process of recognition of some object.
The Traincascade algorithm trains the classifier with the submitted sample, and
generates a cascade using Haar-type features. Despite the importance of determining
the texture, the detection of the shape of an object is a recurring problem in machine
vision. Reference [9, 11] proposed the use of rectangular features, known as Haar-
like, rather than the color intensities to improve the inference to the shape of an
object and increase the accuracy of the classifier from a concept called integral image.
From the integral image it is possible to calculate the sum of values in a rectangular
region in constant time, simplifying and speeding up the feature extraction in image
processing.
340 W. C. S. S. Simões et al.
A software module was developed to enable the camera to capture images, process
them, and submit them to the classifier. Multiple gestures recognition was achieved
through the use of threads.
To increase the possibility of using the algorithm in different environments, var-
ious methods of image processing were used to minimize the noise level and also
elements that do not make gestures mapped to classifiers. Overall, the technical
literature divides a recognition system of objects and gestures into four parts [7]:
Pre-processing; Segmentation; Feature extraction and Statistical Classification. The
following sub-sections describe the main features of each of them.
3.2.1 Pre-Processing
System calibration tasks, geometric distortion correction, and noise removal take
place in the pre-processing stage. One of the concerns in the pre-processing is the
removal of noise caused by many factors, such as resolution of the equipment used,
lighting, distance from the object or gesture over the camera, etc. Salt-and-pepper
noises often appear in the images. The white pixels scattered in the image, called salt
noises, are the pixels of a particular image region that have high value surrounded
by low value pixels. The pepper noise is the opposite situation to that of salt noise.
There are two ways to process those noises: using morphological transformations or
A Fast and Accurate Algorithm for Detecting and Tracking Moving Hand Gestures 341
applying Gaussian smoothing methods to approximate the values of the pixels and
decreasing the perception of such noises.
In a digital image represented on a grid, a pixel has a common border with four
pixels and shares a common corner with four additional pixels. It is said that two
pixels are 4-neighbors if they share a common border. [8, 10] Similarly, two pixels
are 8-neighbors if they share at least one corner. For example, a pixel at location [i, j]
is 4-neighbors [i + 1, j], [i-1, j], [i, j + 1] and [i, j-1]. The 8-neighbors of the pixel
including the four-nearest neighbor [i + 1, j + 1], [i + 1, j-1] [i-1, j + 1] and [i-1, j-1].
Figure 6 shows how the pixels are presented in order neighbors and 4-8-neighbors.
(Fig. 2)
The morphological operations used in this study were erosion, which removed
the pixels that did not meet the minimum requirements of the neighborhood and
dilation, which entered pixels in the image is crafted by erosion, also according to a
pre-determined neighborhood. After applying the morphological transformation, a
smoothing operation takes place. Such transformation performs the approximation
of the values of the pixels, attempting to blur or to filter out the noise or other fine-
scale or dispersed structures. The model used in this project was the 3 × 3 Gaussian
Blur, also known as Gaussian smoothing. The visual effect of such technique is a
blurred soft similar to display the image on a translucent screen.
3.2.2 Segmentation
A feature extractor is used to reduce the space of the significant image elements,
that is, a facilitator of the classification process and is often applied not only for
the recognition of objects, but also to group together similar characteristics in the
image segmentation process [24]. Therefore feature extraction is a way to achieve
dimensional reduction. This task is especially important in real-time applications
because they receive a stream of input data that must be processed immediately.
Usually, there is a high degree of redundancy is such data stream (much data with
repeated information) and need to be reduced to a set of representative features.
If the extracted features are carefully chosen, this set is expected to bring relevant
information to perform a task. The steps taken here to sieve the significant pixels in
the data stream were Motion detection and Skin detection, which are detailed next.
The technique chosen to perform the motion detection consisted in making a back-
ground subtraction, removing the pixels that have not been altered from the previous
frame, thereby decreasing the number of pixels to be subjected to the subsequent
process of gesture recognition. The algorithm performed the following steps:
• Capture two frames;
• Compare the colors of the pixels in each frame;
• If the color is the same, replace the original color by a white pixel. Otherwise
leave it unchanged.
This algorithm, while reducing the amount of pixels in the image that will be pre-
sented to the process of gesture recognition can still display some elements that do
not relate to the gesture itself, such as the clothing of the user or other object that may
be moving the captured images and that will only increase the need of processing
without the end result being of any relevance.
A second way to reduce the amount of pixels is applying a color filter. As the goal
is to track gestures, the choice was Skin detection.
There may be many objects in the environment that have the same color as the human
skin, which varies in color, hues, color, intensity and position of the illumination
source, the environment the person is in, etc. In such cases, even a human observer
cannot determine whether a particular color was obtained from a region of the skin
or an object that is in the image without taking into account contextual information.
An effective model of skin color should solve this ambiguity between skin colors
and other objects.
It is not a simple task to build a model of skin color that works in all possible
lighting conditions. However, a good model of skin color must have some kind
A Fast and Accurate Algorithm for Detecting and Tracking Moving Hand Gestures 343
No Yes
1) Moon Detecon Found Gesture?
No
Yes
Found Moon? 3) Haar-like features
Yes
Yes
No
Fig. 3 Diagram of the process of gesture recognition using motion detection, skin and Haar cascade
[20]
The detection of gestures using the Haar classifier is done by sliding a search window
across the image and checking whether a region of the image in a certain location can
be classified as a gesture. In uncontrolled environments, gestures can be presented
to the different distances that were used to build the classifier. For such reason, the
method proposed here uses Haar scaling to modify the size of the detector rather
than scaling the image.
The initial size of the detector is 20 × 20 pixels, and after each scanning of the
sliding window over the entire frame containing the image, the scale of the detector
is increased by α. The search process defined by the values in the image classifier
can be affected both in efficiency and in performance, because if the scale s, the
detector window is configured to [sΔ], where [ ] represents the rounding operation.
The choice of the factor α affects both the speed of the detection process for accuracy.
Such a value has to be carefully chosen in order to obtain a better relationship between
accuracy and processing time. The factor α applied in this project is 10 %.
The developed system received 800 × 600 images containing hand gestures. The
assessment of the classifier showed that it could process 20 frames per second and
correctly detected 89 % of the input frames, executing on a machine with a processor
Intel i5 2.27 GHz M430.
The diagram in Fig. 3 shows the complete flow of the image processing technique
presented in [20], from image capture to gesture recognition, in which the key steps
A Fast and Accurate Algorithm for Detecting and Tracking Moving Hand Gestures 345
Fig. 4 Detection of the open right hand and left hand in hand after the steps of skin detection and
motion detection, passing the resulting vector to the classifier
were already outlined. Figure 4 shows the open right hand and left hand being
detected.
3.4 Camshift
No Yes
1) Moon Detecon Found Gesture? 4) Canshi
No
No
Yes
Lost Gesture
Found Moon? 3) Haar-like features
Track?
Yes
Yes
No
Fig. 5 Diagram of the gesture recognition process using motion, skin, Haar cascade and CamShift
In this project, the CamShift procedure receives a defined region from the Haar
classifier as input. At this stage the CamShift is no longer receiving ROIs of motion
and skin detection, so the images that are passed to it does not contain pixels that
have a component H with a value less than 60, thus only the most relevant pixels are
processed.
The addition of the CamShift procedure and removal of the Haar transform after
the positive gesture identification yielded a throughput of 28 frames per second with
a correct detection rate of 94 %, thus a performance gain of 29 and 5.6 % efficiency
over the algorithm presented in [20].
The diagram in Fig. 5 shows the complete flow of the image processing technique
developed in this chapter.
To benchmark the efficiency of the classifier built here recognition tests were per-
formed using an image database with 1000 files for each gesture. Such files were
generated together with 2000 files used to build the classifier encompassing people
A Fast and Accurate Algorithm for Detecting and Tracking Moving Hand Gestures 347
of different skin color using non-uniform varied backgrounds under several illumi-
nation scenarios. Such test files were not used for training the classifiers, being kept
only for benchmarking purposes of gesture recognition accuracy.
Table 1 presents the results of the classifiers for each of the gesture. The average
accuracy of the proposed classifier for gesture recognition reached 87 %.
To assess the efficiency of the each of the processing phases of the images used
in this work, specific tests were made to analyze the preprocessing, segmentation,
feature extraction and the classification performance.
In the preprocessing phase the values of the components of the morphological
transformation were varied and it was observed if the recognition algorithm still
succeeded in correctly detecting a gesture. Figure 6 shows the variation of the values
348 W. C. S. S. Simões et al.
of the erosion and dilatation component. The best configuration observed was the one
that used the erosion factor equal to three and dilatation factor also equal to three.
Other values for such factors would eliminate parts that were relevant to gesture
detection.
The gaussian filter reached the best value when the smoothness factor was equal
to five, because it did not eliminate the pixels that correspond to the gestures, making
the other pixels more uniform as may be seen in Fig. 7.
The parameters of the edge recognition filters of Canny and Sobel were analyzed
in the segmentation phase. Figure 8 shows the effect of such filters applied to the
images yielding a better definition of the edges in the resulting images. The best
setting for the Sobel filter was with factor equal to two, while for Canny filter that
was 120, both with the minimum number of edges equal to three. Such setting yielded
a “strengthen” in the edges of the gesture parts, eliminating the elements that did not
fit such pattern.
Feature extraction made use of two techniques: Motion Detection and Skin Detec-
tion. The association of those two techniques attempted to eliminate the static pixels
and that did not fit the minimum and maximum thresholds of the ranges defined as
skin tones.
Two techniques were used in Motion Detection: the border and internal pixel
detection, the latter also known as gaussian mixture. Some of the results of the tests
performed over the parameters of such two techniques are shown in the images of
Fig. 9.
The best configuration to the Motion Detection algorithm used a frame distance
factor equal to three and a Gaussian mixture with a morphological transformation
factor of five and a smoothness factor of three, because those were the parameters
that kept at minimum the variation of the quantity of moving pixels.
The Skin Detection algorithm used as lower thresholds in its components r = 25,
g = 55 and b = 5 and the upper thresholds of rr = 160, gg = 255, bb = 190 (Fig. 10).
After the tests that used only the Haar-like classifiers other resources were tested
to assess the result of feature extraction with the techniques of Motion Detection,
Skin Detection and CamShift. The results obtained are shown in Fig. 11.
A Fast and Accurate Algorithm for Detecting and Tracking Moving Hand Gestures 349
One must remark that using a Haar-like classifier there was a high processing effort
involved in the task of tracking the gesture. That fact was observed by counting the
number of frames per second in the classifier that were able to be processed using each
of the methods described. Such processing effort was made lower with the association
of techniques of Motion and Skin Detection, which reduced the quantity of bits that
was submitted to the comparison process with the classifier, but that was still far
behind of the throughput of the camera. Besides the performance factor, the classifier
had its performance degraded with the variation in the illumination, and changes in
angles and rotation of the gestures presented. As the real performance bounds in
real-time image processing is the throughput of the capture device, CamShift was
used as it is a technique that is a gesture or object tracking scheme that has constant-
time performance after that the Haar-like classifier has performed the mapping of
the gestures onto the classifiers. The addition of CamShift, removing the tracking
task of the Haar-like scheme presented a processing performance of 26 frames per
second.
350 W. C. S. S. Simões et al.
30 29
26
25
20 18
15
11
10
5 3
0
Original Video Haar Cascade Haar Cascade + Haar Cascade + Haar Cascade +
Moon Moon + Skin Moon + Skin +
CamShi
Fig. 11 Diagram comparing the several methods to the one proposed in this project (rightmost bar)
5 Conclusions
References
1. Allen JG, Xu RYD, Jin JS (2004) Object tracking using camshift algorithm and multiple
quantized feature spaces. In: VIP ’05: Proceedings of the Pan-Sydney area workshop on Vi-
sual information processing, pp 3–7, Darlinghurst, Australia, Australia, Australian Computer
Society, Inc.
2. Bradski G, Kaehler A (2008) Learning OpenCV: computer vision with the OpenCV library.
O’Reilly Media Inc., pp 415–453
3. Canny J (1986) Uma aproximação computacional para afiar a detecção, transporte de IEEE.
Análise do teste padrão e inteligência da máquina 8:679–714
4. DU Heng, TszHang TO (2011) Hand gesture recognition suing Kinect. Department of
Electrical and Computer Engineering, Boston University, Boston, USA
5. Freund Y, Schapire RE (1996) Experiments with a new boosting algorithm. In machine
learning-international workshop then conference, pp 148–156. Citeseer
6. Garcia C, Tziritas G (1999) Face detection using quantized skin color regions merging and
wavelet packet analysis. IEEE Multimedia 1(3):264–277
7. Gonzalez RC, Woods, RE (2008) Digital image processing, 3rd edn. Prentice Hall, Upper
Saddle River
8. Handenberg C (2001) Finger tracking and hand posture recognition for real-time human-
computer interaction, master these at Fachbereich Elektrotechnik und Informatik der
Technischen Universität Berlin
9. Jones M, Viola P (2001) Rapid object detection using a boosted cascade of simple features.
IEEE CVPR
10. Kulesa T, Hoch M (1998) Efficient color segmentation under varying illumination conditions.
Academy of media arts, Peter-Welter—Platz 2. Venue tenth IEEE image and multidimensional
digital signal processing (IMDSP) workshop, Germany
11. Lienhart R, KuranovA, PisarevskyV (2002) Empirical analysis of detection cascades of boosted
classifiers for rapid object detection. MRL Technical Report, May 2002
12. Miranda LC, Hornung HH, Baranauskas MCC (2009) Prospecting a gesture based interaction
model for iDTV. In: IADIS international conference on interfaces and human computer interac-
tion (IHCI)/IADIS multi conference on computer science and information systems (MCCSIS),
2009, Algarve, Portugal. Proceedings of the IADIS international conference on interfaces and
human computer interaction. Lisbon, Portugal: IADIS Press. pp 19–26
13. Monteiro G, Peixoto P, Nunes U (2006) Vision-based pedestrian detection using Haar-like
features. Institute of Systems and Robotic. Coimbra—Portugal
14. Phillip Ian W, Fernandez Dr J (2009) Facial feature detection using Haar classifiers. Texas A
& M University—Corpus
15. Phung L, Chai D, Bouzerdoum A (2001) A universal and robust human skin color model using
neural networks. In: Proceeding IJCNN’01, July 2001, pp 2844–2849
16. Silva FWSV da Motion capture: introdução à tecnologia, Rio de Janeiro.
http://www.visgraf.impa.br/Projects/mcapture/publ/mc-tech/. Accessed 30 March 2014
17. Simoes WCSS, Lucena Jr V (2011) Remoção do Fundo da cena para Detecção da Sil-
hueta da Mão Humana e Detecção de Movimentos. I SIGES—I Simpósio de Informática
e Geotecnologia de Santarém. Santarém (ISSN: 2237–3519)
18. Simoes WCSS, Lucena Jr V, Collins E, Albuquerque W, Padilla R, Valente R (2010) Avaliação
de ambientes de desenvolvimento para automação do problema do cubo mágico para o robô
Lego Mindstorms NXT. V CONNEPI—Congresso Norte-Nordeste de Pesquisa e Inovação,
Maceió (ISBN: 978-85-64320-00–0)
19. Simoes WCSS, Lucena Jr V, Leite J C, Silva CA de S (2012) Visíon por computador para
manos a base de reconocimiento de gestos para la interacción com los sistemas operativos de
escritório Windows y Linux. XXXIII UPADI—Convención Panamericana de Ingenierías. La
Habana
A Fast and Accurate Algorithm for Detecting and Tracking Moving Hand Gestures 353
20. Simoes WCSS, Barboza R da S, Lucena Jr V, Lins RD (2013) Use of hand gestures as interface
for interaction between multi-users and the IDTV. XI EuroITV—European Interactive TV
Conference. Como
21. Smith AR (1978) Color gamut transform pairs. In: proceedings of the 5th annual conference
on computer graphics and interactive techniques, p 19. ACM
22. Wilson PI, Fernandez J (2009) Facial feature detection using Haar classifiers. Texas A & M
University, Corpus Christi
23. Xiang S W G, Xuan Y (2009) Real-time follow-up head tracking in dynamic complex
environments. J Shanghai Jiaotong Univ (Sci) 14:593–599 DOI 10.1007/s12204-009-0593-2
24. Yang M-H, Ahuja N (1999) Gaussian mixture model for human skin color and its applications
in image and video databases. In: Proceedings SPIE Storage and Retrieval for Image and Video
Databases, Jan 1999, pp 458–466
Hand Gesture Recognition System Based
in Computer Vision and Machine Learning
P. Trigueiros ()
Insituto Politécnico do Porto, IPP, Porto, Portugal
e-mail: [email protected]
P. Trigueiros · F. Ribeiro
DEI/EEUM—Departamento de Electrónica Industrial, Escola de Engenharia,
Universidade do Minho, Guimarães, Portugal
e-mail: [email protected]
L. P. Reis
DSI/EEUM—Departamento de Sistemas de Informação, Escola de Engenharia,
Universidade do Minho, Guimarães, Portugal
e-mail: [email protected]
P. Trigueiros · F. Ribeiro · L. P. Reis
Centro Algoritmi, Universidade do Minho, Guimarães, Portugal
L. P. Reis
LIACC—Laboratório de Inteligência Artificial e Ciência de Computadores,
Porto, Portugal
recognition system with a formal language definition, the Referee CommLang, into
what is called the Referee Command Language Interface System (ReCLIS). The
second one is a real-time system able to interpret the Portuguese Sign Language.
Sign languages are not standard and universal and the grammars differ from country
to country. Although the implemented prototype was only trained to recognize the
vowels, it is easily extended to recognize the rest of the alphabet, being a solid
foundation for the development of any vision-based sign language recognition user
interface system.
1 Introduction
Hand gesture recognition for human computer interaction is an area of active research
in computer vision and machine learning [19]. One of the primary goals of gesture
recognition research is to create systems, which can identify specific gestures and
use them to convey information or to control a device. Though, gestures need to be
modelled in the spatial and temporal domains, where a hand posture is the static
structure of the hand and a gesture is the dynamic movement of the hand. Being
hand-pose one of the most important communication tools in human’s daily life, and
with the continuous advances of image and video processing techniques, research
on human-machine interaction through gesture recognition led to the use of such
technology in a very broad range of possible applications [3, 22], of which some are
here highlighted:
• Virtual reality: enable realistic manipulation of virtual objects using ones hands
[5, 43], for 3D display interactions or 2D displays that simulate 3D interactions.
• Robotics and Tele-presence: gestures used to interact with robots and to control
robots [34] are similar to fully-immersed virtual reality interactions, however
the worlds are often real, presenting the operator with video feed from cameras
located on the robot. Here, for example, gestures can control a robots hand and arm
movements to reach for and manipulate actual objects, as well as its movement
through the world.
• Desktop and Tablet PC Applications: In desktop computing applications, ges-
tures can provide an alternative interaction to mouse and keyboard [16, 17, 37, 41].
Many gestures for desktop computing tasks involve manipulating graphics, or
annotating and editing documents using pen-based gestures.
• Games: track a player’s hand or body position to control movement and ori-
entation of interactive game objects such as cars, or use gestures to control the
movement of avatars in a virtual world. Play Station 2 for example has introduced
the Eye Toy [14], a camera that tracks hand movements for interactive games, and
Microsoft introduced the Kinect [9] that is able to track users full body to control
games.
• Sign Language: this is an important case of communicative gestures. Since sign
languages are highly structural, they are very suitable as test-beds for vision-based
algorithms [12, 26, 32, 44].
Hand Gesture Recognition System Based in Computer Vision . . . 357
There are areas where this trend is an asset, as for example in the application of these
technologies on interfaces that can help people with physical disabilities, or areas
where it is a complement to the normal way of communicating. Sign language, for
example, is the most natural way of exchanging information among deaf people,
although it has been observed that they have difficulties in interacting with normal
people. Sign language consists of a vocabulary of signs in exactly the same way as
spoken language consists of a vocabulary of words. Sign languages are not standard
and universal and the grammars differ from country to country. The Portuguese Sign
Language (PSL), for example, involves hand movements, body movements and
facial expressions [39]. The purpose of Sign Language Recognition (SLR) systems
is to provide an efficient and accurate way to convert sign language into text or
voice has aids for the hearing impaired for example, or enabling very young children
to interact with computers (recognizing sign language), among others. Since SLR
implies conveying meaningful information through the use of hand gestures [38],
careful feature selection and extraction are very important aspects to consider
In terms of hand gesture recognition, there are basically two types of approaches:
vision-based approaches and data glove methods. This paper focuses on creating a
vision-based approach, to implement a system capable of performing posture and
gesture recognition for real-time applications. Vision-based hand gesture recognition
systems were the main focus of the work since they provide a simpler and more
intuitive way of communication between a human and a computer. Using visual
input in this context makes it possible to communicate remotely with computerized
equipment, without the need for physical contact or any extra devices [8, 35].
As Hasanuzzaman [11] argue, it is necessary to develop efficient and real time
gesture recognition systems, in order to perform more human-like interfaces between
humans and robots. Although it is difficult to implement a vision-based interface for
generic usage, it is nevertheless possible to design this type of interface for a con-
trolled environment [13, 25]. Furthermore, computer vision based techniques have
the advantage of being non-invasive and based on the way human beings perceive
information from their surroundings [36]. However, to be able to implement such
systems, there are a number of requirements that the system must satisfy, in order to
be implemented in a successful way [25], which are:
• Robustness: the system should be user independent and robust enough to factors
like visual noise, incomplete information due for example to occlusions, variations
of illumination, etc.
• Computational efficiency: vision based interaction requires real-time systems,
so the algorithms and learning techniques should be the most effective possible
and computational cost effective.
• Error tolerance: mistakes on vision-based systems should be tolerated and ac-
cepted. If some mistake is made, the user should be able to repeat the command,
instead of letting the system make wrong decisions.
• Scalability: the system must be easily adapted and configured so that it can serve a
number of different applications. The core of vision based applications for human
computer interaction should be the same, regardless of the application.
358 P. Trigueiros et al.
Also, we need to have systems that allow training gestures and learn models capable
of being used in real-time interaction systems. These systems should be easily con-
figurable in terms of the number and type of gestures that they can train, to ensure
the necessary flexibility and scalability.
The rest of this paper is as follows. First we present the Vision-based Hand Ges-
ture Recognition System Architecture in Sect. 2, where the modules that constitute
it are described. In this section, the problem of hand detection and tracking are
addressed, as well as the problem of hand segmentation. Also, hand posture classifi-
cation and dynamic gesture classification implementations are described. In Sect. 3,
the Referee Command Language Interface System (ReCLIS), built to validate the
proposed framework and able to help a robotic soccer referee judge a game in real
time is described. This section also discusses the problem of modelling the command
semantics for command classification. Section 4 presents the Sign Language Recog-
nition prototype architecture and discusses its implementation. The prototype can
be used to supplement the normal form of communication for people with hearing
impairment. Conclusions and future work are drawn in Sect. 5.
The design of any gesture recognition system essentially involves the following three
aspects: (1) data acquisition and pre-processing; (2) data representation or feature
extraction and (3) classification or decision-making. Taking this into account, a
possible solution to be used in any human-computer interaction system is represented
in the diagram of Fig. 1. As it can be seen in the diagram, the system first detects
and tracks the user hand, segments the hand from the video image and extracts
the necessary hand features. The features thus obtained are used to identify the user
gesture. If a static gesture is being identified, the obtained features are first normalized
and the obtained instance vector is then used for classification. On the other hand,
if a dynamic gesture is being classified, the obtained hand path is first labelled
according to the predefined alphabet, giving a discrete vector of labels, which is
then translated to the origin and finally used for classification. Each detected gesture
is used as input into a module that builds the command sequence, i.e. accumulates
each received gesture until a predefined sequence defined in the Command Language
is found. The sequence thus obtained is classified into one of a set of predefined
number of commands that can be transmitted to a Generic System Interface (GSI)
for robot/system control.
In the following sections we will describe the problems of hand posture
classification and dynamic gesture classification.
Hand Gesture Recognition System Based in Computer Vision . . . 359
For hand posture classification, hand segmentation and feature extraction is a crucial
step in vision-based hand gesture recognition systems. The pre-processing stage
prepares the input image and extracts features used later with classification algorithms
[36]. The proposed system uses feature vectors composed of centroid distance values
for hand posture classification. The centroid distance signature is a type of shape
signature [36] expressed by the distance of the hand contour boundary points, from
the hand centroid (xc , yc ) and is calculated in the following manner:
+
d (i) = (xi − xc )2 + (yi − yc )2 , i = 0, . . . ., N − 1 (1)
This way, a one-dimensional function representing the hand shape is obtained. The
number of equally spaced points N used in the implementation was 16. Due to the
subtraction of centroid from the boundary coordinates, this operator is invariant to
translation as shown by Rayi Yanu Tara [32] and a rotation of the hand results in a
circularly shift version of the original image. All the features vectors are normalized,
using the z-normalization, prior to training, by subtracting their mean and dividing
360 P. Trigueiros et al.
where ā is the mean of the instance i, and σ is the respective standard deviation,
achieving this way scale invariance as desired. The vectors thus obtained have zero
mean and a standard deviation of 1. The resulting feature vectors are used to train
a multi-class Support Vector Machine (SVM) that is used to learn the set of hand
postures shown in Fig. 2, and used in the Referee Command Language Interface
System (ReCLIS) and the hand postures shown in Fig. 3 used with the Sign Lan-
guage Recognition System. The SVM is a pattern recognition technique in the area
of supervised machine learning, which works very well with high-dimensional data.
SVM’s select a small number of boundary feature vectors, support vectors, from
each class and builds a linear discriminant function that separates them as widely
as possible (Fig. 4)—maximum-margin hyperplane[40]. Maximum-margin hyper-
planes have the advantage of being relatively stable, i.e., they only move if training
instances that are support vectors are added or deleted. SVM’s are non-probabilistic
classifiers that predict for each given input the corresponding class. When more than
two classes are present, there are several approaches that evolve around the 2-class
case [33]. The one used in the system is the one-against-all, where c classifiers have
to be designed. Each one of them is designed to separate one class from the rest.
For feature extraction, model learning and testing, a C++ application was built with
openFrameworks [18], OpenCV [4], OpenNI [27] and the Dlib machine-learning
library [15]. OpenCV was used for some of the vision-based operations like hand
segmentation and contour extraction, and OpenNI was responsible for the RGB and
depth image acquisition. Figure 5 shows the main user interface for the application,
with a sample vector (feature vector) for the posture being learned displayed below
the RGB image.
Hand Gesture Recognition System Based in Computer Vision . . . 361
Fig. 5 Static gesture feature extraction and model learning user interface
Two centroid distance datasets were built: the first one for the first seven hand
postures defined, with 7848 records and the second one for the Portuguese Sign
Language vowels with a total of 2170 records, obtained from four users. The features
thus obtained were analysed with the help of RapidMiner (Miner) in order to find
the best kernel in terms of SVM classification for the datasets under study. The best
kernel obtained with a parameter optimization process was the linear kernel with a
cost parameter C equal to one. With these values, the final achieved accuracy was
99.4 %.
Hand Gesture Recognition System Based in Computer Vision . . . 363
For dynamic gesture model training, a C++ application for the acquisition of hand
motion sequences (dynamic gestures) for each of the defined gestures, feature ex-
traction and model training and testing was implemented. This application uses the
same libraries as the previous application and an openFrameworks [18] add-on im-
plementation of the HMM algorithm for classification and recognition of numeric
sequences. This add-on is a C++ porting implementation of a MATLAB code from
Kevin Murphy [24].
Figure 8 shows the main user interface for the application, with a hand path drawn
on top of the centroids with the corresponding path distance to centroids drawn as
white lines. For each gesture that required training, a dataset was built and the
system trained in order to learn the corresponding model parameters. The number of
observation symbols defined and implemented was 64 with 4 hidden states. Several
values for the number of observations in the set {16, 25, 36, 49, 64, 81}, and hidden
states, ranging from 2 to 12 were tried out during the experiments, without significant
improvements for values greater than the selected ones.
For model testing, a new set of datasets were built with data from four different
users with a total of 25 per gesture and per user, totalling 1100 records for the
predefined 11 gestures (Fig. 9).
These datasets were analysed with the previous obtained models and the final
accuracy results obtained with Eq. 3 are represented in Table 3.
# correctly predicted class
accuracy = × 100% (3)
# total testing class
So, for the dynamic gesture recognition, with the obtained HMM models, an average
accuracy of 93.72 % was achieved.
366 P. Trigueiros et al.
Fig. 8 Dynamic gestures feature extraction and model training user interface
This section presents the Referee CommLang keywords with a syntax summary and
description. The Referee CommLang is a new and formal definition of all commands
that the system is able to identify. As in [30], the language must represent all the
possible gesture combinations (static and dynamic) and at the same time be simple in
its syntax. The language was defined with BNF (Bakus Normal Form or Bakus-Naur
Form) [2]:
• Terminal symbols (keywords and operator symbols) are in a constant-width
typeface.
• Choices are separated by vertical bars ‘|’and in greater-than and less-than symbols
(<choice>).
• Optional elements are in square brackets ([optional]).
• Sets of values are in curly braces ({set}).
• A syntax description is introduced with:: = .
The language has three types of commands: Team commands, Player commands
and Game commands. This way, a language is defined to be a set of commands that
can be a TEAM_COMMAND, a GAME_COMMAND or a PLAYER_COMMAND.
368 P. Trigueiros et al.
The Human-Computer Interface (HCI) for the prototype was implemented using the
C++ language, and the openFrameworks toolkit [18] with the OpenCV [4] and the
OpenNI [27] add-ons.
The proposed system involves three modules as can be seen in the diagram of
Fig. 10:
1. Data acquisition, pre-processing and feature extraction.
2. Gesture and posture classification with the models obtained in Sect. 2.1 and
Sect. 2.2.
3. Gesture sequence construction or command classification.
As explained in Sect. 3, a referee command is composed by a set of dynamic gestures
(Fig. 9) and hand postures (Fig. 2). The hand postures are used to identify one of the
following commands: team number, player number or game part.
The problems of data acquisition, pre-processing, feature extraction and gesture
classification were discussed in Sect. 2. The following section will describe the
problem of modelling the command semantics for command classification.
Since the system uses a combination of dynamic and static gestures, modelling
the command semantics became necessary. A Finite State Machine is a usually em-
ployed technique to handle this situation [6, 20]. In the implemented system, the FSM
shown in the diagram of Fig. 11 and described in the state transition Table 4 was
implemented to control the transition between three possible defined states: DY-
NAMIC, STATIC and PAUSE. A state transition table, as the name implies, is a
table that describes all the conditions and the states those conditions lead to. A
370 P. Trigueiros et al.
PAUSE state is used to control the transitions between user postures and gestures
and somehow eliminate all unintentional actions between DYNAMIC/STATIC and
STATIC/STATIC gestures. This state is entered every time a gesture or hand posture
is found, and exited after a predefined period of time or when a command sequence
is identified, as can be seen in the state transition table.
The following sequence of images, Fig. 12, Fig. 13 and Fig. 14, shows the Referee
Command Language user interface with the “GOAL, TEAM1, PLAYER2” sequence
of commands being recognized.
Hand Gesture Recognition System Based in Computer Vision . . . 371
interpreted by the system and their classification will be displayed and spoken by the
interface.
The diagram of Fig. 15 shows the proposed system architecture, which consists
of two modules, namely: the data acquisition, pre-processing and feature extraction
model and the sign language posture classification model.
In the first module, the hand is detected, tracked and segmented from the video
images. From the obtained segmented hand, features are extracted, as explained in
Sect. 2, for posture classification.
The Human-Computer Interface (HCI) for the prototype was developed using the
C++ language, and the openFrameworks toolkit [18] with the OpenCV [4] and the
Hand Gesture Recognition System Based in Computer Vision . . . 373
Fig. 16 Sign Language prototype interface wit two vowels correctly classified
OpenNI [27] add-ons, ofxOpenCv and ofxOpenNI respectively. In the following two
images it is possible to see the Sign Language Prototype with two vowels correctly
classified and displayed on the right side of the user interface (Fig. 16).
Hand gestures are a powerful way for human communication, with lots of potential
applications in the area of human computer interaction. Vision-based hand ges-
ture recognition techniques have many proven advantages compared with traditional
374 P. Trigueiros et al.
devices. However, hand gesture recognition is a difficult problem and the current
work is only a small contribution towards achieving the results needed in the field.
The main objective of this work was to study and implement solutions that could
be generic enough, with the help of machine learning algorithms, allowing its appli-
cation in a wide range of human-computer interfaces, for online gesture and posture
recognition. To achieve this, a set of implementations for processing and retrieving
hand user information, learn statistical models and able to do online classification
were created. The final prototype is a generic solution for a vision-based hand ges-
ture recognition system, which is able to integrate posture and gesture classification
and that, can be integrated with any human-computer interface. The implemented
solutions, based on supervised learning algorithms, are easily configured to process
new hand features or to learn different hand postures and dynamic gestures, while
creating statistical models that can be used in any real-time user interface for online
gesture classification. For the problem of hand posture classification, hand features
that give good classification results were identified, being at the same time simple in
terms of computational complexity, for use in any real-time application. The selected
features were tested with the help of the RapidMiner tool for machine learning and
data mining. That way, it was possible to identify a learning algorithm that was able
to achieve very good results in terms of pattern classification, and that was the one
used in the final solution. For the case of dynamic gesture recognition, the choice
fell on Hidden Markov Models, due to the nature of the data, gestures, which are
time-varying processes. This type of models has proven to be very effective in other
areas of application, and had already been applied successfully to the problem of
gesture recognition. The evaluation of the trained gestures with the implemented
prototypes proved that, it was possible to successfully integrate static and dynamic
gestures with the generic framework and use them for human/computer interaction.
It was also possible to prove through this study, and with the various experiments,
which were carried out, that proper feature selection for image classification is vital
for the future performance of the recognition system. It was possible to learn and
select sensible features that could be effectively used with machine learning algo-
rithms in order to increase the performance and effectiveness of online static and
dynamic gesture classification.
To demonstrate the effectiveness of our vision based gesture recognition system,
the proposed methods were evaluated with two applications: the Referee CommLang
Prototype and the Sign Language Recognition Prototype. The first one is able to
interpret user commands defined in the new formal language, the Referee CommLang,
created with the aim of interpreting a set of commands made by a robotic soccer
referee. The second one is able to interpret Portuguese sign language hand postures.
An important aspect to report on the implemented solutions has to do with the
fact that new users were able to learn and adapt to the systems very quickly and were
able to start using them in a normal way after a short period of time, making them
solutions that can be easily adapted and applied to other areas of application.
As future work and major development prospects it is suggested:
Hand Gesture Recognition System Based in Computer Vision . . . 375
• Explore other machine learning algorithms applied to the problem of hand gesture
classification and compare obtained results.
• Include not only the possibility of 3D gestures but also to work with several
cameras to thereby obtain a full 3D environment and achieve view-independent
recognition, thus eliminating some limitations of the current system.
• Explore the possibility of applying stereo vision instead of only depth range
cameras, applied to human/computer interaction and particularly to hand gesture
recognition.
• Introduce gesture recognition with both hands, enabling the creation of more
natural interaction environments.
• Investigate and try to find more reliable solutions for the identification of the
beginning and end of a gesture.
• Build systems that are able to recognize continuous gestures, i.e., without the
need to introduce pauses for gesture or command construction.
• Explore reinforcement learning as a way to start with a reduced number of hand
features per gesture, reducing the time to learn the models, and be able to learn
with user interaction, possibly using multimodal dialog strategies.
• Explore unsupervised learning applied to gesture recognition. Give the
robot/system the possibility to learn by interaction with the user, again with the
possibility of multimodal strategies.
As a final conclusion one can say that although there is still much to do in the area,
the implemented solutions are a solid foundation for the development of generic
gesture recognition systems that could be used with any interface for human computer
interaction. The interface language can be redefined and the system can be easily
configured to train different set of postures and gestures that can be easily integrated
with any desired solution.
Acknowledgments The authors wish to thank all members of the Laboratório de Automação e
Robótica (LAR), at University of Minho, Guimarães. The authors would like to thank also, everyone
who contributed to the hand data features acquisition phase, without which it would have been
very difficult to carry out this study. Also special thanks to the Polytechnic Institute of Porto, the
ALGORITMI Research Centre and the LIACC Research Center, for the opportunity to develop this
research work.
References
32. Tara RY, Santosa PI, Adji TB (2012) Sign language recognition in robot teleoperation using
centroid distance Fourier descriptors. Int J Comput Appl 48(2):8–12
33. Theodoridis S, Koutroumbas K (2010) An introduction to pattern recognition: a Matlab
Approach. Academic, Burlington
34. Trigueiros P, Ribeiro F, Lopes G (2011) Vision-based hand segmentation techniques for
human-robot interaction for real-time applications. In: Tavares JM, Jorge RMN (eds) III EC-
COMAS thematic conference on computational vision and medical image processing, 12–14
De Oubtubro 2011 Olhão. Taylor and Francis, Publication pp 31–35
35. Trigueiros P, Ribeiro F, Reis LP (2012) A comparison of machine learning algorithms applied
to hand gesture recognition. 7th Iberian Conference on Information Systems and Technologies,
20–23 July. Madrid, pp 41–46
36. Trigueiros P, Ribeiro F, Reis LP (2013) A comparative study of different image features for hand
gesture machine learning. 5th International Conference on Agents and Artificial Intelligence,
15–18 February. Barcelona
37. Vatavu R-D, Anthony L, Wobbrock JO (2012) Gestures as point clouds: a $P recognizer for user
interface prototypes. 14th ACM International Conference on Multimodal Interaction. ACM,
Santa Monica
38. Vijay PK, Suhas NN, Chandrashekhar CS, Dhananjay DK (2012) Recent developments in sign
language recognition: a review. Int J Adv Comput Eng Commun Technol 1:21–26
39. Wikipedia (2012) Língua gestual portuguesa [online]. http://pt.wikipedia.org/wiki/Lingua_
gestual_portuguesa. (2013)
40. Witten IH, Frank E, Hall MA (2011) Data mining—practical machine learning tools and
techniques. Elsevier
41. Wobbrock JO, Wilson AD, Li Y (2007) Gestures without libraries, toolkits or training: a $1
recognizer for user interface prototypes. Proceedings of the 20th Annual ACM Symposium on
User Interface Software and Technology. ACM, Newport
42. WuY, Huang TS (1999) Vision-based gesture recognition: a review. Proceedings of the Interna-
tional Gesture Workshop on Gesture-Based Communication in Human-Computer Interaction.
Springer-Verlag.
43. Yoon J-H, Park J-S, Sung MY (2006) Vision-Based bare-hand gesture interface for interactive
augmented reality applications. 5th International Conference on Entertainment Computing,
September 20–22. Cambridge. 2092520: Springer-Verlag, pp 386–389
44. Zafrulla Z, Brashear H, Starner T, Hamilton H, Presti P (2011) American sign language
recognition with the kinect. 13th International Conference on Multimodal Interfaces. ACM,
Alicante
3D Scanning Using RGBD Imaging Devices: A
Survey
Abstract The capture and digital reconstruction of tridimensional objects and sce-
narios are issues of great importance in computational vision and computer graphics,
for the numerous applications, from navigation and scenario mapping, augmented
reality to medical prototyping. In the past years, with the appearance of portable and
low-cost devices such as the Kinect Sensor, which are capable of acquiring RGBD
video (depth and color data) in real-time, there was a major interest to use these
technologies, efficiently, in 3D surface scanning. In this paper, we present a survey
of the most relevant methods from recent literature on scanning 3D surfaces using
these devices and give the reader a general overview of the current status of the field
in order to motivate and enable other works in this topic.
1 Introduction
Tridimensional scanning and reconstruction are processes related to the scan of in-
trinsic characteristics of objects surfaces or scenarios, like shape and appearance.
While scanning deals with the capture of data on a surface and creation of a point
cloud from the geometric samples collected, the process of reconstruction uses the
point cloud data to extrapolate the surface shape. The use of these data is increasing
in prototyping, navigation, augmented reality, quality control, among others and
intense by the entertainment industry, motivating many research in computational
vision and computer graphics.
The processes of scanning and reconstruction are often combined and seen as a
single pipeline, consisting basically of: acquiring the data map, translating it to point-
cloud, allocation in a single coherent system of reference (also called alignment),
and fusion of different captures in a single global solid model.
Although the 3D scanning technologies are not novel, they went through a revolu-
tion with the launch of the Kinect device, in 2010. This occurred by the presentation
of the integrated depth camera with very low cost when compared to the existing
high density scanners, and also for capturing with convincing quality, the geometry
and colors of the objects and scenarios in real time.
The Kinect Sensor (Fig. 1) was launched initially as a accessory of the XBox360
game console, serving as a touchless joystick. The device is composed basically by
a RGB camera, an infrared-based depth camera (then the D from the RGBD term),
both with 640 × 480 resolution and frame rate of 30 fps, a set of microphones and
a motor controlling the tilt of the device.
In particular, the depth sensor is comprised by an IR camera and an IR dotted
pattern projector, and uses a computer vision technology developed by PrimeSense.
It has an approximately range of 30 cm–6 m, and it allows building depth maps of 11-
bit depth resolution. Open-source drivers (such as Libfreenect and Avin2/OpenNI) as
well as the Microsoft Kinect SDK allow this product to be connected to a computer
and to be used in many other applications other than gaming, such as: robotics,
surveillance systems, intra-operatory medical imaging systems, accessibility, among
others.
Other similar devices were also released, such as the Asus Xtion, Primesense
Carmine and Panasonic D-Imager, but the Kinect remained as the most popular and
reference device.
This work presents a survey of the main recent works from the literature related
to 3D scanning using RGBD cameras, in special, the Kinect Sensor. The goal is to
provide a wide survey of the area, providing references and introducing the method-
ologies and applications, from the simple reconstruction of small static objects to
the constantly updated mapping of dense or large scenarios, in order to motivate and
enable other works in this topic.
2 Methods
In this Section, we present the survey of methods found on the literature. For each
method, we present: references, the responsible institution, release date, availability
of the software or source-code, a general overview and a brief description of the
method.
3D Scanning Using RGBD Imaging Devices: A Survey 381
It is necessary to take into consideration the high rate of improvements and in-
novations in this topic currently, therefore, the methods in this section are limited to
the progress of the new technologies until the time of conclusion of this work.
2.1 Kinectfusion
References: [1, 2]
Developed at: Microsoft Research Cambridge
Released in: October 2011
Availability: There is an open-source implementation in C++, called Kinfu, in the
PCL (Point Cloud Library) project [3]. An implementation within Microsoft’s Kinect
for Windows SDK [4] will be released.
General Description In this method, only the depth image is used to track the sensor
position and reconstruct 3D models of the physical scenario in real-time, limited to
a fixed resolution volume (typically 5123 ), through a GPU implementation. The
RGB camera information is only used in the case of texture mapping. Although the
approach aims at speed efficiency to explore real-time rates, it is not GPU memory
efficient, requiring above 512 MB of capacity and 512 or more float-point colors, for
an implementation using 32-bit voxels.
The authors show some potential applications of the KinectFusion modifying or
extending the GPU pipeline implementation to use in 3D scanning, augmented reality,
object segmentation, physical simulations and interactions with the user directly in
the front of the sensor. Because of the speed and accuracy, the method has generated
various other improved extensions for different applications.
Approach Basically, the system continually tracks the six degrees of freedom (DOF)
pose of the camera and fuses, in real time, the camera depth data in a single global
3D model of a fixed size 3D scene. The reconstruction is incremental, with a refined
model as the camera moves, even by vibrating, resulting in new viewpoints of the
real scenario revealed and fused into the global model.
The main system of the GPU pipeline consists of four steps executed concurrently,
using the CUDA language:
1. Surface measurement: the depth map acquired directly from the Kinect is con-
verted in a vertex map and a normal map. Bilateral filtering is used to reduce
the inherent sensor noise. Each CUDA thread works in parallel in each pixel of
the depth map and projects as a vertex in the coordinate space of the camera, to
generate a vertex map. Also each thread computes the normal vector for each
vertex, resulting in a normal map.
2. Camera pose tracking: the ICP (Iterative Closest Point) algorithm, implemented
in GPU, is used in each measurement in the 640 × 480 depth map, to track the
camera pose at each depth frame, using the vertex and normal maps. There-
fore, a 6-DOF rigid transformation is estimated for approximate alignment of the
382 E. E. Hitomi et al.
oriented points with the ones from the previous frame. Incrementally, the esti-
mated transformations are applied to the transformation that defines the Kinect
global position.
3. Volume integration: a 3D fixed resolution volume is predefined, mapping the
specific dimensions of a 3D fixed space. This volume is subdivided uniformly
in a 3D grid of voxels. A volumetric representation is used to integrate the 3D
global vertices of the conversion of the oriented points in global coordinates from
the camera global position, into voxels, through a GPU implementation of the
volumetric TSDF (Truncated Signed Distance Functions). The complete 3D grid
is allocated in the GPU as linear aligned memory.
4. Surface prediction: raycasting of the volumetric TSDF is performed at the esti-
mated frame to extract views from the implicit surface for depth map alignment
and rendering. In each GPU thread, there is a single ray and it renders a single
pixel at the output image.
The rendering pipeline allows conventional polygon-based graphics to be composed
in the raycasting view, enabling the fusion of real and virtual scenes, including
shadowing, all through a single algorithm. Moreover, there is data generation for
better camera tracking by ICP algorithm.
1
http://www.ccs.neu.edu/research/gpc/mvkinfu/index.html.
3D Scanning Using RGBD Imaging Devices: A Survey 383
2.3 Kintinuous
2
http://www.cs.nuim.ie/research/vision/data/rgbd2012
384 E. E. Hitomi et al.
after that, where the zero-crossings are extracted as reconstructed surface vertices.
(iii) A voxel grid filter is applied to remove the possible duplicate points in the
orthogonal raycasting.
• Pose graph representation: The pose graph representation is used to represent the
external meshes, where each position stores a surface slice.
• Mesh generation: Uses the greedy mesh triangulation algorithm described by
Marton et al. [8].
• Visual odometry: The ICP odometry estimation is replaced by a GPU implementa-
tion of the dense odometry algorithm based in RGB-D presented by Steinbruecker
et al. [9] integrated to the KinectFusion GPU pipeline.
3
https://www.cs.washington.edu/node/3544/
3D Scanning Using RGBD Imaging Devices: A Survey 385
2.5 RGB-DSLAM
Reference: [17]
Developed at: Brno University of Technology
Released in: June 2011 (STUDENT EEICT)
Availability: No implementation available.
General Description This method aims to build a dense 3D map of interior environ-
ments, through multiple Kinect depth images. It can be used to produce effectively
dense 3D maps of small workspaces.
The algorithm accumulates errors, caused by small inaccuracies in the camera
pose estimation between consecutive frames, since it is not used any loop closure
algorithm.
4
http://openslam.org/rgbdslam.html
5
http://www.ros.org/wiki/rgbdslam
386 E. E. Hitomi et al.
2.7 Omnikinect
Reference: [18]
Developed at: ICG, Graz University of Technology
Released in: December 2012 (submitted to VRST).
Availability: No implementation available
General Description The system is a KinectFusion modification to allow the use
of multiple Kinect sensors. It proposes hardware configuration and optimized soft-
ware tools for this system. The tests were executed with Nvidia GTX680 and for
comparison, Nvidia Quadro 6000 with 1000 × 1000 pixel resolution, using 7 Kinects.
Approach The method is composed of five steps, and one additional step (3) re-
lated to the KinectFusion implementation, to correct superposition noise and data
redundancy:
1. Measurement: vertex and normal map are computed.
2. Pose estimation: predicted and measured surface ICP.
3. TSDF histogram volume: generation of TSDF histogram volume, filtered from
TSDF outliers measures before temporal smoothing.
4. Updated reconstruction: surface measure integration in a global TSDF.
5. Surface prediction: TSDF raycast to compute surface prediction.
3D Scanning Using RGBD Imaging Devices: A Survey 387
Reference: [19]
Developed at: DFKI, Augmented Vision, University of Kaiserslautern.
Released in: October 2011 (SIGGRAPH)
Availability: No implementation available.
General Description This method aims to scan 3D objects aligning depth and color
information. The system is implemented in C++ and evaluated on a Intel Xeon 3520
(2.67 GHz) with 12 GB RAM memory on Windows 7.
Approach The method is based on:
1. Super-resolution: a new super-resolution algorithm is used, similar to the ap-
proach of the LidarBoost algorithm by Schuon et al.[20], applied to each set of
10 captured frames. First, all depth maps in the set are aligned to the set center
using optical 3D flow. Then, the energy function is minimized to extract a depth
and color map without noise.
2. Global alignment: loop closure alignment based on rigid and non-rigid trans-
formation consisting of three steps: (i) Use of ICP register in pairs to calculate
corresponding points of two frames. (iii) Label of ’correct’ and ’incorrect’ using
the absolute error. (iii) Compute of the exponential transformation in confor-
mal geometric algebra to each frame using a energy function based on ’correct’
correspondences.
3. Non-rigid registration: the global rigid and non-rigid processing is necessary to
obtain a 360◦ model correctly closed.
4. Probabilistic simultaneous non-rigid alignment according to Cui et al. [21] is
applied.
5. Finally, a 3D mesh is generated using the Poisson reconstruction method.
Reference: [22]
Developed at: State Key Laboratory of Information Engineering in Surveying,
Mapping and Remote Sensing, Wuhan University.
Released in: August 2012 (ISPRS).
Availability: no implementation available.
General Description The method is voxel-based, similarly to KinectFusion, but
performing automatic re-localization when the tracking fails in cases of excessive
slow or fast camera movements.
Approach The method is basically divided in the following steps:
1. Preprocessing: the depth image acquired by Kinect is converted from image
coordinates in 3D points and normals into the camera coordinate space.
388 E. E. Hitomi et al.
The RBC construction and queries to the dataset are done by brute-force (BF) prim-
itives. It is introduced a modification to simplify the RBC construction to a single
search by BF, with an approximated search algorithm for the closest neighbor.
To eliminate redundancy and overlapping, the level of overlapping between con-
secutive frames is measured, computing the distance of its depth histograms. Using
the non-similarity metric, the current RGB-D data is discarded for mapping when
the distance is below a threshold.
References: [27].
Developed at: University of Bonn.
Released in: September 2012 (MFI).
Availability: no available implementation.
General Description The aim of this work is to acquire 3D maps from inte-
rior scenarios. The approach integrates color and depth data in a multi-resolution
representation.
For map representation, multi-resolution surfels map is used. Also, octrees are
used to model textured surfaces to multiple resolutions in a probabilistic way.
To register these maps in real-time, used for SLAM, it is performed an iterative re-
finement process in which, multi-resolution surfels are associated between the maps
for each iteration, given the current estimated position. Using these associations,
the new position that maximizes the probable correspondence for the maps is de-
termined. Due to the difference of view positions between the images, the scenario
content is discretized and to compensate, it is used trilinear interpolation.
In order to add spatial constraints between similar views during the operation in
real time, a randomization method is proposed.
2.12 Du et al.
Reference: [28].
Developed at: University of Washington.
Released in: September 2011 (Ubicomp).
Availability: no available implementation.
General Description This method aims the scanning of interior scenarios, although
it can be used with near centimeter precision for other applications. The system that
can be performed in real time interactively in a laptop.
Approach The system basically follows a well-established structure for 3D map-
ping: RGB-D frame registration partition in local alignment, or visual odometry, plus
global alignment, which uses loop closure information to optimize over the frames
390 E. E. Hitomi et al.
and produce camera pose and maps globally consistent. The 3D map is increasingly
updated in real time.
In the RGB-D real-time registratio, the 3-point matching algorithm is used to
compute 6D transformations between pairs of frames. A new correspondence criteria
is used to combine the RANSAC inliers counting with visibility conflict.
Following the RGB-D Mapping alignment, visual characteristics are detected in
the color frame using a GPU implementation of the standard SIFT characteristics to
eliminate outliers and find the camera pose transformation between two frames.
2.13 KTHRGB
Reference: [29].
Developed at: CVAP, Royal Institute of Technology (KTH).
Released in: 2011
Availability: open-source implementation available6 .
General Description Using the VSLAM (Visual SLAM) process, the aim is to map
an environment as closest as real. A mobile robot platform with a Kinect attached is
used. Also, different techniques are compared in different stages.
Approach The method basically follows:
1. SIFT or SURF characteristics are extracted, from each frame; for the initial
correspondence, a kd-tree id used and the information is integrated to compute
the characteristics positions in 3D.
2. From this set of pairs of characteristics, a transformation is computed using the
RANSAC algorithm.
3. The initial position is computed and translated into node and edge using the g2o
framework [30].
4. Loop closure detection and corresponding edge insertion in the graph.
5. Graph optimization in g2o with the Levenberg-Marquardt (LMA) algorithm with
the Cholmod linear solver, and updated camera pose extraction.
6. Reconstruction of the global scenario, generating a point cloud datafile.
Reference: [31]
Developed at: Calit2, University of California.
Released in: February 2012 (Proceedings of SPIE).
6
http://code.google.com/p/kth-rgbd/
3D Scanning Using RGBD Imaging Devices: A Survey 391
The following methods present free and/or commercial implementations, but they
do not provide documentation or technical article about the method.
392 E. E. Hitomi et al.
2.15.1 RGBDEMO
Website: http://labs.manctl.com/rgbdemo.
Availability: open-source implementation, with LGPL license (any modification
must be shared under the same license).
Platforms: Linux, Windows (32 and 64 bits) and MAC OS X (10.6 or higher).
Description: open-source software initially developed by Nicolas Burrus at Robotic-
sLab in the Charles III University of Madrid, providing a simple kit for using Kinect
data without compiling external libraries. It offers a static scanning system.
2.15.2 SKANECT
Website: http://manctl.com/products.html.
Availability: free.
Platforms: Windows (32 and 64 bits) and MAC OS X (10.6 r higher).
Description: Launched as a product by Manctl, it is based on the RGBDemo
implementation.
Website: http://www.matherix.com.
Availability: copy is available by joining the beta version program.
Platform: Windows 7.
Description: Projected to help artists and designers to build 3D models of real objects.
2.15.4 RECONSTRUCTME
Website: http://reconstructme.net.
Availability: free for non-commercial purposes; 99 Euro for commercial use.
Platform: Windows 7.
Description: some authors, as T. Whelan [6] and Sergey K. (KinectShape), claim
that the method is based on the KinectFusion. It is possible that the Master’s thesis
defended in 2012, “A low-cost real-time 3D Surface Reconstruction System”, by
Christoph Kopf, one of the method developers, includes a description of the approach.
The goal of the method is to reconstruct the objects surfaces.
2.15.5 KIRETU
Website: http://pille.iwr.uni-heidelberg.de/∼kinect01/doc/index.html.
Availability: open-source implementation.
Platform: Ubuntu 10.04- 11.04 and LinuxMint 11, both 64 bits.
3D Scanning Using RGBD Imaging Devices: A Survey 393
Description: the Kinect Reconstruction Tutor, Kiretu, was created in a course of the
Heidelberg University.
2.15.6 KINECTSHAPE
Website: http://k10v.com/2012/09/02/18.
Availability: open-source implementation.
Description: KinectFusion’s minimalist implementation.
2.15.7 KINECT-3D-SLAM
Website: http://www.mrpt.org/Application:kinect-3d-slam.
Availability: open-source implementation.
Platforms: Linux and Windows (32 bits).
Description: the software executes VSLAM with the MRPT libraries, to scan small
scenarios.
Website: http://wiki.ultimaker.com/Kinect_2_STL.
Availability: open-source implementation.
Platforms: Linux e Mac OS X.
Description: the software creates STL files for 3D printing.
3 Conclusion
Acknowledgements This work was supported by the CNPq (Brazilian National Council for
Scientific and Technological Development) and the Center for Information Technology Renato
Archer.
394 E. E. Hitomi et al.
References
22. Guo W, Du T, Zhu X, Hu T (2012) Kinect-based real-time RGB-D image fusion method. In
international archives of the photogrammetry, remote sensing and spatial information sciences,
pp 275–279
23. Curless B, Levoy M (1996) A volumetric method for building com-plex models from range
images. In ACM transactions on graphics, SIGGRAPH
24. Parker S, Shirley P, Livnat Y, Hansen C, Sloan P (1998) Interactive ray tracing for isosurface
rendering. In proceedings of visualization
25. Neumann D, Lugauer F, Bauer S, Wasza J, Hornegger J (2011) Real-time RGB-D mapping
and 3-D modeling on the GPU using the random ball cover data structure. In Proceedings of
the 2011 IEEE International Conference on Computer Vision, pp 1161–1167
26. Bauer S, Wasza J, Lugauer F, Neumann D, Hornegger J (2013) Consumer depth cameras for
computer vision (Chapter 2). Springer, London, pp 27–48
27. Stückler J, Behnke S (2012) Integrating depth and color cues for dense multi-resolution scene
mapping using RGB-D cameras. In Proceedings of the IEEE International Conference on
Multisensor Fusion and Information Integration (MFI 2012), Hamburg, Germany
28. Du H, Henry P, Ren X, Chen M, Goldman DB, Seitz SM, Fox D (2011) Interactive 3D modelling
of indoor environments with a consumer depth camera. In Proceedings of the 13th international
conference on Ubiquitous computing, pp 75–84
29. Hogman V (2011) Building a 3D map from RGB-D sensors. Master’s Thesis, KTH Royal
Institute of Technology
30. g2o: A general framework for graph optimization. http://openslam.org/g2o.html. Accessed 28
June 2014
31. Tenedorio D, Fecho M, Schwartzhaupt J, Pardridge R, Lue J, Schulze JP (2012) Capturing
geometry in real-time using a tracked microsoft kinect. In Proceedings of SPIE 8289, The
engineering reality of virtual reality 2012
32. StarCAVE. http://www.andrewnoske.com/wiki/index.php?title=Calit2_-_StarCAVE.
Accessed 28 June 2014
33. Nicolas Burrus’s Kinect Calibration. http://nicolas.burrus.name/index.php/Research/Kinect-
Calibration. Accessed 28 June 2014
34. ART: Advanced realtime tracking. http://ar-tracking.eu. Accessed 28 June 2014
35. ARToolKit. http://www.hitl.washington.edu/artoolkit. Accessed 28 June 2014
36. Libfreenect. https://github.com/OpenKinect/libfreenect. Accessed 28 June 2014
37. Bernardini F, Mittleman J, Rushmeier H, Silva C, Taubin G (1999) The ball-pivoting algorithm
for surface reconstruction. In IEEE transactions on visualization computer graphics, vol 5, pp
349–359
38. VCG: Visualization and Computer Graphics Library. http://vcg.isti.cnr.it/vcglib. Accessed 28
June 2014
39. NVIDIA’s CUDA Implementation of marching cubes. http://developer.download.nvidia.com/
compute/cuda/11/Website/GraphicsInterop.html
40. MRPT: Mobile Robot Programming Toolkit. RANSAC C++ examples. http://www.mrpt.org/
tutorials/programming/maths-and-geometry/ransac-c-examples. Accessed 28 June 2014