0% found this document useful (0 votes)
14 views

Statistical Scale Space Methods

This document summarizes statistical scale space methods for analyzing data across different scales. Scale space analysis seeks to discover features in data at different temporal or spatial scales by analyzing families of smoothed representations. The original SiZer method introduced statistical inference to scale space analysis to distinguish true structural features from noise. Variants and extensions have analyzed data using different smoothers and in higher dimensions. Applications include time series analysis, image processing, and more.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Statistical Scale Space Methods

This document summarizes statistical scale space methods for analyzing data across different scales. Scale space analysis seeks to discover features in data at different temporal or spatial scales by analyzing families of smoothed representations. The original SiZer method introduced statistical inference to scale space analysis to distinguish true structural features from noise. Variants and extensions have analyzed data using different smoothers and in higher dimensions. Applications include time series analysis, image processing, and more.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

International Statistical Review

International Statistical Review (2017), 85, 1, 1–30 doi:10.1111/insr.12155

Statistical Scale Space Methods1


Lasse Holmström and Leena Pasanen
Department of Mathematical Sciences, University of Oulu, Finland
E-mail: [email protected]

Summary
The goal of statistical scale space analysis is to extract scale-dependent features from noisy data.
The data could be for example an observed time series or digital image in which case features in
either different temporal or spatial scales would be sought. Since the 1990s, a number of statistical
approaches to scale space analysis have been developed, most of them using smoothing to capture
scales in the data, but other interpretations of scale have also been proposed. We review the various
statistical scale space methods proposed and mention some of their applications.

Key words: Smoothing; multi-scale; curve fitting; time series; image processing; applications.

1 Introduction
The goal of scale space analysis is to discover the salient features of an object of interest that
appear in different scales. The object considered can be for example a time series or a digital
image in which case features are sought correspondingly in different temporal or spatial scales.
The scale space representation of a signal consists of a family of its smooths. Often smoothing
is performed by convolving the signal with a Gaussian kernel, that is, by computing a moving
average with Gaussian weights. The variance of the kernel indexes the individual scale space
family members. For an image, scale space representation is simply a family of blurs, and for
a curve, the representation is a family of smooth curves; see Figure 1 and the upper panel
of Figure 2 for examples. Each smooth is thought to provide information about the object of
interest at a particular scale or resolution.
The papers of Witkin (Witkin, 1983; 1984) are usually regarded as the origin of scale space
methodology, although it was only relatively recently noticed that such a concept was also
proposed in Japan already in the 1950s (Iijima, 1959; Weickert et al., 1999). In Witkin’s scale-
space filtering, a one dimensional signal is smoothed with a range of smoothing parameter
values and the smooths sweep out a surface he refers to as a ‘scale-space image’. The signal
is then described qualitatively by tracking the zeros of the second derivative of the smooths
across all scales. Witkin required that increasing smoothing level creates no new extrema, a
property often referred to as the causality condition. The fact that Gaussian convolution is the
only smoother that satisfies such a condition was first shown in Babaud et al. (1983) and Babaud
et al. (1986). Koenderink (1984) pointed out that Gaussian scale-space representation can be
obtained by solving a linear diffusion equation. This was the starting point of a whole field of
research on linear diffusion in image processing (Sporring, 1997). Perona & Malik (1990) were
the first to extend this approach for nonlinear diffusion processes. Since Witkin’s work, various
axiomatic foundations has also been developed for linear scale space theory (e.g. Chapter 6 in
Sporring (1997)). Scale space methodology is nowadays well established in computer vision.

1
This paper is followed by discussions and a rejoinder.

© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute. Published by John Wiley & Sons Ltd, 9600 Garsington
Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA 02148, USA.
2 L. HOLMSTRÖM & L. PASANEN

Figure 1. Five images from the scale space representation of an image of a plant.

Figure 2. SiZer analysis of the Old Faithful Data. Upper panel: the family plot of kernel density estimates of the data shown
as jittered green dots. Lower panel: the SiZer map that summarises the statistical significance of the sign of the derivative of
the smooths of the density underlying the data, blue for positive slopes and red for negative slopes. Purple indicates a non-
significant slope and gray data too sparse for inference. The smoothing level indicated by the black line corresponds to the
smooth shown in black in the family plot. The level of significance used is 0.05. The analysis was produced by the software
provided by J.S. Marron at http://www.unc.edu/~marron/marron.html.

Comprehensive introductions to the field can be found, for example, in Lindeberg (1994) and
Sporring (1997).
The focus of this review is scale space as a tool in statistical analyses. Statistical scale space
methodology began to emerge in the 1990s, first in mode detection for univariate and bivariate
density estimation (Minnotte & Scott, 1993; Minnotte et al., 1998), gradually growing into a
rich set of techniques quite distinct in character from the rest of scale space literature. The
seminal work was the SiZer idea introduced in Chaudhuri & Marron (1999) and Chaudhuri
& Marron (2000). As in Witkin’s approach, a family of smooths was used to discover scale-
dependent features of a curve. However, Chaudhuri and Marron allowed for the fact that the
data at hand provide only an estimate of the underlying truth and their aim was to distinguish
the actual structural features of the underlying curve from artifacts caused by noise. Statistical
inference was therefore needed to assess which of the apparent features are structure and which
are just sampling artifacts.
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
Statistical Scale Space Methods 3

As in SiZer, the central goal in all statistics-based scale space analyses is to establish the
significance or credibility of the scale-dependent features in the observed data (Holmström,
2010b). The inference results are summarised as easily interpretable ‘maps’ that facilitate the
application of these methods even by non-experts (e.g. Figure 2). Instead of the Gaussian
convolution postulated in formal scale space theory, the statistics-based approaches often use
other types of smoothers. The useful properties of a particular smoother is deemed to be
more important than strict adherence to the scale space axioms. Outside statistical scale space
literature such practice has also been motivated, for example, by computational efficiency
(Wang & Lee, 1998).
There are, of course, other approaches that one can use to analyze data in multiple scales,
such as the various wavelet-based methodologies (e.g.Vidakovic (1999) & Percival & Walden
(2006)). We will however limit our scope only to techniques that use the level smoothing or
some similar device to represent the scale in the object underlying the data. Even with our scope
thus greatly narrowed, much territory needs to be covered as statistical scale space analysis has
within a relatively short time blossomed into an impressively versatile ensemble of techniques
with many interesting applications.
The rest of the article is organised as follows. Section 2 covers the original SiZer, its variants
and some other related work. Univariate Bayesian variants of SiZer are described in Section 3
and various approaches to time series in Section 4. Extensions to higher dimensions and data
on manifolds are considered in Section 5. Section 6 discusses various interpretations of scale,
different uses of derivatives in feature detection and comparison of maps. Applications of
scale space methods are described in Section 7 and a short summary concludes the article in
Section 8.
2 SiZer and Its Descendants
2.1 The Original SiZer
We begin the more detailed discussion of statistical scale space methods by describing the
SiZer approach of Chaudhuri and Marron. It has the closest connection to the formal scale
space ideas when applied to density estimation. Thus, consider a sample x1 ; : : : ; xn  p from
a univariate probability density function p and its Gaussian kernel estimate,
n
1X
pO .x/ D K .x  xi /; (1)
n
i D1

where the kernel K is the standard normal density function, K .x/ D K.x=/=, and  > 0
is a smoothing parameter. Then the expectation
Z 1
EpO .x/ D K .x  y/p.y/dy D K  p.x/ (2)
1

is a smooth of p obtained by Gaussian convolution. We will in the following also use


the notations,
p .x/ D S p.x/ D K  p.x/;

where S denotes the smoothing operator. Now ¹pO j > 0º can be viewed as the scale space
representation of the data x1 ; : : : ; xn . By (2) this is an unbiased estimate of the scale space
representation ¹p j > 0º of the true underlying density p. SiZer finds the statistically signifi-
cant features in the smooths p and summarises the results in a map. Note here the difference
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
4 L. HOLMSTRÖM & L. PASANEN

between ordinary smoothing-based inference and the scale space approach. While in conven-
tional inference the object of interest would be the unknown function p, scale space inference
instead targets its smooths p .
Figure 2 shows an example of SiZer analysis. The n D 272 observations xi are the dura-
tions in minutes of the eruption or the Old Faithful geyser in Yellowstone National Park,
Wyoming, USA (Härdle, 1991; Azzalini & Bowman, 1990). The upper panel, the so-called
family plot, displays part of the scale space of the data, that is, a family of kernel density
estimates pO for a number of smoothing levels . The lower panel, the SiZer map, summarises
inference about the statistically significant features of the smooths p . The inference is based
on the sign of the derivative p0 .x/ D EpO0 .x/ – SiZer is an acronym for Significant ZERo
crossings of derivatives. A pixel at .x; log10 .// is colored blue or red depending on whether
the confidence interval of p0 .x/ (cf. (3)) is above or below 0. Purple means that the confidence
interval of the derivative includes 0 and gray means that the data are too sparse for inference,
indicated by an effective sample size (ESS) smaller than 5 (cf. (4)). The black line marks the
smoothing level that produces the black curve in the upper panel. In the largest scales, the
underlying function p is unimodal with maximum at around 3.5 min because the derivative is
positive to its left and negative to its right. For a large range of scales, there is clear evidence
for two strong modes but none of the smaller modes turns out to be significant.
The confidence interval for p0 .x/ is of the form

c pO 0 .x//;
pO0 .x/ ˙ q SD. (3)


with an appropriate quantile q > 0. The standard deviation in the confidence limits can be
obtained by using the fact that the kernel estimator consists of sums of independent terms.
Note how scale space analysis sidesteps the bias problem always present in nonparametric
smoothing because the confidence interval is centered precisely on the quantity of interest,
EpO0 .x/ D p0 .x/.
In Chaudhuri & Marron (1999), four alternatives to determine q were proposed: indepen-
dent Gaussian quantile for each x, an approximate simultaneous quantile over all x based on
approximately independent blocks of data, and two methods based on the bootstrap, one which
is simultaneous over x and one which is simultaneous over both x and . For example, the
approximate simultaneous method estimates the ESS for each .x; / as
n
X
ESS.x; / D K .x  xi /=K .0/ (4)
i D1

and defines the number of independent blocks as m D n=avgx ESS.x; /: The associated
quantile is obtained as
!
1=m
1 1 C .1  ˛/
qDˆ ;
2

where 1  ˛ is the confidence level and ˆ is the Gaussian distribution function. Nowadays the
default inference method of SiZer software uses extreme value theory for Gaussian processes
for improved multiple hypothesis testing required in simultaneous inference (Hannig & Mar-
ron, 2006). The map of Figure 2 in fact uses this method. Such inference based on advanced
distribution theory is also used in many variants of SiZer developed in recent years. In a SiZer
map, the range of smoothing values used should be selected so that all interesting features in the
data are shown. This is discussed in Marron & Chung (2001) and Chaudhuri & Marron (1999).
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
Statistical Scale Space Methods 5

The original SiZer considered also regression. The model is

yi D .xi / C "i ; (5)

where x1 <    < xn are values of an explanatory variable,  is the unknown regression
function and the errors "i are independent but possibly heteroscedastic. Local linear estimates
of  and 0 are employed, O  .x/ D aO  , O 0 .x/ D bO , where
n
X
.aO  ; bO / D argmin Œyi  .a C b.xi  x//2 K .x  xi /; (6)
a;b i D1

and SiZer explores the significant features in the scale space representation ¹ j > 0º, where

 .x/ D S .x/  EO  .x/:

In the regression setting, the confidence interval for the derivative 0 .x/ also needs an estimate
for the error variance in (5). Causality in scale space smoothing is now more complicated as
for example in the case of local linear regression monotonicity of the number of extrema with
respect to the scale does not hold anymore (Chaudhuri & Marron, 2000).

2.2 SiZer Variants


A smoothing spline version of SiZer for regression was proposed in Marron & Zhang (2005).
The authors argue that even though splines and local linear regression perform similarly, neither
can replace the other. For instance, with few data points and low noise level, spline smoothing
appears to be more effective than local linear regression whereas with more, noisier data, local
linear regression generally seems to work better.
Hannig & Lee (2006) proposed a robust SiZer for regression. The estimator (6) is replaced by
n
X  
yi  .a C b.xi  x//
.aO ;c ; bO;c / D argmin c K .x  xi /; (7)
a;b O
i D1

where O is a robust estimate of the error standard deviation and c is the Huber loss with cut-off
c > 0,
² 2
x =2; if jxj  c;
c .x/ D
jxjc  c 2 =2; if jxj > c:
Different values of c produce SiZer maps with different levels of robustness: the smaller the
cut-off, the more robust the fit. Robust SiZer can also be used for outlier detection.
As the name suggests, the jump SiZer of Kim & Marron (2006) is designed to detect sudden
jumps in a curve underlying noisy data. A jump produces a ‘funnel shape’ in the SiZer map
because the slopes are significant even for very small smoothing parameter values. The bound-
ary of a jump funnel grows approximately linearly with the smoothing parameter, and therefore,
visualisation of inference results is more effective if the usual logarithmic scale log10  in a
SiZer map is replaced by the linear scale . However, for the purposes of jump detection such
a linear scale focuses the analysis too much on the larger scales and the authors therefore rec-
ommend that only the lower half of the usual SiZer smoothing parameter range is used when
the results are displayed. Figure 3 shows an example of jump SiZer analysis. The Mutagram
of Müller & Wai (2004) finds change points in a time series associated with a sudden jump or
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
6 L. HOLMSTRÖM & L. PASANEN

Figure 3. Jump SiZer analysis of penny the thickness data from Kim & Marron (2006). Left panel: the family plot. Right
panel: the SiZer map with black and white indicating positive and negative slope, respectively, and light and dark gray
indicating non-significance and data sparseness, respectively. Funnels highlight two jumps. The smoothing parameter (here
denoted by h) has a linear scale. The dotted lines show the effective size of the smoothing kernel .

change in the slope. As in SiZer-like methods, multiple scales are considered and the results
are summarised graphically.
Marron & de Uña-Álvarez (2004) extended SiZer for density and hazard rate estimation
when the data are length biased, that is, when the probability of an observation appearing
in the sample is proportional to its value. In addition to length biasedness, the data can also
be censored.
The quantile SiZer of Park et al. (2010b) targets the quantile structure of the data by replacing
local linear regression by a local linear quantile smoother. The smooth corresponding to the
quantile of order 0 <  < 1 is defined by replacing (6) with
n
X
.aO ; ; bO; / D argmin  Œyi  .a C b.xi  x//K .x  xi /; (8)
a;b i D1

where  is the so-called check loss function


²
x; if x  0;
 .x/ D
x.  1/; if x < 0:
Corresponding to a set of different quantiles, a collection of quantile SiZer maps is used to
explore the structure underlying the data.
The causal SiZer (c-Sizer) of Skrøvseth et al. (2012c) is concerned with detecting change
points in a live monitoring system such as disease surveillance. The authors consider two prob-
lems, changes in the distribution of event occurrence times and changes in measurements made
at those times. The former involves density estimation and the latter regression. The analysis
should be strictly causal, that is, a detected event should strictly precede the observed effect in
time. However, due to infinite support, use of the Gaussian kernel in estimation does not lead
to this kind of causality because an estimate at some time x involves data that become avail-
able only after x. Causal SiZer is a variant of SiZer where the temporal causality of inference
is enforced by employing a symmetric kernel with finite support lagged by one half of the sup-
port. Instead of single points, the method retrospectively produces reliable interval estimates of
the change points.
In the case of discrete data, the least squares regression setting of the original SiZer may be
inefficient, and the use of likelihood or quasi-likelihood based estimation can work better. Such
a version of SiZer was proposed in Li & Marron (2005). In a local polynomial quasi-likelihood
approach, one uses a link function g and a quasi-likelihood Q and obtains a smooth of the
regression function by replacing the square loss in the formula (6) by Q.g 1 .aCb.xxi //; yi /,
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
Statistical Scale Space Methods 7

argmin by argmax, and then defining O  .x/ D g 1 .aO  /. Because the link function is usually
monotone, inference and the construction of the SiZer map can be carried out for the transform
g ı , instead of  itself which simplifies the analysis. However, numerical maximisation is
required in the estimation, and the authors develop an algorithm which manages to keep the
computational burden close to the level of the original least squares SiZer. Park & Huh (2013b)
improved on this work basing the SiZer inference on an extension of the Gaussian process
approach of Hannig & Marron (2006) to discrete data. An asymptotic analysis of the method is
also provided.
In a varying coefficient linear regression model, the regression coefficients are allowed to be
nonparametric functions of a covariate,
m
X
yi D ˇj .ui /xij C "i ;
j D1

where the ui ’s and xij ’s are values of covariates, conditional on which the error is zero
mean with variance possibly depending on ui . The coefficient functions ˇj are modelled
nonparametrically and Zhang & Mei (2012) described a version of SiZer to explore their
significant features. Using local linear approximation, one estimates ˇOj; .u/ D aO j; , ˇOj;
0
.u/ D
Obj; , where
8 92
n <
X m
X =
.Oa ; bO  / D argmin yi  Œaj C bj .ui  u/xij K .ui  u/; (9)
a;b : ;
i D1 j D1

and a D Œa1 ; : : : ; am T , b D Œb1 ; : : : ; bm T , and the minimisers are aO  D ŒaO 1; ; : : : ; aO m; T ,
bO  D ŒbO1; ; : : : ; bOm; T . A separate SiZer map is constructed for each ˇj . Zhang et al. (2013)
proposed an robust alternative to this approach, replacing the square in (9) by absolute value.
Confidence intervals are obtained with a modified version of the residual-based wild bootstrap.
Park & Huh (2013a) applied SiZer to the log-variance function in nonparametric regression.
In a random design interpretation of (5), let "jx  N.0; v.x// and consider the squared residual
r D .y  .x//2 . Then rjx  v.x/21 and the log-likelihood of r given s.x/ D log.v.x// is

`.s.x/; r/  log p.rjs.x// D .1=2/Œlog.2 / C log.r/ C s.x/ C r exp.s.x//:

In a local likelihood procedure, one considers the data ri D .yi  .xi //2 and replaces the
square loss in (6) by `.a C b.xi  x/; ri / and argmin by argmax. The log-variance and its
derivative are then estimated by sO .x/ D aO  , sO0 .x/ D bO . Here, the unknown function  is
replaced by a pilot smooth, and the pilot smoothing parameter becomes another dimension in
scale space analysis in the same way as in Rondonotti et al. (2007) (cf. Section 4.1). Confidence
intervals are estimated by applying the Gaussian process technique of Hannig & Marron (2006).
Asymptotic analysis of the method is also provided.

2.3 Some Other Uses of the Scale Space


González-Manteiga et al. (2008) use SiZer for tests of the components of a nonparametric
additive regression model. Consider a d -dimensional covariate vector x D Œx1 ; : : : ; xd T and a
model y D .x/ C  .x/", where " is independent of x, has zero mean and unit variance. An
additive model assumes that the conditional mean is of the form
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
8 L. HOLMSTRÖM & L. PASANEN

d
X
.x/ D c C j .xj /;
j D1

where c is a constant, and the component functions j satisfy smoothness and identifiability
conditions. Tests are proposed for additivity, component significance, the adequacy of paramet-
ric models for the components and for interactions. The idea is to use SiZer and its second
derivate version, SiCon (cf. Section 6), to examine the residuals when a particular model is fit-
ted. Significant features in the residuals indicate a poor fit. For interactions, one applies scale
space analysis to differences of fits between models with and without interactions or, in case of
logistic regression, differences between deviances. Additive models are considered also in the
report Martínez-Miranda (2005) where a bootstrap-based local (adaptive) smoothing param-
eter selection strategy for component estimation is evaluated using a separate SiZer map for
each j . Using visual cues offered by the SiZer map in local bandwidth selection was in fact
contemplated also in the original work of Chaudhuri and Marron.
Several papers have proposed to investigate differences between regression functions using
scale space analysis. Given two regression functions 1 and 2 , Park & Kang (2008) make
inferences about their difference by performing SiZer analysis on  D 2  1 . However,
instead of its derivative, the function  itself is used, and therefore, the confidence interval (3)
c O  .x//. To make comparisons between more than two regression
is replaced by O  .x/ ˙ q SD.
functions, the authors propose to analyze the distributions of the fitted residuals. Thus, consider
m models,
yj i D j .xj i / C "j i ; j D 1; : : : ; m; i D 1; : : : nj :

A pilot smoothing parameter p is chosen and two sets of standardised residuals are computed,
one based on the differences yij  O p .xj i / that correspond to the null hypothesis that all j ’s
are equal and another on the differences yij  O j;p .xj i / that corresponds to the alternative that
some of the j ’s may be different. The density functions of these two sets of residuals are then
subjected to SiZer analysis in the same manner as in the case of the two regression functions 1
and 2 above, and the discovery of significant features indicates that the regression functions
are not all equal. As before, the pilot smoothing parameter adds a new dimension to the scale
space inference. This approach to the comparison of regression functions was extended to time
series in Park et al. (2009) by employing SiZer for dependent data (cf. Section 4.1). However,
for comparison of more than two time series, instead of considering residual densities, a pair of
time series is assumed to be observed at the same time points xi and SiZer analysis is applied
directly to the difference of the residual time series yij  O  .xi / and yij  O j; .xi /. Recently,
an ANOVA type approach for comparing multiple curves observed at arbitrary values of the
dependent variable was developed in Park et al. (2014). The advantage of this method over the
ones that rely on residuals is that, in the spirit of original SiZer, one obtains information also
about the potential local scale dependent differences of the underlying curves themselves.
Differences between trends of two non-stationary time series was considered also in
Godtliebsen et al. (2012) but, instead of analyzing the difference of means  D 2  1 itself,
similarly to SiZer, inference was applied to the derivative 0 D 02; 01; for a range of scales.
In addition to the original SiZer, the Bayesian BSiZer was also employed (cf. Section 3.1). In the
context of image analysis, the Bayesian SiZer for images, iBSiZer, also considers differences
of functions (Section 5.2).
An interesting application of scale space thinking is the multi-scale approach to supervised
classification developed by A. Ghosh and his co-workers. In a two-class situation, Ghosh et al.
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
Statistical Scale Space Methods 9

(2006b) propose to estimate the posterior probability in favor of class 1 by

1 pO1 .xj1/
pO1 ;2 .1jx/ D ;
1 pO1 .xj1/ C 2 pO2 .xj2/

where 1 and 2 are the prior probabilities of the two classes and pO1 .xj1/ and pO2 .xj2/ are
kernel estimates of the class densities. The posterior probability of an observation x or, alter-
natively, the ‘p-value’ P1 ;2 .x/ D P ¹ 1 pO1 .xj1/ > 2 pO2 .xj2/º is then visualised with a
two-dimensional plot using the smoothing parameters 1 and 2 as coordinates. The final clas-
sifier is based on a weighted average of class posterior probabilities corresponding to different
smoothing parameter combinations .1 ; 2 /, where the weights take into account the estimated
misclassification rate as well as the p-value of each observation. A related approach is applied
to k-nearest neighbor classification in Ghosh et al. (2005) & Ghosh et al. (2006a).

3 Univariate Bayesian Methods


Two Bayesian approaches to univariate scale space regression have been proposed, BSiZer
and Posterior Smoothing. Multi-scale analysis of features of the function  in (5) is based
on the posterior distribution p.0 .x/jy/ of the slopes of the smooths  D S , given the
data y D Œy1 ; : : : ; yn T . The SiZer map is replaced by a feature credibility map computed
from the posterior distribution of the slopes. One advantage of the Bayesian approach is that
inference about the features of the underlying function can be performed even in the smallest
scales because data sparsity is not a problem. Of course, reliance on posterior inference assumes
that the likelihood and the priors used are reasonable. The Bayesian methods are also often
computationally more demanding than the frequentist approaches.

3.1 BSiZer
The BSiZer, first introduced in Erästö & Holmström (2005) dealt only with a discrete param-
eter vector  D Œ1 ; : : : ; n T , but it was subsequently extended in Erästö & Holmström
(2007) to the case of a smooth regression function by using a spline model; see also Holm-
ström (2010a) for a comprehensive review. Thus, assume that  is a natural cubic spline, let
a  x1 <    < xn  b be fixed values of the explanatory variable and let i D .xi / be the
value of the spline at the knot xi . The spline  is uniquely determined by its values at the knots.
Denote by S an n n smoothing matrix and let  D S  be the smooth of . Instead of
the posterior distribution p.0 jy/ of the function 0 one then analyzes the finite dimensional
distribution p.D jy/, where D D Œ0 . 1 /; : : : ; 0 . r /T is the vector of slopes of  at
some fixed set of points a < 1 <    < r < b and D is the matrix that computes the deriva-
tives 0 . j / from the vector  . Because the distribution p.D jy/ is obtained by applying
the linear transformation DS to p.jy/, one only needs to deduce p.jy/.
Thus, let " D Œ"1 ; : : : ; "n T be the vector of errors in (5) where in the simplest case, one
assumes that "  N.0;  2 I/. Then the likelihood p.yj;  2 / is Gaussian with mean  and
covariance  2 I. An inverse-2 prior p. 2 / is assumed for the error variance and for  a
smoothing prior is used,
 
p.j0 / / .n2/=2
0 exp .0 =2/T K ; (10)
R b
where K is a positive semidefinitive matrix such that T K D a Œ00 .x/2 dx: The prior
penalises  for roughness, and the level of penalty is controlled by the parameter 0 > 0;
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
10 L. HOLMSTRÖM & L. PASANEN

which can be assigned a separate hyper prior. The scale space smoothing matrix is defined as
S D .I C K/1 .
The joint posterior p.; ; 0 jy/ is obtained from Bayes’ theorem,
   
p.; ; 0 jy/ / p  2 p.j0 /p.0 /p yj;  2 :

The posterior credibility of the slopes 0 .x/ are inferred by first simulating a large sample from
the marginal distribution p.jy/ and the transforming the sample by DS . A credibility map
corresponding to the independent quantile SiZer map could then be constructed by choosing
a threshold value 0:5 < ˛ < 1 and coloring a map pixel . j ; / blue or red according to
whether P .0 . j / > 0 j y/  ˛ or P .0 . j / < 0 j y/  ˛ (and gray otherwise). However,
the maps are in fact drawn based on the joint posterior probabilities over the locations j . An
example of a BSiZer map is displayed in the lower panel of Figure 11. BSiZer analysis can also
handle correlated errors as well as situations where the xi ’s themselves contain errors (Erästö
& Holmström, 2007).

3.2 Posterior Smoothing


A slightly different Bayesian interpretation of SiZer, Posterior Smoothing, was proposed in
Godtliebsen & Øigård (2005). In the model (5), Gaussian independent errors are assumed and
the function  is thought of as a time-varying signal observed on an equispaced grid of times
xi . The model allows also a possible blurring by replacing  by A where A is a known matrix.
A homoscedastic situation is considered and the error variances are assumed to be known or
they are estimated from the data before Bayesian analysis is applied. A smoothing prior is
again adopted,
" n1
#
X
p.j0 / / exp 0 .i  i C1 /2 ; (11)
i D1

but now 0 acts as the scale space smoothing parameter. The credibility map is computed from
the approximate posterior distributions of the discrete derivatives and simultaneous inference
is approximated with the independent blocks idea of the original SiZer.
Øigård et al. (2006) made several important improvements to the original Posterior Smooth-
ing method. These include a better prior model for the signal, use of band-matrix algorithms
for exact and computationally efficient inference, and a better way to summarise the results in
a feature map. Assuming in (5) that x1 > 0, an integrated Wiener process prior
Z x
1
.x/j¹0 ; .0/ D 0º D p .x  h/d W .h/ (12)
0 0
is used, where W .h/ is a standard Wiener process. This choice of prior allows one to eval-
uate the posterior distributions of the true derivatives 0 .xi / instead of using their discrete
approximations. Note that there is a well-known connection between the integrated Wiener
process and the cubic splines that are used in the BSiZer prior (10) (Wahba, 1978). The key
idea in improved Posterior Smoothing is to consider, instead of  D Œ.x1 /; : : : ; .xn /T ,
the augmented random vector  D Œ.t1 /; 0 .t1 /; : : : ; .tn /; 0 .tn /T . It is viewed as a Gaus-
sian Markov random field with an improper Gaussian prior distribution p.j0 / which is of
type (10). This, combined with a Gaussian likelihood produces a multivariate normal posterior
p.jy; 0 / whose mean and precision matrix (inverse covariance matrix) can be found exactly.
The band structure of the precision matrix facilitates fast computations of the posterior mean
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
Statistical Scale Space Methods 11

and marginal variances. In the credibility map, an exact test of the form jE.0 .ti /jy; 0 /j >
q.˛=2/SD.0 .ti /jy; 0 /; can be applied, where q.˛=2/ is the ordinary Gaussian ˛=2 quantlile.
A suitable range of smoothing levels for the credibility map is automatically determined using
a concept of ESS which is natural for the particular Bayesian model used.
BSiZer and Posterior Smoothing interpret SiZer somewhat differently. The idea of SiZer is
to make inferences about the smooths of the underlying true function , and BSiZer aims at
the same goal by first explicitly constructing its posterior distribution and then smoothing it.
The separation of posterior modelling and scale space smoothing also facilitates changes to
the observation model (5). Posterior smoothing, on the other hand, implements the scale space
idea by reconstructing the underlying  using a range of alternative prior smoothing parameter
values and making inferences about the features in these reconstructions.

4 Time Series
4.1 Time Domain Analyses
Time series trend estimation can be interpreted as regression with a fixed equispaced design
and correlated errors. Thus, in (5) xi D i and the "i ’s are dependent. In general, however, a
non-constant mean of a non-stationary time series cannot be distinguished from a correlated
error structure of a stationary time series on the basis of just one observed series. The time
series SiZer of Rondonotti et al. (2007) assumes that the "i ’s are weakly stationary and then
uses a pilot smooth of the data to estimate their autocovariance. The pilot smoothing parameter
p becomes a new scale space dimension and a correlation-adjusted SiZer map is produced
for several alternative values of p , each corresponding to a particular trade-off between an
assumed trend and level of dependence in the data. Small p postulates weakly correlated errors
and a highly structured trend while a large p allows the errors to explain most of the observed
features in the data. In the spirit of a scale space, this SiZer for time series avoids commitment
to a particular error model by considering a whole family of them simultaneously. However, in
the final visualisation of inference results, only four values of the pilot smoothing parameter
are considered that reflect a wide variety of trade-offs between the trend and error correlation.
As a numerical measure of this trade-off, one uses the ratio
n n
!
X X
2 2
IR.p / D "Op ;i = max "Op ;i ; (13)
p
i D1 i D1

(Indicator of the Residual component), where "Op ;i ’s are the residuals of the pilot smooth. The
four pilot smooths correspond to values of p for which the ratio is closest to 0%, 25%, 50%
and 75%. Figure 4 shows an example of such scale space analysis with time series SiZer.
Several improvements to time series SiZer were suggested in Park et al. (2009). The quantiles
in the confidence intervals were estimated using the extreme value theory ideas of Hannig &
Marron (2006) and the autocovariance function was estimated based on differenced time series.
These changes result in a method that for time series with moderately correlated errors does
not require pilot smoothing.
In a Bayesian approach, BSiZer also allows for a dependent Gaussian error structure and
Erästö & Holmström (2007) discusses how it may be used to distinguish dependence in errors
from an underlying trend.
Park et al. (2004) introduced the Dependent SiZer taking a goodness-of-fit approach to SiZer
for time series where, instead of estimating the dependent error structure from the data and
using it to adjust scale space inference accordingly, a specific time series model, such as a
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
12 L. HOLMSTRÖM & L. PASANEN

Figure 4. Time series SiZer analysis of deaths data from Rondonotti et al. (2007). First row: the data, the family plot
and the IR defined in (13); the smoothing parameter is denoted by h. Green highlights the IRs that correspond to the four
pilot smoothing parameter values considered in the next rows. Second row: the smooths corresponding to the selected pilot
smoothing parameters. Third row: the corresponding residuals. Fourth row: the SiZer maps where colors are interpreted as
in Figure 3.

fractional Gaussian noise process is assumed and then one tests whether the model fits the data.
Now the SiZer map flags deviations from the model while a good fit corresponds to feature
non-significance.
The Significant Non-stationarities (SiNos) method of Olsen et al. (2007) detects deviations
from stationarity of a time series in the form of a change in the mean, variance or 1-step auto-
correlation. A pair of adjacent windows is slid along the time series and sample estimates of
these statistics within the two windows are compared to test the null hypothesis of no depar-
ture from stationarity. The test statistics can be expressed as ratios of quadratic forms and
Gaussian distribution-based saddle point approximation is used to determine their p-values.
No smoothing is involved but the scale space idea is implemented by varying the length of
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
Statistical Scale Space Methods 13

the two windows used for testing. The global significance level on each scale is controlled
by adopting the method of false discovery rate. In the wavelet SiZer and wavelet SiNos of Park
et al. (2007), non-stationarities are detected by performing scale space analysis on sequences
of squares (or other functions) of wavelet coefficients of a time series. Thus, for a fixed wavelet
scale, SiZer (or SiNos) is applied to the time series of squared wavelet coefficients and the pro-
cess is repeated for a number of wavelet scales. The analysis is facilitated by the fact that, for a
fixed wavelet scale, the wavelet coefficients are only weakly correlated.
The MRAD (Multi-Resolution Anomaly Detection) procedure described in Zhang et al.
(2008) and Zhang et al. (2014) detects outliers in time series data with long-range dependence,
such as Internet traffic. The technique operates on multiple scales that correspond to window
widths used to aggregate the data. An observed time series y1 D Œy11 ; : : : ; y1n  is aggregated
by summing over a scale of dyadic window widths either using non-overlapping or overlapping
sliding windows. At scale k, an aggregated series yk D Œyk1 ; : : : ; yk nk  is produced, where
nk D dn=2k1 e and each yki is a sum over 2k1 original observations normalised to have,
under the null hypothesis of fractional Gaussian noise (no outliers), the same Gaussian marginal
distribution that does not depend on k or i. In the case of non-overlapping windows,
k1
 H
2X
k1
yki D 2 y1;.di=2k1 e1/2k1 Cj ;
j D1

where H is the Hurst exponent of the process and similarly for overlapping windows. Such
aggregation is similar to wavelet smoothing associated with Haar wavelets. The test to reject
the null hypothesis at time i is of the form maxkD1;:::;M jyki j > C˛M , where C˛M depends on
the estimated Hurst exponent, the chosen significance level ˛ and M , the number of scales
examined. In the MRAD map, the pixel color at a location .i; k/ indicates the p-value of jyki j.
Such multi scale testing can increase the power in outlier detection. Figure 5 shows an example
of MRAD in action (Figure 3.19 of Zhang (2007)).

4.2 Frequency Domain Analyses


A local maximum in the spectral density of a stationary time series may indicate a periodic
component in the data. However, a spectrum estimate such as the periodogram usually also
shows features that are in fact just sampling artifacts. The idea in scale space spectral analysis
is the same as in SiZer for density estimation: a feature, such as a local maximum is more
certainly real if it appears in a substantial range of scales rather than just making a spurious
appearance in a narrow range of smooths.
Sørbye et al. (2009) propose a Bayesian scale space analysis of the log-spectral density.
Consider an observed finite time series ´1 ; : : : ; ´2n and let
ˇ 2n ˇ2
1 ˇˇX ˇ
2 i !k ˇ
P .!/ D ˇ ´t e ˇ ; 1=2  !  1=2 (14)
2n ˇ ˇ
kD1

be the periodogram estimate of the spectral density. Let !j D j=.2n/; j D 1; : : : ; n be the


Fourier frequencies and denote by p the true spectral density of the process underlying the data.
Then, asymptotically,
²1 2
P .!j /  ; ! ¤ 1=2;
 22 2 j (15)
p.!j / 1 ; !j D 1=2;
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
14 L. HOLMSTRÖM & L. PASANEN

Figure 5. MRAD analysis of an Internet traffic semi-experiment. Top: packet counts during an interval of one hour with an
artificially injected port scan anomaly around 100000. The time unit is 102 s. Bottom: MRAD map where the color of a pixel
indicates p -value.

and the ratios P .!j /=p.!j / are independent. Estimation of the log-spectral density
 D log p is then viewed as a nonparametric regression problem, yj D .!j / C "j , where
yj D log.P .!j // and "j is distributed either as log.22 / or log.21 /. Inference about the
credible features of  is based on the posterior density of its derivative, which is found using
an approach similar to the Posterior Smoothing method of Øigård et al. (2006) described in
Section 3.2. However, because the noise "j is not Gaussian, a so-called simplified Laplace
approximation method of Rue et al. is used for accurate and efficient computation of the
credibility maps.
Olsen et al. (2008) use the spectral density to detect change points where the time series
leaves one stationary state and enters another. What should be considered as change depends
on the time scale considered which leads to a scale space approach to the problem. A change
at a time point is tested by computing the periodograms P1 and P2 in windows of length N
to its left and right and then considering the ratios P2 .!j /=P1 .!j /, j D 1; : : : ; N=2, where
!j D j=N is the j th Fourier frequency. The window length N has the role of the scale space
smoothing parameter.
A Bayesian approach for finding credible features in discrete wavelet power spectra was
described in Divine & Godtliebsen (2007). An i.i.d. Gaussian likelihood and a prior of the
form (11) for the wavelet coefficients lead to a posterior that can be sampled exactly. Now the
smoothing parameter in scale space analysis is the wavelet transform scale.

5 Higher Dimensional Settings and Data on Manifolds


5.1 S 3 and Its Variants
It is natural to consider the ideas developed for curves in a multivariate context. Godtlieb-
sen et al. (2002) described a SiZer for two-dimensional densities. The features of a surface
defined by a kernel density estimate of bi-variate data xi 2 R2 , i D 1; : : : ; n, were described
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
Statistical Scale Space Methods 15

Figure 6. Left: the noisy example image. Middle: streamlines computed from significant gradient directions. Right: arrows
indicate significant gradient direction and different colors flag different types of significant local curvature as defined by the
sign and significance of the eigenvalues ;1 and ;2 : hole (yellow), long valley (orange), saddle point (red), long ridge
(purple), and peak (dark blue). The example was obtained from http://www.unc.edu/~marron/marron.html.

in multiple scales using the gradient and the eigenvalues of the Hessian matrix. The method-
ology is referred to as S 3 for Significance in Scale Space and it was subsequently modified in
Godtliebsen et al. (2004) for noisy digital images using a regression model. The inference and
visualisation methods are similar in these two 2D settings of SiZer, and we describe next the
latter in more detail.
Figure 6 shows an example of S 3 analysis where the underlying image is a volume of revo-
lution of the Gaussian density with non-zero mean that has an off-centered cylinder removed.
The noisy image is on the left and frames from feature analyses based on streamlines as well
as significant gradients and curvatures are shown in the two other panels. These panels are in
fact individual frames corresponding to a fixed scale of whole movies that run through a range
scales, visualising features that appear in different image resolutions.
The model for a noisy image is

yij D ij C "ij ; i D 1; : : : ; M; j D 1; : : : ; N; (16)

where .i; j / is a pixel location in an M N image. It is assumed that ij D .i; j /, where
 is a smooth function of two continuous real variables and that the errors "i;j are indepen-
dent. Scale space smoothing is performed using discrete Gaussian convolution with adjustment
for boundary effects. Both the gradient (first order partial derivatives) and curvature (second
order partial derivatives) are needed for useful feature detection. Denote by ;k the kth partial
derivative of the smooth  . The significance of the magnitude of the gradient is tested by

O ;1 .i; j /2 O ;2 .i; j /2


C > q2 ;
O 12 O 22 2

where the partials and their variances are estimated from a Gaussian kernel smooth of the
observed noisy image and the 2 -distribution is justified by the approximate bivariate normality
of ŒO ;1 .i; j /; O ;2 .i; j /T . However, no normal approximation is needed when the image noise
is Gaussian. To explore curvature, denote by ;1  ;2 the two eigenvalues of the Hessian
matrix of  . The null distribution of the test statistic max¹j O ;1 .i; j /j; j O ;2 .i; j /jº is obtained
by simulation using the estimated Hessian matrix. Holes, valleys, saddle points, ridges and
peaks are flagged with different colors based on the significance and sign of the eigenvalues
(Figure 6). Both for gradients and curvature, the idea of independent blocks of the original
SiZer is applied to approximate simultaneous inference over the whole image.
The S 3 method was extended to images with spatially correlated noise in Vaughan et al.
(2012). The covariance of the "i;j ’s in (16) was assumed isotropic and S 3 was applied to
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
16 L. HOLMSTRÖM & L. PASANEN

analyze numerical climate model outputs. In Thon et al. (2012), a Bayesian version of S 3 was
proposed. The method is called BMulA for Bayesian multiscale analysis, and it can also be
considered as a two-dimensional version of Posterior Smoothing (Godtliebsen & Øigård, 2005;
Øigård et al., 2006) in that scale space smoothing is controlled by the image prior. The image
model is again (16) but where the errors now are i.i.d. Gaussian with known variance. The image
prior is an intrinsic Gaussian Markov random field, and therefore, the posterior is Gaussian,
which eliminates the need for simulation-based inference. However, the test statistic quantile
needed for testing significance of curvature is still based on simulation. Computations are sped
up by using toroidal image boundary conditions and Fourier methods. The method is applied to
detecting hairs in dermatoscopic images of skin lesions.
Ganguli & Wand (2004) extended S 3 for geostatistical data. The model is

yi D .xi / C "i ; i D 1 : : : ; n; (17)

where yi is a scalar response at a geographical location xi ,  is a smooth function and the


"i ’s are i.i.d. Gaussian. The data xi are not equispaced and contain regions of data sparsity,
and therefore, the setting is different from that of Godtliebsen et al. (2004). A mixed-model
approach is taken, where  D Xˇ C Zu;  D Œ.x1 /; : : : ; .xn /T , X and Z are a bi-variate
polynomial and a low-rank thin plate spline design matrix, respectively, ˇ is a fixed parameter
vector and u is a zero-mean multivariate Gaussian random variable. The scale space smoothing
parameter is  D "2 =u2 . A spline-based smoother offers a computationally efficient estimator
in regions of data sparsity. Further, the mixed-model approach allows one to use distributional
theory and software that already exists for such models. Simultaneous inference over the whole
spatial grid can be performed based on the asymptotic distribution theory of linear mixed mod-
els. As in Godtliebsen et al. (2004), the significant features in the spatial field are explored
using gradient and curvature. In Ganguli & Wand (2007), the authors extend their approach to
discrete responses using generalised additive models.
In Pedersen et al. (2008), the S 3 methodology was adapted for detection of differences
between pairs of images to analyze the difference between numerical climate model output and
remote sensing observations of global surface albedo climatology.

5.2 iBSiZer
In most of the aforementioned methods, finding significant features in two-dimensional
data is based on inference about the first or second partial derivatives. In contrast, iBSiZer
(Bayesian SiZer for images) and the related multiresolution BSiZer are descendants of BSiZer
(Section 3.1) that consider the pixel intensities as such (Holmström & Pasanen, 2012; Holm-
ström et al., 2011). In iBSiZer, the model for an M N image is again (16). Let n D MN
and denote by  D Œ1 ; : : : ; n T the vectorized underlying true image. The model can then
be written as
y D  C "; (18)
where, in the simplest case, "  N.0;  2 I/ but other covariance structures such as isotropic
or heteroscedastic noise can also be considered. Given a smoothing matrix S , iBSiZer makes
inferences about the credible features of S  for a range of smoothing parameters . This is
performed by analyzing the posterior distribution p.S jy/. The method is typically used for
finding credible differences between two images 1 and 2 . When the associated errors "1 and
"2 are independent, the difference image  D 2 1 and its observed version y D y2 y1 can
still be assumed to satisfy (18). Because smoothing amounts to local averaging of image pixel
intensities, iBSiZer can be interpreted to provide information about the salient features of the
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
Statistical Scale Space Methods 17

Figure 7. First row: In the first panel is the posterior mean E.jy/ of the difference of two Landsat ETM+ satellite images
taken over eastern Finland on August 2, 1999 and May 29, 2002. The other four panels are posterior means E.S jy/
corresponding to different levels of smoothing. The effective size of the smoother is indicated by the yellow circle. Second
row: credibility maps that show the salient features in different spatial resolutions.

truth  in different spatial scales or image resolutions. In the i.i.d. error model, inverse-2 prior
is assumed for  2 , and for , one can use a smoothing Gaussian Markov random field prior
.nn0 /=2
0 1 0 T
p.j 2 ; 0 / / exp   Q ;
2 2 2
where Q is a positive semidefinite matrix with null space dimension n0 . The scale space
smoother is S D .I C Q/1 . Just as in BSiZer, 0 > 0 defines the level of smoothing when
the signal is reconstructed from a noisy observation. Other, application-specific priors can also
be used. Inference is based on posterior sampling and Fourier methods speed up computations.
Figure 7 shows iBSiZer analysis of a difference of two Landsat ETM+ satellite images taken
over the same location in eastern Finland at two different times. The credibly positive and
negative areas in the difference image are shown in blue and red, respectively. Gray is used
for pixels that do not differ credibly from 0. Inference is simultaneous over the images. See
Holmström & Pasanen (2012) for more information on this example. In Pasanen & Holmström
(2015), iBSiZer was extended for the analysis non-linear transformations of multidimensional
satellite images such as the normalized difference vegetation index used in remote sensing.
In a sense, in iBSiZer analysis at smoothing level i , all the features corresponding to the
scales   i are still present in the image. By considering differences of smooths, the mul-
tiresolution approach (Holmström et al., 2011) attempts to remedy this by isolating, for two
levels i < j , those features that are present at level i but not at j . One way to think about
the difference between conventional scale space analysis, such as iBSiZer, and the difference
of smooths idea is that the former uses low pass filtering whereas the latter uses band pass fil-
tering. Let 0 D 1 < 2 <    < L1 < L D 1 be a set of increasing smoothing levels. We
assume that  D 0 corresponds to no smoothing so that S0  D  and that S1  is the mean of
. In BSiZer multiresolution analysis (MRBSiZer), inference is made about the credible fea-
tures in the differences zi D Si   SiC1 , i D 1; : : : ; L  1 and zL D S1 . The term
P
multiresolution refers to the expansion  D L i D1 zi which resolves  into scale dependent
components. Graphical and numerical tools are available for selecting an appropriate sequence
of smoothing levels i . Figure 8 shows an example on MRBSiZer multiresolution decompo-
sition using L D 5 components. On the first row is the underlying true image and four image
multiresolution components displayed as posterior means E.zi jy/. The mean z5 has been left
out. In the second row are the noisy observed image and credibility maps for the multiresolu-
tion components. Here, white and black indicate the credibly positive and negative pixels, and
gray pixels are not credibly different from zero. A scale space technique based on differences of
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
18 L. HOLMSTRÖM & L. PASANEN

Figure 8. iBSiZer scale space multiresolution analysis of an image of John Lennon. First row: the underlying true image
and the mean components E.zi jy/, i D 1; : : : ; 4. The overall mean image has been omitted. Second row: the noisy observed
image and the credibility maps of the components. White and black indicate the credibly positive and negative pixels and
gray pixels are not credibly different from zero.

smooths was also proposed in Marvel et al. (2013). Like in Holmström et al. (2011), the method
is used to analyze numerical climate model outputs in multiple scales by applying smoothing
on a sphere.

5.3 Beyond 2D, Non-linear Manifolds


In principle, the inference part of many statistical scale space methods can be extended to
multivariate random fields that depend on more than two covariates. This is, for example, clearly
the case for iBSiZer and gradient-based S 3 methods. However, visualisation of inference results
is an integral part of scale space data analysis, and creating significance ‘maps’ in contexts more
complex than trivariate probability densities or random fields seems difficult. In a multivariate
regression setting, one solution is to apply additive modelling and perform scale space analysis
on each additive component separately (Miranda et al., 2002; Martínez-Miranda, 2005).
In Duong et al. (2008), scale space analysis of densities is considered in any dimension. Such
curvature related features of surfaces as ridges and saddle points defined by second derivatives
do not have clear counterparts in dimensions higher than two. The authors therefore focus on
the significance of modes, that is, the local maxima of a multivariate density. Both the gradient
and the Hessian matrix are used, and while the approach resembles S 3 , the hypothesis tests
applied are not direct generalisations of Godtliebsen et al. (2002). Theoretical properties of the
required kernel estimators are provided, and visualisation methods are developed for one, two
and three dimensional data.
One could also try to approach higher dimensional problems through dimension reduction
techniques such as principal component analysis or methods developed specifically for non-
linear structures. Interestingly, scale space methods may be useful here, too. Wang & Marron
(2008) proposed a scale space method for finding the effective dimensions in noisy data. The
data are viewed within a spherical local neighbourhood whose radius has the role of the
scale space smoothing parameter. Noise can mask true underlying low-dimensional structure
which becomes evident only after sufficiently large scales are considered. Given a set of noisy
points sampled from a density defined on a submanifold of the euclidean space, the authors
develop statistical tests to determine the effective manifold dimension in different scales. The
tests are based on the local geometry of sample points and their theoretical properties are
also considered.
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
Statistical Scale Space Methods 19

Figure 9. CircSizer analysis of weather data from the atlantic coast of Galicia (NW Spain). The original Figure 9 of Oliveira
et al. (2014) is used with kind permission from Springer Science and Business Media. The colors along a circle of a fixed
radius indicate where the underlying function of interest is increasing or decreasing with the radius (here referred to as)
corresponding to the level of smoothing applied.

Directional data that lie on a circle or on a sphere are common in various applications. Holm-
ström et al. (2011), Vaughan et al. (2012) and Marvel et al. (2013) considered scale space
analysis on a sphere (the globe). A version of SiZer for circular data, CircSiZer, was proposed
in Oliveira et al. (2014). Both density and regression estimation were considered. Smoothing
is performed using the von Mises kernel and the quantiles as well the variance estimate in
regression needed for significance tests are obtained using the bootstrap. Examples of Circ-
SiZer maps are shown in Figure 9. Weather data from the atlantic coast of Galicia (NW Spain)
are analyzed. In the left panel, density estimation is considered for wind directions, and in the
right panel, regression of wind speed on wind direction is analyzed. The different colors are
used in the same way as in the original SiZer (Section 2.1). The colors along a circle of a fixed
radius indicate where the underlying function of interest is increasing or decreasing, with the
radius corresponding to the level of smoothing applied. For more information on this example,
see Oliveira et al. (2014). In Huckemann et al. (2014), it was pointed out that smoothing with
the von Mises kernel does not satisfy scale space causality for all levels of smoothing and the
authors proposed WSiZer (Wrapped SiZer) where the wrapped Gaussian kernel is used, instead.

6 Variations: Scale, Derivatives, Maps


In formal scale space theory scale is defined by the variance of the Gaussian blurring kernel.
In statistical scale space practice, the concept of ‘scale’ is often interpreted much more liberally
simply as some particular component of a statistical model whose explorative variation helps
to reveal the salient features of the phenomenon underlying the observed data. Temporal and
spatial scales are the most typical choices, but we have seen in the previous sections several
examples of other interpretations of scales, too.
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
20 L. HOLMSTRÖM & L. PASANEN

Regularisation often has a smoothing effect on a regression estimate and can therefore be
given a scale space interpretation. Boysen et al. (2007) studied the asymptotic behavior of
piecewise constant least squares regression estimates (regressograms), when the number of
jumps in the estimate is penalised. The penalty is a regularisation parameter that the authors
propose to use for scale space smoothing. Li & Xu (2015) propose a linear regression spline
SiZer where the number of spline knots serves as the scale parameter. Fiducial inference is used
and multiple testing adjustment is made to control the false discovery rate in the SiZer map.
The Least Absolute Shrinkage and Selection Operator (LASSO) is a linear regression method
where the least squares sum is penalised by the absolute sum of the regression coefficients,
thus enabling model fitting even when there are more explanatory variables than observations
(Tibshirani, 1996). An estimate of the regression coefficient vector is obtained by minimising a
penalised least squares loss,
® ¯
ˇ  D argmin kˇX  yk22 C kˇk1 ;
ˇ

where k  kp is the Lp -norm. The regularisation parameter  can be viewed as a scale which
motivated Pasanen et al. (2015) to propose a scale space view of the LASSO and the Bayesian
LASSO. Hindberg (2012) introduced a test of multinormality based on weighted sums over
consecutive dimensions along a data vector. Here, the number of dimensions involved in the
sum serves as a scale parameter. The technique can also be used to develop a test for the k-
sample problem. Like the LASSO, both of Hindberg’s methods can be applied in situations
where the number of data is much smaller than the number of covariates.
Dümbgen & Walther (2008) made inferences about the slope of a univariate density (or a
failure rate) by considering all intervals whose endpoints are observations. The length of such
intervals can be thought of as a scale space parameter, although no smoothing is involved.
Multiple hypotheses about density features across different scales can be tested simultaneously.
For related theoretical work, see Dümbgen & Spokoiny (2001) and Rufibach & Walther (2010).
Yet another example of how to think of a scale is the thick pen transform of Fryzlewicz &
Oh (2011) designed for the exploration of dependence structure of a time series. The technique
is motivated by the scale space idea, and the role of a scale is played by the thickness of the pen
used to plot the time series.
In addition to the different definitions of ‘scale’, there are various approaches to detecting
interesting features in the data. The original SiZer (Chaudhuri & Marron, 1999) mostly con-
centrated on applying the first derivative, although the possibility to use the second derivative
or no derivative at all was also mentioned. The mathematical analysis of Chaudhuri & Marron
(2000) in fact considered derivatives of arbitrary order. The second derivative version of SiZer
has later been referred to as SiCon (Significance of Convexity). Chaudhuri & Marron (2002)
compared SiZer and SiCon by giving examples where the second derivative can detect struc-
ture better than the first derivative. One such case is the analysis of change points in a signal
because a jump change corresponds to a local maximum of the first derivative, hence a local
zero crossing of the second derivative. Often the use of the second derivative appears to facili-
tate the detection of interesting features of a curve that otherwise would be masked in the first
derivative analysis by an overall strong increase or decrease of the curve.
S 3 (Section 5.1) uses second derivatives to find features in two-dimensional data, but iBSiZer
and its multiresolution version (Section 5.2), the S 3 adaptation of Pedersen et al. (2008), as
well the methods developed by C. Park and his co-workers for comparing curves (Section 2.3),
are examples of approaches that use no differentiation at all. An alternative use of differenti-
ation was proposed in Pasanen et al. (2013), where the derivative is computed with respect to
the scale. Like in multiresolution BSiZer, the goal is to decompose an observed time series y
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
Statistical Scale Space Methods 21

Figure 10. Upper panels: sin.x/ C sin.3x/ C sin.6x/ and its noisy version. Lower left: the scale-derivative map of the
noisy signal with the minima of  7! kD yk indicated by the black horizontal lines. Lower right: the noise and three scale-
dependent components extracted as differences of smooths where the smoothing levels used correspond to the black horizontal
lines in the scale-derivative map. The mean of the signal is omitted. The figures originally appeared in Pasanen et al. (2013).

into credible scale-dependent components using differences of smooths but the method is use-
ful as an exploratory tool even without any associated inference. In this case, one defines the
so-called scale-derivative of y as D y D .d=d.log //S y, where S is a smoother. The local
minima of the function  7! kD yk then define effective multiresolution smoothing levels. An
example is shown in Figure 10, where the observed time series is a sum of three sine waves
with wavelengths =3, 2 =3, 2 , and Gaussian noise. The usefulness of the scale-derivative
in finding interesting features of a time series can be understood by considering the heat
equation. Thinking of S y as a discrete-time estimate of a smooth  of an underlying func-
tion  and assuming that the smoother approximates Gaussian convolution, the heat equation
suggests that

@ .x/ @ .x/ @2  .x/


D /
@.log / @ @x 2

holds approximately. Therefore, the zero-crossings of the scale-derivative are zero-crossings of


the second time derivative of the underlying signal, places where its rate of change is largest.
Compared with SiZer and BSiZer, the method consequently has the same benefits as SiCon.
Finally, given the great variety of available alternatives to statistical scale space analyses,
including the many approaches outlined in the previous sections and the variations described
above, how should one judge the performance of the different techniques? An interesting
solution proposed by Hannig et al. (2013) is to compare the SiZer-type maps produced by
different methods using distance metrics derived from ideas originally developed for digital
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
22

Table 1. Applications of statistical scale space methods.

Application Methods References


Internet traffic
Detection of anomalies, non-stationarity MRAD, SiZer, SiNos Zhang et al. (2008), Zhang et al. (2014), & Park et al. (2007)
Traffic data modelling (Dependent) SiZer Park et al. (2004), Marron et al. (2004), & Park et al. (2006);
Hernandez-Campos et al. (2005) & Park et al. (2005)
Medical technology and health care
Medical imaging (Dependent) SiZer, Skrøvseth et al. (2010), Møllersen et al. (2010), & Thon et al. (2012);
BMulA, S 3 , MFD Godtliebsen et al. (2004) & Godtliebsen & Øigård (2005);
Park et al. (2010a) & Poline & Mazoyer (1994);

International Statistical Review (2017), 85, 1, 1–30


Worsley et al. (1996)
Health monitoring, disease diagnosis c-SiZer, SiZer, Skrøvseth et al. (2012b), Skrøvseth et al. (2012a), & Skrøvseth & Godtliebsen (2011);
SiNos, SiCon Mortensen et al. (2006), Rooper et al. (2014), & Jacobs et al. (2012);
Skrøvseth et al. (2012c) & Skrøvseth et al. (2012)
Flow cytometry, genetic and cellular data SiZer, SiCon, Zeng et al. (2002), Salganik et al. (2005), & Duong et al. (2008);
Bayesian LASSO, Barker et al. (2008), Vekemans et al. (2012), & Bekaert et al. (2002);
WSiZer Zhao et al. (2004), Hannig & Marron (2006), & Pasanen et al. (2015);
Huckemann et al. (2014)
Earth sciences
Climatology, meteorology SiZer,BSiZer,S 3 , Korhola et al. (2000) & Holmström & Erästö (2002);
SiCon, SiNos, Erästö & Holmström (2005), Erästö & Holmström (2006), Erästö & Holmström (2007), & Weckström et al. (2006);

© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
Post. Smoothing, Erästö et al. (2012), Pedersen et al. (2008), & Rohling & Pälike (2005)
MRBSiZer, Sørbye et al. (2009), Olsen et al. (2007), Olsen et al. (2008), & Holmström et al. (2011);
CircSiZer Divine et al. (2009), Rydén (2010), & Oliveira et al. (2014);
L. HOLMSTRÖM & L. PASANEN

Vaughan et al. (2012), Marvel et al. (2013)


Oceanograhy, glaciology Wavelets, SiZer, Divine & Godtliebsen (2007), Abram et al. (2013), & Divine et al. (2005);
SiCon, BSiZer Zamelczyk et al. (2014) & Aagaard-Sørensen et al. (2014);
Bolton et al. (2010), Miettinen et al. (2012), & Justwan et al. (2008);
McClymont et al. (2013), Wilson et al. (2011), & Godtliebsen et al. (2012);
Korsnes et al. (2002)
Geology and geophysics Post. Smoothing, Øigård et al. (2006), Chen et al. (2005), & Rudge (2008);
SiZer Condie & Aster (2010)
Ecology SiZer, SiCon Weis et al. (2014) & Sonderegger et al. (2008)
Statistical Scale Space Methods 23

images. Univariate regression is considered and one computes the distance of a map pro-
duced by a variant of SiZer to an ‘oracle’ SiZer map based on known regression and variance
functions. The smaller the distance, the better the performance of the SiZer-type method is
deemed to be.

7 Applications
The applications tackled with statistical scale space methods are too numerous to permit
anything close to an exhaustive and detailed overview. Most of the papers reviewed in the pre-
ceding sections include analyses of real data, often comparing a proposed novel scale space
technique with an earlier analysis of the same data. Beyond such performance comparisons and
analyses of standard benchmark examples, there are a few research fields that appear to be par-
ticularly amenable to a scale space approach, and we have outlined them in Table 1. Interesting
applications of scale space techniques can be found both to technological and scientific prob-
lems, sometimes in combination with more traditional statistical methods. For each application,
Table 1 lists the scale space methods used together with some example references.
Analysis of Internet traffic data has been a popular topic. Park et al. (2004) & Park et al.
(2007) and Zhang et al. (2008) & Zhang et al. (2014) discussed in Section 4, all feature such
data analyses and additional references are given in Table 1. Medical imaging applications
have considered detection of melanoma as well as analysis of fMRI and Positron Emission
Tomography (PET) images. The multifiltering detection (MFD) strategy of Poline & Mazoyer
(1994) and associated work by Worsley et al. (1996) are included in the table because they
work in the same spirit as the methods reviewed in the present article in that multiple smooths
of a PET image are used for making inferences about interesting features. Scale space meth-
ods have also been used in monitoring type 1 and type 2 diabetes patients. Other monitoring
and diagnostic applications include circulatory research, diagnosing of pleural effusion, anal-
ysis of the dependence of influenza risk on age and the incidence of whooping cough, as well
as the use of accelerometer data for early detection of chronic obstructive pulmonary dis-
ease. Other applications to medical technology include analyses of flow cytometry, cellular and
genetic data.
In basic scientific research, earth sciences have turned out to be a fertile ground for scale
space methods. For example, climate science often deals with time series and data sets for
which questions relating to temporal or spatial scales arise naturally. One particularly popular
area has been the analysis of reconstructions of the paleoenvironment. Past temperatures going
back hundreds or thousands of years can be inferred from lake or sea sediment cores and even
longer time scales can be explored using ice cores. Figure 11 shows a lake sediment core based
BSiZer analysis of the reconstructed mean July temperatures for the past 800 years in northern
Fennoscandia. The large scale trend shows a decreasing temperature, but in the centennial scale,
the Little Ice Age (about from A.D. 1550 to 1850) shows as a broad credible minimum. Many
other scale space analyses of past and predicted climate variation are included in Table 1. Other
work on earth sciences include applications to meteorology, oceanography, research on ice,
earthquakes and ecology.
Outside of the aforemetioned main areas of applications, SiZer has been used also for
instance in astronomy (Park et al., 2011), electrophysiology (Roca-Pardiñas et al., 2011), exper-
imental psychology (Becker & Elliott, 2006) and economics (Pittau & Zelli, 2007; Marron &
de Uña-Álvarez, 2004). Ganguli & Wand (2004) analyzed fishery data using S 3 for geostatis-
tical data and Godtliebsen & Øigård (2005) used Posterior Smoothing in a pattern recognition
application to detect bones in fish fillets. Finally, Holmström & Pasanen (2012) and Pasanen &
Holmström (2015) proposed to use iBSiZer in satellite-based remote sensing.
For those interested in trying scale space methods in their own applications, software is
available in the public domain. Matlab implementations include
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
24 L. HOLMSTRÖM & L. PASANEN

Figure 11. BSiZer analysis of a diatom-based temperature reconstruction in northern Fennoscandia for the last 800 years.
The upper panel: the reconstructed mean July air temperatures shown as circles together with three smooths obtained as
posterior means. The lower panel: BSiZer map with the three smoothing levels shown using the same colors as in the upper
panel. As opposed to the original SiZer (e.g. Figure 2), here blue and red signify credibly negative and positive slope,
respectively. The credibility threshold ˛ D 0:8 was used. Gray indicates a non-credible slope. For details see Weckström et
al. (2006).

http://www.unc.edu/~marron/marron_software.html
http://www.stat.nus.edu.sg/~zhangjt/SubPages/SiZerSS/software.htm
http://www.cs.helsinki.fi/u/kohonen/sizer/

R packages are also available,

http://CRAN.R-project.org/package=SiZer
http://CRAN.R-project.org/package=feature
http://CRAN.R-project.org/package=NPCirc

Matlab implementations of many of the Bayesian approaches discussed in this article can be
obtained at

http://cc.oulu.fi/~lpasanen/
http://mathstat.helsinki.fi/bsizer/
http://www2.telemed.no/kevin/

8 Summary
During the last 15 years, a great variety of statistical scale space methods have been devel-
oped and they have found applications in many areas of research. The popularity of the scale
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
Statistical Scale Space Methods 25

space view can perhaps be explained by the highly intuitive and transparent way it deals with
scales in the data, usually through smoothing while sometimes employing some other natural
idea of a scale. For non-statisticians, the often strikingly effective graphical summaries of scale
space analyses are very appealing as they greatly facilitate the interpretation of the inferences
about the salient features in the data. In the future, the toolbox of scale methods will undoubt-
edly keep expanding, and we look forward to new applications both in the areas already well
covered as well as in other fields of research where the concept of scale plays an important role
in the analysis of data.

Acknowledgements
Research supported by Academy of Finland grant numbers 250862 and 276022.

References
Aagaard-Sørensen, S, Husum, K, Hald, M, Marchitto, T & Godtliebsen, F. (2014). Sub sea surface temperatures
in the Polar North Atlantic during the Holocene: planktic foraminiferal Mg/Ca temperature reconstructions. The
Holocene, 24(1), 93–103.
Abram, NJ, Mulvaney, R, Wolff, EW, Triest, J, Kipfstuhl, S, Trusel, LD, Vimeux, F, Fleet, L & Arrowsmith, C.
(2013). Acceleration of snow melt in an Antarctic Peninsula ice core during the twentieth century. Nat. Geosci.,
6(5), 404–411.
Azzalini, A & Bowman, AW. (1990). A look at some data on the old faithful Geyser. Appl. Stat., 39(3), 357–365.
Babaud, J, Baudin, M, Witkin, AP & Duda, R. (1983). Uniqueness of the Gaussian kernel for scale-space filtering.
Fairchild TR 645, June 1983.
Babaud, J, Witkin, AP, Baudin, M & Duda, RO. (1986). Uniqueness of the Gaussian kernel for scale-space filtering.
IEEE Trans. Pattern Anal. Mach. Intell., PAMI-8(1), 26–33.
Barker, MS, Kane, NC, Matvienko, M, Kozik, A, Michelmore, RW, Knapp, SJ & Rieseberg, LH. (2008). Multiple
paleopolyploidizations during the evolution of the compositae reveal parallel patterns of duplicate gene retention
after millions of years. Mol. Biol. Evol., 25(11), 2445–2455.
Becker, C & Elliott, MA. (2006). Flicker-induced color and form: interdependencies and relation to stimulation
frequency and phase. Conscious Cogn., 15(1), 175–196.
Bekaert, S, Koll, S, Thas, O & Van Oostveldt, P. (2002). Comparing telomere length of sister chromatids in human
lymphocytes using three-dimensional confocal microscopy. Cytometry, 48(1), 34–44.
Bolton, CT, Wilson, PA, Bailey, I, Friedrich, O, Beer, CJ, Becker, J, Baranwal, S & Schiebel, R. (2010). Millennial-
scale climate variability in the subpolar north atlantic ocean during the late Pliocene. Paleoceanography, 25(4).
Boysen, L, Liebscher, V, Munk, A & Wittich, O. (2007). Scale space consistency of piecewise constant least squares
estimators – another look at the regressogram. In Asymptotics: Particles, processes and inverse problems, vol. 55,
Eds. E.A. Cator, G. Jongbloed, C. Kraaikamp, H.P. Lopuhaä & J.A. Wellner, pp. 65–84. Beachwood, Ohio, USA,
Lecture Notes–Monograph Series: Institute of Mathematical Statistics.
Chaudhuri, P & Marron, JS. (2002). Curvature vs. slope inference for features in nonparametric curve estimates.
Unpublished Manuscript.
Chaudhuri, P & Marron, J S. (1999). SiZer for exploration of structures in curves. Amer. Statist. Assoc., 94(447),
807–823.
Chaudhuri, P & Marron, JS. (2000). Scale space view of curve estimation. Ann. Stat., 28(2), 408–428.
Chen, SJ, Jia, QH & Ma, L. (2005). SiZer for exploration of inhomogeneous structure in temporal distribution of
earthquakes. Acta Seismol. Sin., 18(5), 572–581.
Condie, KC & Aster, RC. (2010). Episodic zircon age spectra of orogenic granitoids: the supercontinent connection
and continental growth. Precambrian Res., 180(3-4), 227 –236.
Divine, DV, Isaksson, E, Kaczmarska, M, Godtliebsen, F, Oerter, H, Schlosser, E, Johnsen, SJ, Van Den Broeke, M &
Van De Wal, RSW. (2009). Tropical Pacific–high latitude south Atlantic teleconnections as seen in ı 18 O variability
in Antarctic coastal ice cores. Journal of Geophysical Research: Atmospheres, 114(D11).
Divine, DV & Godtliebsen, F. (2007). Bayesian modeling and significant features exploration in wavelet power
spectra. Nonlinear Proc. Geophys, 14, 79–88.
International Statistical Review (2017), 85, 1, 1–30
© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
26 L. HOLMSTRÖM & L. PASANEN

Divine, DV, Korsnes, R, Makshtas, AP, Godtliebsen, F & Svendsen, H. (2005). Atmospheric-driven state transfer of
shore-fast ice in the northeastern Kara Sea. Journal of Geophysical Research: Oceans, 110(C9).
Dümbgen, L & Spokoiny, VG. (2001). Multiscale testing of qualitative hypotheses. Ann. Stat., 29(1), 124–152.
Dümbgen, L & Walther, G. (2008). Multiscale inference about a density. Ann. Stat., 36(4), 1758–1785.
Duong, T, Cowling, A, Koch, I & Wand, M. (2008). Feature significance for multivariate kernel density estimation.
Comput. Stat. Data Anal., 52(9), 4225 –4242.
Erästö, P & Holmström, L. (2005). Bayesian multiscale smoothing for making inferences about features in scatter
plots. J. Comput. Graph. Stat., 14(3), 569–589.
Erästö, P & Holmström, L. (2006). Selection of prior distributions and multiscale analysis in Bayesian temperature
reconstructions based on fossil assemblages. J. Paleolimnol., 36(1), 69–80.
Erästö, P & Holmström, L. (2007). Bayesian analysis of features in a scatter plot with dependent observations and
errors in predictors. J. Stat. Comput. Simul., 77(5), 421–431.
Erästö, P, Holmström, L, Korhola, A & Weckström, J. (2012). Finding a consensus on credible features among sev-
eral paleoclimate reconstructions. Annals of Appl. Stat., 6(4), 1377–1405. Available at http://dx.doi.org/10.1214/
12-AOAS540 [Accessed date on 17 December 2015].
Fryzlewicz, P & Oh, HS. (2011). Thick pen transformation for time series. J. R. Stat. Soc. Series B Stat. Methodol.,
73(4), 499–529.
Ganguli, B & Wand, M. (2004). Feature significance in geostatistics. J. Comput. Graph. Stat., 13(4), 954–973.
Ganguli, B & Wand, M. (2007). Feature significance in generalized additive models. Stat. Comput., 17(2), 179–192.
Ghosh, AK, Chaudhuri, P & Murthy, CA. (2005). On visualization and aggregation of nearest neighbour classifiers.
IEEE Trans. Pattern Anal. Mach. Intell., 27(10), 1592–1602.
Ghosh, AK, Chaudhuri, P & Murthy, CA. (2006a). Multi-scale classification using nearest neighbour density
estimates. IEEE Trans. Syst. Man. Cybern. B., 36(5), 1139–1148.
Ghosh, AK, Chaudhuri, P & Sengupta, D. (2006b). Classification using kernel density estimates: Multiscale analysis
and visualization. Technometrics, 48(1), 120–132.
Godtliebsen, F, Holmström, L, Miettinen, A, Erästö, P, Divine, DV & Koc, N. (2012). Pairwise Scale-Space Compar-
ison of Time Series with Application to Cli mate Research. J. Geophys. Res., 117. Available at http://dx.doi.org/10.
1029/2011JC007546 [Accessed date on 17 December 2015].
Godtliebsen, F, Marron, J & Chaudhuri, P. (2002). Significance in scale space for bivariate density estimation. J.
Comput. Graph. Stat., 11(1), 1–21. [Accessed date on 17 December 2015].
Godtliebsen, F, Marron, J & Chaudhuri, P. (2004). Statistical significance of features in digital images. Image Vis.
Comput., 22(13), 1093–1104.
Godtliebsen, F & Øigård, TA. (2005). A visual display device for significant features in complicated signals. Comput
Stat. Data Anal., 48(2), 317–343.
González-Manteiga, W, Martínez-Miranda, MD & Raya-Miranda, R. (2008). SiZer map for inference with additive
models. Stat. Comput., 18(3), 297–312.
Hannig, J & Lee, TCM. (2006). Robust SiZer for exploration of regression structures and outlier detection. J. Comput.
Graph. Stat., 15(1), 101–117.
Hannig, J, Lee, T & Park, C. (2013). Metrics for SiZer map comparison. Stat, 2(1), 49–60.
Hannig, J & Marron, JS. (2006). Advanced distribution theory for SiZer. Amer. Statist. Assoc., 101(474), 484–499.
Härdle, W. (1991). Smoothing Techniques: With Implementation in S, Springer Series in Statistics New York: Springer.
Hernandez-Campos, F, Jeffay, K, Park, C, Marron, JS & Resnick, SI. (2005). Extremal dependence: internet traffic
applications. Stochastic Models, 21(1), 1–35.
Hindberg, K. (2012). Scale-space methodology applied to spectral feature detection, multinormality testing and the
k-sample problem, and wavelet variance analysis, Ph.D. Thesis, University of Tromsø, Tromsø, Norway.
Holmström, L. (2010a). BSiZer. Comput. Stat. Data. Anal., 2(5), 526–534. Available at http://dx.doi.org/10.1002/
wics.115 [Accessed on 17 December 2015].
Holmström, L. (2010b). Scale space methods. Wiley Interdisciplinary Reviews: Computational Statistics, 2(2),
150–159. Available at http://dx.doi.org/10.1002/wics.79 [Accessed on 17 December 2015].
Holmström, L & Erästö, P. (2002). Making inferences about past environmental change using smoothing in multiple
time scales. Comput. Stat. Data. Anal., 41(2), 289–309.
Holmström, L & Pasanen, L. (2012). Bayesian scale space analysis of differences in images. Technometrics, 54(1),
16–29. Available at http://dx.doi.org/10.1080/00401706.2012.648862 [Accessed on 17 December 2015].
Holmström, L, Pasanen, L, Furrer, R & Sain, SR. (2011). Scale space multiresolution analysis of random signals.
Computational Statistics & Data Analysis, 55(10), 2840 –2855. Available at http://dx.doi.org/10.1016/j.csda.2011.
04.011 [Accessed on 17 December 2015].
Huckemann, SF, Kim, KR, Munk, A, Rehfeld, F, Sommerfeld, M, Weickert, J & Wollnik, C. (2014). The circular

International Statistical Review (2017), 85, 1, 1–30


© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
Statistical Scale Space Methods 27

SiZer, inferred persistence of shape parameters and application to stem cell stress fibre structures. Available as
arXiv:1404.3300v1.
Iijima, T. (1959). Basic theory of pattern observation. Papers of Technical Group on Automata and Automatic Control,
IECE, Japan, Dec. 1959. In Japanese.
Jacobs, JH, Archer, BN, Baker, MG, Cowling, BJ, Heffernan, RT, Mercer, G, Uez, O, Hanshaoworakul, W, Viboud,
C & Schwartz, J. (2012). Searching for sharp drops in the incidence of pandemic A/H1N1 influenza by single year
of age. PloS one, 7(8), e42328.
Justwan, A, Koç, N & Jennings, AE. (2008). Evolution of the Irminger and East Icelandic Current systems through the
Holocene, revealed by diatom-based sea surface temperature reconstructions. Quat. Sci. Rev., 27(15), 1571–1582.
Kim, CS & Marron, JS. (2006). SiZer for jump detection. J. Nonparametr. Stat., 18(1), 13–20.
Koenderink, J. (1984). The structure of images. Biol. Cybern., 50(5), 363–370.
Korhola, A, Weckström, J, Holmström, L & Erästö, P. (2000). A quantitative Holocene climatic record from diatoms
in northern Fennoscandia. Quatern. Res., 54, 284–294.
Korsnes, R, Pavlova, O & Godtliebsen, F. (2002). Assessment of potential transport of pollutants into the barents sea
via sea ice–an observational approach. Mar. Pollut. Bull., 44(9), 861–869.
Li, N & Xu, X. (2015). Spline Multiscale Smoothing to Control FDR for Exploring Features of Regression Curves.
To appear in Journal of Computational and Graphical Statistics. DOI: 10.1080/10618600.2014.1001069.
Li, R & Marron, JS. (2005). Local likelihood SiZer map. Sankhyā, 67(3), 476–498.
Lindeberg, T. (1994). Scale-Space Theory in Computer Vision Dordrecht, the Netherlands: Kluwer Academic
Publishers.
Marron, J & Chung, SS. (2001). Presentation of smoothers: the family approach. Comput. Statist., 16(1), 195–207.
Marron, J & de Uña-Álvarez, J. (2004). SiZer for length biased, censored density and hazard estimation. J. Statist.
Plann. Inference, 121(1), 149–161.
Marron, J & Zhang, JT. (2005). Sizer for smoothing splines. Comput. Statist., 20(3), 481–502.
Marron, JS, Hernández-Campos, F & Smith, FD. (2004). A SiZer analysis of IP flow start times. In The first erich
l. lehmann symposium—optimality, Vol. Volume 44, Eds. J. Rojo & V. Pérez-Abreu, pp. 87–105. Beachwood, OH:
Institute of Mathematical Statistics Lecture Notes–Monograph Series.
Martínez-Miranda, D. (2005). SiZer map for evaluation a bootstrap local bandwidth selector in nonparametric addi-
tive models, Reports in statistics and operations research Santiago de Compostela, Spain: Universidade de Santiago
de Compostela, Departamento de Estatística e Investigación Operativa.
Marvel, K, Ivanova, D & Taylor, KE. (2013). Scale space methods for climate model analysis. J. Geophys. Res.
Atmos., 118(11), 5082–5097.
McClymont, EL, Sosdian, SM, Rosell-Melé, A & Rosenthal, Y. (2013). Pleistocene sea-surface temperature evo-
lution: early cooling, delayed glacial intensification, and implications for the mid-Pleistocene climate transition.
Earth Sci. Rev., 123, 173–193.
Miettinen, A, Divine, D, Koç, N, Godtliebsen, F & Hall, IR. (2012). Multicentennial variability of the sea surface
temperature gradient across the Subpolar North Atlantic over the last 2.8 kyr. J. Clim., 25(12), 4205–4219.
Minnotte, MC, Marchette, DJ & Wegman, EJ. (1998). The bumpy road to the mode forest. J. Comput. Graph. Statist.,
7, 239–251.
Minnotte, MC & Scott, D. (1993). The mode tree: a tool for visualization of nonparametric density estimates. J.
Comput. Graph. Statist., 2, 51–68.
Miranda, RR, Miranda, M & Carmona, AG. (2002). Exploring the structure of regression surfaces by using
SiZer map for additive models. In Compstat, Eds. W. Härdle & B. Rönz, pp. 361–366. Heidelberg New York:
Physica-Verlag HD.
Møllersen, K, Kirchesch, HM, Schopf, TG & Godtliebsen, F. (2010). Unsupervised segmentation for digital
dermoscopic images. Skin Res. Tech., 16(4), 401–407.
Mortensen, KE, Godtliebsen, F & Revhaug, A. (2006). Scale-space analysis of time series in circulatory research.
Am. J. Physio-Heart Circulatory Physiol., 291(6), H3012–H3022.
Müller, HG & Wai, N. (2004). Change trees and mutagrams for the visualization of local changes in sequence data.
J. Comput. Graph. Statist., 13(3), 571–585.
Øigård, T, Rue, H & Godtliebsen, F. (2006). Bayesian multiscale analysis for time series data. Comput. Statist. and
Data Analysis, 51(3), 1719–1730.
Oliveira, M, Crujeiras, R. & Rodríguez-Casal, A. (2014). CircSiZer: an exploratory tool for circular data. Environ.
Ecol. Stat., 21(1), 143–159.
Olsen, L, Chaudhuri, P & Godtliebsen, F. (2008). Multiscale spectral analysis for detecting short and long range
change points in time series. Comput. Statist. and Data Analysis, 52, 3310–3330.
Olsen, L, Sørbyea, S & Godtliebsen, F. (2007). A scale-space approach for detecting non-stationarities in time series.
Scand. J. Stat., 35, 119–138.

International Statistical Review (2017), 85, 1, 1–30


© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
28 L. HOLMSTRÖM & L. PASANEN

Park, C, Ahn, J, Hendry, M & Jang, W. (2011). Analysis of long period variable stars with nonparametric tests for
trend detection. J. Amer. Statist. Assoc., 106(495), 832–845.
Park, C, Godtliebsen, F, Taqqu, M, Stoev, S & Marron, J. (2007). Visualization and inference based on wavelet
coefficients, SiZer and SiNos. Comput. Statist. and Data Analysis, 51, 5994–6012.
Park, C, Hannig, J & Kang, KH. (2009). Improved SiZer for time series. Statist. Sinica, 19(4), 1511.
Park, C, Hannig, J & Kang, KH. (2014). Nonparametric comparison of multiple regression curves in scale-space. J.
Comput. Graph. Statist., 23(3), 657–677.
Park, C, Hernández-Campos, F, Marron, J & Smith, FD. (2005). Long-range dependence in a changing internet traffic
mix. Computer Networks, 48(3), 401–422.
Park, C & Huh, J. (2013a). Nonparametric estimation of a log-variance function in scale-space. J. Statist. Plann.
Inference, 143(10), 1766 –1780.
Park, C & Huh, J. (2013b). Statistical inference and visualization in scale-space using local likelihood. Comput.
Statist. & Data Analysis, 57(1), 336–348.
Park, C & Kang, KH. (2008). SiZer analysis for the comparison of regression curves. Comput. Statist. & Data
Analysis, 52(8), 3954–3970.
Park, C, Lazar, NA, Ahn, J & Sornborger, A. (2010a). A multiscale analysis of the temporal characteristics of resting-
state fMRI data. J. Neurosci. Methods, 193(2), 334 –342.
Park, C, Lee, TC & Hannig, J. (2010b). Multiscale exploratory analysis of regression quantiles using quantile SiZer.
J. Comput. Graph. Statist., 19(3), 497–513.
Park, C, Marron, JS & Rondonotti, V. (2004). Dependent SiZer: goodness-of-fit tests for time series models. J. Appl.
Stat., 31(8), 999–1017.
Park, C, Shen, H, Marron, J, Hernández-Campos, F & Veitch, D. (2006). Capturing the elusive poissonity in web
traffic. In Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2006. MASCOTS
2006. 14th IEEE International Symposium on, Monterey, California, USA., pp. 189–196.
Pasanen, L & Holmström, L. (2015). Bayesian scale space analysis of temporal changes in satellite images. J. Appl.
Stat., 42(1), 50–70.
Pasanen, L, Holmström, L & Sillanpää, MJ. (2015). Bayesian LASSO, scale space and decision making in association
genetics. PloS one, 10(4), e0120017.
Pasanen, L, Launonen, I & Holmström, L. (2013). A scale space multiresolution method for extraction of time series
features. Stat, 2(1), 273–291.
Pedersen, CA, Godtliebsen, F & Roesch, AC. (2008). A scale-space approach for detecting significant differ-
ences between models and observations using global albedo distributions. J. Geophys. Res., 113(D10), DOI:
10.1029/2007JD009340.
Percival, DB & Walden, AT. (2006). Wavelet Methods for Time Series Analysis, Cambridge Series in Statistical and
Probabilistic Mathematics New York: Cambridge University Press.
Perona, P & Malik, J. (1990). Scale-space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal.
Mach. Intell., 12(7), 629–639.
Pittau, MG & Zelli, R. (2007). Exploring patterns of income polarization using SiZer. J. Quantitative Econ., 5(1),
101–111.
Poline, JB & Mazoyer, B. (1994). Analysis of individual brain activation maps using hierarchical description and
multiscale detection. IEEE Trans. Med. Imaging, 13(4), 702–710.
Roca-Pardiñas, J, Cadarso-Suárez, C, Pardo-Vazquez, JL, Leboran, V, Molenberghs, G, Faes, C & Acuna, C. (2011).
Assessing neural activity related to decision-making through flexible odds ratio curves and their derivatives. Stat.
Med., 30(14), 1695–1711.
Rohling, EJ & Pälike, H. (2005). Centennial-scale climate cooling with a sudden cold event around 8,200 years ago.
Nature, 434, 975–979.
Rondonotti, V, Marron, J & Park, C. (2007). SiZer for time series analysis: a new approach to the analysis of trends.
Electron. J. Stat., 1, 268–289.
Rooper, LM, Ali, SZ & Olson, MT. (2014). A minimum fluid volume of 75 ml is needed to ensure adequacy in a
pleural effusion: a retrospective analysis of 2540 cases. Cancer Cytopathology, 122(9), 657–665.
Rudge, JF. (2008). Finding peaks in geochemical distributions: A re-examination of the helium-continental crust
correlation. Earth Planet. Sci. Lett., 274(1), 179–188.
Rufibach, K & Walther, G. (2010). The block criterion for multiscale inference about a density, with applications to
other multiscale problems. J. Comput. Graph. Statist., 19(1), 175–190.
Rydén, J. (2010). Exploring possibly increasing trend of hurricane activity by a SiZer approach. Environ. Ecol. Stat.,
17(1), 125–132.
Salganik, M, Milford, E, Hardie, D, Shaw, S & Wand, M. (2005). Classifying antibodies using flow cytometry data:
class prediction and class discovery. Biom. J., 47(5), 740–754.

International Statistical Review (2017), 85, 1, 1–30


© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
Statistical Scale Space Methods 29

Skrøvseth, SO, Schopf, T, Thon, K, Zortea, M, Geilhufe, M, Mollersen, K, Kirchesch, H & Godtliebsen, F.
(2010). A computer aided diagnostic system for malignant melanomas. In Applied Sciences in Biomedical and
Communication Technologies (ISABEL), 2010 3rd International Symposium on, Rome, Italy, pp. 1–5.
Skrøvseth, S, Årsand, E, Godtliebsen, F & Hartvigsen, G. (2012a). Mobile phone-based pattern recognition and data
analysis for patients with type 1 diabetes. Diabetes Technol. Ther., 14(12), 1098–1104.
Skrøvseth, SO, Arsand, E, Godtliebsen, F & Joakimsen, R.M. (2012b). Model driven mobile care for patients with
type 1 diabetes. Stud. Health Technol. Inform., 180, 1045–9.
Skrøvseth, SO, Bellika, JG & Godtliebsen, F. (2012c). Causality in scale space as an approach to change detection.
PloS one, 7(12), e52253.
Skrøvseth, SO & Godtliebsen, F. (2011). Scale space methods for analysis of type 2 diabetes patients’ blood glucose
values. Comput. Math. Methods Med., 2011, 1–7.
Skrøvseth, SO, Dias, A, Gorzelniak, L, Godtliebsen, F & Horsch, A. (2012). Scale-space methods for live processing
of sensor data. Stud. Health Technol. Inform., 180, 138–142.
Sonderegger, DL, Wang, H, Clements, W.H & Noon, BR. (2008). Using SiZer to detect thresholds in ecological data.
Front. Ecol. Environ., 7(4), 190–195.
Sørbye, S, Hindberg, K, Olsen, L & Rue, H. (2009). Bayesian multiscale feature detection of log-spectral densities.
Comput. Statist. Data Analysis, 53(11), 3746–3754.
Sporring, J. (1997). Gaussian scale-space theory, Computational Imaging and Vision Dordrecht: Kluwer Academic
Publishers.
Thon, K, Rue, H, Skrøvseth, SO & Godtliebsen, F. (2012). Bayesian multiscale analysis of images modeled as
Gaussian Markov random fields. Comput. Statist. Data Analysis, 56(1), 49 –61.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Series B Stat. Methodol., 58(1),
267–288.
Vaughan, A, Jun, M & Park, C. (2012). Statistical inference and visualization in scale-space for spatially dependent
images. J. Korean Stat. Soc., 41(1), 115 –135.
Vekemans, D, Proost, S, Vanneste, K, Coenen, H, Viaene, T, Ruelens, P, Maere, S, Van de Peer, Y & Geuten, K.
(2012). Gamma paleohexaploidy in the stem lineage of core eudicots: significance for MADS-box gene and species
diversification. Mol. Biol. Evol., 29(12), 3793–806.
Vidakovic, B. (1999). Statistical Modeling by Wavelets, Wiley Series in Probability and Statistics New York: John
Wiley & Sons, Inc.
Wahba, G. (1978). Improper priors, spline smoothing and the problem of guarding against model errors in regression.
J. R. Stat. Soc. Series B Stat. Methodol., 40(3), 364–372.
Wang, X & Marron, JS. (2008). A scale-based approach to finding effective dimensionality in manifold learning.
Electron. J. Stat., 2, 127–148.
Wang, YP & Lee, SL. (1998). Scale-space derived from B-splines. IEEE Trans. Pattern. Anal. Mach. Intell., 20,
1040–1055.
Weckström, J, Korhola, A, Erästö, P & Holmström, L. (2006). Temperature patterns over the past eight centuries in
northern Fennoscandia inferred from sedimentary diatoms. Quatern. Res., 66, 78–86.
Weickert, J, Ishikawa, S & Imiya, A. (1999). Linear scale-space has first been proposed in Japan. J. Math. Imaging
Vis., 10(3), 237–252.
Weis, AE, Wadgymar, SM, Sekor, M & Franks, SJ. (2014). The shape of selection: using alternative fitness functions
to test predictions for selection on flowering time. Evolutionary Ecology, 28(5), 885–904.
Wilson, L, Hald, M & Godtliebsen, F. (2011). Foraminiferal faunal evidence of twentieth-century Barents Sea
warming. The Holocene, 21(4), 527–537.
Witkin, AP. (1983). Scale-space filtering. In 8th International Joint Conference of Artificial Intelligence, Karlsruhe,
West Germany, pp. 1019–1022.
Witkin, AP. (1984). Scale-space filtering: a new approach to multi-scale description. In Acoustics, Speech, and Signal
Processing, IEEE International Conference on ICASSP ’84., Vol. 9, San Diego, California, USA, pp. 150–153.
Worsley, KJ, Marrett, S, Neelin, P & Evans, A. (1996). Searching scale space for activation in PET images. Human
Brain Mapping, 4(1), 74–90.
Zamelczyk, K, Rasmussen, TL, Husum, K, Godtliebsen, F & Hald, M. (2014). Surface water conditions and calcium
carbonate preservation in the fram strait during marine isotope stage 2, 28.8 - 15.4 kyr. Paleoceanography, 29(1),
1–12.
Zeng, Q, Wand, M, Young, AJ, Rawn, J, Milford, EL, Mentzer, SJ & Greenes, RA. (2002). Matching of flow-
cytometry histograms using information theory in feature space. In Proceedings of the AMIA Symposium. American
Medical Informatics Association, San Antonio, TX, USA, pp. 929–933.
Zhang, HG & Mei, CL. (2012). SiZer inference for varying coefficient models. Commun. Stat. Simul. Comput., 41(10),
1944–1959.
Zhang, HG, Mei, CL & Wang, HL. (2013). Robust SiZer approach for varying coefficient models. Math. Probl. Eng.,
2013, 1–13.

International Statistical Review (2017), 85, 1, 1–30


© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.
30 L. HOLMSTRÖM & L. PASANEN

Zhang, L. (2007). Functional singular value decomposition and multi-resolution anomaly detection, Ph.D. Thesis,
University of North Carolina at Chapel Hill, Chapel Hill, USA.
Zhang, L, Zhu, Z, Jeffay, K, Marron, J & Smith, F. (2008). Multi-Resolution Anomaly Detection for the Internet. In
Infocom Workshops 2008, IEEE, Phoenix, AZ, USA, pp. 1–6.
Zhang, L, Zhu, Z & Marron, JS. (2014). Multiresolution anomaly detection method for fractional Gaussian noise.
J. Appl. Stat., 41(4), 769–784.
Zhao, X, Marron, J & Wells, MT. (2004). The functional data analysis view of longitudinal data. Statist. Sinica, 14(3),
789–808.

[Received February 2015, accepted September 2015]

International Statistical Review (2017), 85, 1, 1–30


© 2016 The Authors. International Statistical Review © 2016 International Statistical Institute.

You might also like