Error Analysis in Circle Fitting
Error Analysis in Circle Fitting
Abstract: We study the problem of fitting circles (or circular arcs) to data
points observed with errors in both variables. A detailed error analysis
for all popular circle fitting methods – geometric fit, Kåsa fit, Pratt fit,
and Taubin fit – is presented. Our error analysis goes deeper than the
traditional expansion to the leading order. We obtain higher order terms,
which show exactly why and by how much circle fits differ from each other.
Our analysis allows us to construct a new algebraic (non-iterative) circle
fitting algorithm that outperforms all the existing methods, including the
(previously regarded as unbeatable) geometric fit.
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 886
2 Statistical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 887
3 Geometric circle fits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 888
4 Algebraic circle fits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889
5 Error analysis: a general scheme . . . . . . . . . . . . . . . . . . . . . 892
6 Error analysis of geometric circle fit . . . . . . . . . . . . . . . . . . . 898
7 Error analysis of algebraic circle fits . . . . . . . . . . . . . . . . . . . 900
8 Comparison of various circle fits . . . . . . . . . . . . . . . . . . . . . 903
9 Experimental tests and conclusions . . . . . . . . . . . . . . . . . . . . 907
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 908
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 909
1. Introduction
Fitting circles and circular arcs to observed points is one of the basic tasks
in pattern recognition and computer vision, nuclear physics, and other areas
∗ First supporter of the project.
† Second supporter of the project.
886
A. Al-Sharadqah and N. Chernov/Error analysis for circle fitting algorithms 887
[5, 9, 11, 23, 24, 27, 30, 32]. Many algorithms have been developed that fit
circles to data. Some minimize the geometric distances from the circle to the data
points (we call them geometric fits). Others minimize various approximate (or
‘algebraic’) distances, they are called algebraic fits. We overview most popular
algorithms in Sections 3–4.
Geometric fit is commonly regarded as the most accurate, but it can only
be implemented by iterative schemes that are computationally intensive and
subject to occasional divergence. Algebraic fits are faster but presumably less
precise. At the same time the assessments on their accuracy are solely based on
practical experience, no one has performed a detailed theoretical comparison of
the accuracy of various circle fits. It was shown in [8] that all the circle fits have
the same covariance matrix, to the leading order, in the small-noise limit. Thus
the differences between various fits can only be revealed by a higher-order error
analysis.
The purpose of this paper is to do just that. We employ higher-order error
analysis (a similar analysis was used by Kanatani [22] in the context of more
general quadratic models) and show exactly why and by how much the geometric
circle fit outperforms the algebraic circle fits in accuracy; we also compare the
precision of different algebraic fits. Section 5 presents our error analysis in a
general form, which can be readily applied to other curve fitting problems.
Finally, our analysis allows us to develop a new algebraic fit whose accuracy
exceeds that of the geometric fit. Its superiority is demonstrated by numerical
experiments.
2. Statistical model
where (δi , εi ) represent isotropic Gaussian noise. Precisely, δi ’s and εi ’s are i.i.d.
normal random variables with mean zero and variance σ 2 .
The true points (x̃i , ỹi ) are supposed to lie on a ‘true circle’, i.e. satisfy
where (ã, b̃, R̃) denote the ‘true’ (unknown) parameters. Therefore
where ϕ1 , . . . , ϕn specify the locations of the true points on the true circle.
The angles ϕ1 , . . . , ϕn are regarded as fixed unknowns and treated as additional
parameters of the model (called incidental or latent parameters). For brevity we
denote
ũi = cos ϕi = (x̃i − ã)/R̃, ṽi = sin ϕi = (ỹi − b̃)/R̃. (2.4)
Note that ũ2i + ṽi2 = 1 for every i.
A. Al-Sharadqah and N. Chernov/Error analysis for circle fitting algorithms 888
Remark. In our paper δi and εi have common variance σ 2 , i.e. our noise is
homoscedastic. In many studies the noise is heteroscedastic [25, 35], i.e. the
normal vector (δi , εi ) has point-dependent covariance matrix σ 2 Ci , where Ci is
known and depends on i, and σ 2 is an unknown factor. Our analysis can be
extended to this case, too, but the resulting formulas will be somewhat more
complex, so we leave it out.
where di stands for the distance from (xi , yi ) to the circle, i.e.
p
di = ri − R, ri = (xi − a)2 + (yi − b)2 , (3.2)
where (a, b) denotes the center, and R the radius of the circle.
In the context of the functional model, the geometric fit returns the maximum
likelihood estimates (MLE) of the circle parameters [6], i.e.
A major concern with the geometric fit is that the above minimization prob-
lem has no closed form solution. All practical algorithms of minimizing F
are iterative; some implement a general Gauss-Newton [6, 15] or Levenberg-
Marquardt [9] schemes, others use circle-specific methods proposed by Landau
[24] and Späth [30]. The performance of iterative algorithms heavily depends on
the choice of the initial guess. They often take dozens or hundreds of iterations
to converge, and there is always a chance that they would be trapped in a local
minimum of F or diverge entirely. These issues are explored in [9].
A peculiar feature of the maximum likelihood estimates (â, b̂, R̂) of the circle
parameters is that they have infinite moments [7], i.e.
for any set of true values (ã, b̃, R̃); here E denotes the mean value. This happens
because the distributions of these estimates have somewhat heavy tails, even
though those tails barely affect the practical performance of the MLE (the same
happens when one fits straight lines to data with errors in both variables [2, 3]).
To ensure the existence of moments one can adopt a different parameter
scheme. An elegant scheme was proposed by Pratt [27] and others [13], which
describes circles by an algebraic equation
A(x2 + y2 ) + Bx + Cy + D = 0 (3.5)
A. Al-Sharadqah and N. Chernov/Error analysis for circle fitting algorithms 889
It is clear now that (3.5) defines a circle if and only if B 2 + C 2 − 4AD > 0.
As the parameters (A, B, C, D) only need to be determined up to a scalar
multiple, it is natural to impose a constraint
B 2 + C 2 − 4AD = 1, (3.7)
where we denote zi = x2i + yi2 for brevity (we intentionally omit symbol A here
to make our formulas consistent with the subsequent ones). Now the problem re-
duces to a system of linear equations (normal equations) with respect to B, C, D
A. Al-Sharadqah and N. Chernov/Error analysis for circle fitting algorithms 890
that can be easily solved, and then one recovers the natural circle parameters
a, b, R via (3.8).
This method was introduced in the 1970s by Delogne [11] and Kåsa [23], and
then rediscovered and published independently by many authors, see references
in [9]. It remains popular in practice. We call it Kåsa fit.
The Kåsa method is perhaps the fastest circle fit, but its accuracy suffers
when one observes incomplete circular arcs (partially occluded circles); then the
Kåsa fit is known to be heavily biased toward small circles [9]. The reason for
the bias is that the algebraic distances fi provide a poor approximation to the
geometric distances di ; in fact,
hence the Kåsa fit minimizes FK ≈ 2R2 d2i , and it often favors smaller circles
P
minimizing R2 rather than the distances di .
Pratt fit. To improve the performance of the Kåsa method one can minimize
1
P 2
another function, F = 4R 2 FK , which provides a better approximation to di .
This new function, expressed in terms of A, B, C, D reads
X [Azi + Bxi + Cyi + D]2
FP = , (4.4)
B 2 + C 2 − 4AD
due to (3.8). Equivalently, one can minimize
X
F (A, B, C, D) = [Azi + Bxi + Cyi + D]2 (4.5)
subject to the constraint (3.7). This method was proposed by Pratt [27].
Taubin fit. A slightly different method was proposed by Taubin [32] who min-
imizes the function
2
(xi − a)2 + (yi − b)2 − R2
P
FT = P . (4.6)
4n−1 (xi − a)2 + (yi − b)2
General remarks. Note that the minimization of (4.5) must use some con-
straint, to avoid a trivial solution A = B = C = D = 0. Pratt and Taubin fits
A. Al-Sharadqah and N. Chernov/Error analysis for circle fitting algorithms 891
utilize constraints (3.7) and (4.8), respectively. Kåsa fit also minimizes (4.5),
but subject to constraint A = 1.
While the Pratt and Taubin estimates of the parameters A, B, C, D have
finite moments, the corresponding estimates of a, b, R have infinite moments,
just like the MLE (3.4). On the other hand, Kåsa’s estimates of a, b, R have
finite moments whenever n ≥ 4; see [37].
All the above circle fits have an important property – they are independent
of the choice of the coordinate system, i.e. their results are invariant under
translations and rotations; see a proof in [14].
Practical experience shows that the Pratt and Taubin fits are more stable
and accurate than the Kåsa fit, and they perform nearly equally well, see [9].
Taubin [32] intended to compare his fit to Pratt’s theoretically, but no such
analysis was ever published. We make such a comparison below.
There are many other approaches to the circle fitting problem in the modern
literature [4, 10, 35, 28, 29, 31, 33, 36, 38], but most of them are either quite
slow or can be reduced to one of the algebraic fits [14, Chapter 8].
MA = ηNA (4.14)
and
AT NA = 1, (4.15)
thus A must be a generalized eigenvector for the matrix pair (M, N), which also
satisfies AT NA = 1. The problem (4.14)–(4.15) may have several solutions. To
choose the right one we note that for each solution (η, A)
AT MA = ηAT NA = η, (4.16)
The singular case and summary. The matrix M is singular if and only
if the observed points lie on a circle (or a line); in this case the eigenvector
A0 corresponding to η = 0 satisfies ZA0 = 0, i.e. it gives the interpolating
circle (line), which is obviously the best possible fit. However it may happen
that for some (poorly chosen) matrices N we have AT0 NA0 < 0, so that the
geometrically perfect solution has to be rejected. Such algebraic fits are not
worth considering. For all constraint matrices N in this paper we have AT NA ≥
0 whenever AT MA = 0 (the reader can verify this directly).
Summarizing, we conclude that each algebraic circle fit can be computed in
two steps: first we find all solutions (η, A) of the generalized eigenvalue problem
(4.14), and then we pick the one with the minimal non-negative η.
P (x, y; Θ) = 0, (5.1)
Here Gm = ∇θ̂m and Hm = ∇2 θ̂m denote the gradient (the vector of the first
order partial derivatives) and the Hessian matrix of the second order partial
derivatives of θ̂m , respectively, taken at the true vector X̃. The remainder term
OP (σ 3 ) in (5.2) is a random variable R such that σ −3 R is bounded in proba-
bility.
Expansion (5.2) shows that Θ̂(X) → Θ̂(X̃) in probability, as σ → 0. It is
convenient to assume that
Θ̂(X̃) = Θ̃. (5.3)
Precisely (5.3) means that whenever σ = 0, i.e. when the true points are observed
without noise, then the estimator returns the true parameter vector, i.e. finds
the true curve. Geometrically, it means that if there is a model curve that
interpolates the data points, then the algorithm finds it.
A. Al-Sharadqah and N. Chernov/Error analysis for circle fitting algorithms 894
With some degree of informality, one can assert that whenever (5.3) holds,
the estimate Θ̂ is consistent in the limit σ → 0. This is regarded as a minimal re-
quirement for any sensible fitting algorithm. For example, if the observed points
lie on one circle, then every circle fitting algorithm finds that circle uniquely.
Kanatani [20] remarks that algorithms which fail to follow this property “are
not worth considering”.
Under the assumption (5.3) we rewrite (5.2) as
where ∆θ̂m (X) = θ̂m (X) − θ̃m is the statistical error of the parameter estimate.
The accuracy of an estimator θ̂ in statistics is characterized by its Mean
Squared Error (MSE)
2
E (θ̂ − θ̃)2 = Var(θ̂)+ bias(θ̂) ,
(5.5)
where bias(θ̂) = E(θ̂)− θ̃. But it often happens that exact (or even approximate)
values of E(θ̂) and Var(θ̂) are unavailable because the probability distribution
of θ̂ is overly complicated, which is common in curve fitting problems, even if
one fits straight lines to data points; see [2, 3]. There are also cases where the
estimates have theoretically infinite moments because of somewhat heavy tails,
which on the other hand barely affect their practical performance. Thus their
accuracy should not be characterized by the theoretical moments which happen
to be affected by heavy tails; see also [2]. In all such cases one usually constructs
a good approximate probability distribution for θ̂ and judges the quality of θ̂
by the moments of that distribution.
It is standard [1–3, 12, 34] to construct a normal approximation to θ̂ and
treat its variance as an ‘approximative’ MSE of θ̂. The normal approximation is
usually based on the leading term in the Taylor expansion, like GTm E in (5.4).
For circle fitting algorithms, the resulting variance (see below) will be the same
for all known methods, so we will go one step further and use the second order
term. This gives us a better approximative distribution and allows us to compare
circle fitting methods. In our formulas, E(θ̂m ) and Var(θ̂m ) denote the mean and
variance of the resulting approximative distribution.
The first term in (5.4) is a linear combination of i.i.d. normal random variables
that have zero mean, hence it is itself a normal random variable with zero mean.
The second term is a quadratic form of i.i.d. normal variables. Since Hm is a
symmetric matrix, we have Hm = QTm Dm Qm , where Qm is an orthogonal
matrix and Dm = diag{d1 , . . . , d2n} is a diagonal matrix. The vector Em =
Qm E has the same distribution as E does, i.e. its components are i.i.d. normal
random variables with mean zero and variance σ 2 . Thus
X
ET Hm E = ETm Dm Em = σ 2 di Zi2 , (5.6)
where the Zi ’s are i.i.d. standard normal random variables, and the mean value
of (5.6) is
E ET Hm E = σ 2 tr Dm = σ 2 tr Hm .
(5.7)
A. Al-Sharadqah and N. Chernov/Error analysis for circle fitting algorithms 895
Classification of higher order terms. In the MSE expansion (5.9), the lead-
ing term σ 2 GTm Gm is the most significant. The terms of order σ 4 are often given
by long complicated formulas. Even the expression for the bias (5.8) may con-
tain several terms of order σ 2 , as we will see below. Fortunately, it is possible
to sort them out keeping only the most significant ones, see next.
Kanatani [22] recently derived formulas for the bias of certain ellipse fitting
algorithms. First he found all the terms of order σ 2 , but in the end he noticed
that some terms were of order σ 2 (independent of n), while the others of order
σ 2 /n. The magnitude of the former was clearly larger than that of the latter,
and when Kanatani made his conclusions he ignored the terms of order σ 2 /n.
Here we formalize Kanatani’s classification of higher order terms as follows:
– In the expression for the bias (5.8) we keep terms of order σ 2 (independent of
n) and ignore terms of order σ 2 /n.
– In the expression for the mean squared error (5.9) we keep terms of order σ 4
(independent of n) and ignore terms of order σ 4 /n.
These rules agree with our assumption that not only σ → 0, but also n → ∞,
although n increases rather slowly (n ≪ 1/σ 2 ). Such models were studied by
Amemiya, Fuller and Wolter [1, 34] who made a more rigid assumption that
n ∼ σ −a for some 0 < a < 2.
Now it turns out (we omit detailed proofs; see [14]) that the main term
σ 2 GTm Gm in our expression for the MSE (5.9) is of order σ 2 /n; so it will never
be ignored. Of the fourth order terms, 21 σ 4 kHm k2F is of order σ 4 /n, hence it
will be discarded, and the same applies to all the terms involving third order
partial derivatives mentioned above.
The bias σ 2 tr Hm in (5.8) is, generally, of order σ 2 (independent of n), thus
its contribution to the mean squared error (5.9) is significant. However the full
expression for the bias may contain terms of order σ 2 and of order σ 2 /n, of
which the latter will be ignored; see below.
Now the terms in (5.9) have the following orders of magnitude:
E [∆θ̂m ]2 = O(σ 2 /n) + O(σ 4 ) + O(σ 4 /n) + O(σ 6 ),
(5.10)
A. Al-Sharadqah and N. Chernov/Error analysis for circle fitting algorithms 896
Table 1
The order of magnitude of the four terms in (5.9)
σ2 /n σ4 σ4 /n σ6
small samples (n ∼ 1/σ) σ3 σ4 σ5 σ6
large samples (n ∼ 1/σ2 ) σ4 σ4 σ6 σ6
where each big-O simply indicates the order of the corresponding term in (5.9).
It is interesting to roughly compare their values numerically. In typical computer
vision applications, σ does not exceed 0.05; see [5]. The number of data points
normally varies between 10-20 (on the low end) and a few hundred (on the high
end). For simplicity, we can set n ∼ 1/σ for smaller samples and n ∼ 1/σ 2 for
larger samples. Then Table 1 presents the corresponding typical magnitudes of
each of the four terms in (5.9).
We see that for larger samples the fourth order term coming from the bias
may be just as big as the leading second-order term, hence it would be unwise
to ignore it. Earlier studies, see e.g. [5, 8, 17], usually focused on the leading,
i.e. second-order, terms only, disregarding all the fourth-order terms, and this
is where our analysis is different. We make one step further – we keep all the
terms of order O(σ 2 /n) and O(σ 4 ). The less significant terms of order O(σ 4 /n)
and O(σ 6 ) would be discarded.
Now combining all our results gives a matrix formula for the (total) mean
squared error (MSE)
E (∆Θ̂)(∆Θ̂)T = σ 2 GGT + σ 4 BBT + · · · ,
(5.11)
KCR lower bound. The matrix V representing the leading terms of the vari-
ance has a natural lower bound (an analogue of the Cramer-Rao bound): for
every curve family (5.1) there is a symmetric positive semi-definite matrix Vmin
such that for every estimator satisfying (5.3)
−1
PΘi P T
X
Θi
V ≥ Vmin = , (5.14)
kPxi k2
for the gradient with respect to the planar variables x and y; both gradients are
taken at the true point x̃i = (x̃i , ỹi ). For example in the case of fitting circles
defined by P = (x − a)2 + (y − b)2 − R2 , we have
T T
PΘi = −2 (x̃i − ã), (ỹi − b̃), R̃ , Pxi = 2 (x̃i − ã), (ỹi − b̃) . (5.17)
Therefore,
Vmin = (WT W)−1 , (5.18)
where
ũ1 ṽ1 1
def . .. ..
W = .. . . (5.19)
ũn ṽn 1
and ũi , ṽi are given by (2.4).
The general inequality (5.14) was proved by Kanatani [17, 18] for unbiased
estimators Θ̂ and then extended by Chernov and Lesort [8] to all estimators
satisfying (5.3). The geometric fit (which minimizes orthogonal distances) always
satisfies (5.3) and attains the lower bound Vmin ; this was proved by Fuller
(Theorem 3.2.1 in [12]) and independently by Chernov and Lesort [8], who
named the inequality (5.14) Kanatani-Cramer-Rao (KCR) lower bound. See
also survey [25] for the more general case of heteroscedastic noise.
bias – better estimates should have smaller essential biases. It appears that
there is no natural minimum for kB1 k, in fact there exist estimators which have
a minimum variance V = Vmin and a zero essential bias, i.e. B1 = 0. We will
construct such an estimator in Section 7.
Here we apply the general method of the previous section to the geometric circle
fit, i.e.
P to2 the estimator Θ̂ = (â, b̂, R̂) of the circle parameters minimizing the
sum di of orthogonal (geometric) distances from the data points to the fitted
circle.
Variance of the geometric circle fit. We start with the main part of our
error analysis – the variance term represented by σ 2 V in (5.13). The distances
di = ri − R can be expanded as
q
di = (x̃i + δi ) − (ã + ∆a)]2 + (ỹi + εi ) − (b̃ + ∆b)]2 − R̃ − ∆R
q
= R̃2 + 2R̃ũi (δi − ∆a) + 2R̃ṽi (εi − ∆b) + OP (σ 2 ) − R̃ − ∆R
= ũi (δi − ∆a) + ṽi (εi − ∆b) − ∆R + OP (σ 2 ), (6.1)
P 2
see (2.4). Minimizing di to the first order is equivalent to minimizing
X
(ũi ∆a + ṽi ∆b + ∆R − ũi δi − ṽi εi )2 . (6.2)
of course this does not include the OP (σ 2 ) terms. Thus the variance of our
estimator, to the leading order, is
where the higher order (of σ 4 ) terms are not included. Comparing this to (5.18)
confirms that the geometric fit attains the minimal possible covariance matrix V.
A. Al-Sharadqah and N. Chernov/Error analysis for circle fitting algorithms 899
a = ã + ∆1 a + ∆2 a + OP (σ 3 ),
b = b̃ + ∆1 b + ∆2 b + OP (σ 3 ), (6.7)
3
R = R̃ + ∆1 R + ∆2 R + OP (σ ).
Since we already
P 2 found ∆1 a, ∆1 b, ∆1 R, the only unknowns are ∆2 a, ∆2 b, ∆2 R.
Minimizing di is now equivalent to minimizing
X
(ũi ∆2 a + ṽi ∆2 b + ∆2 R − fi )2 , (6.9)
where
σ2
(WT W)−1 WT 1 + (WT W)−1 WT S ,
E(∆Θ̂) = E(∆2 Θ̂) = (6.12)
2R
where 1 = (1, 1, . . . , 1)T and S = (s1 , . . . , sn )T , here si is a scalar
The second term in (6.12) is of order O(σ 2 /n), thus the essential bias is
given by the first term only, and it can be simplified. Since the last column of
A. Al-Sharadqah and N. Chernov/Error analysis for circle fitting algorithms 900
ess σ2 T
E(∆Θ̂) = 0, 0, 1 . (6.14)
2R̃
Thus the estimates of the circle center, â and b̂, have no essential bias, while
the estimate of the radius has essential bias
ess σ2
E(∆R̂) = , (6.15)
2R̃
which is independent of the number and location of the true points. These facts
are consistent with the results obtained by Berman [5] under the assumptions
that σ > 0 is fixed and n → ∞.
Here we analyze algebraic circle fits using their matrix representation. We re-
call that A is a solution of the generalized eigenvalue problem MA = ηNA
corresponding to the smallest non-negative η. We also require kAk = 1.
Since the true points lie on the true circle, Z̃Ã = 0, as well as M̃Ã = 0 (hence
M̃ is a singular matrix). Therefore
ÃT ∆1 M Ã = n−1 ÃT Z̃T ∆1 Z + ∆1 ZT Z̃ Ã = 0,
(7.4)
hence AT MA = OP (σ 2 ), and premultiplying (4.14) by AT yields η = OP (σ 2 ).
Next, substituting the expansions of M, A, and N into (4.14) gives
(M̃ + ∆1 M + ∆2 M)(Ã + ∆1 A + ∆2 A) = η ÑÃ (7.5)
(recall that N is data-dependent for the Taubin method, but only its ‘true’
value Ñ matters, as η = OP (σ 2 ), hence the use of the observed values only adds
higher order terms). Now using M̃Ã = 0 yields
(M̃ ∆1 A + ∆1 M Ã) + (M̃ ∆2 A + ∆1 M ∆1 A + ∆2 M Ã) = η ÑÃ (7.6)
A. Al-Sharadqah and N. Chernov/Error analysis for circle fitting algorithms 901
The left hand side of (7.6) consists of a linear part (M̃ ∆1 A + ∆1 M Ã) and a
quadratic part (M̃ ∆2 A + ∆1 M ∆1 A + ∆2 M Ã). Separating them gives
M̃ ∆2 A + ∆1 M ∆1 A + ∆2 M Ã = η ÑÃ. (7.8)
Note that M̃ is a singular matrix (because M̃Ã = 0), but whenever there are
at least three distinct true points, they determine a unique true circle, thus the
kernel of M̃ is one-dimensional, and it coincides with span(Ã). Also, we set
kAk = 1, hence ∆1 A is orthogonal to Ã, and we can write
where M̃− denotes the Moore-Penrose pseudoinverse. Now one can easily check
that E(∆1 M ∆1 A) = O(σ 2 /n) and E(∆1 A) = 0; these facts will be useful in
the upcoming analysis.
where
z̃i 2x̃i δi + 2ỹi εi
def x̃i δi
Z̃i =
ỹi and ∆1 Zi = (7.11)
εi
1 0
denote the columns of the matrices Z̃T and ∆1 ZT , respectively. Next,
T
0 whenever i 6= j
E (∆1 Zi )(∆1 Zj ) = (7.12)
σ 2 T̃i whenever i = j
where
4z̃i 2x̃i 2ỹi 0
def 2x̃ i 1 0 0
T̃i = 2ỹi
. (7.13)
0 1 0
0 0 0 0
Note n−1 T̃i = T̃ and ÃT T̃i à = ÃT Pà = B̃ 2 + C̃ 2 − 4ÃD̃ for each i; recall
P
(4.11) and (4.12). Hence
X X
Z̃i ÃT T̃i ÃZ̃Ti = (ÃT PÃ)Z̃i Z̃Ti = n(ÃT PÃ)M̃. (7.14)
A. Al-Sharadqah and N. Chernov/Error analysis for circle fitting algorithms 902
Remarkably, the variance of algebraic fits does not depend on the constraint
matrix N, hence all algebraic fits have the same variance (to the leading order).
In the next section we will derive the variance of algebraic fits in the natural
circle parameters (a, b, R) and see that it coincides with the variance of the
geometric fit (6.6).
(here à is the first component of the vector Ã). Then, note that ∆2 Zi = (δi2 +
ε2i , 0, 0, 0)T , and so X
E(Z̃T ∆2 Z)Ã = 2Ãσ 2 Z̃i . (7.19)
Therefore the essential bias is given by
ÃT PÃ
ess
X
E(∆2 A) = −σ 2 M̃− 4Ãn−1 Z̃i + PÃ − ÑÃ . (7.20)
ÃT ÑÃ
A more detailed analysis (which we omit) gives the following expression con-
taining all the O(σ 2 ) and O(σ 2 /n) terms:
(1 − n3 )(ÃT PÃ)
X
2 −
4Ãn−1
Z̃i + 1 − n4 PÃ −
E(∆2 A) = −σ M̃ ÑÃ
ÃT ÑÃ
X
− 4Ãn−2 (Z̃Ti M̃− Z̃i )Z̃i + O(σ 4 ). (7.21)
A. Al-Sharadqah and N. Chernov/Error analysis for circle fitting algorithms 903
In fact, this term will play the key role in the subsequent analysis.
Bias of the Pratt and Taubin fits. We have seen that all the algebraic fits
have the same main characteristic – the variance (7.15), to the leading order.
We will see below that their variance coincides with that of the geometric circle
fit. Thus the difference between all our circle fits should be traced to the higher
order terms, especially to their essential biases.
First we compare the Pratt and Taubin fits. For the Pratt fit, the constraint
matrix is N = Ñ = P, hence its essential bias (7.20) becomes
ess
E(∆2 APratt ) = −4σ 2 Ã [0, 0, 0, 1]T . (8.1)
In other words, the Pratt constraint N = P cancels the second (middle) term
in (7.20); it leaves the first term intact.
For the Taubin fit, the constraint matrix is N = T and its ‘true’ value is
Ñ = T̃ = n1
P
T̃i ; also note that T̃i à = 2ÃZ̃i + Pà for every i. Hence the
Taubin’s bias is
ess
E(∆2 ATaubin ) = −2σ 2 Ã [0, 0, 0, 1]T . (8.2)
Thus, the Taubin constraint N = T cancels the second term in (7.20) and a
half of the first term; it leaves only a half of the first term in place.
As a result, the Taubin fit’s essential bias is twice as small as that of the Pratt
fit. Given that their main terms (variances) are equal, we see that the Taubin fit
is statistically more accurate than that of Pratt. We believe our analysis answers
the question posed by Taubin [32] who intended to compare his fit to Pratt’s.
We call this fit hyperaccurate, or ‘Hyper’ for short. The term hyperaccuracy was
introduced by Kanatani [21, 22] who was first to employ Taylor expansion up
to the terms of order σ 4 for the purpose of comparing various algebraic fits and
designing better fits.
We note that the Hyper fit is invariant under translations and rotations
because its constraint matrix H is a linear combination of two others, T and P,
that satisfy the invariance requirements; see a proof in [14].
As any other algebraic circle fit, the Hyper fit minimizes the function F (A) =
AT MA subject to the constraint AT NA = 1 (with N = H), hence we need to
solve the generalized eigenvalue problem MA = ηHA and choose the solution
with the smallest non-negative eigenvalue η (see the end of Section 4).
We note that the matrix H is not singular, three of its eigenvalues are positive
and one is negative (these facts can be easily derived from the following simple
observations: det H = −4, trace H = 8z̄ + 2 > 1, and λ = 1 is one of its
eigenvalues). If M is positive definite, then by Sylvester’s law of inertia the
matrix H−1 M has the same signature as H does, i.e. the eigenvalues η of H−1 M
are all real, exactly three of them are positive and one is negative. In this sense
the Hyper fit is similar to the Pratt fit, as the constraint matrix P also has three
positive and one negative eigenvalues.
The Hyper fit can be computed by a numerically stable procedure involv-
ing singular value decomposition (SVD). First, we compute the (short) SVD,
Z = UΣVT , of the matrix Z. If its smallest singular value, σ4 , is less than
a predefined tolerance ε (we suggest ε = 10−12 ), then A is the corresponding
right singular vector, i.e. the fourth column of the V matrix. In the regular
case (σ4 ≥ ε), one forms Y = VΣVT and finds the eigenpairs of the symmet-
ric matrix YH−1 Y. Selecting the eigenpair (η, A∗ ) with the smallest positive
eigenvalue and computing A = Y−1 A∗ completes the solution. The prior trans-
lation of the coordinate system to the centroid of the data set (which ensures
that x̄ = ȳ = 0) makes the computation of H−1 particularly simple. The corre-
sponding MATLAB code is available from our web page [14].
where the matrix (WT W)−1 appears in (6.12) and the matrix M̃− appears in
(7.15). The identity (8.9) is easy to verify directly for the unit circle ã = b̃ = 0
and R̃ = 1, and then one can check that it remains valid under translations and
similarities.
Equation (8.9) implies that for every true point (x̃i , ỹi )
4Ã2 R̃2 J̃M̃− Z̃i Z̃Ti M̃− J̃T = n2 (WT W)−1 Wi WiT (WT W)−1 , (8.10)
where Wi = (ũi , ṽi , 1)T denote the columns of the matrix W, cf. (5.19). Sum-
ming up over i gives
Thus the variance of all the algebraic circle fits (to the leading order) coincides
with that of the geometric circle fit, cf. (6.6). Therefore the difference between
all the circle fits should be then characterized in terms of their biases, which we
do next.
The essential bias of the Pratt fit is, due to (8.1),
ess T
E(∆2 Θ̂Pratt ) = 2σ 2 R̃−1 0, 0, 1 .
(8.13)
Observe that the estimates of the circle center are essentially unbiased, and
the essential bias of the radius estimate is 2σ 2 /R̃, which is independent of the
A. Al-Sharadqah and N. Chernov/Error analysis for circle fitting algorithms 906
number and location of the true points. We know that the essential bias of the
Taubin fit is twice as small, hence
ess T
E(∆2 Θ̂Taubin ) = σ 2 R̃−1 0, 0, 1 .
(8.14)
Comparing to (6.14) shows that the geometric fit has an essential bias that is
twice as small as that of Taubin and four times smaller than that of Pratt.
Therefore, the geometric fit has the smallest bias among all the popular circle
fits, i.e. it is statistically most accurate.
The formulas for the bias of the Kåsa fit can be derived, too, but in gen-
eral they are complicated. However recall that all our fits, including Kåsa, are
independent of the choice of the coordinate system, hence we can choose it
so that the true circle has center at (0, 0) and radius R̃ = 1. For this circle
à = √12 [1, 0, 0, −1]T , hence Pà = 2à and so M̃− Pà = 0, i.e. the middle term
in (7.20) is gone.
√ Also note that ÃT PÃ = 2, hence the last term in parentheses
T
in (7.20) is 2 2 [1, 0, 0, 0] .
Next, assume for simplicity that the true points are equally spaced on an
arc of size θ (a typical arrangement in many studies). Choosing the coordinate
system so that the east pole (1, 0) is at the center of that arc (see Figure 1)
ensures ȳ = xy = 0. It is not hard to see now that
M̃− [1, 0, 0, 0]T = 14 (xx − x̄2 )−1 [xx, −2x̄, 0, xx]T . (8.15)
Using the formula (8.6) we obtain (omitting details as they are not so relevant)
the essential bias of the Kåsa fit in the natural parameters (a, b, R):
ess T σ2 T
E(∆2 Θ̂Kasa ) = 2σ 2 0, 0, 1 −
2
−x̄, 0, xx . (8.16)
xx − x̄
The first term here is the same as in (8.13) (recall that R̃ = 1), but it is
the second term above that causes serious trouble: it grows to infinity because
xx − x̄2 → 0 as θ → 0. This explains why the Kåsa fit develops a heavy bias
toward smaller circles when data points are sampled from a small arc.
A. Al-Sharadqah and N. Chernov/Error analysis for circle fitting algorithms 907
Table 2
Mean square error (and its components) for four circle fits (104 ×values are shown). In this
test n = 100 points are placed (equally spaced) along a semicircle of radius R = 1 and the
noise level is σ = 0.05
total MSE = variance + (ess. bias)2 + rest of MSE
Pratt 1.5164 1.2647 0.2500 0.0017
Taubin 1.3451 1.2647 0.0625 0.0117
Geom. 1.2952 1.2647 0.0156 0.0149
Hyper. 1.2892 1.2647 0.0000 0.0244
Table 3
Mean square error (and its components) for four circle fits (106 ×values are shown). In this
test n = 10000 points are placed (equally spaced) along a semicircle of radius R = 1 and the
noise level is σ = 0.05
total MSE = variance + (ess. bias)2 + rest of MSE
Pratt 25.5520 1.3197 25.0000 -0.76784
Taubin 7.4385 1.3197 6.2500 -0.13126
Geom. 2.8635 1.3197 1.5625 -0.01876
Hyper. 1.3482 1.3197 0.0000 -0.02844
To illustrate our analysis of various circle fits we have run a few computer
experiments where we set n true points equally spaced along a semicircle of
radius R = 1. Then we generated random samples by adding a Gaussian noise
at level σ = 0.05 to each true point, and after that applied various circle fits to
estimate the parameters (a, b, R).
Table 2 summarizes the results of the first test, with n = 100 points; it shows
the mean square error (MSE) of the radius estimate R̂ for each circle fit (ob-
tained by averaging over 107 randomly generated samples). The table also gives
the breakdown of the MSE into three components. The first two are the vari-
ance (to the leading order) and the square of the essential bias, both computed
according to our theoretical formulas. These two components do not account
for the entire mean square error, due to higher order terms which our analysis
discarded. The remaining part of the MSE is shown in the last column, which is
relatively small. (We note that only the total MSE can be observed in practice;
all the other columns of this table are the results of our theoretical analysis.)
We see that all the circle fits have the same (leading) variance, which accounts
for the ‘bulk’ of the MSE. Their essential bias is different, it is highest for the
Pratt fit and smallest (zero) for the Hyper fit. Algorithms with smaller essential
biases perform overall better, i.e. have smaller mean square error. The Hyper fit
is the best in our experiment; it outperforms the (usually unbeatable) geometric
fit.
To highlight the superiority of the Hyper fit, we repeated our experiment
increasing the sample up to n = 10000, see Table 3 and Figure 2. We see that
A. Al-Sharadqah and N. Chernov/Error analysis for circle fitting algorithms 908
Pratt
Taubin
Geom.
−4
10 Hyper
−5
10
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Fig 2. MSE for various circle fits (on the logarithmic scale) versus the sample size n (from
10 to 104 ).
when the number of points is high, the the Hyper fit becomes several times more
accurate than the geometric fit. Thus, our analysis disproves the popular belief
in the statistical community that there is nothing better than minimizing the
orthogonal distances.
Needless to say, the geometric fit involves iterative approximations, which
are computationally intensive and subject to occasional divergence, while our
Hyper fit is a fast non-iterative procedure, which is 100% reliable.
Summary. All the known circle fits (geometric and algebraic) have the same
variance, to the leading order. The relative difference between them can be
traced to higher order terms in the expansion for the mean square error. The
second leading term in that expansion is the essential bias, for which we have
derived explicit expressions. Circle fits with smaller essential bias perform better
overall. This explains a poor performance of the Kåsa fit, a moderate perfor-
mance of the Pratt fit, and a good performance of the Taubin and geometric
fits (in this order). We showed that while there is a natural lower bound on the
variance to the leading order (the KCR bound), there is no lower bound on the
essential bias. In fact there exists an algebraic fit with zero essential bias (the
Hyper fit), which outperforms the geometric fit in accuracy. We plan to perform
a similar analysis for ellipse fitting algorithms in the near future.
Acknowledgements
The authors are grateful to the anonymous referees for many helpful sugges-
tions. N.C. was partially supported by National Science Foundation, grant DMS-
0652896.
A. Al-Sharadqah and N. Chernov/Error analysis for circle fitting algorithms 909
References
[1] Amemiya, Y. and Fuller, W.A. (1988). Estimation for the nonlinear
functional relationship. Annals Statist. 16 147–160. MR0924862
[2] Anderson, T.W. (1976). Estimation of linear functional relationships:
Approximate distributions and connections with simultaneous equations in
econometrics. J. R. Statist. Soc. B 38 1–36. MR0411025
[3] Anderson, T.W. and Sawa, T. (1982). Exact and approximate distri-
butions of the maximum likelihood estimator of a slope coefficient. J. R.
Statist. Soc. B 44 52–62. MR0655374
[4] Atieg, A. and Watson, G.A. (2004). Fitting circular arcs by orthog-
onal distance regression. Appl. Numer. Anal. Comput. Math. 1 66–76.
MR2168317
[5] Berman, M. (1989). Large sample bias in least squares estimators of a
circular arc center and its radius. CVGIP: Image Understanding 45 126–
128.
[6] Chan, N.N. (1965). On circular functional relationships. J. R. Statist. Soc.
B 27 45–56. MR0189163
[7] Chernov, N. Fitting circles to scattered data: parameter estimates have
no moments. Manuscript, see http://www.math.uab.edu/~chernov/cl.
[8] Chernov, N. and Lesort, C. (2004). Statistical efficiency of curve fitting
algorithms. Comp. Stat. Data Anal 47 713–728. MR2101548
[9] Chernov, N. and Lesort, C. (2005). Least squares fitting of circles. J.
Math. Imag. Vision 23 239–251. MR2181705
[10] Chernov, N. and Sapirstein, P. (2008). Fitting circles to data with
correlated noise. Comput. Statist. Data Anal 52 5328–5337.
[11] Delogne, P. (1972). Computer optimization of Deschamps’ method and
error cancellation in reflectometry. In Proc. IMEKO-Symp. Microwave
Measurement (Budapest) 117–123.
[12] Fuller, W.A. (1987). Measurement Error Models. L. Wiley & Son, New
York. MR0898653
[13] Gander, W., Golub, G.H., and Strebel, R. (1994). Least squares fit-
ting of circles and ellipses. BIT 34 558–578. MR1430909
[14] http://www.math.uab.edu/chernov/cl.
[15] Joseph, S.H. (1994). Unbiased least-squares fitting of circular arcs. Graph.
Mod. Image Process. 56 424–432.
[16] Kadane, J.B. (1970). Testing overidentifying restrictions when the distur-
bances are small. J. Amer. Statist. Assoc. 65 182–185.
[17] Kanatani, K. (1996). Statistical Optimization for Geometric Compu-
tation: Theory and Practice. Elsevier Science, Amsterdam, Netherlands.
MR1392697
[18] Kanatani, K. (1998). Cramer-Rao lower bounds for curve fitting. Graph.
Mod. Image Process 60 93–99.
[19] Kanatani, K. (2004). For geometric inference from images, what kind of
statistical model is necessary? Syst. Comp. Japan 35 1–9.
A. Al-Sharadqah and N. Chernov/Error analysis for circle fitting algorithms 910