A Continuous Normal Approximation To The Binomial Distribution
A Continuous Normal Approximation To The Binomial Distribution
9, 2024-05
Hugo Hernandez
ForsChem Research, 050030 Medellin, Colombia
[email protected]
doi: 10.13140/RG.2.2.26710.05447
Abstract
The binomial distribution is a well-known example of discrete probability distribution. Only two
outcomes are possible for each independent trial in a binomial experiment. In this report, a
continuous approximation is proposed for describing the discrete binomial probability
function, which can then be used to represent an analogous binomial continuous variable. The
proposed approximation consists of a correction to the combinatorial number approximated
by using Stirling’s equation, followed by a Taylor series approximation truncated after the
second power. As a result, a normal or Gaussian distribution function is obtained. The error of
the proposed approximation decays with the number of trials considered. However, even for
small numbers of trials (e.g. less than 10), the approximation can be considered satisfactory.
Keywords
Bernoulli trials, Binomial Distribution, Combinatorial, Continuous Approximation, Factorial,
Gamma Function, Normal Distribution, Probability, Taylor Series, Stirling’s Approximation
1. Introduction
Any situation where multiple trials can be independently performed under similar conditions,
each of which having only two possible outcomes and 1, is denoted as a binomial
experiment [1], and the trials are denoted as Bernoulli trials. Binomial experiments are also
commonly denoted as combinatorial problems and have been observed and investigated since
ancient times (with earliest written records dating from about the 5th century BC in ancient
India) [2].
1
Typically, the binomial outcomes are denoted as success and failure, but the outcomes and will be
used here as a generalization.
Cite as: Hernandez, H. (2024). A Continuous Normal Approximation to the Binomial Distribution.
ForsChem Research Reports, 9, 2024-05, 1 - 25. Publication Date: 11/04/2024.
A Continuous Normal Approximation
to the Binomial Distribution
Hugo Hernandez
ForsChem Research
[email protected]
(1.1)
or equivalently,
(1.2)
In the case of multiple independent trials, let us say trials, the total probability of the
different outcomes for the multiple trials, which is the product of the probabilities of each
independent trial, will be given by:
( )
(1.3)
Notice that the binomial power in Eq. (1.3) can be alternatively expressed using the Binomial
Theorem [3,4] as follows:
( ) ∑( )
(1.4)
where ( ) represents a combinatorial number defined as:
( )
( )
(1.5)
and is the factorial number given by:
(1.6)
Notice that each term of the sum in Eq. (1.4) represents the probability ( ) of obtaining
outcome in trials, and outcome in trials, in a binomial experiment of independent
trials. Then, considering Eq. (1.2), we obtain:
( ) ( ) ( )
(1.7)
representing the probability distribution of the binomial experiment, also known as binomial
distribution. For or , the corresponding probability is zero.
The expected value of the binomial distribution is:
( )
( ) ∑ ( ) ∑ ( ) ∑ ( )
( ) ( ) ( )
∑( ) ( ) ∑( ) ( )
( ( ))
(1.8)
and its variance is:
( ) ( ) ( ( )) (∑ ( ))
(∑ ( ) ( ) )
(∑( )( ) ( ) )
(∑( ) ( )) ( )
( ( ) ) (( ) ) ( )
(1.9)
In addition, the cumulative probability function of the binomial distribution is:
( ) ∑ ( ) ∑( ) ( )
(1.10)
The binomial probability distribution function is a discrete function. However, discrete
functions can be approximated by continuous functions, facilitating in certain cases the
calculus of probabilities, and other mathematical operations.
A direct transformation of Eq. (1.7) into a continuous probability density function is obtained by
replacing the factorial function by the corresponding gamma function ( ) [5] and considering a
local average of the density function [6] for , as follows:
( ) ( )
( ) ( )
( ) ( )
(1.11)
where
( ) ∫
(1.12)
such that
( ) ( )
(1.13)
and
( )
(1.14)
is a correction term required to satisfy a necessary condition of probability
density functions2:
∫ ( )
(1.15)
In addition, the cumulative probability function of the continuous binomial distribution will be:
( )
( ) ∫ ( )
( ) ( )
(1.16)
Alternatively, the cumulative probability function can be expressed in terms of the regularized
incomplete beta function ( ) [7] as follows:
( ) { ( )
(1.17)
where
∫ ( )
( ) ( )
∫ ( )
(1.18)
In this report, an alternative continuous approximation is presented, which can then be related
to the normal distribution. The proposed approximation results from considering, in first place,
Stirling’s approximation (Section 2) to the factorial, or the function, and the combinatorial
number, then incorporating a correction to the combinatorial number approximation (Section
3), and finally, using a Taylor series approximation for the logarithm of the probability function
(Section 4). The probability density and cumulative probability functions for a continuous
binomial variable are described in Section 5.
2
Alternatively, we may simply set the upper limit to and use a normalization factor to
fulfill condition (1.15).
2. Stirling’s Approximation
(2.1)
Thus, the logarithm of the factorial will be:
(2.2)
Assuming a continuous variable with , then Eq. (2.2) can be considered as a midpoint rule
Riemann sum [8], equivalent to the following integral:
∫ [( )] ( )( ( ) ) ( ( ) )
( ) ( ) ( ) ( ) ( )
(2.3)
Figure 1 illustrates the performance of approximation (2.3) on the estimation of , for
selected values of between and . This approximation has a mean absolute error in the
estimation of of about for and about for .
Figure 1. Comparison between and the continuous approximation shown in Eq. (2.3). Selected
values of between and were considered.
3
See: https://www.wolframalpha.com/input?i=lim%28%28x%2B1%2F2%29*ln%281%2B1%2F%282x%29%29%29%2C+x%3Dinf
( ) ( )
(2.4)
and therefore, for large values of :
( ) ( )
(2.5)
Replacing this result in Eq. (2.3) yields:
( ) ( )
(2.6)
The performance of approximation (2.6) is illustrated in Figure 2. The mean absolute error of
this approximation increases to for and to for .
Figure 2. Comparison between and the continuous approximation shown in Eq. (2.6). Selected
values of between and were considered.
Figure 3. Comparison between and the continuous approximation shown in Eq. (2.7). Selected
values of between and were considered.
The advanced work on integrals, series and factorials performed by John Wallis, Abraham de
Moivre and James Stirling during the 17th and 18th centuries, allowed obtaining a more precise
approximation of , nowadays known as Stirling’s approximation [9,10]:
( )
(2.7)
The performance of Stirling’s approximation (Eq. 2.7) is illustrated in Figure 3. While the
differences in performance between approximations (2.3), (2.6) and (2.7) are barely noticeable
in practice, Stirling’s approximation is better. The mean absolute error obtained for Stirling’s
approximation is for and for .
Now, considering Stirling’s approximation, the logarithm of the combinatorial number then
becomes:
( ) ( )
( ) ( ) ( ) ( )
( )
( ) ( ) ( )
( )
(3.1)
Thus,
( ) ( ) ( )
( ) ( )
√ ( )
(3.2)
Figure 4 illustrates the behavior of approximation (3.2) for the estimation of the combinatorial
number, considering different values of . As it can be seen, the approximation works fine for
intermediate values, but dramatically fails at the extremes of the interval, that is, when
or . In fact, in the limits of or , the approximated combinatorial number tends
to , as the denominator tends to . It can also be observed that the estimated value is always
slightly higher than the exact combinatorial number. Neglecting the extreme values, the mean
relative absolute error ( ) observed for the estimation of the combinatorial number decays
with number of trials, approximately as follows:
( )
(3.3)
This behavior of the is illustrated in Figure 5.
Figure 4. Comparison between ( ) and the continuous approximation shown in Eq. (3.2). Selected values
of the total number of trials ( ).
Figure 5. Mean relative absolute error observed for Eq. (3.2) as a function of the number of trials ( ) and
described by Eq. (3.3), neglecting the extreme values ( ).
Since the main issue with Eq. (3.2) is that the denominator becomes zero for or ,
the following empirical correction is proposed in this report:
( )
( )
√ ( ) ( )
(3.4)
where is a small positive constant ensuring that the denominator is never zero for .
( )
√ ( )
(3.5)
This approximate relation is satisfied with less than error for using the following
constant value:
(3.6)
as illustrated in Figure 6.
The performance of Eq. (3.4) for different number of trials is graphically shown in Figure 7. The
mean relative absolute error obtained with Eq. (3.4), including the extreme values, can be
described using the following empirical relation:
( )
(3.7)
The for Eq. (3.4) is presented in Figure 8. This MRE is about one order of magnitude less
than the value obtained for Eq. (3.2).
Figure 7. Comparison between ( ) and the continuous approximation shown in Eq. (3.4), using
. Selected values of the total number of trials ( ).
Figure 8. Mean relative absolute error observed for Eq. (3.4) as a function of the number of trials ( ) and
described by Eq. (3.7), including the extreme values ( ). .
Using approximation (3.4), the binomial probability distribution function (Eq. 1.7) becomes:
( ) √ ( ) ( ) ( )
( )( )
(3.8)
Now, the continuous binomial probability density function can be approximated as follows:
( )
( ) ( )√ ( ) ( ) ( )
( )( )
(3.9)
In this case, a constant value ( ) is used to normalize the probability density function, such
that:
( )
∫ √ ( ) ( ) ( )
( )( )
(3.10)
While it may seem reasonable to set and , it is not necessarily the case, as it
was shown in Eq. (1.11).
In terms of the probability logarithm, we obtain:
( ) ( ) ( ) ( ) ( ) ( ) ( )
( )
( ) ( )
(3.11)
Similarly, the probability density logarithm is:
( ) ( ) ( )
(3.12)
Let us now use a Taylor Series approximation of ( ) about the expected value of the
distribution: ( ) . Assuming and as constants, let us consider the following function:
( ) ( ) ( ) ( ) ( ) ( ) ( )
( )
( ) ( )
(4.1)
Then,
( ) ( ( ) ( ) ( ) ( ))
( )
( )
( )( ( ) )
(4.2)
( ) ( )
( ) ( ) ( )
( ) ( )
(4.3)
( ) ( ) ( ( ) )
( )
( ) ( ) ( ( ) )( )
(4.4)
( )
( ) ( )
(4.5)
( ) ( )
( ( ) ) ( )
(4.6)
( )
( ) ( ) ( ) ( )
(4.7)
( ) ( )
( ( ) ) ( )
(4.8)
( ) ( ) ( )
( ) ( ) ( ) ( )
(4.9)
( ) ( )
( ( ) ) ( )
(4.10)
( ) ( ) ( )( ) ( ) ( ) ( )( ) ( )
( ) ( ) ( ) ( )
(4.11)
( ) ( ) ( ( )) ( ) ( )
( )
( ( ) ) ( )
(4.12)
Eq. (4.11) and (4.12) valid for .
Thus,
( ) ( )
( )( )
∑
( ( ) ( ) ( ) ( ))
( )
( )
( )( ( ) )
( ) ( ( ) )
( ( ) )( )
( ) ( ) ( ( ) )( )
( ) ( )( )
∑( )( )
( )( ( ) ) ( )( )
(4.13)
Truncating after the second order term we obtain:
( ) ( ( ) ( ) ( ) ( ))
( )
( )
)( (( ) )
( ) ( ( ) )
( ( ) )( )
( ) ( ) ( ( ) )( )
( )
( )( )
( ( ) ) ( )
(4.14)
From which:
( )( ) ( ) ( )
( ) √ ( ) ( )
( )( ( ) ) ( ) ( )( )
( ) ( )( )
( )
( ) ( ( ) )( )
( ) ( )
( )
( )( )
( ( ) ) ( )
(4.15)
Then, for large values of we may neglect in certain sums, resulting in:
( )( ) ( )
( )
( ) ( )
( )
√ ( )
(4.16)
In the case of the uniform binomial distribution ( ), and large values of :
( )
( ) √
(4.17)
The approximation shown in Eq. (4.17) corresponds to a truncated normal distribution function
with and √ . Notice that for a uniform binomial distribution we have exactly
(from Eq. 1.8 and 1.9) ( ) and ( ) , which are consistent with the
estimations obtained with the second order approximation for large . Approximation (4.17) is
compared to the exact binomial probability values (Eq. 1.7) for selected values of in Figure 9.
Figure 9. Comparison between ( ) and the continuous approximation shown in Eq. (4.17). Selected
values of the total number of trials ( ).
Even when Eq. (4.17) was obtained after assuming large values of , the estimation of the
probability for small number of trials is somehow satisfactory. Figure 10 shows the effect of the
number of trials on the mean absolute error ( ) in the estimation of the probability function
for the binomial uniform distribution. For , the obtained is already below ; for
, the drops below ; and for , the descends below . The
following empirical expression can be used to describe the error obtained:
( )
(4.18)
Figure 10. Mean absolute error observed for Eq. (4.17) as a function of the number of trials ( ) and
described by Eq. (4.18).
Figure 11. Behavior of ( ) according to the continuous approximation shown in Eq. (4.20). Selected
values of probability of outcome ( ) and total number of trials ( ).
Figure 12. Mean absolute error observed for Eq. (4.20) as a function of the number of trials ( ) and
approximately described by Eq. (4.21).
Figure 11 shows the behavior of the probability function using approximation (4.20) compared
to the exact probabilities (Eq. 1.7) for different values of and . The differences are quantified
using the mean absolute error, which can be approximately described using the following
expression:
( )
(4.21)
The behavior of the with number of trials is illustrated in Figure 12. As expected, the error
decreases with the number of trials. This approximation might be considered suitable for
, as the mean absolute error is below .
Of course, lower errors are expected for higher order Taylor approximations, but at the
expense of more complicated analytical expressions.
( ) ( ) ( )
(5.1)
where, in this case
( )
( ( ( )( ) ))
( )
∫
√ ( )
( ( )( ) ) ( ( )( ) )
( ) ( )
√ ( ) √ ( )
(5.2)
where represents the error function. The error function terms emerge due to the truncation
of the normal distribution between and [11].
Considering that the difference between two consecutive values of the discrete binomial
distribution is , then each integer value is the representative value (midpoint class
mark) of the interval between and [6]. So, instead of setting the limits as
and , we need to consider the whole intervals at the extremes. That is,
and .
Thus,
( ( ( )( ) ))
√ ( )
( )
( )
( )( ) ( )( )
( )
( ) ( )
√ ( ) √ ( )
(5.3)
The probability density function can be shifted by unit to transform the midpoint class mark
into an upper limit class mark, resulting in:
( ( ( )( ) ))
√ ( )
( )
( )
( )( ) ( )( )
( )
( ) ( )
√ ( ) √ ( )
(5.4)
On the other hand, the cumulative probability can be approximated as follows:
( ) ∫ ( )
( )( ) ( )( )
( ) ( )
√ ( ) √ ( )
( )( ) ( )( )
( )
( ) ( )
√ ( ) √ ( )
(5.5)
Figure 13 shows the behavior of approximation (5.5) compared to the exact cumulative
probability of the discrete distribution (Eq. 1.10) for selected values of and . The mean
absolute error obtained can be approximately described by the empirical expression:
( )
(5.6)
presented graphically in Figure 14. Even for a small number of trials (e.g. ), the mean
absolute error observed in the estimation of the cumulative probability is already less than .
Figure 13. Behavior of ( ) according to the continuous approximation shown in Eq. (5.5), compared
to the exact results (Eq. 1.10). Selected values of outcome probability ( ) and total number of trials ( ).
Figure 14. Mean absolute error observed for Eq. (5.5) as a function of the number of trials ( ) and
approximately described by Eq. (5.6).
The expected value obtained with the approximate probability density shown in Eq. (5.4) is:
( ) ∫ ( )
( )( )
( ( ) ( ) ) (( )( ) ( ) )
( ) ( ) ( )
√
( )( ) ( ) ( ) ( )
( ) ( )
√ ( ) √ ( )
(5.7)
and for large number of trials:
( )
(5.8)
The variance of the approximate probability density function is:
( ) ∫ ( ( )) ( )
( )
( ) √
( ( ) ( ) ) (( )( ) ( ) )
(( ( ) ( ) ) ( ) (( )( ) ( ) ) ( ) )
( )( ) ( ) ( ) ( )
( ) ( )
√ ( ) √ ( )
(5.9)
and for large number of trials:
( ) ( )
(5.10)
Eq. (5.8) and (5.10) correspond to the expected value and variance of the binomial distribution
(Eq. 1.8 and 1.9).
Finally, a type I standard continuous binomial random variable can be defined as follows [12]
(assuming large number of trials):
( )
√ ( )
(5.11)
( )( )
( )
√ ( )
√
( )
( )( ) ( )( )
( )
( ) ( )
√ ( ) √ ( )
(5.12)
Since we are assuming large number of trials, then the standard continuous binomial random
variable might be approximated by the standard normal random variable ( ):
( ) ( )
√
(5.13)
6. Summary
( ) ( ) ( )
(1.7)
with the corresponding discrete cumulative probability function:
( ) ∑ ( ) ∑( ) ( )
(1.10)
where
( )
( )
(1.5)
Assuming that the factorials in the combinatorial number can be approximated by Stirling’s
formula:
( )
(2.7)
Then, we obtain:
( ) ( ) ( )
( ) ( )
√ ( )
(3.2)
Unfortunately, this expression diverges for and , as the denominator approaches a
value of . For this reason, the following approximation is proposed:
( )
( )
√ ( ) ( )
(3.4)
where
(3.6)
Thus, the binomial probability distribution becomes (as a logarithm):
( ) ( ) ( ) ( ) ( ) ( ) ( )
( )
( ) ( )
(3.11)
which is valid for continuous values of and .
The previous expression can be further approximated using a Taylor series expansion at
, yielding:
( ) ( )
( )( )
∑
( ( ) ( ) ( ) ( ))
( )
( )
( )( ( ) )
( ) ( ( ) )
( ( ) )( )
( ) ( ) ( ( ) )( )
( ) ( )( )
∑( )( )
( )( ( ) ) ( )( )
(4.13)
which can be truncated after the second power, resulting in:
( ( ( )( ) ))
( ) (( )( ) )
( )
( )
√ ( )
(4.19)
Now, considering large number of trials, the second factor approaches , and the probability
approximation simplifies into:
( ( ( )( ) ))
( )
( )
√ ( )
(4.20)
In the case of the uniform binomial distribution ( ), and large values of :
( )
( ) √
(4.17)
The probability function approximation can also be used to obtain the probability density
function of an equivalent continuous binomial random variable using:
( ) ( ) ( )
(3.12)
resulting in:
( ( ( )( ) ))
√ ( )
( )
( )
( )( ) ( )( )
( )
( ) ( )
√ ( ) √ ( )
(5.4)
with cumulative probability:
( ) ∫ ( )
( )( ) ( )( )
( ) ( )
√ ( ) √ ( )
( )( ) ( )( )
( )
( ) ( )
√ ( ) √ ( )
(5.5)
( )
√ ( )
( )
(6.2)
This report provides data, information and conclusions obtained by the author(s) as a result of original
scientific research, based on the best scientific knowledge available to the author(s). The main purpose
of this publication is the open sharing of scientific knowledge. Any mistake, omission, error or inaccuracy
published, if any, is completely unintentional.
This research did not receive any specific grant from funding agencies in the public, commercial, or not-
for-profit sectors.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC
4.0). Anyone is free to share (copy and redistribute the material in any medium or format) or adapt
(remix, transform, and build upon the material) this work under the following terms:
Attribution: Appropriate credit must be given, providing a link to the license, and indicating if
changes were made. This can be done in any reasonable manner, but not in any way that
suggests endorsement by the licensor.
NonCommercial: This material may not be used for commercial purposes.
References
[1] Devore, J. L. (2016). Probability and Statistics for Engineering and the Sciences. 9 th Edition.
Cengage Learning. Boston, MA. Section 3.4. The Binomial Probability Distribution. pp. 117-125.
https://www.cengage.com/c/probability-and-statistics-for-engineering-and-the-sciences-9e-
devore/9781305251809PF/.
[2] García-García, J. I., Fernández Coronado, N. A., Arredondo, E. H., & Imilpán Rivera, I. A. (2022). The
binomial distribution: Historical origin and evolution of its problem situations. Mathematics, 10
(15), 2680. doi: 10.3390/math10152680.
[3] Coolidge, J. L. (1949). The story of the binomial theorem. The American Mathematical Monthly, 56
(3), 147-157. doi: 10.2307/2305028.