A Distributed and Adaptive Signal Processing Approach To Reducing Energy Consumption in Sensor Ne
A Distributed and Adaptive Signal Processing Approach To Reducing Energy Consumption in Sensor Ne
Abstract— We propose a novel approach to reducing energy conserving energy in sensor networks that is mutually exclu-
consumption in sensor networks using a distributed adaptive sive to the above approaches, and can be used in combination
signal processing framework and efficient algorithm 1 . While the with them to increase energy reduction.
topic of energy-aware routing to alleviate energy consumption
in sensor networks has received attention recently [1,2], in this Our approach is based on judiciously exploiting existing
paper, we propose an orthogonal approach to previous methods. sensor data correlations in a distributed manner. Correlations
Specifically, we propose a distributed way of continuously ex- in sensor data are brought about by the spatio-temporal
ploiting existing correlations in sensor data based on adaptive characteristics of the physical medium being sensed. Dense
signal processing and distributed source coding principles. Our sensor networks are particularly rich in correlations, where
approach enables sensor nodes to blindly compress their read-
ings with respect to one another without the need for explicit spatially dense nodes are typically needed to acquire fine
and energy-expensive inter-sensor communication to effect this spatial resolution in the data being sensed, and for fault
compression. Furthermore, the distributed algorithm used by tolerance from individual node failures. Examples of corre-
each sensor node is extremely low in complexity and easy to lated sensors include temperature and humidity sensors in a
implement (i.e., one modulo operation), while an adaptive filtering similar geographic region, or magnetometric sensors tracking
framework is used at the data gathering unit to continuously
learn the relevant correlation structures in the sensor data. Our a moving vehicle. Another interesting example of correlated
simulations show the power of our proposed algorithms, revealing sensor data involves audio field sensors (microphones) that
their potential to effect significant energy savings (from 10%- sense a common event such as a concert or whale cries.
65%) for typical sensor data corresponding to a multitude of Audio data is particularly interesting in that it is rich in spatial
sensor modalities. correlation structure due to the presence of echoes, causing
I. I NTRODUCTION multiple sensors to pick up attenuated and delayed versions
of a common sound origin.
advances in wireless networking and embedded micropro- We propose to remove the redundancy caused by these
cessor designs have enabled the creation of dense low-power inherent correlations in the sensor data through a distributed
sensor networks. These sensor networks consist of nodes compression algorithm which obviates the need for the sensors
endowed with a multitude of sensing modalities such as tem- to exchange their data among each other in order to strip their
perature, pressure, light, magnetometer, infrared, audio, video, common redundancy. Rather surprisingly, we will show that
etc. The nodes are typically of small physical dimensions compression can be effected in a fully blind manner without
and operated by battery power, making energy consumption the sensor nodes ever knowing what the other correlated sensor
a major concern. For example, failure of a set of nodes in nodes have measured. Our proposed paradigm is particularly
the sensor network due to energy depletion can lead to a effective for sensor network architectures having two types of
partition of the sensor network and loss of potentially critical nodes: sensing nodes and data-gathering nodes. The sensing
information. Motivated by this, there has been considerable nodes gather data of a specific type and transmit this data
recent interest in the area of energy-aware routing for ad hoc upon being queried. The data gathering node queries specific
and sensor networks [1], [2], [3] and efficient information sensors in order to gather information in which it is inter-
processing [4], [5] to reduce the energy usage of sensor nodes. ested (see Fig. 1). We will assume the above architecture
For example, one method of conserving energy in a sensor (Fig. 1) for the rest of the paper and show that for such
node is to aggregate packets along the sensor paths to reduce an architecture, we can devise compression algorithms that
header overhead. In this paper, we propose a new method of have very lightweight encoders, yet can achieve significant
1 This work was supported in part by DARPA-F30602-00-2-0538, NSF- savings. Note, that we target very lightweight encoders in
CCR-0219722 and Intel. this paper because we assume that the sensors have limited
sensor node
sensor node
To build a distributed compression system, we propose
sensor node
to use an asymmetric coding method among the sensors.
sensor node Specifically, we propose to build upon the architecture of
sensor node
Fig. 2 which is designed for two nodes. In Fig. 2, there are two
query data
nodes, each of which measures data using an Analog-to-Digital
sensor node
(A/D) converter. One of the sensor nodes will either transmit
query
data its data Y directly to the data gathering node or compress its
readings with respect to its own previous readings while the
Data Gathering Node other sensor node compresses its data X with respect to its
own previous readings and readings from other sensors and
Fig. 1. An example sensor network: a computer acts as the data gathering then transmits the compressed data m to the data gathering
node, and queries various sensors to collect data
node. The decoder will then try to decode m to X, given
that Y is correlated to X. In the discrete alphabet case, it
compute power, but the constructions introduced in this paper can be shown that the compression performance of the above
can be easily strengthened given greater compute power at architecture can match the case where Y is available to the
the sensors. The savings are achieved by having the data sensor node that is measuring X.
gathering node track the correlation structure among nodes To extend the above architecture (Fig. 2) to n nodes we
and then use this information to effect distributed sensor data will have one node send its data either uncoded (i.e., Y ) or
compression. The correlation structure is determined by using compressed with respect to its past. The data gathering node
an adaptive prediction algorithm. The sensors, however, do can decode this reading without receiving anything from the
not need to know the correlation structure; they need to know other sensors. The other sensors can compress their data with
only the number of bits that they should use for encoding respect to Y , without even knowing their correlation structure
their measurements. As a result, each sensor node is required with respect to Y . The data gathering node will keep track of
to perform very few operations in order to encode its data. the correlation structure and inform the sensors of the number
The decoder, however, is considerably more complex, but it of bits that they shall use for encoding. In the compression
resides on the data gathering node, which is not assumed literature, Y is often referred to as side-information and the
to be energy constrained. Preliminary results based on our above architectures are often referred to as compression with
distributed compression and adaptive prediction algorithms side information [6].
perform well in realistic scenarios, achieving 10-65% energy To develop code constructions for distributed compression,
savings for each sensor in typical cases. In addition, our we will start by giving some background information on
distributed compression architecture can be combined with source coding with side information and then introduce a code
other energy saving methods such as packet/data aggregation construction that achieves good performance at a low encoding
to achieve further gains. cost.
Two of the main challenges in designing a system as A. Background on compression with side information
described above include (1) devising a computationally inex-
pensive encoder that can support multiple compression rates In 1973, Slepian and Wolf presented a suprising result to the
and (2) determining an adaptive correlation-tracking algorithm source coding (compression) community [6]. The result states
that can continuously track the amount of correlation that that if two discrete alphabet random variables X and Y are
exists between the sensor nodes. We will look at the above correlated according to some arbitrary probability distribution
two issues in the following sections. In the next section, we p(x, y), then X can be compressed without access to Y
start by devising a computationally inexpensive compression without losing any compression performance with respect to
algorithm for the sensor nodes. In section 3, we will present the the case where X is compressed with access to Y . More
correlation tracking algorithm. In section 4, we will integrate formally, without having access to Y , X can be compressed
the above components into a complete system. Simulation using H(X|Y ) bits where
results are given in section 5 and we conclude with some H(X|Y ) = − PY (y) PX (x|y)log2 PX (x|y) (1)
remarks in section 6. y x
II. D ISTRIBUTED COMPRESSION The quantity, H(X|Y ) is often interpreted as the “uncertainty”
The appeal of using distributed compression lies in the remaining in the random variable X given the observation of
fact that each sensor can compress its data without knowing Y [7]. This is the same compression performance that would
what the other sensors are measuring. In fact, the sensors do be achieved if X were compressed while having access to
not even need to know the correlation structure between its Y . To provide the intuition behind this result, we provide the
data and that of the other sensors. As a result, an end-to-end following example.
compression system that achieves a significant savings across
the network can be built, where the endpoints consist of the Example 1: Consider X and Y to be equiprobable 3-bit data
sensor node and the data gathering node. sets which are correlated in the following way: dH (X, Y ) ≤ 1,
^
m
Decoder
X
0 1
Y
Sensor Node
Data 2∆ 2∆
Y
A/D r0 r2 r4 r6 r8 r10 r12 r14 r1 r3 r5 r7 r9 r11 r13 r15
0 1 0 1
Fig. 2. Distributed compression: the encoder compresses X given that the 4∆ 4∆ 4∆ 4∆
decoder has access to Y , which is correlated to X. r0 r4 r8 r12 r2 r6 r10 r14 r1 r5 r9 r13 r3 r7 r11 r15
. .
where dH (., .) denotes Hamming distance. When Y is known . .
. .
both at the encoder and decoder, we can compress X to 2
bits, conveying the information about the uncertainty of X
Fig. 3. A tree-based contruction for compression with side information. The
given Y (i.e., the modulo-two sum of X and Y given by: root of the tree contains 24 values, and two partitions of the root quantizer
(000),(001),(010) and (100)). Now, if Y is known only at the are shown.
decoder, we can surprisingly still compress X to 2 bits. The
method of construction stems from the following argument: if
the decoder knows that X=000 or X=111, then it is wasteful data, which might be time-varying. Motivated by the above,
to spend any bits to differentiate between the two. In fact, we have devised a tree-based distributed compression code
we can group X=000 and X=111 into one coset (it is exactly that can provide variable-rate compression without the need
the so-called principal coset of the length-3 repetition code). for changing the underlying codebook.
In a similar fashion, we can partition the remaining space
B. Code construction
of 3-bit binary codewords into 3 different cosets with each
coset containing the original codewords offset by a unique and In this section we propose a codebook construction that will
correctable error pattern. Since there are 4 cosets, we need to allow an encoder to encode a random variable X given that
spend only 2 bits to specify the coset in which X belongs. The the decoder has access to a correlated random variable Y .
four cosets are given as This construction can then be applied to a sensor network as
shown in Fig. 2. The main design goal of our code construc-
coset-1 = (000, 111), coset-2 = (001, 110), tion is to support multiple compression rates, in addition to
coset-3 = (010, 101), coset-4 = (011, 110) being computationally inexpensive. In support of our goal of
The decoder can recover X perfectly by decoding Y to the minimizing the computations for each sensor node, we will
closest (in hamming distance) codeword in the coset specified not be looking into code constructions that use complicated
by the encoder. Thus the encoder does not need to know the error correction codes. Error correction codes, can however, be
realization of Y for optimal encoding. easily incorporated into our construction but will lead to more
complexity for each sensor node. Our uncoded code construc-
The above results were established only for discrete random tion is as follows. We start with a root codebook that contains
variables. In 1976, Wyner and Ziv extended the results of [6] 2n representative values on the real axis. We then partition
to lossy distributed compression by proving that under certain the root codebook into two subsets consisting of the even-
conditions [8], there are no performance degradations for lossy indexed representations and the odd-indexed representations.
compression with side information available at the decoder as We represent these two sub-codebooks as children nodes of
compared to lossy compression with side information available the root codebook. We further partition each of these nodes
at both the encoder and decoder. into sub-codebooks and represent them as children nodes in
The results established by [6] and [8] are theoretical results, the second level of the tree structure. This process is repeated
however, and as a result do not provide intuition as to how one n times, resulting in an n-level tree structure that contains
might achieve the predicted theoretical bounds practically. In 2n leaf nodes, each of which represents a subcodebook that
1999, Pradhan and Ramchandran [9] prescribed practical con- contains one of the original 2n values. An example partition is
structions for distributed compression in an attempt to achieve given in Fig. 3, where we use n = 4 and show only 2 levels of
the bounds predicted by [6] and [8]. The resulting codes the partition. Note from this tree-based codebook construction
perform well, but cannot be directly used for sensor networks that if the spacing between representative values is denoted by
because they are not designed to support different compression ∆, then each of the subcodebooks at level-i in the tree will
rates. To achieve distributed compression in a sensor network, contain representative values that are spaced apart by 2i ∆. In
it is desirable to have one underlying codebook that is not a sensor network, a reading will typically be represented as
changed among the sensors but can also support multiple one of the 2n values in the root codebook assuming that the
compression rates. The reason for needing a codebook that sensor uses an n-bit A/D converter. Instead of transmitting n-
supports multiple compression rates is that the compression bits to represent the sensor reading, as would be traditionally
rate is directly dependent on the amount of correlation in the done, we can transmit i < n bits if there is side-information,
1) Encoder: The encoder will receive a request from the r0 r4 r8 r12 r2 r6 r10 r14 r1 r5 r9 r13 r3 r7 r11 r15
data gathering node requesting that it encode its readings using Y = 0.8
. .
i bits. The first thing that the encoder does is find the closest . .
representation of the data from the 2n values in the root . .
codebook (this is typically done by the A/D converter). Next,
the encoder determines the subcodebook that X belongs to Fig. 4. An example for the tree based codebook. The encoder is asked to
encode X using 2 bits, so it transmits 01 to the decoder. The decoder will
at level-i. The path through the tree to this subcodebook will use the bits 01 in ascending order from the LSB to determine the path to the
specify the bits that are transferred to the data gathering node. subcodebook to use to decode Y with.
The mapping from X to the bits that specify the subcodebook
at level i can be done through the following deterministic
mapping: III. C ORRELATION T RACKING
f (X) = index(X) mod 2i (2) In the above encoding/decoding operations we assume that
the decoder for sensor j has available to it at time k some
where f (X) represents the bits to be transmitted to the decoder (j)
side-information Yk that is correlated to the sensor reading,
and index() is a mapping from values in the root codebook to (j)
their respective indices. For a given X and i, f (X) will be an Xk . In practice, we choose to use a linear predictive model
(j)
i-bit value which the data gathering node will use to traverse where Yk is a linear combination of values that are available
the tree. at the decoder:
2) Decoder: The decoder (at the data gathering node) will M
j−1
(j) (j) (i)
receive the i-bit value, f (X), from the encoder and will Yk = αl Xk−l + βi Xk (4)
traverse the tree starting with the least-significant-bit (LSB) l=1 i=1
of f (X) to determine the appropriate subcodebook, S to use. (j) (i)
where Xk−l represents past readings for sensor j and Xk
The decoder will then decode the side-information, Y , to the
represents present sensor readings from sensor i 2 . The vari-
closest value in S:
ables αl and βi are weighting coefficients. We can then think
(j) (j)
X̂ = argminri ∈S ||Y − ri || (3) of Yk as a linear prediction of Xk based on past values
(j)
(i.e., Xk−l ; l = 1, ..., M ) and other sensor readings that have
where ri represents the ith codeword in S. Assuming that Y (i)
is less than 2i−1 ∆ away from X, where ∆ is the spacing in already been decoded at the data gathering node (i.e., Xk ;
the root codebook, then the decoder will be able to decode i = 1, ..., j −1, where i indexes the sensor and j −1 represents
Y to the exact value of X, and recover X perfectly. The the number of readings from other sensors that have already
following example will elucidate the encoding/decoding been decoded). We choose to use a linear predictive model
operations. because it is not only analytically tractable but also optimal in
the limiting case where the readings can be modeled as i.i.d.
Gaussian random variables.
Example 2: Consider the 4-level tree codebook of Fig. 4.
Assume that the data is represented by the value r9 = 0.9 In order to leverage the inter-node correlations, we require
in the root codebook and the data gathering node asks the that one of the sensors always sends its data either uncoded
sensor to encode X using 2 bits. The index of r9 is 9, so or compressed with respect to its own past data. Furthermore,
f (X) = 9 mod 4 = 1. Thus, the encoder will send the two we number the sensors in the order that they are queried. For
bits, 01, to the data gathering node (see Fig. 4). The data example, at each time instant, one of the sensors will send
(1)
gathering node will receive 01 and descend the tree using its reading, Xk , either uncoded or coded with respect to its
the least-significant bit first (i.e., 1 and then 0) to determine own past. The reading for sensor 2 can then be decoded with
the subcodebook to decode the side-information with. In the respect to
M
example, we assume that the side-information, Y , is 0.8, and (2) (2) (1)
Yk = αl Xk−l + β1 Xk (5)
we will decode Y in the subcodebook located at 1, 0 in the
l=1
tree to find the closest codeword. This codeword is r9 which
2 Note
is exactly the value representing X. Thus, we have used 2 bits that for simplicity, our above prediction model is based on a finite
number of past values and a single present value for each of the other sensor
to convey the value of X instead of using the 4 bits that would readings that have been decoded. This model can be generalized to the case
have been needed if we had not done any encoding. where past values of other sensors are also included in the prediction.
and Rxj xi and Rxi xi are given as: Γ(k+1) = Γ(k) − 1 µ(−2Pj + 2Rzz
j (k)
Γj ). (10)
j j
2
rxj x1 (1) rxj x2 (1) ... rxj xj−1 (1)
rxj x1 (2) rxj x2 (2) ... rxj xj−1 (2) In practice, however, the data gathering node will not have
Rxj xi =
... ... ... ... knowledge of Pj and Rzzj
and will therefore need an efficient
rxj x1 (M ) rxj x2 (M ) ... rx x (M )
j j−1 method for estimating Pj and Rzz
j
. One standard estimate is
(j−1)
Xk 1 2
K
2
σN = N (15)
so that (10) becomes j
K − 1 i=1 k,j
Γ(k+1) = Γj
(k) k,j (−X (j) + Z
− µZ T Γ(k) ) (11) 2
j k k,j j during the first K rounds of requests. To update σN j
, the data
(k) k,j Nk,j
= Γj + µZ gathering node can form the following filtered estimate:
2 2 2
where the second equality follows from the fact that Yk =
(j) σN j ,new
= (1 − γ)σN j ,old
+ γNk,j (16)
T Γ(k) and Nk,j = X (j) − Y (j) . The equation described
Z where σN 2 2
is the previous estimate of σN and γ is a
k,j j k k j ,old j
by (12) is well known in the adaptive filtering literature as “forgetting factor” [10]. We choose to use a filtered estimate
the Least-Mean-Squares (LMS) algorithm and the steps in to adapt to changes in statistics.
calculating the LMS solution is summarized below:
(j) (k)T B. Decoding error
1. Yk = Γ j Zk,j
(j) (j) As mentioned above, it is always possible for the data
2. Nk,j = Xk − Yk
Γ(k+1) = Γ(k) + µZ k,j Nk,j gathering node to make a decoding error if the magnitude
3. j j
of the correlation noise, |Nk,j |, is larger than 2i−1 ∆ where i
To use the LMS algorithm, the data gathering node will start is the number of bits used to encode the sensor reading for
by querying all of the sensors for uncoded data for the first sensor j at time k. We propose two approaches for dealing
K rounds of requests. The value of K should be chosen to be with such errors. One method is to use error detection codes
large enough to allow the LMS algorithm to converge. After and the other method entails using error correction codes.
K rounds of requests have been completed, the data gathering To use error detection, each sensor node can transmit a
node can then ask for coded values from the sensor nodes cyclic redundancy check (CRC) [12] for every m readings
and decode the coded value for sensor j with respect to its that it transmits. The data gathering node will decode the
(j) (j) . The value
corresponding side information, Yk = ΓTj Z k,j m readings using the tree-structured codebook as above and
of Γj will continue to be updated to adjust to changes in compare its own calculation of the CRC (based on the m
the statistics of the data. More specifically, for each round of readings it decodes) to the CRC transmitted by the sensor. If
request and each value reported by a sensor, the decoder will an error is detected (i.e, the CRC does not match), then the
(j)
decode Yk to the closest codeword in the subcodebook, S, data gathering node can either drop the m readings or ask for a
specified by the corresponding sensor retransmission of the m readings. Whether the data gathering
(j) (j) node drops the m readings or asks for a retransmission is
X̂k = argminri ∈S ||Yk − ri || (12) application dependent, and we do not address this issue in
(j) (j) this paper. Furthermore, by using Chebyshev’s inequality (13),
From section II-B, we know that X̂k will always equal Xk
(j) the data gathering node can make the probability of decoding
as long as the sensor node encodes Xk using i bits so that
error as small as it desires which translates directly into a
2 ∆ > |Nk,j |. If |Nk,j | > 2 ∆, however, then a decoding
i−1 i−1
lower probability of data drops or retransmissions.
error will occur. We can use Chebyshev’s inequality [11] to
The other method of guarding against decoding error is to
bound this probability of error:
use error-correction codes. We propose using a non-binary
2
σN error correction code such as an (M,K) Reed-Solomon code
P [|Nk,j | > 2i−1 ∆] ≤ j
(13) [13] that can operate on K sensor readings and generate
(2i−1 ∆)2
M − K parity check symbols. These M − K parity check
where Nk,j is drawn from a distribution with zero mean and
2 symbols can be transmitted to the data gathering node along
variance σN . Thus, to insure that P [|Nk,j | > 2i−1 ∆] is less
j
2
σN
with the K encoded sensor readings. The data gathering
2 = node will decode the K sensor readings using the tree-based
j
than some probability of error, Pe , we can choose (2i−1 ∆)
Pe . The value of i that will insure this probability of error is structure mentioned above and upon receiving the M − K
then given as parity check symbols, it can correct for any errors that occurred
2
1 σN in the K sensor readings. If more than M −K 2 errors exist in
i = log2 ( 2 j ) + 1 (14) the K sensor readings, then the Reed-Solomon decoder will
2 ∆ Pe
B. Energy savings
80
The next set of simulations were run to measure the amount
of energy savings that the sensor nodes achieved. The energy
60
savings were calculated to be the total reduction in energy
that resulted from transmission and reception. Note that for
40 reception, energy expenditure is actually not reduced but
increased because the sensor nodes need to receive the extra
20 bits that specify the number of bits to encode each sensor
reading with. For an n-bit A/D converter, an extra log(n)
0 bits need to be received each time the data gathering node
0 2000 4000 6000 8000 10000 12000 14000 16000 18000
Time informs a sensor of the number of bits needed for encoding.
We assume that the energy used to transmit a bit is equivalent
Fig. 5. Tolerable noise vs. prediction noise for 18,000 samples of humidity. to the energy used to receive a bit. To reduce the extra energy
The tolerable noise is the amount of noise that can exist between the prediction needed for reception, we simulated the data gathering node to
of a sensor reading and the actual sensor reading without inducing a decoding only specify the number of encoding bits periodically. In our
error.
simulations, we chose for this period to be 100 samples for
each sensor node. The 5 sensor nodes were alternately queried
by sensor j as: to send back readings that were compressed only with respect
4
to its own past readings so that compressed readings from
(j) (j) (m)
Yk = αl Xk−l + Xk (17) other sensors could be decoded with respect to these readings.
l=1 The overall average savings in energy is given in Table 1. To
where m = j. In other words the prediction of the reading Data Set Temperature Humidity Light
for sensor j is derived from its own past values and one other Ave Energy Savings 66.6% 44.9% 11.7%
sensor. To test the correlation tracking algorithm, we measured TABLE I
the tolerable noise that the correlation tracking algorithm AVERAGE ENERGY SAVINGS OVER AN UNCODED SYSTEM FOR SENSOR
calculates at each time instant. The tolerable noise is the NODES MEASURING TEMPERATURE , HUMIDITY AND LIGHT
amount of noise that can exist between the prediction of a
sensor reading and the actual sensor reading without inducing
a decoding error. Tolerable noise is calculated by using (14), assess the performance of our algorithm, we choose to use the
and noting that the tolerable noise will be given as 2i−1 ∆ work of [14] as a benchmark for comparison. The work of
where i is the number of bits that are requested from the sensor [14] is also based on a distributed coding framework but the
and ∆ is the spacing of values in the A/D converter. We set prediction algorithm uses a filtered estimate for the prediction
the bound on probability of decoding error to be less than 1 in coefficients instead of using a gradient descent algorithm such
100 and simulated the data gathering algorithm and the sensor as LMS to determine the prediction coefficients. Furthermore,
node algorithms over 18,000 samples of light, temperature and in [14] the prediction algorithm only uses one measurement
humidity for each sensor (a total of 90,000 samples). A plot from a neighboring sensor to form the prediction estimate.
of the tolerable noise vs. actual prediction noise is given in Thus, in order to perform a fair comparison, we changed the
Fig. 5. where the top graph represents the tolerable noise and model of (17) to only use one measurement from another
the bottom graph represents the actual prediction noise. sensor to form the prediction estimate and surprisingly was
From the plot it can be seen that the tolerable noise is much able to achieve roughly the same performance as given in Table
larger than the actual prediction noise. The reason for this 1. The results for humidity are approximately 24% better than
is that we were conservative in choosing the parameters for the results cited in [14] for the same data set. Similarly, the
estimating the number of bits to request from the sensors. The results for temperature and light are approximately 16% and
tolerable noise can be lowered to achieve higher efficiency but 3% better respectively than the results cited in [14] for the
this also leads to a higher probability of decoding error. For respective data sets. Thus, it is clear that the LMS algorithm
the simulations that we ran, zero decoding errors were made is better suited for tracking correlations than the methods given
for 90,000 samples of humidity, temperature and light. in [14].
One other thing to note from the plot is that there are One can achieve even larger energy savings than the savings
many spikes in the tolerable noise. These spikes occur be- cited above by using a less conservative estimate of the bits