Information Extraction From Sensor Networks Using The Watershed Transform Algorithm
Information Extraction From Sensor Networks Using The Watershed Transform Algorithm
transform algorithm
Mohammad Hammoudeh
a,
, Robert Newman
b
a
Manchester Metropolitan University, Chester Street, Manchester M1 5GD, UK
b
University of Wolverhampton, Wulfruna Street, Wolverhampton WV1 1LY, UK
a r t i c l e i n f o
Article history:
Available online xxxx
Keywords:
Information extraction
Watershed segmentation
Query scoping
Macroprogramming
Wireless sensor networks
a b s t r a c t
Wireless sensor networks are an effective tool to provide ne resolution monitoring of the physical envi-
ronment. Sensors generate continuous streams of data, which leads to several computational challenges.
As sensor nodes become increasingly active devices, with more processing and communication resources,
various methods of distributed data processing and sharing become feasible. The challenge is to extract
information from the gathered sensory data with a specied level of accuracy in a timely and power-ef-
cient approach. This paper presents a new solution to distributed information extraction that makes use
of the morphological Watershed algorithm. The Watershed algorithm dynamically groups sensor nodes
into homogeneous network segments with respect to their topological relationships and their sensing-
states. This setting allows network programmers to manipulate groups of spatially distributed data
streams instead of individual nodes. This is achieved by using network segments as programming
abstractions on which various query processes can be executed. Aiming at this purpose, we present a
reformulation of the global Watershed algorithm. The modied Watershed algorithm is fully asynchro-
nous, where sensor nodes can autonomously process their local data in parallel and in collaboration with
neighbouring nodes. Experimental evaluation shows that the presented solution is able to considerably
reduce query resolution cost without scarifying the quality of the returned results. When compared to
similar purpose schemes, such as Logical Neighborhood, the proposed approach reduces the total query
resolution overhead by up to 57.5%, reduces the number of nodes involved in query resolution by up to
59%, and reduces the setup convergence time by up to 65.1%.
2013 Elsevier B.V. All rights reserved.
1. Introduction
Wireless Sensor Networks (WSNs) are enabling the production
of applications that previously were not practical. They are cur-
rently being applied to a variety of use domains ranging from hab-
itat monitoring to space exploration, and from scientic to
military. Such applications share several aspects: (1) the demand
for information; (2) the response to this demand generally exists
in multiple unstructured, potentially unbounded sequences of data
points; (3) the generation of great volume of data that is imperfect
in nature and is characterised by signicant redundancy. The char-
acteristics of the sense data coupled with the resource constraints
on sensor nodes necessitate the development of resource-efcient
WSN applications. In-network information extraction is one meth-
od to minimise resource utilisation while achieving applications
objectives. This is due to the fact that local processing of sensed
data is more energy and bandwidth efcient than the transfer of
raw data to a central location for processing.
Information extraction is the sub-discipline of articial intelli-
gence [1] that selectively structures, identies, lters, classies,
and merges multi-modal data produced by multiple sensor nodes
to discover recurring patterns that form a coherent and meaningful
information. Initially, information extraction converts, possibly un-
bounded, sequences of raw data into a more uniform and conve-
nient structural format preparing it for further processing and
analysis. It exploits domain-specic knowledge and data structural
properties to harvest information by nding and combining rele-
vant data while excluding irrelevant and erroneous ones. This def-
inition will be used throughout this work to describe the term
information extraction.
This work addresses distributed query-based information
extraction systems that use in-network computation to provide
cost effective query responses. Nevertheless, the proposed solution
is general enough to be applicable to other information extraction
models, e.g., periodic or threshold-based. The query-based model
of information extraction is widely used in data intensive applica-
tions, where data is stored at multiple locations. Such systems are
user controlled where users specify and inject a request for the
information they require through simple queries; then the
1566-2535/$ - see front matter 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.inffus.2013.07.001
distp; p
0
jf p
0
6 f p
_ _
Denition 3 (Steepest Descending Path). "
p
2 X, the Steepest
Descending Path (SDP) is the set of points p
0
2 N(p) dened as
follows:
p
0
2 Np
f p f p
0
distp; p
0
LSp; f p
0
< f p
_ _
i.e. the SDP is a chain of linked points where every point has a grey-
value precisely less than its predecessor. Each point may have more
than one neighbour with a grey-value less than its own value;
therefore, there may be many descending paths from a given point.
The SDP is the path where every point in that path is linked to the
neighbour with the lowest grey-value. In keeping with the rain
analogy, a droplet that falls over a point on the surface ows to
the regional minimum along the path of steepest descent. Fig. 2
shows the SDP from the point (0, 0) to the regional minimum situ-
ated in (4, 4).
Denition 4 (Cost function based on lower slope). The slope of the
surface has strong inuence on path selection; atter surface
allows for faster, easier, and more direct walking. The slope-based
cost function, cost(p
i1
, p
i
), for travelling from point p
i1
to
p
i
2 N(p
i1
) is:
LSp
i1
distp
i1
; p
i
f p
i1
> f p
i
LSp
i
distp
p1
; p
i
f p
i1
< f p
i
1
2
LSp
i1
LSp
i
distp
i1
; p
i
f p
i1
f p
i
_
Denition 5 (Topographic distance). Distance is the primary cost
of walking on a surface. The topographic distance between two
points p and q is the minimal p distance among all paths p
between p and q:
Fig. 1. An example of a regional minimum.
Fig. 2. An example of a steepest descending path.
4 M. Hammoudeh, R. Newman/ Information Fusion xxx (2013) xxxxxx
Please cite this article in press as: M. Hammoudeh, R. Newman, Information extraction from sensor networks using the Watershed transform algorithm,
Informat. Fusion (2013), http://dx.doi.org/10.1016/j.inffus.2013.07.001
TD
f
p; q infTD
p
f
p; q
where TD
f
p; q
n
i2
costp
i1
; p
i
is the topographical distance of
a path p = (p = p
1
, p
2,
. . . , p
n
= q), such that "
i
, p
i
2 N(p
i1
) and p
i
2 X
Denition 6 (Catchment basin based on topographic dis-
tance). CB
TD
(m
i
) of a local minimum m
i
is dened as the set of
points p 2 X that are topographically closer to m
i
than to any other
regional minimum m
j
,:
CB
TD
m
i
p f m
i
TD
f
p; m
i
TD
f
p; m
j
8ji
_ _
Specically, a CB is composed of a uniquely labelled regional mini-
mum and all the points whose SDP lead to it [24]. In keeping with
the rain simulation analogy, a CB is a zone where all rain fall is
conveyed to. Fig. 3 shows a digital image segmented into two CB
(highlighted with different grey levels). The squares indicate the
regional local minima of each CB and the arrows represent the
steepest descending paths indicating the direction of descent.
4. The Watershed algorithm suitability for WSNs
The Watershed algorithm is intuitively understandable and of-
fers a wide range of prospective modications to t specic goals.
It can be easily customised to include diverse external factors that
may have inuence on the segmentation outcome, e.g., physical
obstacles between nodes. Watershed algorithm can easily be
adapted to work with multimodal WSNs. Each sense modality
can be viewed as a separate digital image to be segmented. A node
may have many SDPs leading to different minima and it may join
multiple catchment basins each correspond to one sense modality.
The segmentation steps across the different sense modalities can
be combined in one step to reduce resource utilisation.
The concept of Watersheds can be implemented in a distributed
asynchronous mode with fast computation time. The localisation
of the process of building catchment basins is a benecial factor
for immense and frequently varying data sets, posing it appropri-
ate for WSNs applications. A parallel and asynchronous implemen-
tation is expected to reduce data communication across the
network and evade the need for global re-computation/synchroni-
sation of the entire network segments once one or more observa-
tions are changed. Watershed algorithm can also be modied to
base the segmentation process on local conditions, which helps
in parallelising this process (see Section 5).
In the Watershed algorithm, the topographical distance be-
tween a point and its regional minimum equals to f(p)f(m
i
). This
means that a SDP guarantees a minimal communication cost. The
minimisation of the bridging distance between a point and its
regional minimum helps the WSN to save energy. The effective en-
ergy gain is the minimum squared distance between two points.
Finally, in Watersheds there are few parameter decisions and it
does not make any assumptions that limits its applicability.
5. The modied Watershed algorithm
5.1. Neighbours denition using Shepards method
In the wide body of literature, most Watershed transform algo-
rithm variations use a 4- to 8-pixel neighbours to dene the
boundary of a point. However, the topographical-distance-based
transform may result in non-minima plateaus with nonempty inte-
rior in situations where non-minima points do not have a neigh-
bour of lower value. Moreover, nodes may be located at different
topographic distances from the node of interest. Hence, an addi-
tional ordering relation between such points is required. To ad-
dress these issues, we propose using Shepards method [32] to
construct the set of neighbours of a point P. Shepard dened two
metrics:
(a) Arbitrary distance metric:
This metric is to compute the geodesic distance to the lower
boundary of the plateau. All points located within radius r from p
are considered in determining the SDP. The calculation of this met-
ric is computationally simple, but there is a possibility that there
are zero or excessive number of neighbours within radius r.
(b) Arbitrary number metric:
Just the nearest n neighbours are included in determining the SDP.
This metric disregards the spacing and relative location of the
points and necessitates complicated ranking process and expensive
searching for points. Moreover, it accepts a xed number, n, of
neighbouring nodes as optimal.
A combination of the two metrics captures their advantages. An
initial search radius r is established depending on the total number
of points. On average, a maximum of seven points was dened to
limit the complexity and the amount of computation required. r
is dened as:
pr
2
7A
N
were A is the area of the minimum bounding box enclosing all
points. The authors of [33] present a detailed study about the suit-
ability of this inclusion function for WSNs.
5.2. Incorporate the hop count in determining the SDP
A path from location x to location y on a terrain is a descending
path if the value of a point p never increases as we move p along
the path from x to y. If p always moves through the neighbour with
the smallest value, then the path from x to y is the SDP. It is possi-
ble to have several descending paths from p to its local minimum,
the selection among them is mostly nondeterministic or imple-
mentation dependent. However, when the Watershed algorithm
is applied in WSNs applications, a computationally efcient
descending path is preferred. The authors of [34] identied com-
munication as the most energy expensive operation on a sensor
node; they calculated the energy consumption and found that a
general-purpose processor could execute 3 million instructions
for the same amount of energy used to transmit 1 Kbit of data by
radio over 100 m. The communication cost is dependent on the
transmission distance, number of intermediate communication
hops, link quality, and other elements. Hop count is a metric that
is commonly utilised by routing protocols to determine the dis-
tance between two hosts or for organising nodes in resource-ef-
cient clusters (e.g., [35]). The energy spent in communication is
relative to the square of the communication distance between
the sending and receiving hosts. The distance between two nodes Fig. 3. Image segmented into two catchment basins.
M. Hammoudeh, R. Newman/ Information Fusion xxx (2013) xxxxxx 5
Please cite this article in press as: M. Hammoudeh, R. Newman, Information extraction from sensor networks using the Watershed transform algorithm,
Informat. Fusion (2013), http://dx.doi.org/10.1016/j.inffus.2013.07.001
can be estimated on the basis of incoming signal strengths or di-
rectly using the global positioning system. The energy expenditure
is also proportional to the scale of the network [36]. Consequently,
the lower communication distances should be considered as a
function of the inter-sensor Euclidean distance and the number
of intermediate communication hops. The resulting distance func-
tion can be dened as the summation of the single inter-node
Euclidean distances divided by the hop count.
Distp; p
0
distp
i
; p
j
hc
where i, j < hc 1 and hc is the hop count.
6. Segments as a scoping mechanism
In the wider body of literature [49], most existing program-
ming abstractions are a part of the query language used, thus lim-
iting its applicability. In fact, system developers are concerned
with the application logic as well as specifying the nodes to partic-
ipate in a specic task and how to communicate with them. The
benets of restricting the number of nodes participating in a task
are to reduce energy and bandwidth consumption along with
improving information accuracy by removing task-irrelevant
nodes from the task computation. For instance, querying tempera-
ture data over the whole network might end up getting the amount
of data that is linked with the density of sensors but not necessarily
pertinent to the monitored phenomena.
A key contribution of the work presented here is the creation of
logical groups of nodes within the network, called segments. These
segments are used to constrain dissemination of queries to a sub-
set of the most-relevant nodes. Thus, each query is sent to the
smallest possible number of nodes, and per-node query dissemina-
tion overhead does not grow with network size. Segments can also
be useful to many services other than information extraction, e.g.,
the steepest path can be used by the routing service to determine
energy efcient routes.
The user can exploit segments as a component of a query, or
segments can be used by the system to autonomously make deci-
sions on where a query should be disseminated. The former
requires the extension of existing query languages to incorporate
the segment construct and the development of an associated
parser for the language. The building of query execution abstrac-
tion as part of the used query language is beyond the scope of this
paper. In the latter, segments form a vital factor in making a deci-
sion on where a query should be disseminated, how the query will
be processed within and across segments, and where the query an-
swer will be nally resolved.
In the absence of dedicated query constructs, the concept of
scoping in the context of WSNs macroprogramming presents a
practical solution. Our Watershed-based approach allows the
developers to reason about the network as a whole, instead of
viewing the network as a set of isolated nodes. Segments allows
dynamic complex interactions between system portions, which re-
sults in less programming effort, reduced complexity, and more
reliable code. In a segmented network, query dissemination and
the generation of query running plans are based on Watershed-de-
ned (distance and sensor value) neighbourhood instead of the
unreliable wireless communication range. Watershed segments
are dynamically updated at low cost. This is due to the localised
independent processing of segment membership. When the cur-
rent state of the node or the state of a node in its steepest path
changes signicantly, the node initialises a local update process
independently from other nodes. Local computation results in re-
duced calculation times and costs.
The Watershed algorithm groups sensor node segments, such
that each segment is homogeneous with respect to some property.
For instance, a segment may be a collection of nodes that are geo-
graphically adjacent and they share a common temperature read-
ing. When performing segmentation, the Watershed uses
information about node location, its sensory values, and informa-
tion about the boundaries of segmented nodes. Hence, the Wa-
tershed algorithm integrates the topological (e.g., [16,17]) and
data-centric (e.g., [37,38]) group abstraction methods to eliminate
their drawbacks. This property allows for obtaining accurate
results and presenting several local behaviours effectively, making
it possible to develop programs that articulate higher-level behav-
iour further than that of the standard query-based approaches. To
address this emerging requirement additional programming
abstractions to distribute the computational load while maintain-
ing results effectiveness are essential. The logical segments
produced by the Watershed algorithm substitute the rigid geo-
graphical neighbourhood offered by the radio range with applica-
tion-based, higher level abstractions. Such segments are
generated in a way that the logical notion of proximity is dened
declaratively and dynamically according to nodes properties, be-
side restrictions on communication costs (particularly the width
of the segment). A network segment created using the concept of
logical neighborhood specied by applicative conditions is, thus,
able to extract requested information with high delity.
All nodes within the same segment are automatically labelled
with an integer identier called marker. The Watershed algorithm
is supplied with membership templates. These templates encapsu-
late a set of logical constraints that a node has to satisfy to belong
to one or more segments. Templates are also used to dynamically
update and maintain segment membership. Such segments give
system programmers the ability to manipulate logical segments
rather than isolated nodes or groups of adjacent nodes. Yet, pro-
grammers are still capable of manipulating individual nodes or
broadcast regions, but they may indicate declaratively the part of
the network to involve and therefore manage the scope of a query
to reduce computation cost.
7. Parallel asynchronous watershed implementation
This section, presents our proposed reformulation of the Wa-
tershed algorithm that complies with the sensor nodes constraints,
i.e., limited memory, energy, processing speed, and bandwidth. Pre-
viously published Watershed algorithms required at least three glo-
bal synchronisation points: minima detection; labelling; and
ooding. This work describes a newfully Parallel and Asynchronous
Watershed algorithm(for short called PA Watershed), in which sen-
sor nodes perform local computations independently and in paral-
lel without a global synchronisation point. This is of particular
importance when applied to in-network processing in WSNs, where
the problem of synchronisation is exacerbated by slow and unreli-
able data communication; and mitigation using sophisticated syn-
chronisation protocols is difcult due to the constraints over
communication bandwidth given by power consumption.
To apply the PA Watershed to WSNs segmentation, for a three-
dimensional image, each pixel is considered as a sensor node. In
this representation, the numerical value of each pixel determines
the corresponding sense modality of a node located at position
(x, y). The image resolution corresponds to the network density,
i.e., the number of nodes per area unit. To simplify the implemen-
tation of the PA Watershed algorithm, we assume a virtual grid is
formed throughout the deployed network. Each node is assigned to
a virtual grid cell and each cell can contain only one node.
A crucial aspect of the design of the PA Watershed algorithm is
to exploit the data locality to reduce the algorithms processing and
communication cost. The PA Watershed algorithm is based on the
6 M. Hammoudeh, R. Newman/ Information Fusion xxx (2013) xxxxxx
Please cite this article in press as: M. Hammoudeh, R. Newman, Information extraction from sensor networks using the Watershed transform algorithm,
Informat. Fusion (2013), http://dx.doi.org/10.1016/j.inffus.2013.07.001
rain falling simulation model because it involves less communica-
tion overhead, compared to immersion-based approaches which
are global in nature. The PA Watershed algorithm may be regarded
as an asynchronous relaxation of the Hill Climbing algorithm [30].
This algorithm is similar to the rainfall simulation except that it be-
gins from the minima and climbs by the steepest ascending slope.
In this algorithm, most computations can be viewed as s collection
of independent tasks that can be run in parallel on different sensor
nodes. Each node runs a simple nite state machine linked to every
individual sensed modality. Moreover, nodes use non-blocking
communication to improve performance by overlapping computa-
tion and communication. This approach avoids the need of any glo-
bal synchronisation among sensor nodes and results in a very high
computational efciency and running speed of the algorithm.
The PA Watershed algorithm labels each node with the identi-
er of its catchment basin by walking along the SDP towards to
a regional minimum. Primarily, all nodes are marked by initial
temporary labels and assumed to be non-minima nodes. Each node
starts searching for the steepest descent among its neighbours. If a
steepest descent is detected, then the node will be relabelled to
match the label of its corresponding steepest neighbour. When a
regional minimum is nally reached, its label is propagated up to
all nodes along the path. The sequence of nodes along the path will
be relabelled to mark the association to that regional minimum,
hence concluding this search process. Because the search process
is executed and terminated autonomously by individual nodes,
the total idle time and communication overhead is reduced signif-
icantly. In situations where a regional minima cannot be reached,
e.g., at zone, the steepest descent must be calculated from the
complete set of lower borders.
Algorithm 1. The proposed PA Watershed algorithm.
Input state S(u) of node u; current sensor reading d(u); d(v)
sensor readings of neighbours
Output segment membership after processing new data
case (S(u) = initial)
u broadcasts d(u) and label l(u) to its neighbours N(v)
u listens for data from every neighbour v
i
2 N(v)
u computes the following:
N(v)
=
all neighbours with sensor value equal to d(u)
N(v)
P
all neighbours with sensor reading greater than
d(u)
N(v)
Ln
= v
i
a unit set where slope(u, v
i
) = LS(u) and v
i
2 N(v)
if N(v)
Ln
=
lv min
v2Nv
lv
S(u) MP
else
S(u) NP
case (S(u) = MP)
u listens for new data, h
0
(v), from any neighbour
if h
0
(v) < h(u)
N(v)
Ln
v;S(u) NP
else
l(u) min(l(u), l(v)); S(u) MP
case (S(u) = NP)
u waits for data from N(v)
Ln
l(u) min(l(u), l(v))
S(u) NP
In all cases, u sends any new reading >T to all nodes in N(v)
P
In multi-modal sensor networks, a number of paths from a sin-
gle source to multiple destinations exists. Each path represent a
different sense modality. Sensor nodes are assigned a separate la-
bel for each sense modality. The search process for the regional
minima associated with each sense modality can be combined in
one step. This does not add extra cost on the nodes, or on the net-
work in general, due to the broadcast nature of wireless communi-
cation. In practice, a node sensor reading can change frequently
necessitating relabeling all nodes above it in the SDP with the
new minimum label. A threshold can be dened to avoid unneces-
sary updates resulting from insignicant changes in sensor read-
ing. The segment membership update process can be merged
with other communication tasks, e.g., data transmission, to reduce
membership management overhead. Depending on the application
requirements, the membership update for various sense modalities
can be aggregated in one step. To keep track of segment member-
ship update along the SDP, a ag (called reset) is dened at each
node to monitor whether a change took place since the last data
communication.
The PA Watershed does not require frequent synchronisation
among nodes. It takes advantage of the asynchronous computa-
tions at the programming level and for its implementation. The
minima detection, labelling, and climbing of the SDPs run on all
nodes asynchronously and in parallel. An important shortcoming
of the PA Watershed is that inner nodes of a plateau cannot deter-
mine independently if they belong to non-minimum plateau (NP)
or minimum plateau (MP). In its current form, the PA Watershed
requires global synchronisation to recognise and label MPs. To
overcome this problem, all inner nodes of a plateau are classied
into two groups: plateau and minimum. The plateau is the group
of nodes whose neighbours have a sensor value of equal or greater
value than its sensor value. The minimum contains every node of
the plateau that has a neighbour with a sensor value less than its
own sensor value. Nodes in the minimum serve as seeds for prop-
agating their labels to inner nodes on the plateau. The labels prop-
agate to progressively ll the entire plateau.
Algorithm 1 shows the pseudocode of the PA Watershed algo-
rithm, which runs on every node. Regardless of the network size,
each node only communicates with a limited number of neigh-
bours located within its radio range. Every node will learn only
the next hop, not the complete hop-by-hop route, to the minima.
The algorithm starts by each node broadcasting its ID, its current
state, as well as its sensors readings and corresponding labels
(d
i
(u), l
i
(u)) to its neighbours. Then, nodes use the gathered infor-
mation to group their neighbours in three sets: N(u)
=
contains all
neighbours with sensor values equal to d(u); N(u)
P
is the set of
all neighbouring nodes with sensor readings greater than d(u);
N(u)
Ln
= v
i
is a unit subset of !, where !(u) = {v
i
2 N(u)
slope(u, v
i
) = LS(u). If !(u) has two or more nodes, then the nearest
node to (u) is chosen from!(u), if the set is empty, then the node is
on a MP. The node will keep monitoring changes in neighbours
data. When the node current state is minima or plateau and re-
ceives new data such that h
0
(v) < h(u), then it changes its state to
NP; otherwise, a representative label is calculated for that plateau
or minima. The representative label is that of the node with the
smallest reading in the plateau. This means that all neighbours of
a minimum node with equal reading to that node, relate to one
connected component. A connected component of an undirected
graph is a subgraph such that any pair of vertices are connected
by paths and are connected to no additional vertices [39]. When
a non-minimum plateau node receives a new data message, it up-
dates its label to the minimal label of a minimum plateau. There-
fore, all changes are dealt with locally and independently from
other nodes, which keeps the cost of dynamic segmentation very
low. Nodes exploit the broadcast nature of wireless communica-
tion in updating their state or label. When a node forwards the data
of its predecessors or when it hears neighbours broadcast mes-
sages, it can use these messages to update its label. To increase
the algorithm efciency, a threshold T is dened to indicate
M. Hammoudeh, R. Newman/ Information Fusion xxx (2013) xxxxxx 7
Please cite this article in press as: M. Hammoudeh, R. Newman, Information extraction from sensor networks using the Watershed transform algorithm,
Informat. Fusion (2013), http://dx.doi.org/10.1016/j.inffus.2013.07.001
signicant changes that need to be reported to neighbours. Since
data messages can be captured by nodes due to the broadcast nat-
ure of wireless communications, nodes will also use the threshold
to decide locally whether they should start an update process or
not.
8. Query resolution mechanism in PA Watershed
The PA Watershed divides the feature space into distinct seg-
ments returning a hierarchical description of the data in the form
of a tree. Moreover, it provides each sensor with a path, i.e., the
SDP, over which quires and their answers can be transmitted to
the querying node. SDPs can be exploited by queries to explore
sensors data starting at the root and going only as deep as is nec-
essary to provide answers with the desired quality. This greedy as-
cent algorithm is reasonably straightforward; the query is initiated
at the root of the tree, i.e., at the local minimum, and every node in
the segment makes decisions independently based on the satis-
ability of the query conditions by its local data. If an ascent is nec-
essary, the query is forwarded to the next node on the SDP.
A traversal of the SDP using our greedy algorithm will stop at
nodes at various levels that all satisfy the query. Let the section S
of the SDP be the set of nodes at which the query stops. Then,
the structure of the tree above S, i.e., all nodes that have a prede-
cessor in S, is irrelevant to the cost of query, as the query traversal
will never ascend that far. In this simple version of the query res-
olution method, if a decision is made to forward, then only the next
child will receive the query. More elaborate schemes can be de-
signed to assign different priorities to children and perform more
complex selective forwarding.
Consider the following example to explain the mechanism for
resolving a specic multi-modal query: Return the temperature
at each sensor node, where the air pressure is less than 950mbar.
This query is expressed using simple inclusion predicate, which is a
condition of the form current reading > threshold. We call such
predicates, whose evaluation depends also on the readings taken
by the nodes, dynamic predicates as they specify which nodes
should include their response in the query, i.e., nodes whose values
exceed a given threshold.
First, a query is propagated from the sink toward relevant local
minima by end-to-end routing. A minimum being dened by some
criterion, for instance that the pressure 6950. Once the set of seg-
ments to sample from has been selected, data must be queried
from the subtree rooted at local minimum. The choice of the data
point can be made by problem of ascending up the subtree until
a node that is qualied to reply the query is found. After the query
traverses a node with pressure reading P950 or when the query
reaches the leaf node, sensor nodes with pressure readings falling
into the query window recursively report their temperature read-
ings to the nodes from which they received the query. Partial re-
sults on every SDP are returned level by level up the tree until
reaching the root node. Whenever possible, data is aggregated as
query responses move down the tree and sent in the same data
packet. After aggregating the data from all SDPs, the aggregated re-
sult is returned back to the sink node, again by an end-to-end rout-
ing protocol.
9. Evaluation
In this section, the efciency of the PA Watershed produced
programming abstraction is evaluated in comparison with more
traditional query resolution methods, in-network and centralised
processing, as a baseline. Moreover, in the absence of recent di-
rectly applicable approaches to compare against, we choose the
Logical Neighborhoods [9] approach. To this end, we implemented
PA Watershed and Logical Neighborhood abstractions on top of
MuMHR [35] routing protocol and evaluated them using the Dingo
[40] simulator. In our experiments, programming abstractions are
used as a query scoping mechanism to constrain dissemination of
queries so that the entire network is not ooded.
9.1. Dingo WSNs simulator
Dingo provides tools for the simulation and deployment of
high-level, Python code on real WSNs. Dingo users write high level
simulations, which can then be rened to various hardware plat-
forms (e.g., Gumstix [41]). In addition, Dingo allows mixed mode
simulation using a combination of real and simulated nodes. The
authors found Dingo not only easy to use, but also powerful en-
ough to model and simulate the behaviour of systems at various
design stages. In Dingo, nodes have the ability to obtain their
sensed data from a database or graphical objects like maps. The
use of real-world data improves the delity of simulations as it
makes it possible to check the simulation results against the real
data. Furthermore, Dingo has several features in the form of
plugins. These can be activated/deactivated on the plugin menu.
Network topologies can be loaded and saved. The Topology menu
can be used to change the network topology of a simulation from a
random topology to/from a grid.
9.2. MuMHR routing protocol
In our experiments we make use of MuMHR (Multi-hop Multi-
path Hierarchal Routing) because it has proven to be an energy ef-
cient and robust routing protocol. MuMHR is already implemented
in Dingo, which allowed the authors to direct the implementation
efforts on the programming abstractions.
9.3. Simulation setup and scenario
In all simulations, we congured a number of parameters based
on the Crossbow [42] wireless sensor hardware platform. Concur-
rently, we tried to conform to values set in the synthetic scenario
in [9] for better comparison. Nodes were dispersed over the
monitored region such that no two nodes share the same location
and the transmission range of each node is bound to 75 m. The
bandwidth of the channel was set to 1 Mbps, each data message
was 500 bytes long, and the packet header for various message
types was xed to 30 bytes. A simple model for radio hardware en-
ergy dissipation is also assumed. The transmitter dissipates energy
to run the radio electronics and power amplier; and the receiver
dissipates energy to run the radio electronics. All the nodes were
given an equal initial supply of energy. The processing delay for
transmitting a message is randomly chosen between 0 and 5 ms,
simulating real-world characteristics of low-power radio
transmission.
The simulated scenario is the deployment of a WSN to monitor
the temperature in an environmentally sensitive area. Each node is
initially congured with a single attribute, i.e., temperature. The
nodes temperature reading is perceived as the colour intensity
corresponding to its position on a thermal map. All simulations
employ a thermal map, Fig. 4(a), taken from [43]. This thermal
map is fed in the simulator as an image at the system startup. To
simulate dynamic conditions, a list of events in terms of signicant
increase of temperature readings on some nodes was congured.
The set of nodes at which these events are to occur was dened,
initially nondeterministically, before the start of each simulation
run. In each run, between 5 and 10 events occur at predened
intervals. An event occurrence is followed by a query to locate
the lowest and highest temperature readings in the entire network.
Each simulation run lasted 1000 s.
8 M. Hammoudeh, R. Newman/ Information Fusion xxx (2013) xxxxxx
Please cite this article in press as: M. Hammoudeh, R. Newman, Information extraction from sensor networks using the Watershed transform algorithm,
Informat. Fusion (2013), http://dx.doi.org/10.1016/j.inffus.2013.07.001
The PA Watershed algorithm has been implemented starting
from the Python code of SciPy Cookbook available at [44]. Each
node runs a copy of Algorithm 1 until stabilisation. Then, a request
(query) is made to the sink node using a simple graphical user
interface. Dingo provides a chart pane where individual node data
or network data can be plotted during the simulation.
Due to the effect that network density and nodes distribution
have on the performance analysis results, multiple simulation runs
are combined to estimate uncertainties in the simulations. In other
words, to demonstrate that the results are not biased to specic
network setup, we ran the same experiments for 5 different distri-
butions. This makes our simulation a Monte Carlo simulation, as
repeated sampling from a distribution is performed. Therefore, in
each simulation run, network nodes were randomly positioned in
the simulation window. At each point, the measured metric of ve
runs was averaged.
9.4. Performance metrics, results and analysis
Query resolution is implemented in four algorithms: PA Wa-
tershed, segments are used as high-level programming model to
support query resolution; Logical Neighborhood, the network is
partitioned into logical groups as described in [9]; In-network pro-
cessing, by data aggregation up the network spanning tree; and
centralised, by collecting and analysing all data at the sink.
Fig. 4(b) shows the network segmentation results using the PA
Watershed algorithm. It is straightforward to extend the segmen-
tation results of the thermal map using the generalised Voronoi,
i.e., every location where there is no sensor node is allocated to
its closest segment, Fig. 4(c). Since the most power consuming
operation is wireless communication, a query resolution cost is
measured as the trafc overhead generated to answer the query.
Fig. 5 shows the communication overhead generated by each of
the four studied query resolution methods. The results indicate
that on average, segment-based query processing resulted in an al-
most 79.5% decrease in the total number of transmissions over the
centralised approach, 66% decrease on average over the in-network
processing approach, and 40% decrease on average over the Logical
Neighborhood approach.
Centralised query collection incurs the highest messaging over-
head amongst the four tested approaches. This is because the query
is received by every node in the network. If a node is carrying data
relevant to the posed query, then it sends it to the base station
through multi-hop communication.
In-network data aggregation attempts to limit the number of
exchanged messages while providing accurate answers to posed
queries. Sensor nodes form an aggregation tree, where parent
nodes aggregate the data received from their children and forward
the result to their own parents. However, similar to centralised
query resolution, in-network data aggregation query resolution
does not provide a mechanism for isolating nodes carrying data
irrelevant to a query. Some queries include the characteristics of
nodes to participate in the response; however, the query still has
to be received by all nodes in the network. Moreover, when the
query is dealing with multiple elements targets, the size of col-
lected data can potentially be excessive. This may require the frag-
mentation of data to be transmitted through multiple messages,
which results in dramatic increase in the number of transmissions
as the data moves towards the base station.
It is clear from Fig. 5 that Watershed-based query resolution
outperforms Logical Neighborhood in terms of messaging
overhead. The separation of the Logical Neighborhood from its
expressly devised routing protocol resulted in an increased com-
munication overhead. This is because of incorporating the proto-
col-dependent communication cost in the group membership
function. As a result, the span of communication also changes in
an undesirable manner. Watershed segment abstraction leveraged
localised interactions without pre-installing neighborhood
templates.
The number of nodes that participated in answering a specic
query has impact on the total response time, energy consumption,
and accuracy of the response to that query. Eliminating nodes car-
rying information unrelated to a specic query reduces energy
consumption and data analysis time as well as improves the re-
sponse accuracy. Fig. 6 presents the number of nodes participated
in resolving a query at various network densities using the PA Wa-
tershed and the Logical Neighborhoods algorithms. A node is
counted as involved in generating a query if it sends or forwards
the query message or a response message associated to this query.
The results show that on average, segment-based query processing
resulted in an almost 24.5% decrease in the total number of nodes
that participated in answering the query over the Logical Neigh-
borhoods approach at an identical accuracy level. The exclusion
of non-relevant nodes to the query initial search allows drilling
down into the exact response; this newcapability is a positive indi-
cator of the scalability of the approach. Watershed leverages the
knowledge of the information distribution, which makes it more
feasible than the Logical Neighborhoods and other traditional ap-
proaches. In the Watershed-based approach, the neighbours set
is built using the steepest paths metric, which provides an efcient
way to sort nodes according to the data they carry. Accordingly,
many query types such as maximum, minimum, or average are re-
solved involving small portions of the network. Moreover, the
Shepard distance metric adopted in the Watershed algorithm helps
to enhance localised interactions by limiting the span of the neigh-
bourhood. The combined Shepard neighbourhood metric resulted
in better results than the cost construct dened in the Logical
Neighborhood. In the latter, cost is measured in credits on a per-
node basis without considering the local and global node density.
Thus, our approach performed better in determining the network
sections to participate in resolving the posed query and how to
reach them.
Fig. 4. (a) Test thermal map; (b) PA Watershed segmentation results; and (c)
Extend segmentation results.
Fig. 5. Comparison of the messaging overhead involved in resolving the query at
different network densities.
M. Hammoudeh, R. Newman/ Information Fusion xxx (2013) xxxxxx 9
Please cite this article in press as: M. Hammoudeh, R. Newman, Information extraction from sensor networks using the Watershed transform algorithm,
Informat. Fusion (2013), http://dx.doi.org/10.1016/j.inffus.2013.07.001
Fig. 7 shows the convergence time for the PA Watershed and the
Logical Neighborhoods averaged across multiple runs at different
network densities. It can be seen that Watershed-based segmenta-
tion always generates better quality logical groups without incur-
ring an unacceptable increase in convergence time (30%)
compared to the Logical Neighborhoods. This low convergence
time is due to its asynchronous operation, which makes group
segmentation a fast process. Also, the time required to update seg-
ments is small, because each node is only dependent on the set of
nodes on its path to the minima.
10. Conclusion
This paper advocates PA Watershed segments as a high-level
programming abstractions for query-based information extraction
systems. It presents network segmentation as a possible effective
solution to the problem of the limitations of pure query-based sys-
tems. The logical constructs provided by Watershed offer an effec-
tive method for processing user queries. Segments provide a useful
abstraction for the low level programming of nodes (the establish-
ment of network topology and of radio communications) while still
providing users with the ability to extract high-level information
from the network. The advantages of Watershed segments include:
dynamic construction and updating of segments; eliminating the
need for compilers or new programming constructs; catering for
node level communication; localised computation; segment setup
considers the physical topology provided by wireless broadcast;
and the span of the segment improves response accuracy while
reducing the processing cost.
Experimental results presented in Section 9 show that in-net-
work, segment-based query resolution results in substantial
energy savings compared to the baseline and Logical Neighborhood
schemes. The PA Watershed groups nodes into logical energy ef-
cient segments that minimise unnecessary communication and en-
hance queries responses accuracy. Compared to the original
Watershed algorithm, the key enhancement in the PA Watershed
is that travelling along the steepest paths and labelling are concur-
rently and locally ran according to nodes states, throughout the
complete segmentation process. A number of shortcomings so far
in the work can be dealt with in the future, for instance the de-
tailed study of segmentation setup and maintenance cost.
11. Future work
Leading directly fromthe work in this paper, there are a number
of avenues that need to be followed. Most importantly, investigate
the usefulness of the Watershed gradients to various network and
application functions. For instance, in agent-oriented WSN system
such as [12], the gradient can be used for itinerary planning of the
mobile agents. The order in which sensor nodes are visited by the
mobile agent and the number of nodes it migrates to can have a
signicant impact on energy consumption. Similarly, the gradient
can be used in generating more efcient query execution plans,
i.e., the number of nodes to process a query and the order in which
a query reaches these nodes. For example, if the temperature at
location (x, y) is 50 C, then nearby locations temperature should
be correlated with that based on distance. The presented work uses
this natural gradient as a key characteristic to forward the query
towards the heat source. To select an optimal subset of sensor
nodes and to decide on an optimal order of how to incorporate
these measurements can be equally well supported by other meth-
ods such as exploiting the relationships among multi-modal sense
data. As information about a particular event of interest is usually
captured in multiple sensed modalities, then, it is feasible to ex-
ploit multi-modal relationships to merge multi-modal query dis-
semination and data collection for different modalities together.
Mathematically, this can be viewed as calculating a gradient that
is the derivative of a multi-variable function. However, now that
there are multiple directions to consider; the direction of gradient
is no longer simply up or down along the x-axis, like it is with
functions of a single modality. The Watershed gradients can also be
useful to WSNs applications such as ood management. The gradi-
ent can be utilised to create high-risk ood maps to predict, report,
and control oods. These maps contain answers to queries relevant
to the such tasks; for example, when rain falls on a catchment,
what is the amount of rainwater that reached the waterways?
We also intend to investigate possible ways to exploit the gra-
dient to give meaning to the collected data. For instance, in forest
re monitoring applications, the gradient at each point shows the
direction the temperature rises most quickly. The magnitude of the
gradient determines how fast the temperature rises in that direc-
tion. This information can be useful for reghting operations. By
taking a dot product, it is possible to measure how temperature
changes in directions other than the direction of greatest change.
Moreover, the gradient can be used as a visualisation tool similar
to a vector map. It can be used to visually display the location of
senor nodes as well as the direction and magnitude of data. This
data representation can be overlaid over geographic maps to allow
users from different communities to derive conclusions based on
visual information. This visualisation and analysis tool is visually
communicative, it provides information on spatial patterns, it im-
plies the distributions and states, and it implies the relations of
various phenomena.
References
[1] B. Liu, H. Ju, Y. Yao, Object recognition and centroid detection based on
machine vision, in: Second International Conference on Mechanic Automation
and Control Engineering, 2011, pp. 59455947.
Fig. 6. The number of nodes responded by sending their data to resolve the query.
Fig. 7. Watershed and Logical Neighborhoods convergence time with logarithmic
regression.
10 M. Hammoudeh, R. Newman/ Information Fusion xxx (2013) xxxxxx
Please cite this article in press as: M. Hammoudeh, R. Newman, Information extraction from sensor networks using the Watershed transform algorithm,
Informat. Fusion (2013), http://dx.doi.org/10.1016/j.inffus.2013.07.001
[2] F2F, RFID from Farm to Fork, 2013. <http://www.rd-f2f.eu/> (accessed
15.03.13).
[3] L. Vincent, P. Soille, Watersheds in digital spaces: an efcient algorithm based
on immersion simulations, IEEE Trans. Pattern Anal. Mach. Intell. 13 (6) (1991)
583598.
[4] A. Pathak, V.K. Prasanna, Energy-efcient task mapping for data-driven sensor
network macroprogramming, IEEE Trans. Comput. 59 (7) (2010) 955968.
[5] T.W. Hnat, T.I. Sookoor, P. Hooimeijer, W. Weimer, K. Whitehouse, A modular
and extensible macroprogramming compiler, in: Proceedings of the 2010 ICSE
Workshop on Software Engineering for Sensor Network Applications, SESENA
10, 2010, pp. 4954.
[6] A. Pathak, M.K. Gowda, Srijan: a graphical toolkit for sensor network
macroprogramming, in: Proceedings of the 7th Joint Meeting of the
European Software Engineering Conference and the ACM SIGSOFT
Symposium on The Foundations of Software Engineering, ESEC/FSE 09, 2009,
pp. 301302.
[7] T.I. Sookoor, T.W. Hnat, K. Whitehouse, Programming cyber-physical systems
with macrolab, in: Proceedings of the 6th ACM Conference on Embedded
Network Sensor Systems, SenSys 08, 2008, pp. 363364.
[8] R. Newton, G. Morrisett, M. Welsh, The regiment macroprogramming system,
in: Proceedings of the 6th International Conference on Information Processing
in Sensor Networks, IPSN 07, 2007, pp. 489498.
[9] L. Mottola, G.P. Picco, Using logical neighborhoods to enable scoping in
wireless sensor networks, in: Proceedings of the 3rd International Middleware
Doctoral Symposium, 2006, pp. 612.
[10] L. Mottola, G.P. Picco, Programming wireless sensor networks: fundamental
concepts and state of the art, ACM Comput. Surv. 43 (3) (2011). 19:119:51.
[11] F. Bellifemine, G. Fortino, R. Giannantonio, R. Gravina, A. Guerrieri, M. Sgroi,
Spine: a domain-specic framework for rapid prototyping of WBSN
applications, Softw. Pract. Exp. 41 (3) (2011) 237265. http://dx.doi.org/
10.1002/spe.998.
[12] F. Aiello, G. Fortino, R. Gravina, A. Guerrieri, A java-based agent platform for
programming wireless sensor networks, Comput. J. 54 (3) (2011) 439454.
http://dx.doi.org/10.1093/comjnl/bxq019.
[13] Oracle Labs, Small Programmable Object Technology (sun spot), 2012. <http://
www.sunspotworld.com/> (accessed 26.09.12).
[14] N. Raveendranathan, S. Galzarano, V. Loseu, R. Gravina, R. Giannantonio, M.
Sgroi, R. Jafari, G. Fortino, From modeling to implementation of virtual sensors
in body sensor networks, IEEE Sens. J. 12 (3) (2012) 583593. http://dx.doi.org/
10.1109/JSEN.2011.2121059.
[15] D. Roggen, C. Lombriser, M. Rossi, G. Trster, Titan: an enabling framework for
activity-aware pervasive apps in opportunistic personal area networks,
EURASIP J. Wirel. Commun. Netw. (2011) 1:11:22. http://dx.doi.org/
10.1155/2011/172831.
[16] K. Whitehouse, C. Sharp, E. Brewer, D. Culler, Hood: a neighborhood
abstraction for sensor networks, in: Proceedings of the 2nd International
Conference on Mobile Systems, Applications, and Services, 2004, pp. 99110.
[17] M. Welsh, G. Mainland, Programming sensor networks using abstract regions,
in: Proceedings of the 1st Conference on Symposium on Networked Systems
Design and Implementation, vol. 1, 2004, pp. 33.
[18] R. Gummadi, N. Kothari, R. Govindan, T. Millstein, Kairos: a macro-
programming system for wireless sensor networks, in: Proceedings of the
Twentieth ACM Symposium on Operating Systems Principles, SOSP 05, 2005,
pp. 12.
[19] T. Abdelzaher, B. Blum, Q. Cao, Y. Chen, D. Evans, J. George, S. George, L. Gu, T.
He, S. Krishnamurthy, L. Luo, S. Son, J. Stankovic, R. Stoleru, A. Wood,
Envirotrack: Towards an environmental computing paradigm for distributed
sensor networks, in: Proceedings of the 24th International Conference on
Distributed Computing Systems (ICDCS04), 2004, pp. 582589.
[20] L. Mottola, G.P. Picco, Logical neighborhoods: a programming abstraction for
wireless sensor networks, in: Proceedings of the Second IEEE International
Conference on Distributed Computing in Sensor Systems, Springer-Verlag,
Berlin, Heidelberg, 2006, pp. 150168.
[21] V. Osma-Ruiz, J.I. Godino-Llorente, N. Senz-Lechn, P. Gmez-Vilda, An
improved watershed algorithm based on efcient computation of shortest
paths, Pattern Recognit. Lett. 40 (2007) 10781090.
[22] C. Kuo, S. Odeh, M. Huang, Image segmentation with improved watershed
algorithm and its FPGA implementation, in: The 2001 IEEE International
Symposium on Circuits and Systems, vol. 2, 2001, pp. 753756.
[23] A. Bleau, L.J. Leon, Watershed-based segmentation and region merging,
Comput. Vis. Image Underst. 77 (2000) 317370.
[24] A. Bieniek, A. Moga, An efcient watershed algorithm based on connected
components, Pattern Recognit. Lett. 33 (6) (2000) 907916.
[25] V. Grau, A.U.J. Mewes, M. Alcaniz, R. Kikinis, S.K. Wareld, Improved
watershed transform for medical image segmentation using prior
information, IEEE Trans. Med. Imaging 23 (4) (2004) 447458.
[26] H. Sun, J. Yang, M. Ren, A fast watershed algorithm based on chain code and its
application in image segmentation, Pattern Recogn. Lett. 26 (2005) 1266
1274.
[27] C. Rambabu, T. Rathore, I. Chakrabarti, A new watershed algorithm based on
hillclimbing technique for image segmentation, in: Conference on Convergent
Technologies for the Asia-Pacic Region, vol. 4, 2003, pp. 14041408.
[28] M. S