0% found this document useful (0 votes)
3 views

Neural Network Tomography

Neural Network Tomography

Uploaded by

Waqas Akhter
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Neural Network Tomography

Neural Network Tomography

Uploaded by

Waqas Akhter
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

1

Neural Network Tomography


Liang Ma, Member, IEEE, Ziyao Zhang, Student Member, IEEE, and Mudhakar Srivatsa, Senior Member, IEEE

Abstract—Network tomography, a classic research problem in of ICMP/data packets. In risk-sensitive applications, security
the realm of network monitoring, refers to the methodology of policies may even block such measurements.
inferring unmeasured network attributes using selected end-to- Alternatively, the end-to-end approach provides a solution
end path measurements. In the research community, network
tomography is generally investigated under the assumptions that does not require the cooperation of internal network
of known network topology, correlated path measurements, elements or the equal treatment of control/data packets. It
bounded number of faulty nodes/links, or even special network relies on end-to-end path performance metrics (e.g., path
protocol support. The applicability of network tomography is delays or bandwidths) experienced by data packets to infer
considerably constrained by these strong assumptions, which the unknown network information using network tomography.
arXiv:2001.02942v1 [cs.NI] 9 Jan 2020

therefore frequently position it in the theoretical world. In


this regard, we revisit network tomography from the practical Formally, network tomography [6] refers to the methodology
perspective by establishing a generic framework that does not of inferring unmeasured network characteristics via measuring
rely on any of these assumptions or the types of performance the end-to-end performance of selected paths.
metrics. Given only the end-to-end path performance metrics As a generic case, network tomography can be used to
of sampled node pairs, the proposed framework, NeuTomog- infer path performance metrics for any unmeasured node pair,
raphy, utilizes deep neural network and data augmentation to
predict the unmeasured performance metrics via learning non- which covers the case of link performance metric inference as
linear relationships between node pairs and underlying unknown links essentially correspond to 1-hop paths. In real networks,
topological/routing properties. In addition, NeuTomography can the path performance metrics of interest can be additive,
be employed to reconstruct the original network topology, which i.e., the combined metric over multiple links is the sum of
is critical to most network planning tasks. Extensive experiments individual link metrics, or non-additive, i.e., the path metric
using real network data show that comparing to baseline solu-
tions, NeuTomography can predict network characteristics and is a non-linear function of the involved link metrics. For
reconstruct network topologies with significantly higher accuracy instance, delays are additive, while a multiplicative metric
and robustness using only limited measurement data. (e.g., packet delivery ratio) can be expressed in an additive
form using function log(·). By contrast, path congestion levels
are determined by the worst performed link on this path, thus
I. I NTRODUCTION non-additive. The general goal of network tomography is to
Accurate and timely knowledge of network states (e.g., utilize the path measurements between a small subset of node
delays and congestion levels on individual paths) is essen- pairs to infer the path performance metrics of the remaining
tial for various network operations such as route selection, unmeasured node pairs.
resource allocation, fault diagnosis, and service migrations. Existing work on network tomography emphasizes extract-
Directly measuring the path quality with respect to (w.r.t.) each ing as much network information as possible from available
individual node pair, however, is costly and not always feasible measurements. However, past experience shows that network
due to the large traffic overhead in the measurement process tomography mostly exists in the theoretical level under strong
and the lack of protocol support at internal network elements assumptions, thus difficult to be applied to real network
for making such measurements [1]. Hence, such limitations monitoring tasks. For example, prior works [7], [8], [9],
motivate the need for indirect approaches, where we infer the [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20],
network states of interest by measuring the performance along [21], [22], [23], [24], [25], [26], [27], [28], [29], [30], [31],
selected paths w.r.t. a subset of node pairs. [32], [33], [34], [35], [36], [37], [38] almost always require
Depending on the granularity of observations, indirect ap- the precise knowledge of the network topology and [39],
proaches can be classified as hop-by-hop or end-to-end ap- [40], [41], [42], [15] even assume that the network topology
proaches. The former rely on special diagnostic tools such follows a tree structure. Unfortunately, in real applications,
as traceroute, pathchar [2], Network Characterization Service network topologies are frequently concealed from others for
(NCS) [3], or Time Series Latency Probes (TSLP) [4] to security reasons. Furthermore, special routing mechanism such
reveal fine-grained performance metrics of individual links by as source routing is needed in [12], [13], [14], [19], [18], [20],
sending active probes. Specifically, Traceroute reports delay [21], [22], [23], [24], [25], [26], [27] and multicast is employed
for each hop on the probed path by gradually increasing the in [10], [39], [40], [41], [42]; however, practical networks
time-to-live (TTL) [5] field of probing packets. Its refinement, usually block the support of such strong routing requirement.
pathchar, returns hop-by-hop delays and loss rates. Later Moreover, regarding additive performance metrics, besides the
advancements, NCS and TSLP, return capacities on each link. end-to-end path performance metric, the nodes/links involved
While providing fine-grained information, the above tools in this path are often required for the performance inferring
require that Internet Control Message Protocol (ICMP) be task. While for non-additive performance metrics, the maxi-
supported at each internal node. Even then, they cause extra mum number of problematic nodes/links is usually imposed
load and suffer from inaccuracies caused by different priorities as a common constraint [28], [29], [30], [31], [32], [33], [34],
2

[35]. Finally, little is known for a solution that simultaneously approaches [40], [1], [48] and multicast [49], [50], [10], [39]
addresses both additive and non-additive tomography prob- are needed to estimate the link metric distributions. When all
lems. but k link metrics are zero, compressive sensing techniques
In this paper, we establish a generic and lightweight to- are used to identify the k non-zero link metrics [51], [52],
mography framework that removes all above assumptions and [47]. With additional assumptions of controllable routing, [53]
constraints, thus applicable to most practical network setting. derives necessary and sufficient conditions on the network
The input to our tomography framework is only a set of topology for identifying all link metrics, given that monitors
end-to-end path measurements w.r.t. some node pairs, and can measure any cycles. A similar study in [54] quantifies
the output is the predicted path performance metrics for all the minimum number of measurements needed to identify a
unmeasured node pairs. For each input data point, the only broader set of link metrics. Moreover, [55], [56], [57] de-
available information is the starting/terminating nodes and velop measuring vantage placement algorithms for performing
their corresponding path performance metric. The proposed efficient path measurements. Since routing along cycles is
framework, called NeuTomography, is based on deep neural typically prohibited, these methods are not widely applicable.
networks [43], which learn the non-linear relationship between If only measuring cycle-free paths are allowed, then [12], [13]
node pairs and their path performance metrics. establish the necessary and sufficient topological conditions
Comparing to existing tomography solutions, NeuTomogra- for link metric determination. Yet, such cycle-free controllable
phy is generic and easily applicable in that it does not require source routing still has limited support in practice.
additional network knowledge or rely on specific performance When the performance metrics are non-additive, additional
metric type (additive or non-additive). Moreover, since the constraints are typically imposed. Under the assumption that
given measured node pairs potentially can be any subset of all multiple simultaneous failures happen with negligible prob-
node pairs in the network, i.e., there may exist measurement ability, [5], [58], [59], [60], [55], [57] target to detect and
bias, we further propose Path Augmented Tomography (PAT) localize the bottleneck in the network. To improve the resolu-
that proactively constructs additional input data by estimating tion in characterizing failures, range tomography in [61] not
the performance bounds of unmeasured node pairs using the only localizes the failure, but also estimates its severity (e.g.,
given path measurements. Extensive experiments via both congestion level). These works, however, ignore the fact that
Rocketfuel [44] and CAIDA’s ITDK [45] network data show multiple failures occur more frequently than one may imagine
that by measuring only 30% node pairs, NeuTomography is [62]. To address the issue of multiple failures, [41], [42], [63],
able to accurately predict the path performance of the rest 70% [64], [65], [66] attempt to find the minimum set of network
node pairs, with the mean absolute percentage error (MAPE) elements whose failures explain the observed path states. In
as small as 2%, and PAT further reduces MAPE by up to a Bayesian formulation, [67], [68] estimate the failure proba-
50%; such results are orders of magnitude improvement over bilities of different links. For the case of binary performance
benchmarks. Finally, although we are not given any informa- metrics (failed or normal), if the number of failed links is upper
tion about the network topology, NeuTomography provides a bounded by k and the measuring vantages can probe arbitrary
solution to efficiently reconstruct the network topology with cycles or paths, [69], [70], [31] focus on placing measuring
different granularities utilizing only the given end-to-end node vantages and constructing measurement paths to localize a
pair measurements, thus revealing more insights to network given number of failures. Furthermore, in [28], [29], [30], [32],
operators for resource optimizations. [33], [34], [35], efficient testing conditions and algorithms are
proposed to quantify the capability of localizing node failures.
However, for arbitrary valued non-additive link metrics, few
A. Further Discussions on Related Work positive results are known. In this regard, we build the neural-
Network tomography can be categorized into passive [46] network-based tomography framework for such general non-
and proactive tomography [8]. Passive tomography refers to additive performance metrics.
a technology of inferring network performance metrics by
passively observing the existing traffic attributes [47]. How- B. Summary of Contributions
ever, passive tomography requires additional assumptions, e.g., Our contributions are four-fold:
correlated performance metrics, to assist the inference task, 1) We propose for the first time a deep neural-network-based
thus limiting its applicability. In contrast, active tomography generic and lightweight tomography framework (NeuTomogra-
proactively measures some performance metrics; it is more phy) for network monitoring tasks using only end-to-end path
useful in practical network monitoring tasks, and therefore is measurements of a subset of node pairs without requiring any
the focus of this paper. additional assumptions on the network.
For active tomography, the most important branch is identi- 2) We build algorithm Path Augmented Tomography (PAT)
fying additive performance metrics. Under the assumption of to improve the performance prediction accuracy using esti-
known network topology and node/link involvement in each mated performance bounds as the augmented input data.
measurement path, the problem is formulated as solving a 3) Although no prior knowledge about the network topology
system of linear equations. Yet, even under such assumptions, is given, we establish one method using the proposed tomog-
it is frequently impossible to uniquely identify all unmea- raphy framework to reconstruct the network topology.
sured link metrics from path measurements because the linear 4) Extensive experiments using real data confirm the high
system is not always invertible [7], [8], [9], and statistical accuracy of NeuTomography in predicting path performance
3

metrics of unmeasured node pairs; the reconstructed network unknown network topology Given path
topologies also exhibit small errors. Such results are orders of v1 v2 performance What are the path
measurements performance metrics of
magnitude improvement over baseline solutions. v3 of node pairs the following node pairs?
{v1, v2} {v1, v3}, {v2, v5}
The rest of the paper is organized as follows. Section II {v1, v4} {v1, v5}, {v3, v6}
{v2, v6} {v1, v6}, {v4, v5}
formulates the problem. Section III discusses the challenges. v4 {v3, v4} {v2, v3}, {v4, v6}
Section IV presents the proposed tomography framework. Real v5 v6 {v2, v4}, {v5, v6}
{v3, v5}
network data are employed in Section V for evaluations. : end-to-end path measurement (where intermediate traversed nodes are unknown)

Finally, Section VI concludes the paper.


Figure 1. Network Tomography in a Sample Network.

II. P ROBLEM S TATEMENT


In this paper, we consider the most challenging and practical to obtain the set of path measurements, or make additional
problem setting for network tomography, where strong and/or assumptions. Specifically,
unrealistic assumptions in prior works are removed. 1) For any node pair {vi , vj } ∈ S, we only know the
performance metric of the end-to-end path between vi and
vj . Besides the end-points vi and vj , we have no knowledge
A. Path Performance Metric on which other nodes are on this path.
For the end-to-end path performance metric w.r.t. a node 2) The path between node pair {vi , vj } in S is constructed
pair, it can be broadly classified into two categories: via the underlying routing protocol(s), which are unknown.
1) Additive performance metrics: Path performance metric is 3) We do not know the network topology or even the number
additive if the combined metric over a path is the sum of all of links in the network.
involved individual link metrics. For instance, 4) No constraints are imposed on how the given node pair set
S is obtained; therefore, potentially S can be any subset of T .
• path length (e.g., number of hops on the path) and path
Hence, the path metric distribution associated with node pairs
delay are directly additive;
in S might be different from the actual path metric distribution
• some statistical attributes are additive, e.g., path jitter
of node pairs in T . In this paper, we study how such path
is the combination of individual link jitters if all link
metric distribution inconsistency between S and T affects the
latencies are mutually independent;
accuracy of the network tomography task.
• multiplicative path metric (e.g., packet delivery ratio) can
In sum, as the input, we are only given the basic end-
be expressed in an additive form of individual link metrics
to-end path information for a set of node pairs. We then
by using the log(·) function.
investigate whether such limited information is sufficient to
2) Non-additive performance metrics: Path performance metric reveal unmeasured network information; see Figure 1 for the
is non-additive if the path metric is not the sum of all involved problem illustration.
individual link metrics. For instance,
Remark: Though the measured node pair set S can be any
• binary path status (normal or failed), i.e., the end-to-end
subset of T , S needs to cover all nodes in the network. In
path is normal if all involved links are normal, and failed
if there exists one failed link on this path; S node in V appears at least once in a node
other words, each
pair of S, i.e., φ∈S φ = V . This is because if there exists
• path congestion level and bandwidth, i.e., they are deter-
node w ∈ V that is not covered by S, then we do not know
mined by the most problematic link on this path.
the existence of w; therefore, there is no request on inferring
In this paper, we do not impose any constraints on the type the performance metric of a path starting/terminating at w.
of path performance metrics. Our goal is to establish a generic
and practical network tomography framework that is capable C. Objective
of handling any performance metrics of interest.
For a performance metric of interest (e.g., delay, congestion
level, etc.), suppose we are only given the measured end-to-end
B. Problem Input: Path Measurements path performance metrics w.r.t. all node pairs in set S. Our first
For a network G with n nodes, there are n2 node pairs. objective is to develop a generic framework to infer the end-to-


Let V denote the set of nodes in G (|V | = n) and set end path performance metrics for all unmeasured node pairs,
{v i.e., in set T \ S (recall that T is the complete node pair set).
 i , vj } (vi , vj ∈ V , vi 6= vj ) a node pair. Then T =
{vi , vj } vi ,vj ∈V,vi 6=vj is a set containing all node pairs in V Our second objective is to reconstruct the original network
(i.e., |T | = n2 ). For a given performance metric of interest
 topology based on the given path performance measurements
(any metric as discussed in Section II-A), suppose we are given of node pairs in set S when the type of performance metrics
the measured end-to-end path performance metric associated allows; see Section IV-C for details.
with each node pair in a subset S of T , i.e., S ⊂ T . Then Discussions: In practical networks, generally only a small
we explore how to infer the unmeasured path information portion of end-to-end path measurements are available. There-
as accurately as possible purely based on this available set fore, for the first objective, we aim to establish a framework
of path measurements. To make our problem more practical that exhibits high accuracy when |S| is small, thus applicable
and applicable to real networks, we do not constrain the way to real network monitoring tasks. Note that in cases where
4

some node pairs in T \ S are directly connected by links, IV. G ENERIC N ETWORK T OMOGRAPHY F RAMEWORK
i.e., neighbors in the underlying unknown network topology, In this section, we propose a neural-network-based network
their inferred path performance metrics correspond to their tomography framework to address our objectives.
link performance metrics if these links are selected for routing.
While regarding the second objective, the significance of it is
that the topological information is critical to many network A. Neural-Network-Based Tomography Framework (NeuTo-
applications and operations, e.g., traffic engineering, fault mography)
localization, etc. For neural networks, they are shown to be exceptionally
powerful in the field of machine learning [43]. Moreover,
the universal approximation theorem [71] proves that neural
III. P ROBLEM C HALLENGES AND R ELATIONS TO networks are capable of approximating any non-linear func-
C LASSICAL N ETWORK T OMOGRAPHY P ROBLEMS tions as long as the hidden layer consists of sufficient number
Our problem formulation in Section II is a generalization of neurons. As such, we use neural networks to address the
of classical network tomography problems. In particular, let c potential non-linearality in our network tomography problem.
denote the path performance metric vector of all node pairs in The neural network is a mathematical architecture where the
set S. Then the network tomography problem can be described training variables are continuous. In contrast, both routing ma-
by trices R and R0 in our problem are binary (see the discussions
Section III). Nevertheless, our ultimate goal is to determine
R ⊗ w = c, (1) R0 ⊗w rather than individual R0 and w. Therefore, we propose
to relax the binary values in the routing matrices R and R0 to
where w is the performance metric vector of all links in the continuous values ranging from 0 to 1, thus forming stochastic
network with entry wi denoting the performance metric of routing matrices, denoted by R e and Re 0 , respectively. Then
link li , R = (Ri,j ) is called the routing matrix with each e e e 0 e 0
entry Ri,j in R (or Ri,j in R ) indicates the probability that
entry Ri,j ∈ {0, 1} representing whether link lj is present link lj exists on the path connecting the i-th node pair in S
on the path between the i-th node pair in S, and ⊗ is the (or T \ S). By such routing matrix relaxation, we propose a
operator indicating how link metrics are related to the end-to- neural-network-based network tomography framework, called
end path metrics. In particular, the meaning of ⊗ depends on NeuTomography, as shown in Figure 2.
the problem considered as described as follows.
Figure 2 is a neural network model consisting of k fully-
1) Additive metric tomography: For this class of problems, connected hidden layers [43], where each hidden layer con-
the common assumption is that R and c are both known, and tains γ neurons. Here, γ is the estimated number of links in
⊗ is simply the matrix multiplication. the network. Note that the exact number of links is unknown,
2) Boolean metric tomography: For Boolean metric tomog- and we discuss later how the value of γ is estimated and how
raphy, all performance metrics are binary, where 0 represents γ affects the inference accuracy. For this model, each input is
“normal” and 1 represents “failed”. In this case, ⊗ is Boolean a node pair from set S. At the input layer, the node pair, say
matrix product, i.e., ci = ∨j (Ri,j ∧ wj ). v1 and v2 , is mapped to an n-dimensional “one-hot” vector
3) Congestion level (or bandwidth) tomography: For con- v0 (recall that n is the total number of nodes), where only
gestion level tomography, operator ⊗ finds the most congested positions corresponding to v1 and v2 are 1 and 0 elsewhere.
link in the given path, i.e., ci = maxj (Ri,j wj ). While for Next, as in typical fully connected neural networks, v0T is
bandwidth tomography, ⊗ finds the link with the minimum multiplied by an n × γ matrix M1 and then added by a bias
bandwidth, i.e., ci = minj (Ri,j wj ). vector b1 . The resulting v0T M1 +bT1 is passed to hidden layer
Discussions: For all these classical problems, R is assumed 1 and taken as input by an activation function [43] σ(·), i.e.,
to be known and the number of faulty links is generally
bounded. Then, w and R0 , which is the routing matrix corre-
sponding to the unmeasured node pairs in T \ S, are inferred γ-dim γ-dim γ-dim

for computing the path performance metrics for node pairs in n-dim
T \ S via R0 ⊗ w. In comparison, our problem setting is more 0
input vector with two ones

relaxed. Without any assumptions on w or R, we are tasked


0
to determine R0 ⊗ w directly with the given c. This problem is 1
extremely challenging as we are only given c and the number 0
Mnxγ Mγ xγ … Mγ x1 Σ
of entries in w is unknown, since we have no knowledge about 0 output
layer

the network topology. We design our network tomography


1
framework to infer R0 ⊗ w directly since even the metrics 0
w is inferred somehow, it is still not sufficient to determine
dim: dimension
end-to-end path metrics . This is because without knowing
the underlying principles that govern the end-to-end routing,
hidden
layer 1
hidden
layer 2
… hidden
layer k
w cannot uniquely determine the end-to-end path performance.
Hence, we generalize classical tomography issues by removing Figure 2. Neural-network-based network tomography framework (NeuTomog-
all assumptions made on w and R. raphy).
5

hidden layer 1 outputs σ(v0T M1 +bT1 ). Each of the following- of T ; moreover, for the training data, only a small percentage
up hidden layers has the same activation function σ(·) and of node pairs are measured (e.g., for the experiments in
operates by the same way. Thus, let the output vector from Section V, |S|/|T | ≤ 30%), thus potentially causing model
hidden layer j − 1 be vj−1 ; then the output from hidden layer overfitting [43]. For instance, if the input data only include
j (j ≤ k) is vjT = σ(vj−1
T
Mj + bTj ), where Mj is a γ × γ node pairs that are less than 3-hop away, then the predicted
weight matrix between hidden layers j − 1 and j, and bj is distance for unmeasured node pairs are also up to 3 hops
the corresponding bias vector. Finally, vkT generated by hidden though the network diameter (which is not directly given in the
layer k is multiplied by a γ × 1 weight vector m, i.e., vkT m, input data) might be substantially larger than 3. As such, we
as the final path performance metric between the input node propose one algorithm that leverages S to construct additional
pair v1 and v2 at the output layer (only one neuron and no input data to improve the prediction accuracy.
activation function or bias in the final output layer). 1) Motivation and Algorithm Sketch: For each node pair
Design Intuitions of NeuTomography: In NeuTomography, φ ∈ S, let dφ denote the measured path performance metric
we select sigmoid [43] as the activation function σ(·) across w.r.t. φ, and V the set of nodes appearing in the node pairs of
all hidden layers, i.e., to represent the probability of each link set S. Then, the measurement data can be  directly mapped to
appearing on paths. Its design intuition is as follows: a weighted graph G 0 = V, S, {dφ }φ∈S , where V is the set
1) When performance metrics are additive, for the input of vertices, S specifies the end-points of all edges in G 0 , and
node pair, say the i-th node pair in set S, intuitively, the {dφ }φ∈S (i.e., path performance metrics w.r.t. S in the input
purpose of all hidden layers is to compute the i-th row R e i,: of data) are the corresponding edge weights in G 0 . In G 0 , for each
the stochastic routing matrix R (each entry value is between
e node pair µ in T \ S, there exists a path Pµ connecting the
0 and 1). Then the weight vector m connecting to the output nodes in µ if G 0 is a connected graph. If G 0 is disconnected and
layer represents the approximated metrics for all links in there exist node pairs which are not connected by any paths
the network. Moreover, the mapping from node pairs to the in G 0 , then these node pairs are not selected for augmented
routing matrix is highly complicated and likely to be non- input data. Our idea is to use the performance metric of Pµ ,
linear. Therefore, we use multiple (k) hidden layers to capture denoted by deµ , as the estimation of the real path metric of the
such relations, where each additional layer tries to refine the unmeasured node pair µ, and feed (µ, deµ ) to NeuTomography
probability of each link appearing on a particular path. as additional data. Then deµ is updated iteratively using its
2) When performance metrics are non-additive, e.g., initial estimation and the predicted value by NeuTomography.
congestion level and bandwidth, the design intuition is that Lastly, the refined deµ is returned as the final inferred path
for the i-th input node pair, the goal of the k-th hidden layer is performance metric for µ ∈ T \ S. Based on this idea, we
to output a “one-hot” vector vk with 1 in only one position and propose a tomography algorithm with augmented data, called
0 elsewhere. In this “one-hot” vector, the position with value Path Augmented Tomography (PAT).
1 corresponds the most problematic link for the congestion 2) Path Augmented Tomography (PAT): Complete algo-
level and bandwidth tomography. On the other hand, since our rithm of PAT is presented in Algorithm 1. In PAT, we first
objective is to accurately predict the product of vkT and m and compute the path performance estimation deµ for each node
vk and m are not unique for the same product, we only need pair µ in T \ S by lines 1–3. Path performance estimations are
to train the neural network model for such product instead of carried out such that deµ corresponds to the path with the best
individual vk and m, thus easing the training process. performance metric on G 0 w.r.t the given tomography task.
Next, with this path performance estimation, we iteratively
Discussions: Our goal is to predict R0 ⊗ w by only using
train the neural network framework. Specifically, from the
R ⊗ w, where R and R0 are related by the underlying routing
|T \S| unmeasured node pairs, α|T \S| random node pairs with
protocol(s), and no prior knowledge about R, R0 , or w is
the estimated path performance values are selected (by line 5)
available. The gist of NeuTomography is to capture such
and combined with the given measurement data (line 6) as
relations among R, R0 , and w by the estimated number of
the augmented training data to train NeuTomography (line 7).
links γ and multiple hidden layers. In Section V, by extensive
Note that β in line 9 equals zero for node pairs that are
experiments, we show that by using only a small portion of
not within the same component when G 0 is disconnected.
measurements as the input data, NeuTomography is capable of
After this training process, the path performance estimation
learning the accurate relations between R and R0 irrespective
{deµ }µ∈T \S is updated by lines 8–10. In particular, parameter
of the type of performance metrics. Moreover, we show that
β (0 ≤ β < 1) is employed in line 9 to balance the estimated
NeuTomography is robust against the estimation error of the
and the predicted value so as to avoid overfitting. Such training
number of links (γ).
and value updating process is repeated until the maximum
number of iterations is reached. Finally, Algorithm 1 outputs
B. Path Measurement Augmentation {deµ }µ∈T \S .
In Section IV-A, NeuTomography purely utilizes the given Discussions of Algorithm 1: There are two key operations in
measurements of node pairs in S to predict the path per- Algorithm 1, i.e., the first “foreach” loop which initializes path
formance of node pairs in T \ S. However, the measured performance metric estimations for unmeasured node pairs
performance metric distribution might be different from the based on G 0 ; and the second “while” loop which iteratively up-
actual performance metric distribution as S can be any subset dates these estimations using predictions made by NeuTomog-
6

Algorithm 1: Path Augmented Tomography (PAT) Table I


AS T OPOLOGIES IN ROCKETFUEL AND ITDK
input : Path performance measurements for each node
pair in S: {(φ, dφ )}φ∈S , proportion of additional unknown unknown link
data α (0 < α < 1), update weight β AS # D ASPL unknown link weights congestion levels
(0 ≤ β < 1) nodes #links mean var mean var
3967 201 11 4.8 434 4.2 14.5 5.2 78.0
output: Inferred path performance metric for each node 3257 240 14 5.1 404 4.8 18.7 4.3 41.3
pair in T \ S 1221 318 13 5.0 758 2.0 1.1 2.8 9.3
15706 325 7 3.17 874 2.0 1.1 2.8 9.3
1 foreach node pair µ in T \ S do
2 deµ ← path performance estimation for µ using the
input measurement data {(φ, dφ )}φ∈S ; contains the per-hop information of all traversed nodes; such
3 end information, by contrast, is not available in our input data.
4 while the maximum number of iterations is not reached Given that only the path measurements for set S are
do available, among all the performance metrics of interest, only
5 randomly choose α|T \ S| node pairs from T \ S to the minimum number of hops between two nodes provides
form the augmented node pair set A; useful information on the underlying network topology. This
6 training data = {(φ, dφ )}φ∈S ∪ {(θ, deθ )}θ∈A ; is because, for other path performance metrics, e.g., delay and
7 use the above training data to train congestion level, even if there exists a link between two nodes,
NeuTomography; such link may not be used in any communication paths due to
8 foreach µ ∈ T \ S do its poor performance or routing constraints. Therefore, the link
performance metric between two neighboring nodes can be
9 deµ ← β · deµ + (1 − β) · N T (µ); // N T (µ):
arbitrarily larger or smaller than their path performance metric.
the path performance metric for µ
Thus, for network topology reconstruction, in this section, we
predicted by NeuTomography after
only focus on the case where the path performance metric is
training in line 7
the number of hops.
10 end
However, even the path measurements w.r.t. set S are the
11 end
number of hops, the corresponding network topology is not
12 return deµ for each µ ∈ T \ S; unique. For instance, in Figure 3, given the same measure-
ments, both topologies in Figure 3 suffice and exist with
equal probabilities. In this regard, we introduce the following
1 4 1 4 1à2: 1 hop 1à7: 2 hops definition to extend the description of network topologies.
2à3: 1 hop 2à6: 2 hops
2 5 6 2 5 6 4à5: 1 hop 1à5: 3 hops Definition 1: For network G, A(m) (m is an integer) is the
(m)
4à6: 1 hop 3à4: 4 hops m-extended adjacency matrix of G, where Ai,j = 1 if nodes
3 3 5à7: 1 hop (m)
7 7 6à7: 1 hop i and j are m-hop away in G and Ai,j = 0 otherwise.
(a) (b)
When m = 1, the extended adjacency matrix is reduced
Figure 3. Non-uniqueness of network topologies corresponding to the same to a regular adjacency matrix. Comparing A(m) (m > 1)
set of path measurements. to A(1) , A(m) establishes a coarse-grained representation of
the network topology. Specifically, since A(1) is generally not
unique (see Figure 3), if A(m) is accurate when m is small,
raphy while it is being trained. The estimation update process then the collection of the constructed extended adjacency
is regulated by the parameter β which controls the amount of matrices {A(1) , A(2) , . . . , A(m) } jointly determine the net-
new estimations allowed. Such soft update process ensures the work topology with various granularity and accuracy tradeoffs,
stability of the training process of NeuTomography and can be through which the network operators can perform network
found in some existing works on machine learning. Intuitively, planning tasks based on the optimization objectives and the
as the predictions made by NeuTomography become more topology granularity/accuracy preference levels. In Section V,
accurate while it is being trained, the higher accuracy of the we show that although A(1) is likely to be inaccurate, A(2)
predicted data in turn helps the training process. In Section V, already achieves high accuracy for some networks.
we conduct extensive experiments to understand how such Thus, with Definition 1, we reconstruct the m-extended
path augmented tomography and tunable parameters (α, β, adjacency matrix for different m. The reconstruction algo-
and #iterations) affect the performance inference accuracy. rithm for approximating A(m) is simple: For each node pair
{i, j} ∈ T , if the (inferred or given) performance metric of the
path between i and j falls into the region of (m−0.5, m+0.5],
C. Network Topology Reconstruction (m) (m)
then Ai,j = 1; otherwise, Ai,j = 0.
Our network topology reconstruction task is more challeng-
V. E XPERIMENTS
ing than prior traceroute-based router-level topology inference
works [72], [73], [74], [75], [76], [77]. This is because tracer- A. Input Data
oute (may be blocked in real networks for security reasons) We evaluate NeuTomography through extensive experiments
enables a hop-by-hop approach, where each measurement path on Autonomous System (AS) networks from both the Rock-
7

etfuel [44] and ITDK [45] projects, which represent IP- B. Benchmark Solutions
level connections between backbone/gateway routers of several
ASes from major Internet Service Providers (ISPs) around To study the performance of NeuTomography, it is com-
the globe. The parameters of selected networks obtained from pared against the following benchmarks.
these two projects are listed in Table I, where AS15706 is from 1) Minimum Monitor Placement and Determination of All
ITDK and others are from Rocketfuel and the last five columns Identifiable Links (MMP+DAIL). MMP+DAIL [12], [25] is a
are unknown to NeuTomography (as discussed in Section II). state-of-the-art tomographic solution for additive performance
Note that “D” and “ASPL” in Table I stand for network metrics, under the assumption of known network topology
diameter and average shortest path length, respectively; both and controllable cycle-free routing. In particular, MMP places
in terms of the number of hops. Since the Rocketfuel and the minimum number of measuring vantages to ensure all
ITDK projects do not directly provide path measurements, measurement paths are sufficient to accurately compute all link
we consider the following three aspects to generate path metrics. While DAIL determines all links whose performance
measurements using the available network data for evaluating metrics are accurately inferable under the given measuring
NeuTomography. vantages. To employ MMP+DAIL as a benchmark, we test
it under erroneous topological information as follows. Let
Remark: The purpose here is only to provide a method G = (V, E) be the actual network topology (V /E set of
to generate measurement paths to evaluate NeuTomography. nodes/links in G) and G 0 = (V 0 , E 0 ) the perceived network
Besides the end-to-end path metrics of selected node pairs, topology by MMP with the topological information error being
NeuTomography does not know anything about link metrics,  (0 <   1), where V = V 0 and E 6= E 0 . Specifically, for
network topologies, routing strategies, or sampling methods link e ∈ E, e ∈ E 0 with probability 1 − ; for link e ∈ / E,
that are used to generate data as discussed below. e ∈ E 0 with probability . Based on the measuring vantages
placed by MMP in G 0 , DAIL determines all link metrics.
1) Link metrics. Unlike Rocketfuel, ITDK in [45] does not However, links in E \ E 0 are not visible to DAIL and links
provide the link metric information; therefore, for the experi- in E 0 \ E do not exist. Therefore, DAIL only utilizes links
ment purpose, regarding AS15706, its link metric distribution in E ∩ E 0 to construct measurement paths for determining
is approximated by AS1221 in Rocketfuel (which has the link metrics in E 0 . For edges in E 0 whose performance
similar number of nodes). Furthermore, besides these link metrics cannot be uniquely determined, they are assigned
metric information in Table I, to extensively study NeuTo- arbitrary values according to the distribution of the inferable
mography, we also consider two other types of link metrics: link metrics. With these link metrics, if the underlying routing
(i) unweighted link metrics, where there is no link metric, and mechanism is given, then the path performance metric for any
(ii) uniform link metrics, where link metrics in the network node pair is computable.
are uniformly distributed between 1 and 10. 2) Arbitrary-valued Non-additive Metric Identification
(ANMI). For non-additive performance metrics, e.g., conges-
2) Routing strategies. To construct a path between two
tions, most existing tomography approaches [28], [29], [30],
nodes, two routing strategies are employed: (i) Min-Hop
[31], [32], [33], [34], [35] target to localize the problematic
Routing (MHR), where a path incurring the minimum number
links under the assumption of known network topology. Cur-
of hops is selected, and (ii) Best Performance Routing (BPR),
rently, the state-of-the-art approaches are capable of either
where w.r.t. a given performance metric of interest, the path
uniquely locate up to k binary link metrics (normal/failed)
with the best performance metric is selected, e.g., shortest path
[31] or locating only one problematic link and determining its
for the metric of delay, least congested path for the metric of
arbitrary-valued link metric [61]. To the best of our knowledge,
congestion level.
there is no tomographic approach that is capable of handling
3) Sampling methods. When the above 1) and 2) are known, arbitrary-valued non-additive performance metrics without the
the end-to-end path performance metrics can be obtained for constraint of the number of problematic links. In this regard,
all node pairs (in set T ). We then sample a subset S of T to we employ an artificial method called Arbitrary-valued Non-
form the input data. We first consider random sampling, where additive Metric Identification (ANMI) that is similar to range
S is randomly picked from T . Since there may exist constraints tomography, but without the constraint of the number of prob-
on measurable pairs in real networks, we next consider an al- lematic links. Specifically, given a tunable threshold parameter
ternative method, called monitor-based sampling. In monitor- τ , when a link performance metric is less than τ , then this link
based sampling, we first randomly select ρ nodes as monitors; is regarded as normal, and problematic otherwise. Suppose
then each monitor pings all other nodes (both monitors and there exists a method that can uniquely localize all problematic
non-monitors) to measure the end-to-end path performance links in the network when the network topology is known.
between them. Thus, under monitor-based sampling, each node Then assuming we are also given the precise performance
pair in S contains at least one monitor. For each of these metric distributions of normal and problematic links in the
sampling methods, the sampling ratio |S|/|T |, is selected from network, we further estimate the fine-grained link metrics
{20%, 25%, 30%} (the number of monitors ρ under monitor- by generating the estimated values according to these given
based sampling is tuned such that the required |S|/|T | is metric distributions. Finally, similar to MMP+DAIL, the path
reached). All path metrics associated with node pairs in T \ S performance metric for any node pair is computable if the
serve as the testing data. underlying routing mechanism is given.
8

In addition to MMP+DAIL and ANMI that are proposed degree is generally between 1 and 5. In this regard, we set
specifically for network tomography, we also compare Neu- γ as γ = 2.5n. Such γ is an overestimation for AS3967,
Tomography against two solutions established in other related AS3257, and AS1221, but an underestimation for AS15706
areas, which are described in the following. (see Table I). In Section V-D, we study how such estimation
3) Non-negative Matrix Factorization (NMF) [78]. NMF inaccuracy affects the performance. For the number of hidden
is widely used in recommendation systems, where the goal layers k, to balance the accuracy and the training time, we set
is to complete the non-negative user-item rating matrix via k = 2. Furthermore, we select the mean square error (MSE)
the product of two lower dimensional matrices. At the high as the loss function, Adam [82] (statistical gradient descent
level, recommendation systems and network tomography share based method) as the optimizer, and 1000 epochs for training.
similar objectives, as they both target to predict the unknown In addition, when the enhanced algorithm PAT is employed,
non-negative entries in a matrix using some given entry values. we set α = 15%, β = 0.6, and #iterations=6 (line 4 in
As such, we use NMF as one benchmark solution. Algorithm 1).
4) Neural Matrix Factorization (NeuMF) [79]. NeuMF
is a neural-network-based solution employing both neural
D. Path Metric Prediction Accuracy
collaborative filtering [80] and matrix factorization for recom-
mendation systems. To use NeuMF as a benchmark, we tune To show the advantages of NeuTomography, we first il-
our measurement data to adapt to the input format of NeuMF, lustrate the distribution of the predicted performance metrics.
which requires the user-item rating be between 0 (dislike) and For instance, in Figure 4, the predicted metric distribution by
1 (like). For non-preferable large performance metrics, e.g., NeuTomography almost overlaps with the actual distribution
delay and the number of hops, we use the reciprocal of the for different performance metric types and sampling methods
measured path metrics as the output of NeuMF; for preferable (PAT is used for monitor-based sampling), while the results
large performance metrics, e.g., bandwidth and delivery ratio, by NMF and NeuMF deviate from the actual distribution
we normalize the path metrics as the NeuMF output. In this significantly. Note that since MMP+DAIL requires the network
way, path metrics (≥ 1) with superior (or poor) performance topology as input, it cannot be compared with other solutions
are mapped to values close to 1 (or 0). under the same settings, thus omitted. In Figure 4, the path
performance metrics in the input training data have different
Remark: There are no existing tomographic solutions that distributions under different sampling methods; nevertheless,
operate under our highly relaxed problem settings. In this NeuTomography can recover the actual performance metric
regard, we choose MMP+DAIL and ANMI, which have been distribution for all unmeasured node pairs.
adapted to our simulation settings, as a representative solutions
In addition to the predicted performance metric distribu-
for traditional tomography problems under strong assumptions.
tion, more importantly, we need to evaluate the path metric
Moreover, NeuTomography is compared to NMF and NeuMF
prediction accuracy for each unmeasured node pair. As such,
algorithms which demonstrates that although the latter two
in this section, we focus on the mean absolute percentage error
work well in tasks sharing similar objectives as our tomog-
(MAPE) as the accuracy evaluation metric; the corresponding
raphy tasks, NeuTomography outperforms them due to its
results under different experiment settings, i.e., (i) link metrics
customized designs.
that are unweighted (UN), from real data (Re), or uniformly
distributed (UD), (ii) best performance routing (BPR) or
min-hop routing (MHR), and (iii) random or monitor-based
C. Experiment Settings sampling, are reported in Tables II–IX. In addition, we also
1) Input Data for Training: We evaluate NeuTomography repeat each experiment under γ = 2n and γ = 3n, and get
on three path performance metrics: (i) the number of hops, (ii) similar results, thus omitted for page limitations. These results
accumulated path delays, and (iii) the path congestion level. therefore confirm the robustness of NeuTomography against
Note that the path bandwidth and the binary normal/failed link number estimation errors.
metric are similar to the path congestion level as all of them are 1) Additive Metrics: The results for additive performance
determined by the worst performed link; therefore, we use the metrics are shown in Table II, where MAPEs less than 15% are
path congestion level as a representative non-additive metric highlighted. For each AS, 20%–30% node pairs are measured
in this section. For each path metric, we generate the input to infer the path performance of the rest unmeasured 70%–
training data via the combination of link metric types, routing 80% node pairs. In Table II, as expected, with the increased
strategies, sampling methods, and sampling ratios as discussed portion of measured node pairs (from 20% to 30%), the
in Section V-A. prediction accuracy is improved for all cases. Moreover, under
2) Framework Parameters: For NeuTomography, we select random sampling, we observe that NeuTomography is excep-
the following parameters. As discussed in Section II-B, the tionally accurate, irrespective of networks, link weight distri-
given measurement data already covers all nodes in the net- butions, or the underlying routing strategies. When 30% ran-
work; therefore, the dimension of the input layer, n, in Figure 2 dom node pairs are measured, the corresponding MAPE ranges
is determined. For the number of links γ (i.e., the number from 2% to 7% for BPR, and 5%–15% for MHR (with 22%
of neurons in hidden layers), we estimate it by the average MAPE for AS3967 as the only exception). Furthermore, when
node degree (defined as 2γ/n) in real networks. As shown the underlying routing strategy coincides with the performance
in [81], in real communication networks, the average node metric of interest (i.e., BPR), such high prediction accuracy
9

0.08 0.10 0.8 0.6


NeuTomography NeuTomography NeuTomography NeuTomography
NMF 0.08 NMF 0.6 NMF NMF
0.06 NeuMF NeuMF NeuMF 0.4 NeuMF
Ground Truth 0.06 Ground Truth Ground Truth Ground Truth
pdf

pdf

pdf

pdf
0.04 0.4
0.04 0.2
0.02 0.02 0.2

0.00 0.00 0.0 0.0


0 20 40 60 0 20 40 60 0 5 10 15 0 10 20 30
path performance metrics path performance metrics path performance metrics path performance metrics

(a) random sampling (additive) (b) monitor-based sampling (additive) (c) random sampling (non-additive) (d) monitor-based sampling (non-
additive)

Figure 4. Distribution of the predicted (additive/non-additive) path performance metrics (AS3257 under |S|/|T | = 30%, link weights from real data, and
best performance routing).

Table II
PATH P ERFORMANCE P REDICTION E RROR (MAPE IN %) OF N EU T OMOGRAPHY FOR A DDITIVE M ETRICS (UN/R E /UD: LINK METRICS THAT ARE
UNWEIGHTED / FROM REAL DATA / UNIFORMLY DISTRIBUTED , BPR: BEST PERFORMANCE ROUTING , MHR: MIN - HOP ROUTING , MONITOR :
MONITOR - BASED SAMPLING , M +PAT: MONITOR - BASED SAMPLING INPUT DATA AND PAT IS APPLIED )

AS3967 AS3257 AS1221 AS15706


|S| sampling BPR MHR BPR MHR BPR MHR BPR MHR
|T | method UN Re UD Re UD UN Re UD Re UD UN Re UD Re UD UN Re UD Re UD
random 8.8 7.3 7.7 36.0 22.4 7.9 5.9 6.0 18.0 14.6 2.9 3.4 3.0 6.7 6.3 1.3 1.8 1.5 2.3 3.1
20% monitor 40.4 36.4 41.9 75.9 46.7 36.2 28.5 26.8 54.8 45.1 45.1 24.3 19.3 41.4 51.1 26.7 14.7 8.4 11.4 39.6
m+PAT 23.3 22.4 21.4 34.2 32.9 23.8 12.1 15.4 28.6 21.4 26.6 11.1 9.7 17.0 22.7 23.6 8.3 7.0 9.9 12.3
random 8.1 8.0 7.4 28.3 15.9 6.7 5.4 5.7 15.4 10.0 2.0 3.0 2.2 5.1 6.5 0.9 1.4 1.4 2.1 2.6
25% monitor 35.8 36.9 39.5 75.8 41.6 32.9 16.9 25.4 39.3 42.0 42.0 19.5 17.3 34.5 40.2 15.2 8.1 7.3 10.2 9.0
m+PAT 20.3 14.6 17.4 32.8 28.0 18.1 11.5 14.3 26.1 20.8 18.7 9.2 9.2 15.4 21.5 14.9 5.3 4.3 9 6.8
random 6.5 5.7 6.6 22.2 14.4 5.9 4.0 4.5 13.2 10.0 2.4 2.6 2.3 5.2 5.2 0.6 1.2 1.3 2.1 2.5
30% monitor 32.7 31.0 18.2 49.2 40.6 9.1 14.5 21.0 28.5 40.6 35.8 17.1 14.0 25.1 39.0 13.3 8.1 3.0 3.9 7.9
m+PAT 18.7 14.2 12.4 33.7 28.1 9.1 8.6 13.2 25.7 19.5 18.4 7.2 6.8 14.9 19.9 13.0 5.3 3.0 3.8 6.8

Table III
PATH P ERFORMANCE P REDICTION E RROR (MAPE IN %) OF MMP+DAIL FOR A DDITIVE M ETRICS

AS3967 AS3257 AS1221 AS15706


topology BPR MHR BPR MHR BPR MHR BPR MHR
error UN Re UD Re UD UN Re UD Re UD UN Re UD Re UD UN Re UD Re UD
0.5% 21.1 23.7 29.0 36.7 30.6 21.9 35.9 32.1 35.8 28.8 22.7 30.4 29.6 30.9 28.4 6.3 20.6 29.6 22.6 24.3
1% 27.1 32.5 34.2 41.9 33.5 32.8 44 50.5 49.8 43.3 30.6 41.2 43.8 40.4 39.7 9.9 30.3 39.4 33.1 37.5
2% 35.3 44.1 47.4 47.6 43.0 39.9 54.1 57.2 56.8 46.7 38.2 50.2 54.8 46.9 46.9 15.9 38.2 48.8 38.8 42.0

Table IV
PATH P ERFORMANCE P REDICTION E RROR (MAPE IN %) OF NMF FOR A DDITIVE M ETRICS

AS3967 AS3257 AS1221 AS15706


|S|sampling BPR MHR BPR MHR BPR MHR BPR MHR
|T |method UN Re UD Re UD UN Re UD Re UD UN Re UD Re UD UN Re UD Re UD
random 34.4 37.2 41.3 59.4 49.7 33.2 47.4 40.7 56.2 45.0 33.9 35.6 41.5 46.7 45.4 22.0 29.5 29.5 28.9 31.3
20% monitor 55.9 55.7 56.4 57.9 58.3 61.2 40.7 60.2 63.3 60.2 71.2 37.7 40.5 69.6 69.4 69.0 30.8 33.3 33.8 69.3
random 34.3 35.4 41.8 58.9 49.6 32.6 44.1 38.5 50.1 43.5 33.2 34.8 41.1 41.9 43.0 20.5 24.6 28.8 25.6 29.1
25% monitor 54.3 54.0 55.4 57.2 56.5 60.8 40.4 40.4 62.9 60.5 69.7 35.7 39.5 69.1 69.3 69.5 29.2 33.2 30.5 69.2
random 33.8 35.3 37.0 55.5 46.6 31.9 34.9 36.4 50.0 42.9 32.6 33.7 37.1 38.6 43.1 19.2 22.2 26.0 25.1 28.7
30% monitor 53.8 53.7 54.7 57.6 56.6 59.5 39.1 59.1 61.2 61.9 68.7 34.8 36.5 68.6 69.8 69.4 28.8 30.3 28.7 69.6

is achievable even when only 20% node pair measurements of NeuTomography. First, Table I shows that link weights in
are available. By comparison, under monitor-based sampling, AS3967 and AS3257 have higher variance comparing to those
the resulting MAPE is relatively large, especially in the case in AS1221 and AS15706, which potentially causes difficulty
of |S|/|T | = 20%. Nevertheless, when |S|/|T | is increased to in predicting link performance metrics. Nevertheless, when
30%, MAPE is reduced by half for many cases. On the other the underlying routing mechanism is BPR, NeuTomography
hand, even without increasing the amount of training data, is robust against link weight variances and achieves high
using algorithm PAT alone improves the prediction accuracy prediction accuracies for all networks. Intuitively, this is
significantly. As shown in Table II, MAPEs are almost halved because under BPR, the routing mechanism considers both
(some even less than 15%) after applying PAT, which therefore the network topology and link weight information; in other
demonstrates the high efficiency of PAT in reducing the words, node pair measurements incorporate more network
prediction error. information, including the link weight variance. Such rich
information therefore enables the high prediction accuracy
Table II also reveals some insights into the performance
10

Table V
PATH P ERFORMANCE P REDICTION E RROR (MAPE IN %) OF N EU MF FOR A DDITIVE M ETRICS

AS3967 AS3257 AS1221 AS15706


|S| sampling BPR MHR BPR MHR BPR MHR BPR MHR
|T | method UN Re UD Re UD UN Re UD Re UD UN Re UD Re UD UN Re UD Re UD
random 64.3 74.5 84.1 98.2 66.3 71.9 39.0 55.8 48.8 58.9 65.7 16.1 38.9 28.0 47.0 84.5 4.7 26.8 7.2 30.7
20% monitor 52.1 59.6 57.2 70.7 58.5 48.8 49.0 55.6 61.4 57.6 72.5 23.3 59.6 35.3 61.7 47.3 13.9 39.9 10.1 50.2
random 45.9 43.8 45.2 80.8 58.5 60.7 37.1 52.3 49.6 55.6 62.0 14.3 36.9 27.0 46.6 74.6 4.4 26.1 5.3 27.1
25% monitor 45.7 47.9 53.7 64.0 55.8 42.5 46.6 51.2 55.4 56.9 65.2 23.0 50.6 32.4 57.3 43.5 8.0 26.6 9.5 37.2
random 42.0 43.8 39.4 65.3 56.3 38.1 34.5 44.5 46.0 53.2 58.5 13.5 34.9 23.0 43.0 46.7 3.9 20.9 5.3 25.0
30% monitor 42.0 46.3 47.4 55.2 54.6 40.7 43.6 49.3 51.6 53.1 60.2 17.8 49.2 30.6 55.9 35.0 7.9 26.9 7.8 36.1

Table VI
PATH P ERFORMANCE P REDICTION E RROR (MAPE IN %) OF N EU T OMOGRAPHY FOR N ON -A DDITIVE M ETRICS

AS3967 AS3257 AS1221 AS15706


|S| sampling BPR MHR BPR MHR BPR MHR BPR MHR
|T | method Re UD Re UD Re UD Re UD Re UD Re UD Re UD Re UD
random 0.7 1.0 35.2 14.5 0.2 1.8 25.2 7.9 1.1 0.9 9.7 7.0 0.2 0.5 1.8 2.0
20% monitor 3.3 3.0 57.0 17.9 0.6 1.2 34.2 11.9 0.6 2.1 15.3 13.9 0.3 2.1 8.8 5.7
m+PAT - - 47.8 18.0 - - 31.6 11.8 - - 14.0 13.0 - - 8.0 5.8
random 0.8 1.1 28.3 12.3 0.5 1.0 18.7 7.7 0.3 0.6 7.3 6.0 0.2 0.1 1.2 1.8
25% monitor 1.6 2.6 53.4 14.4 0.6 1.7 31.5 9.8 0.7 2.0 13.3 10.3 0.3 1.3 6.0 4.2
m+PAT - - 53.0 15.0 - - 31.0 9.8 - - 13.5 1.6 - - 6.1 4.4
random 0.4 1.2 28.2 11.0 0.3 0.9 16.1 8.1 0.2 0.5 6.1 5.9 0.2 0.3 1.8 1.7
30% monitor 1.5 2.0 50.7 13.6 0.6 1.3 29.7 8.9 0.2 1.7 11.6 9.5 0.3 0.8 2.2 3.7
m+PAT - - 50.0 14.1 - - 29.0 9.0 - - 12.0 9.6 - - 2.5 4.0

Table VII
PATH P ERFORMANCE P REDICTION E RROR (MAPE IN %) OF ANMI FOR N ON -A DDITIVE M ETRICS

AS3967 AS3257 AS1221 AS15706


threshold BPR MHR BPR MHR BPR MHR BPR MHR
ratio Re UD Re UD Re UD Re UD Re UD Re UD Re UD Re UD
30% 15.0 33.0 24.8 33.5 30.1 31.3 39.2 33.4 10.3 32.9 16.9 34.2 19.6 35.4 22.4 36.0
50% 15.4 46.1 25.1 52.4 30.1 49.4 39.1 51.1 11.5 50.0 18.6 54.6 22.3 48.8 25.8 52.6
70% 19.7 62.2 29.9 65.5 31.2 64.1 39.9 65.5 12.6 65.5 20.4 67.4 23.7 71.1 26.7 74.1

Table VIII
PATH P ERFORMANCE P REDICTION E RROR (MAPE IN %) OF NMF FOR N ON -A DDITIVE M ETRICS

AS3967 AS3257 AS1221 AS15706


|S| sampling BPR MHR BPR MHR BPR MHR BPR MHR
|T | method Re UD Re UD Re UD Re UD Re UD Re UD Re UD Re UD
random 46.2 19.5 64.7 24.2 23.3 23.2 80.7 20.4 24.3 27.3 41.1 22.5 26.1 26.9 45.9 22.4
20% monitor 30.0 24.7 57.4 28.8 22.1 26.4 71.6 27.8 22.5 32.5 35.6 30.3 24.8 33.4 29.7 30.5
random 29.5 16.9 60.1 23.4 17.2 21.5 75.0 18.7 17.4 26.4 40.1 20.0 23.4 26.9 34.2 22.2
25% monitor 29.4 24.9 54.7 26.4 22.1 26.3 62.3 25.2 22.0 28.6 35.5 27.0 24.8 32.5 28.9 28.7
random 28.5 15.2 51.9 21.1 17.0 20.5 68.9 17.2 16.7 26.3 28.3 18.7 18.6 25.7 26.4 18.7
30% monitor 25.0 22.2 54.2 23.6 21.6 24.5 57.1 23.4 21.7 26.8 35.2 24.2 24.5 32.0 28.6 26.4

for NeuTomography. Second, comparing to BPR, the average AS15706 is relatively densely connected, which provides more
MAPE under MHR is substantially larger. This is because next-hop options when constructing end-to-end paths, thus
MHR only captures the network topology information while leading to smaller MAPE.
the link weight information is lost. Nevertheless, under random
sampling and UD, MAPE is generally less than 15%, even We now compare our solution to the benchmarks (in Ta-
if the underlying routing mechanism is MHR. Furthermore, bles III–V). For MMP+DAIL, we know from [12], [25] that
Table II implies that the effect of link weight variance becomes it does not incur any error if all assumptions are completely
prominent under MHR. Specifically, MAPE is improved (even satisfied. However, as shown in Table III, it is extremely vul-
less than 3%) in AS1221 and AS15706 with smaller link nerable to the topology error. Specifically, when the topology
weight variance. This observation suggests that NeuTomog- error is only 0.5%, the MAPE can be up to 37%; when the
raphy is capable of dealing with different types of routing topology error is increased to 2%, then MAPE is deterio-
mechanisms so long as the link weight variance is relatively rated to around 50%. This result shows the advantages of
small. Finally, recall that links in AS1221 and AS15706 have NeuTomography, for which no additional network information
the same weight distribution; however, we observe that the is required while still achieving superior performance. In
average MAPE in AS15706 is smaller, which can be explained addition, under the same experiment setting, our solution sig-
by the network structural properties: The average node degree nificantly outperforms NMF and NeuMF with up to one order
is 4.7 and 5.4 for AS1221 and AS15706, respectively. Thus, of magnitude reduction in MAPE, which demonstrates the
superiority of NeuTomography. We note that NeuMF performs
11

Table IX
PATH P ERFORMANCE P REDICTION E RROR (MAPE IN %) OF N EU MF FOR N ON -A DDITIVE M ETRICS

AS3967 AS3257 AS1221 AS15706


|S| sampling BPR MHR BPR MHR BPR MHR BPR MHR
|T | method Re UD Re UD Re UD Re UD Re UD Re UD Re UD Re UD
random 16.3 3.5 55.8 65.9 7.0 5.5 34.7 79.0 3.6 7.5 46.8 89.4 5.8 8.6 20.7 34.5
20% monitor 15.7 6.2 58.1 71.6 8.6 9.2 37.5 64.8 3.8 6.5 38.1 58.4 5.3 7.3 19.6 39.0
random 16.0 4.0 53.4 45.8 6.0 4.5 34.1 30.7 3.3 5.7 30.7 50.9 3.8 4.1 17.3 20.5
25% monitor 14.0 5.8 57.1 57.9 7.2 8.5 33.7 55.5 3.6 6.4 33.0 51.7 4.9 6.0 16.3 30.1
random 16.8 3.4 52.6 44.2 4.4 3.6 27.1 26.9 2.7 2.6 28.3 44.0 3.3 2.1 14.1 18.5
30% monitor 12.9 4.6 56.8 51.7 7.6 5.6 33.5 46.7 3.3 6.2 26.1 49.5 4.0 3.6 14.7 29.2

relatively well for AS15706 when link metrics are from real path is independent of the link congestion level. Hence, when
data. This is because unlike NMF that only leverages linear the variance of the link congestion level is large, there is a
transformations, NeuMF is a neural-network-based model, lack of link information embedded in the path measurements.
which is equipped with the capability in capturing non-linear Nevertheless, when the link congestion variance is relatively
relationships. However, even with such improvement, NeuMF small, i.e., in AS1221 and AS15706, NeuTomography exhibits
is still inferior to our proposed solution. high accuracy. Furthermore, the large node degree in AS1221
and AS15706 also shortens the average path length under
Discussions on PAT: In Table II, only the results of PAT for
MHR, which reduces the likelihood of constructing a path with
monitor-based sampling are presented. We also test PAT under
a bottleneck link, thus improving the prediction accuracy.
random sampling and get similar results (thus omitted for
page limitations). This is because PAT is proposed mainly for Besides, we also observe that monitor-based sampling gen-
addressing the overfitting problem during training. For random erally incurs larger MAPE under MHR, especially when the
sampling, it is unbiased in the sense that path performance link congestion level is from the real data and the networks
metrics in the input data closely represents the performance are AS3967 and AS3257. Again, this observation shows that
metric distribution in the testing data. However, monitor-based when the variance of the link congestion level is small,
sampling is biased, which causes overfitting, and PAT is able NeuTomography is able to recover the network information
to alleviate the effect of biased sampling. that is critical to the performance prediction even when the
2) Non-additive Metrics: For non-additive performance underlying routing mechanism is independent of the perfor-
metrics (Table VI), with the increased portion of measured mance metric of interest. Furthermore, under MHR, we test
node pairs (from 20% to 30%), the prediction accuracy is also PAT for monitor-based sampling. The results in Table VI
improved. Furthermore, under BPR, NeuTomography achieves show that when the performance metric is non-additive, PAT
significant performance (MAPE< 3.5%) for all cases. This is only slightly reduces MAPE or achieves similar performance
because, regarding the performance metric of interest (conges- as the one without PAT. This implies that under MHR and
tion level), the goal for BPR is to find the least congested path. non-additive performance metrics, it is difficult to know the
Specifically, for a node pair, if there exists a path bypassing performance bound of an unmeasured node pair especially
all highly congested links, then its performance metric is when the link congestion level exhibits high variance.
small. For the tested AS topologies, most end-to-end path
Regarding benchmark solutions, evaluation results of ANMI
performance metrics fall into the narrow region of 3–9, which
is shown in Table VII. The threshold ratio here is the ratio
therefore simplifies the performance inference task. Neverthe-
of normalized τ (i.e., τ minus the minimum link metric
less, our objective is not only to identify the coarse-grained
value) and the range of the metric value distribution (i.e.,
congestion range, but also to determine the fine-grained con-
the difference between the maximum and the minimum link
gestion level. In this regard, NeuTomography achieves high
metric values). From the results in Table VII, we can see
performance inference accuracy for each node pair without
that although ANMI performs relatively well for AS1221
any other network information as prior knowledge. For non-
under BPR with real link metrics, NeuTomography still offers
additive metrics, since using only 20% node pairs already
an order of magnitude performance improvements for the
enables superior performance (MAPE< 3.5%) under BPR,
same setting. Moreover, NeuTomography outperforms NMF
PAT is not employed for the performance improvement.
by up to two orders of magnitude (Table VIII). While for
By contrast, under MHR, MAPE experiences various error
NeuMF, although its MAPE is less than 14% under BPR (see
levels. When link congestion levels are uniformly distributed,
Table IX), NeuTomography still shows one order of magnitude
the corresponding MAPE is between 1.6% and 18% for all
of improvement on average.
networks. However, under real link congestion level distribu-
tion, MAPE is large for AS3967 and AS3257, while still small In sum, both results on additive/non-additive performance
for AS1221 and AS15706 (1%–15%). As discussed before, metrics confirm the high efficiency and applicability of NeuTo-
this is caused by the link variance and limited information mography in real networks without relying on the knowledge
in the path measurements. In Table I, the variance of the of additional network information (e.g., network topology) or
link congestion level is severe for AS3967 and AS3257 (78 rigorous assumptions (e.g., controllable routing), thus provid-
and 41.3 respectively). Moreover, under MHR, the routing ing a lightweight and robust solution.
12

Table X
T OPOLOGY R ECONSTRUCTION E RROR (FPR AND FNR IN %) W. R . T. E XTENDED A DJACENCY M ATRIX A(m)

AS3967 (FPR/FNR: False Positive/Negative Rate in %) AS3257 (FPR/FNR: False Positive/Negative Rate in %)
|S|sampling m=1 m=2 m=3 m=4 m=5 m=1 m=2 m=3 m=4 m=5
|T |method FPR FNR FPR FNR FPR FNR FPR FNR FPR FNR FPR FNR FPR FNR FPR FNR FPR FNR FPR FNR
random 0.4 52.9 2.7 24.3 3.9 18.7 3.5 14.2 2.4 10.0 0.2 59.4 1.5 26.2 2.8 20.2 2.6 12.5 1.7 6.8
20% monitor 2.1 73.0 4.6 68.1 7.4 62.7 9.3 57.2 14.2 53.7 0.5 77.1 1.8 69.0 4.3 63.1 6.6 57.4 10.2 52.5
m+PAT 0.0 72.0 0.4 65.9 3.0 54.8 8.6 43.0 12.4 33.7 0.0 76.5 0.1 67.0 1.9 62.1 6.6 55.2 10.2 49.9
random 0.5 38.9 2.1 21.7 3.1 16.1 3.0 11.3 2.5 9.3 0.2 50.9 1.3 20.5 2.1 14.6 1.8 9.2 1.1 4.9
25% monitor 1.6 67.5 4.2 58.7 7.8 52.1 10.9 50.9 12.3 50.8 0.2 71.4 1.5 60.1 4.0 53.7 6.6 49.2 9.5 42.2
m+PAT 0.0 72.9 0.5 59.8 3.1 46.0 7.7 32.5 9.9 23.6 0.0 71.5 0.2 59.1 2.3 49.8 5.9 40.0 8.2 32.8
random 0.3 43.7 1.5 16.4 1.8 8.6 1.2 4.8 0.8 3.2 0.2 43.0 1.1 16.8 1.7 11.6 1.3 6.7 0.8 3.9
30% monitor 0.6 61.5 2.7 56.1 6.0 48.5 8.9 43.8 10.5 39.1 0.1 52.6 1.0 27.2 2.5 18.2 2.7 12.2 2.0 7.8
m+PAT 0.0 60.2 0.6 53.3 2.8 42.7 7.4 32.4 9.4 21.9 0.0 52.9 0.2 27.1 2.2 17.1 4.0 11.2 2.0 7.9
AS1221 (FPR/FNR: False Positive/Negative Rate in %) AS15706 (FPR/FNR: False Positive/Negative Rate in %)
|S|sampling m=1 m=2 m=3 m=4 m=5 m=1 m=2 m=3 m=4 m=5
|T |method FPR FNR FPR FNR FPR FNR FPR FNR FPR FNR FPR FNR FPR FNR FPR FNR FPR FNR FPR FNR
random 0.2 22.4 0.5 8.1 1.0 3.8 0.8 3.0 0.4 2.1 0.1 7.4 0.3 1.1 0.8 0.7 0.5 1.4 0.2 5.0
20% monitor 0.9 76.0 3.2 64.9 6.4 64.6 10.6 62.3 11.6 57.4 5.0 60.3 7.6 50.9 21.3 39.0 18.3 39.6 4.3 40.7
m+PAT 0.0 75.5 0.2 61.8 2.5 62.6 10.9 56.1 8.9 42.1 1.7 55.6 6.3 43.5 13.4 30.0 15.7 33.4 4.0 33.1
random 0.1 21.2 0.4 5.6 0.4 3.0 0.2 1.1 0.2 0.7 0.1 6.6 0.2 0.5 0.4 0.6 0.1 0.8 0.1 1.8
25% monitor 0.4 67.4 2.3 54.9 5.7 54.7 8.5 55.6 11.6 49.2 2.5 47.2 5.4 24.9 13.2 20.6 8.9 20.2 1.6 18.2
m+PAT 0.0 65.0 0.3 49.6 2.0 43.2 7.7 32.3 8.6 26.0 2.4 46.5 5.0 21.1 12.0 18.5 8.3 15.5 1.4 16.5
random 0.1 19.5 0.5 3.6 0.6 1.9 0.4 1.7 0.2 1.4 0.0 5.2 0.1 0.3 0.2 0.2 0.2 0.3 0.0 1.3
30% monitor 0.1 58.6 1.7 43.4 5.1 38.2 5.3 44.1 7.8 37.9 0.3 44.4 4.4 18.7 8.2 19.4 8.2 15.7 1.8 17.8
m+PAT 0.0 55.2 0.3 41.7 2.5 38.6 5.5 29.6 7.9 23.5 0.3 41.2 4.0 13.2 6.1 14.7 7.8 12.3 1.6 14.6

E. Topology Reconstruction Accuracy Moreover, under monitor-based sampling, PAT yields low
reconstruction error for m ≥ 2. This result implies that the
When the performance metric of interest is the minimum topology reconstruction is more accurate in networks with
number of hops, we use NeuTomography to reconstruct the high density. This is because in dense networks, there exist
network topology in terms of the extended adjacency matrix. more node pairs which are close to each other; therefore, the
To test the reconstruction accuracy, intuitively, we can use the probability of these close node pairs that are selected for mea-
(m) 0(m)
matrix difference ( i j |Ai,j − Ai,j |)/n2 as the evalu-
P P
surements are increased, which assists the learning process in
(m) 0(m)
ation metric, where A and A are the real and con- NeuTomography. For benchmarks NMF and NeuMF, they can
structed extended adjacency matrices, respectively. However, also be used to construct A(m) ; however, their performance
since the number of links in a network is generally much is substantially worse than NeuTomography, thus omitted
smaller than n2 , even a full zero matrix A0(m) leads to a due to page limitations. In sum, NeuTomography provides a
small matrix difference. Therefore, we use the False Positive state-of-the-art solution to reconstruct network topologies with
Rate (FPR) and False Negative Rate (FNR) as the evaluation various granularities using only a small percentage of node
metric. Specifically, let A(m) and A0(m) be the real and pair measurements without additional network knowledge.
constructed extended adjacency matrices, and τ the number of
non-zero elements in A(m) . Then FPR is the number of non- VI. C ONCLUSION
zero elements in A0(m) that are zeros in A(m) over n2 − τ ;
similarly, FNR equals the number of zero elements in A0(m) We revisited the problem of network tomography from the
that are non-zeros in A(m) over τ . The reconstructed network practical perspective. Without relying on any assumptions on
topology is accurate if both FPR and FNR are small. The network topologies, protocol support, or measurement metric
corresponding results are reported in Table X, where both FPR properties as in the literature, we established a generic tomog-
and FNR less than 15% are highlighted. First, for extended raphy framework, NeuTomography, to infer unknown network
adjacency matrices, FPR is small for all cases as usually characteristics using only end-to-end path performance metrics
τ  n2 , and thus the denominator n2 − τ is much larger than of selected node pairs. Next, regarding the potential overfitting
the numerator. Second, as expected, the increased number of problem, we proposed one algorithm that utilizes active perfor-
measured node pairs is beneficial in improving the topology mance bound estimation as the augmented data for iteratively
reconstruction accuracy. Third, under random sampling, A(1) , improving the performance prediction accuracy. Furthermore,
i.e., m = 1, is mostly inaccurate (except for AS15706). we investigated the feasibility of employing NeuTomography
Nevertheless, when m is increased to 2, FNR is reduced by to reconstruct the network topology under the given limited
over a half. Specifically, w.r.t. A(2) , for both AS1221 and measurement data. Extensive experiments using real network
AS15706, FPR and FNR are less than 4% when 30% random data show that NeuTomography is robust against network
node pairs are measured. Fourth, monitor-based sampling parameter errors and exhibits high prediction accuracies for
yields high topology reconstruction error; nevertheless, for both additive and non-additive performance metrics, which
some networks, i.e., AS15706, FNR is reduced to be less than is up to orders of magnitude improvement over benchmark
15% via PAT. Finally, since AS1221 and AS15706 have the solutions. Besides, with small errors in terms of extended
same link congestion level distribution, Table X demonstrates adjacency matrices, the reconstructed network topologies also
that for the denser network AS15706, even A(1) is accurate. provide vital insights to network operational optimizations.
13

R EFERENCES [29] L. Ma, T. He, A. Swami, D. Towsley, and K. K. Leung, “Network


capability in localizing node failures via end-to-end path measurements,”
[1] F. Lo Presti, N. Duffield, J. Horowitz, and D. Towsley, “Multicast- IEEE/ACM Transactions on Networking, vol. 25, no. 1, pp. 434–450,
based inference of network-internal delay distributions,” IEEE/ACM 2017.
Transactions on Networking, vol. 10, no. 6, pp. 761–775, Dec. 2002. [30] N. Bartolini, T. He, and H. Khamfroush, “Fundamental limits of failure
[2] A. B. Downey, “Using pathchar to estimate internet link characteristics,” identifiability by boolean network tomography,” in IEEE INFOCOM,
in ACM SIGCOMM, 1999. 2017.
[3] G. Jin, G. Yang, B. R. Crowley, and D. A. Agarwal, “Network charac- [31] L. Ma, T. He, A. Swami, D. Towsley, and K. K. Leung, “On optimal
terization service (NCS),” in IEEE HPDC, 2001. monitor placement for localizing node failures via network tomography,”
[4] A. Dhamdhere, D. D. Clark, A. Gamero-Garrido, M. Luckie, R. K. P. Performance Evaluation, vol. 91, no. C, pp. 16–37, 2015.
Mok, G. Akiwate, K. Gogia, V. Bajpai, A. C. Snoeren, and K. Claffy, [32] T. He, N. Bartolini, H. Khamfroush, I. Kim, L. Ma, and T. La Porta,
“Inferring persistent interdomain congestion,” in ACM SIGCOMM, 2018. “Service placement for detecting and localizing failures using end-to-end
[5] K. Lai and M. Baker, “Measuring link bandwidths using a deterministic observations,” in IEEE ICDCS, 2016.
model of packet delay,” in ACM SIGCOMM, 2000. [33] T. He, “Distributed link anomaly detection via partial network tomogra-
[6] M. Coates, A. O. Hero, R. Nowak, and B. Yu, “Internet tomography,” phy,” SIGMETRICS Performance Evaluation Review, vol. 45, pp. 29–42,
IEEE Signal Processing Magzine, vol. 19, pp. 47–65, 2002. 2017.
[7] O. Gurewitz and M. Sidi, “Estimating one-way delays from cyclic-path [34] D. Z. Tootaghaj, T. He, and T. La Porta, “Parsimonious tomography:
delay measurements,” in IEEE INFOCOM, 2001. Optimizing cost-identifiability trade-off for probing-based network mon-
[8] Y. Chen, D. Bindel, H. Song, and R. H. Katz, “An algebraic approach to itoring,” SIGMETRICS Performance Evaluation Review, vol. 45, no. 3,
practical and scalable overlay network monitoring,” in ACM SIGCOMM, pp. 43–55, 2018.
2004. [35] N. Galesi and F. Ranjbar, “Tight bounds for maximal identifiability of
[9] A. Chen, J. Cao, and T. Bu, “Network tomography: Identifiability and failure nodes in boolean network tomography,” in IEEE ICDCS, 2018.
Fourier domain estimation,” in IEEE INFOCOM, 2007. [36] D. Ghita, H. Nguyen, M. Kurant, K. Argyraki, and P. Thiran, “Netscope:
[10] E. Lawrence and G. Michailidis, “Network tomography: A review and Practical network loss tomography,” in IEEE INFOCOM, 2010.
recent developments,” Frontiers in Statistics, vol. 54, 2006.
[37] Z. Zhang, O. Mara, and K. Argyraki, “Network neutrality inference,” in
[11] R. Mahajan, N. Spring, D. Wetherall, and T. Anderson, “Inferring link
ACM SIGCOMM, 2014.
weights using end-to-end measurements,” in ACM SIGCOMM Internet
[38] D. Ghita, K. Argyraki, and P. Thiran, “Network tomography on corre-
Measurement Workshop, 2002.
lated links,” in ACM IMC, 2010.
[12] L. Ma, T. He, K. K. Leung, A. Swami, and D. Towsley, “Identifiability
[39] T. Bu, N. Duffield, and F. Lo Presti, “Network tomography on general
of link metrics based on end-to-end path measurements,” in ACM IMC,
topologies,” in ACM SIGMETRICS, 2002.
2013.
[40] N. Duffield and F. Lo Presti, “Multicast inference of packet delay
[13] ——, “Inferring link metrics from end-to-end path measurements:
variance at interior network links,” in IEEE INFOCOM, 2000.
Identifiability and monitor placement,” IEEE/ACM Transactions on
Networking, vol. 22, no. 4, pp. 1351–1368, 2014. [41] N. Duffield, “Simple network performance tomography,” in ACM IMC,
[14] L. Ma, T. He, K. Leung, D. Towsley, and A. Swami, “Efficient 2003.
identification of additive link metrics via network tomography,” in IEEE [42] ——, “Network tomography of binary network performance characteris-
ICDCS, 2013. tics,” IEEE Transactions on Information Theory, vol. 52, pp. 5373–5388,
[15] C. Liu, T. He, A. Swami, D. Towsley, T. Salonidis, A. I. Bejan, and 2006.
P. Yu, “Multicast vs. unicast for loss tomography on tree topologies,” in [43] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. The MIT
IEEE MILCOM 2015, 2015. Press, 2016.
[16] T. He, C. Liu, A. Swami, D. Towsley, T. Salonidis, A. I. Bejan, and P. Yu, [44] U. of Washington, “Rocketfuel: An ISP topology mapping
“Fisher information-based experiment design for network tomography,” engine,” 2002. [Online]. Available: http://www.cs.washington.edu/
in ACM SIGMETRICS, 2015. research/networking/rocketfuel/
[17] S. Tati, S. Silvestri, T. He, and T. L. Porta, “Robust network tomography [45] T. Cooperative Association for Internet Data Analysis
in the presence of failures,” in IEEE ICDCS, 2014. (CAIDA), “Macroscopic Internet Topology Data Kit (ITDK),”
[18] Y. Gao, W. Wu, W. Dong, C. Chen, X.-Y. Li, and J. Bu, “Preferential link April 2013. [Online]. Available: http://www.caida.org/data/active/
tomography: Monitor assignment for inferring interesting link metrics,” internet-topology-data-kit/
in IEEE ICNP, 2014. [46] A. Medina, N. Taft, K. Salamatian, S. Bhattacharyya, and C. Diot,
[19] Y. Gao, W. Dong, W. Wu, C. Chen, X.-Y. Li, and J. Bu, “Scalpel: Scal- “Traffic matrix estimation: Existing techniques and new directions,” in
able preferential link tomography based on graph trimming,” IEEE/ACM ACM SIGCOMM, 2002.
Transactions on Networking, vol. 24, no. 3, pp. 1392–1403, 2016. [47] Y. Zhang, M. Roughan, W. Willinger, and L. Qiu, “Spatio-temporal
[20] Y. Gao, W. Dong, C. Chen, J. Bu, T. Chen, M. Xia, X. Liu, and compressive sensing and internet traffic matrices,” in ACM SIGCOMM,
X. Xu, “Domo: Passive per-packet delay tomography in wireless ad- 2009.
hoc networks,” in IEEE ICDCS, 2014. [48] Y. Xia and D. Tse, “Inference of link delay in communication networks,”
[21] H. Li, Y. Gao, W. Dong, and C. Chen, “Taming both predictable IEEE Journal on Selected Areas in Communications, vol. 24, no. 12, pp.
and unpredictable link failures for network tomography,” IEEE/ACM 2235–2248, 2006.
Transactions on Networking, vol. 26, no. 3, pp. 1460–1473, 2018. [49] A. Adams, T. Bu, T. Friedman, J. Horowitz, D. Towsley, R. Caceres,
[22] W. Ren and W. Dong, “Robust network tomography: k-identifiability N. Duffield, F. L. Presti, and V. Paxson, “The use of end-to-end
and monitor assignment,” in IEEE INFOCOM, 2016. multicast measurements for characterizing internal network behavior,”
[23] W. Dong, Y. Gao, W. Wu, J. Bu, C. Chen, and X.-Y. Li, “Optimal IEEE Communications Magazine, vol. 38, no. 5, pp. 152–159, May
monitor assignment for preferential link tomography in communication 2000.
networks,” IEEE/ACM Transactions on Networking, vol. 25, no. 1, pp. [50] R. Castro, M. Coates, G. Liang, R. Nowak, and B. Yu, “Network
210–223, 2017. tomography: Recent developments,” Statistical Science, 2004.
[24] L. Ma, T. He, K. K. Leung, A. Swami, and D. Towsley, “Link [51] M. H. Firooz and S. Roy, “Network tomography via compressed
identifiability in communication networks with two monitors,” in IEEE sensing,” in IEEE GLOBECOM, 2010.
GLOBECOM, 2013. [52] W. Xu, E. Mallada, and A. Tang, “Compressive sensing over graphs,”
[25] ——, “Monitor placement for maximal identifiability in network tomog- in IEEE INFOCOM, 2011.
raphy,” in IEEE INFOCOM, 2014. [53] A. Gopalan and S. Ramasubramanian, “On identifying additive link
[26] T. He, L. Ma, A. Gkelias, K. K. Leung, A. Swami, and D. Towsley, “Ro- metrics using linearly independent cycles and paths,” IEEE/ACM Trans-
bust monitor placement for network tomography in dynamic networks,” actions on Networking, vol. 20, no. 3, 2012.
in IEEE INFOCOM, 2016. [54] N. Alon, Y. Emek, M. Feldman, and M. Tennenholtz, “Economical graph
[27] T. He, A. Gkelias, L. Ma, K. K. Leung, A. Swami, and D. Towsley, discovery,” in Symposium on Innovations in Computer Science, 2011.
“Robust and efficient monitor placement for network tomography in [55] Y. Bejerano and R. Rastogi, “Robust monitoring of link delays and faults
dynamic networks,” IEEE/ACM Transactions on Networking, vol. 25, in ip networks,” in IEEE INFOCOM, 2003.
no. 3, pp. 1732–1745, 2017. [56] R. Kumar and J. Kaur, “Practical beacon placement for link monitoring
[28] L. Ma, T. He, A. Swami, D. Towsley, K. K. Leung, and J. Lowe, “Node using network tomography,” IEEE Journal on Selected Areas in Com-
failure localization via network tomography,” in ACM IMC, 2014. munications, vol. 24, no. 12, pp. 2196–2209, Dec. 2006.
14

[57] J. D. Horton and A. López-Ortiz, “On the number of distributed


measurement points for network tomography,” in ACM IMC, 2003.
[58] R. L. Carter and M. E. Crovella, “Measuring bottleneck link speed
in packet-switched networks,” Performance Evaluation, vol. 27-28, pp.
297–318, 1996.
[59] M. Jain and C. Dovrolis, “End-to-end available bandwidth: Measurement
methodology, dynamics, and relation with tcp throughput,” IEEE/ACM
Transactions on Networking, vol. 11, no. 4, pp. 537–549, 2003.
[60] S. Alouf, P. Nain, and D. Towsley, “Inferring network characteristics via
moment-based estimators,” in IEEE INFOCOM, 2001.
[61] S. Zarifzadeh, M. Gowdagere, and C. Dovrolis, “Range tomography:
Combining the practicality of boolean tomography with the resolution
of analog tomography,” in ACM IMC, 2012.
[62] A. Markopoulou, G. Iannaccone, S. Bhattacharyya, C.-N. Chuah, and
C. Diot, “Characterization of failures in an IP backbone,” in IEEE
INFOCOM, 2004.
[63] R. R. Kompella, J. Yates, A. G. Greenberg, and A. C. Snoeren, “De-
tection and localization of network black holes,” in IEEE INFOCOM,
2007.
[64] H. Zeng, P. Kazemian, G. Varghese, and N. McKeown, “Automatic test
packet generation,” in ACM CoNEXT, 2012.
[65] A. Dhamdhere, R. Teixeira, C. Dovrolis, and C. Diot, “NetDiagnoser:
Troubleshooting network unreachabilities using end-to-end probes and
routing data,” in ACM CoNEXT, 2007.
[66] Y. Huang, N. Feamster, and R. Teixeira, “Practical issues with using
network tomography for fault diagnosis,” ACM SIGCOMM Computer
Communication Review, vol. 38, pp. 53–58, 2008.
[67] H. X. Nguyen and P. Thiran, “The boolean solution to the congested IP
link location problem: Theory and practice,” in IEEE INFOCOM, 2007.
[68] D. Ghita, C. Karakus, K. Argyraki, and P. Thiran, “Shifting network
tomography toward a practical goal,” in ACM CoNEXT, 2011.
[69] S. S. Ahuja, S. Ramasubramanian, and M. Krunz, “SRLG failure
localization in all-optical networks using monitoring cycles and paths,”
in IEEE INFOCOM, 2008.
[70] S. Cho and S. Ramasubramanian, “Localizing link failures in all-optical
networks using monitoring tours,” Elsevier Computer Networks, vol. 58,
pp. 2–12, 2014.
[71] K. Hornik, “Approximation capabilities of multilayer feedforward net-
works,” Neural Network, vol. 4, no. 2, pp. 251–257, 1991.
[72] N. Spring, R. Mahajan, and D. Wetherall, “Measuring ISP topologies
with Rocketfuel,” in ACM SIGCOMM, 2002.
[73] M. H. Gunes and K. Sarac, “Inferring subnets in router-level topology
collection studies,” in ACM IMC, 2007.
[74] Z. M. Mao, J. Rexford, J. Wang, and R. H. Katz, “Towards an accurate
AS-level traceroute tool,” in ACM SIGCOMM, 2003.
[75] B. Eriksson, P. Barford, and R. Nowak, “Network discovery from passive
measurements,” in ACM SIGCOMM, 2008.
[76] B. Eriksson, P. Barford, J. Sommers, and R. Nowak, “Domainimpute:
Inferring unseen components in the Internet,” in IEEE INFOCOM, 2011.
[77] B. Yao, Ramesh Viswanathan, F. Chang, and D. Waddington, “Topology
inference in the presence of anonymous routers,” in IEEE INFOCOM,
2003.
[78] D. D. Lee and H. S. Seung, “Algorithms for non-negative matrix
factorization,” in NIPS, 2000.
[79] X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, “Neural
collaborative filtering,” in WWW, 2017.
[80] B. Marlin, “Modeling user rating profiles for collaborative filtering,” in
NIPS, 2003.
[81] A.-L. Barabsi and M. Psfai, Network science. Cambridge University
Press, 2016.
[82] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,”
CoRR, vol. abs/1412.6980, 2014.

You might also like