Optimal In-Network Distribution of Learning Functions For A Secure-by-Design Programmable Data Plane of Next-Generation Networks.18384v1
Optimal In-Network Distribution of Learning Functions For A Secure-by-Design Programmable Data Plane of Next-Generation Networks.18384v1
X, XXX 2024 1
Abstract—The rise of programmable data plane (PDP) and that focuses on the use of distributed artificial intelligence
in-network computing (INC) paradigms paves the way for the (AI) techniques is the so-called “In-network distributed in-
arXiv:2411.18384v1 [cs.NI] 27 Nov 2024
development of network devices (switches, network interface telligence”, which aims to enable network devices to collab-
cards, etc.) capable of performing advanced computing tasks.
This allows to execute algorithms of various nature, including orate and make intelligent decisions autonomously, without
machine learning ones, within the network itself to support user the need for centralized control. This paradigm can make
and network services. In particular, this paper delves into the networks more scalable and fault-tolerant (as they become
issue of implementing in-network learning models to support less dependent on centralized controls) and highly adaptable
distributed intrusion detection systems (IDS). It proposes a model to changing conditions and traffic distributions in real-time
that optimally distributes the IDS workload, resulting from
the subdivision of a “Strong Learner” (SL) model into lighter through intelligent decisions about traffic routing, resource
distributed “Weak Learner” (WL) models, among data plane management, and network performance optimization.
devices; the objective is to ensure complete network security Recently, interest is emerging in solutions that go beyond
without excessively burdening their normal operations. Further- the standard uses of distributed intelligence on the network
more, a meta-heuristic approach is proposed to reduce the long (such as supporting Self-optimizing networks, Autonomous
computational time required by the exact solution provided by
the mathematical model, and its performance is evaluated. The network management, and Context-aware networking), aiming
analysis conducted and the results obtained demonstrate the to improve network security by allowing AI-enhanced network
enormous potential of the proposed new approach to the creation devices to autonomously distinguish between legitimate and
of intelligent data planes that effectively act as a first line of anomalous traffic flows. This can, at the same time, improve
defense against cyber attacks, with minimal additional workload the accuracy and increase the speed of intrusion detection.
on network devices.
For their part, the fixed-perimeter nature of traditional IDSs
Index Terms—In-Network Computing, Distributed AI, IDS, is no longer adequate for the highly pervasive and dynamic
Programmable Data Plane, Security by Design. nature of next-generation networks. Even recent solutions in
the literature, which rely on in-network telemetry and traffic
I. I NTRODUCTION data forwarding to a centralized SDN controller that runs the
detection module and completes the decision-making process,
T HE evolving cyber threat landscape requires increas-
ingly agile and adaptable cyber-security solutions. The
emerging paradigms of in-network computing (INC) and in-
do not meet the mentioned requirements.
Next-generation networks require Active IDSs (also called
network distributed learning (INDS), coupled with the concept Intrusion Prevention Systems - IPS), which leverage the INC
of distributed Intrusion Detection Systems (IDS), emerge as and distributed intelligence paradigms to process and analyze
key components to address the challenge. The integration network data within Programmable Data Plane (PDP) devices,
of these concepts has in fact the potential to revolutionize and enable the devices themselves to block threats through
network security by offering a robust, scalable, and resilient completely decentralized procedures; thereby improving the
defense against ever-evolving threats. effectiveness and timeliness of intrusion detection and ensur-
INC exploits the idea of distributing computational tasks ing greater scalability, resilience, and fault tolerance.
across the network infrastructure, rather than relying solely on In this paper we refer to a new paradigm of Active
edge or cloud computing resources. To this end, it leverages Intrusion Detection Systems, we recently proposed in [1],
the capabilities of network devices, such as switches, routers, which leverages the concept of AI model splitting to split a
and network interface cards (NICs), to perform data processing Strong Learner (SL) model into its individual Weak Learner
or caching. An interesting subfield of In-Network Computing (WL) components. The latter are mapped into Virtual Network
Functions (VNF), with both threat detection and response
The authors are with the University of Calabria, Italy capabilities, that can be distributed among the PDP devices
M. G. Spina, F. De Rango, and A. Iera are also with CNIT, Italy. of a next-generation network.
This work was partially supported by the European Union under the
Italian National Recovery and Resilience Plan (NRRP) of NextGenerationEU, For the aforementioned paradigm to be truly effective,
partnership on “Telecommunications of the Future” (PE00000001 - program orchestration is required to always implement an optimal
“RESTART”). distribution of learning functions that truly allows the network
This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may to (i) continuously improve the accuracy of intrusion detection
no longer be accessible. by adapting to new threats, (ii) reduce the processing load, and
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 2
(iii) reduce both the impact on the standard functionality of devices with custom and expensive hardware to enable them
the involved network devices (e.g., packet forwarding) and the to perform ML/DL-relevant tasks.
reaction time to threats. The main contributions of this paper Parallel efforts have focused on encoding ML models
can therefore be summarized as follows: within programmable networking devices, particularly Ran-
• demonstrate the potential of jointly using PDP devices dom Forests (RFs) and Decision Trees (DTs). In this direction,
and in-network distributed learning to enable the network SwitchTree [7] and Forest [8] stand out as the most valuable
user plane to implement a fully distributed active IDS, examples. Both proposals strove to find the best encoding
and increase the effectiveness of this new functionality; methodology to embed DTs and RFs within constrained and
• propose an optimization model for efficient deployment instruction set-limited PDPs. Following this trend, the works in
of in-network learning models for distributed Active IDS, [9]–[12] show effort in designing a framework capable of en-
which balances security coverage with performance; coding general RF/DT within P4-enabled networking devices.
• propose a meta-heuristic approach providing a practical Recent research has demonstrated the remarkable capabilities
and scalable solution to the optimization problem; of eBPF (extended Berkeley Packet Filter), showing nearly
• conduct a comprehensive performance analysis aimed at equivalent performance to P4 in managing general-purpose
demonstrating the effectiveness of the proposed approach tasks offloaded to networking devices [13]. An important
in enhancing the protection of the network against cyber contribution in this domain is found in [14], where the authors
threats while minimizing the impact on the overall net- focus on developing an efficient and effective encoding of a
work performance. DNN using eBPF technology.
The remainder of the paper is organized as follows. Sec- A common effort emerging from the literature is the search
tion II presents the main related works in the key reference for optimal encodings of the entire (sometimes complex)
areas of this research. In Section III, an innovative paradigm ML/DL models to adapt them to network devices with reduced
that exploits distributed in-network learning models to im- impact on packet forwarding performance. None of them ad-
plement a “secure-by-design” data plane is introduced, while dresses how to intelligently distribute in-network classification
Section IV illustrates a model for the optimization of the in- modules to achieve pervasive and ubiquitous security through
network distribution of learning elements and related meta- a fully distributed and collaborative approach of such modules,
heuristic solution. The results of a comprehensive performance which is the objective of the novel paradigm studied in our
evaluation campaign are presented in Section V. Finally, in paper.
Section VI, conclusions are drawn and future work is outlined.
B. In-Network Learning Distribution
II. R ELATED W ORKS
The paradigm of the distribution of computational functions
A. In-Network Security: ML/DL-aided Traffic Classification relevant to AI (both training and inference) finds its first
With the advent of Programmable Data Plane (PDP) and evidence in the context of Edge and Cloud Computing.
INC capabilities, recent efforts have focused on the design Many works in the literature addressed the concept of
of in-network IDS solutions (also referred to as in-network decomposing a deep neural network (DNN) into its layers
classifiers) to address security-related challenges. A significant to distribute the workload between an edge mobile device
area of research investigated the use of the programmable and the cloud, proposing optimization models for this pur-
PISA (Protocol Independent Switch Architecture) switch ar- pose. Among others, in [15] the best split is determined via
chitecture by means of Reconfigurable Match Tables (RMT), regression models that predict the computational and energy
enabled by the introduction of the P4 language [2]. In [3] consumption of each DNN layer, while in [16] the optimal
the authors proposed N2Net, a solution that implements the solution is determined by considering device and network
forwarding pass of a Binary Neural Network (BNN) in a resource utilization to minimize end-to-end latency between
P4-enabled switch, outlining the limitations of modern pro- the edge and the cloud.
grammable networking devices in accommodating complex Only recently, with the emergence of the potential of the
ML/DL models characterized by intricate computations and in-network computing paradigm [17], attention has shifted
mathematical operations. Following this direction, the authors towards a distribution of learning functions that also exploits
of BaNaNa Split [4] extended the use of the BNN to Smart- the network segments that connect Edge and Cloud. Under-
NICs to overcome the mentioned limitations: the joint work of standing the close and crucial integration between artificial
programmable networking devices and end-host applications. intelligence and future 6G networks, the authors of [18],
Nevertheless, the proposed solution does not fit well the [19] and [20] envisaged and analyzed the structural changes
concept of ubiquitous and pervasive in-network security, since needed for the future 6G networks to naturally accommodate
it does not work without a server that shares the workload distributed artificial intelligence activities within their Data
with the networking device. Plane.
With Taurus [5] and Homunculus [6], Swamy et al. pro- Instead, Saquetti et. al. [21] focus on the constrained nature
posed to equip the programmable networking devices with of PDP devices as well as the limitations imposed by the
dedicated hardware capable of supporting map-reduce ab- reference PDP programming language (i.e., P4) when dealing
straction to perform complex mathematical operations. Main with distributed intelligence in the network. Through a simple
challenge of this approach is the need to redesign networking PoC – a neural network with 3 layers and a total of seven
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 3
neurons – they proposed an optimization model to distribute WL-VNFs and made available to the orchestration functions
the DNN within the network at single neuron granularity, with that are in the second level. This process is depicted in Fig.1.
a one-to-one mapping between PDP and neuron. However, it An optimal distribution strategy of the WL-VNFs among
turns out that this type of distribution is not feasible when the PDP devices is then decided, which allows the selected
neural network is complex, severely limiting the applicability switches that host the WL-VNFs to operate cooperatively as
of the proposal. an active IDS in the network. The activities described in this
In the wake of the recent effort to deploy intelligence paper refer to what is only theorized at the second level of the
“in-the-network” by leveraging key enablers envisioned for mentioned architecture, but not previously developed. Specifi-
upcoming 6G networks, our research aims to help fill a crucial cally, the goal is to find the set of WL-VNFs and the switches
literature and structural gap regarding network security for that host them in such a way as to maximize the security
future generation networks. The close integration between AI coverage of the considered network, i.e., the effectiveness in
and networks is a key factor for pursuing the concept of in- detecting and reacting to the maximum number of attacks.
tegrated security. By deploying virtualized anomaly detection
Strong Learner (SL)
and response functions across the network and enabling their W L1 W L3
SL Decomposition &
collaborative action, a security fabric can be created that makes mapping into WL-VNFs
III. D EPLOYMENT OF AN ML- ENABLED ACTIVE IDS IN A The functions that will then perform this activity are hosted
NETWORK DATAPLANE in the lowest level of the architecture (as shown in Fig.1),
i.e. the AI-enhanced programmable data plane. Here, the
The reference for the research reported in this paper is the
cooperative policy that the group of WLs implements provides
one presented by the authors in [1], where a new paradigm
that all flows are analyzed and the suspicious ones are properly
according to which anomaly detection capabilities are natively
marked by each WL to signal this to the following WLs that
embedded in the devices of a typical data plane of a future
must be executed on the flow to reconstruct the original SL.
programmable network is introduced. That work in fact reports
The flow, as it passes through the network, is analyzed by
only a simple proof-of-concept of the resulting ML-enabled
the various WLs that constitute the SL, and each one signals
Active Intrusion Detection System, for which instead in the
the result of its inference to the others. If a WL realizes that
present paper we propose an effective method of optimizing
it is the last of the set that constitutes an SL and that all the
the deployment of learning functions in the devices and their
others have already performed the flow analysis, it completes
related chaining. For the benefit of the reader, we briefly report
its analysis, and through a majority voting algorithm takes the
the basic concepts, referring to the aforementioned paper for
final decision, blocking the flows that the WLs chain deems
the details of the hypothesized architecture.
malicious. The algorithm is completely distributed and does
not require human involvement or of the network controller.
A. Projecting the Ensemble Learning over the Network To allow distributed WL-VNFs to inform each other on the
inferences carried out for a network flow, a custom header,
The reference framework includes all the functionalities
P4-encoded, is considered as well as a procedure carried out
to implement the proposed paradigm, distributed over three
by the PDP device augmented with the WL-VNFs.
logical levels, Artificial Intelligence Plane (AIP), Control
& Orchestration Plane (C&OP), AI-enhanced Programmable
Data Plane (AIePDP) [1]. IV. F ORMULATION
The proposed paradigm envisages that through ad hoc In this section, we propose a variant of the shortest path
functions included in the first level, the model that must be problem to optimize the deployment of the WL-VNFs.
embedded in the PDP is trained, its partitioning is performed, We represent a network using a graph. The nodes in our
and the VNFs that will carry out detection and response model represent the network nodes in which the WL-VNFs
to attacks are created. An SL appropriately trained for the can be deployed, while the edges denote the links between
purpose is then broken down into individual WLs coded as network elements. We use node coloring to represent the
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 4
implementation of specific WL-VNFs, where each color1 variable equal to 1 if and only if the edge (i, j) is visited in
corresponds to a different type of WL-VNF and the coloring the path s–t, and yic be a binary variable equal to 1 if and
cost corresponds to the associated implementation cost. For only if the vertex i is colored by c in the graph. The last set
instance, an SL composed of three WLs will determine three of variables keeps track of the coloring of the nodes in each
WL-VNFs and therefore three different colors (e.g., red, green, path s–t. In particular, given the color c, fixed the source s and
st
and blue), as shown in Fig.2. the target t, zic must be equal to 1 if and only if in the path
s–t the vertex i is colored with c and is traversed. In addition,
Color let wij ∈ Z+ be the positive weight associated with each edge
Domain
W L1 (i, j) and pc ∈ Z+ the cost of coloring a node with color c.
Strong Learner (SL)
The All-Pairs Shortest Path Coloring problem presented can
W L2 be formulated using the following programming model.
X X st
X
min wij · xij + pc · yic (1)
W L3 (s,t)∈V ×V (i,j)∈E: (i,c)∈V ×C
i̸=t∧j̸=s
s.t.
X 1
if i = s
st
X st
Fig. 2. From WL-VNFs to Colors domain. xij − xji = −1 if i = t ∀ i, s, t ∈ V (2)
0
j∈V \{s} j∈V \{t} otherwise
The graph edges are weighted to reflect a network connec- X st
xij ≤
X X st
xij ∀s,t∈V ;∀k∈S;
∀S⊊V \{s,t}:|S|≥2 (3)
tion characteristic, such as latency or bandwidth. Our objective (i,j)∈E(S) i∈S\{k} j∈V \{s}
X
is to find the optimal deployment of WL-VNFs to ensure yic ≤ 1 ∀i ∈ V (4)
c∈C
comprehensive network security coverage. X st
zic ≥ 1 ∀ s, t ∈ V ; ∀c ∈ C (5)
This approach guarantees pervasive and ubiquitous network i∈V
protection, aligning with the need for robust cybersecurity st
zic ≤
X st
xij ∀ s, t ∈ V ; ∀i ∈ V \ {t}; ∀c ∈ C (6)
measures in the evolving landscape of next-generation net- j∈V \{s}
to be minimized includes both the costs of the different paths yic ∈ {0, 1} ∀(i, c) ∈ V × C (10)
st
zic ∈ {0, 1} ∀s, t, i ∈ V ; ∀c ∈ C. (11)
between pairs of source nodes and target nodes, ensuring that
each path passes through at least one colored node for each
The objective of the model (1) is to minimize the total
color and the cost of coloring the nodes themselves. In the
weight of the traversed edges and the cost of coloring the
remainder of the section, we propose a detailed mathematical
nodes. Constraints (2) ensure flow conservation, and equations
model that represents the problem and a meta-heuristic ap-
(3) are subtour elimination constraints represented in cutset
proach, based on a Biased Random-Key Genetic Algorithm
form, named Generalized Cut-Set (GCS) inequalities. This
(BRKGA), providing a practical and scalable solution to the
latter set of constraints ensures that the number of edges with
optimization problem.
both extremes in S, i.e., |E(S)|, cannot be greater than the
number of vertices in S traversed from the s–t path. This type
A. Exact Model of constraint is necessary due to the coloring constraints (5)–
This section delves into the mathematical complexities of (7), which could generally induce cycles disconnected from
the APSPC problem through the development of an Integer the simple path s–t. Constraints (4) ensure that each node is
Linear Programming (ILP) model. The problem is formulated colored with at most one color, and constraints (5) ensure that
on an undirected connected loopless graph G = (V, E), with in each shortest path s–t, there is at least one colored vertex
the goal of determining the simple shortest paths between all for each color c ∈ C. The constraints (6) and (7) ensure that
pairs of nodes (source-target) such that each path includes at a node i can contribute to the s–t path with color c only
least one vertex colored for each color in the set C. Despite if i is effectively traversed as an intermediate node or as
the undirected nature of the graph, this model incorporates the destination node, respectively. The set of constraints (8)
directed flow constraints, which are necessary for the formal establishes that if a node i contributes to at least one s–t path
definition of paths from a source node s to a target node t. with a specific color c, then i must indeed be colored with c
For this reason, with the abuse of terminology, once the nodes in the solution. Finally, constraints (9)–(11) define the variable
s and t have been fixed, any node can have outgoing and domains.
incoming edges. Three sets of binary variables are introduced Additionally, a separation procedure is developed for the
to indicate whether an edge is traversed and whether a vertex computationally expensive subtour elimination constraints (3).
is colored with a specific color; specifically, let xst So, initially, the relaxed problem is considered, meaning the
ij be a binary
subtour elimination constraints are temporarily omitted. Dur-
1 The terms “color” and “SL/WL-VNF” will be used interchangeably. More ing the resolution process, any violated subtours in the current
specifically, SL-VNF refers to a scenario in which only one color is needed. solution are identified. Regarding the separation routine, a
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 5
function of the color cost and avgNodeWeight (line 4). This V. P ERFORMANCE E VALUATION
probability assesses the cost-effectiveness of coloring the node.
If the ratio is very low, the node is colored with certainty. This section illustrates an in-depth performance evaluation
Intuitively, this means that we color the node if the cost of campaign conducted to assess the benefits of the proposal
coloring is relatively small compared to the benefit we gain in terms of both optimization and network-relevant aspects.
from coloring it. Phase 2 focuses on other characteristics of Two experimental campaigns will be described in order to
the node. The procedure calculates the NodeProbability as a accomplish this task: (i) Model Evaluation Campaigns; and
function of the ratio between nodeDegree and avgNodeWeight, (ii) Network Evaluation Campaigns.
and the chromosome gene associated with the node (line 9).
This operation allows us to determine how important it is to
color a node based on the number of connections and the A. Model Evaluation Campaigns
strength of those connections (average edge weight). If these
values indicate that the node is influential in the network, In this section, we summarize the results of our computa-
then the probability of coloring it increases. If the gene is tional experiments on the meta-heuristic defined. In particular,
too low, the probability is set to one to avoid invalidating the we conduct an in-depth analysis of the impact of various
probability calculation. Finally, a random number is generated network characteristics on the effectiveness and efficiency of
using a uniform distribution between 0.0 and 1.0, and the node the entire defined system.
is colored if this random number is less than NodeProbability The BRKGA has been implemented in C++ using clang
(line 11). The procedure returns the boolean result, indicating version 14.0.3. For the compilation, the C++17 standard was
whether the node should be colored or not. set using the CMAKE CXX STANDARD 17 specification
The calculateFitness function evaluates the fitness of a in the CMake configuration file. All the optimization
solution by calculating the aggregate path cost between all computational tests were conducted using an Apple M2 Max
pairs of nodes within the graph, based on their color as- processor with CPU 12-core and GPU 38-core and 96 GB
signments. Initially, it computes an overall color cost derived LPDDR5 of RAM running macOS Ventura 13.3.
from color assignments. Subsequently, the algorithm iterates
over each node pair to determine the shortest path between 1) Instances and Parameter Setting: In order to evaluate the
them, applying a modified Dijkstra algorithm that incorporates performance of the proposed approach, a set of instances was
the color constraints. If a valid path exists, its cost is added generated as described below. The set is composed of random
to the aggregate path cost. If no valid path is found, the topology networks, each of which is identified by a unique
algorithm designates the solution as infeasible and halts further combination of the following parameters: number of nodes
calculations. (n), edge density (d), and color cost ranges (cr). In particular,
we considered: four values for the number of the nodes, i.e.,
n ∈ {10, 15, 25, 30}; four values for the edge density, that
C. Strong Learner Splitting and #colors Selection determines the number of the edges #e = d · n(n − 1)/2,
with d ∈ {0.25, 0.35, 0.45, 0.55}; and four ranges of values
The number of colors (i.e., the different WLs that com- for the color cost, i.e., cr1 = [1, 125], cr2 = [50, 150],
pose the entire SL) available to color the nodes of a given cr3 = [75, 175], cr4 = [100, 200]. For each instance, the
graph is chosen using the function defined below, denoted as number of colors is uniquely determined by the function (12).
cd : R → 2 Z + 1. Given a real number x, this function More in detail, given a certain number of nodes, we start by
returns the largest odd integer less than x or returns 3 if the generating the minimum spanning tree G = (V, E) first to
largest integer less than x is 2. Formally: ensure connectivity, then we randomly add edges to E, until
the needed number of edges, determined by the edge density
3
if ⌊x⌋ = 2 parameter, is reached. The costs of the edges are determined
cd(x) := ⌊x⌋ − 1 if ⌊x⌋ ∈ 2Z \ {2} as a sample from a uniform distribution in the interval [1, 200].
⌊x⌋ if ⌊x⌋ ∈ 2Z + 1. The color costs are determined as a sample from a uniform
distribution in the color costs value range parameter. For each
The exact number of colors, #colors, available for the graph scenario, identified by a given combination of values of n,
G = (V, E) is given by evaluating the function cd in the d, and cr, we generated six different random instances, for a
average number of nodes present in all classical shortest paths, total of 384 instances, by varying the seed used to initialize
i.e., without the coloring constraint. the random number generator. We organized each set into four
classes, based on edge density, named {EDi }4i=1 .
For the metaheuristic parameters, we carried out a pre-
2 X
#colors = cd d(i, j) , (12) liminary tuning phase using irace, a tool that performs an
|V | · (|V | − 1) automatic configuration to optimize parameter values (refer
(i,j)∈E|i<j
to [26] for details). This tuning was done using four random
where d(i, j) is the number of nodes present in the classical instances of each of the ADi sets. Table I summarizes the
shortest path between i and j calculated using the Dijkstra tuned parameters of the BRKGA, grouping them into three
algorithm. sets: Operator, IPR-Per and Others.
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 7
TABLE I ranges from 3.08 in the ED3 class to 3.17 in ED1, while in
T UNED BRKGA PARAMETERS . the ED4 class, all instances have #colors equal to 3. Overall,
5 instances with 5 colors were recorded. With 30 nodes, the
Operator IP R − P er Other
highest #colors values are recorded, ranging from 3.08 in the
pcte pctm πt πe ϕ sel md pctp α m
0.1 0.6 3 1 1/r2 randS 0.15 0.85 20 2 ED4 class to 3.42 in ED1. In total, 4 instances with 7 colors
and 3 instances with 5 colors were recorded. Therefore, for
each density class, as the number of nodes increases, #colors
also increases. These trends can be explained by the fact that,
2) Experimental Results: The summary table will be pre- in fully random topologies with a greater number of nodes
sented by grouping instances according to their density class and/or relatively low density, it is more likely to find, on
EDi and the number of nodes. Each row in the tables refers average, the shortest path with a higher length, which requires
to a subset of instances from a given set that share the the use of more colors, as expected from the definition of the
same edge density and, where specified, the same number cd function.
of nodes. These are indicated by the descriptor in the Set As expected, #N Dy increases with the total number of
column, where the acronym “ED” stands for edge density nodes in the network. For example, in the case of 10 nodes
and “N” stands for nodes. Furthermore, all time values are and density class ED1, the average number of deployed nodes
measured in seconds. Table II provides detailed information is 4.21, while with 30 nodes in the same class, it increases to
on the results obtained by applying BRKGA to the set of 14.79. This trend is consistent across all classes, confirming
all instances. Each row reports the average values for the that as the graph size increases, more nodes are involved in the
following parameters: the number of available colors in the deployment of learning models and VNFs necessary to ensure
instances (#colors), calculated using the cd function; the time network security coverage. With the same number of nodes, it
taken by the metaheuristic to identify the obtained solution is observed that as density increases, the number of deployed
(BestT ime (s)); the total execution time (Time (s)); the nodes tends to increase. For instance, for N = 15, #N Dy
number of deployed nodes (#N Dy); the total solution cost increases from 6.92 in density class ED1 to 9.00 in class ED2,
(Cost); color-related costs (Costc ); and path cost (Costp ). The and 8.08 in class ED4. The scalability of the proposed model is
number of referred instances is 24 for each row aggregating on evident from the way it adapts to networks of varying sizes and
both the edge density and the number of nodes, and 96 for the densities. The increase in #N Dy with the growth in both the
AV G rows aggregating only on the edge density. For all the number of nodes and density shows that the model can handle
experiments, we set the time limit equal to 900 seconds and larger and more complex network topologies. This scalability
the maximum of consecutive iterations without improvement is crucial for next-generation networks, where the number of
wi to 10. nodes and connections will continuously increase, requiring an
Analyzing the behavior of the average best time, it increases efficient distribution of learning functions across the network.
as expected as both the number of nodes and the density The increase in the number of nodes has a significant
increase. However, the effect of the number of nodes is impact on the total costs for each density class. For example,
more significant compared to the density, while still remaining observing the results in the table, for 10 nodes and ED1, the
below 1 minute. In particular, as shown in Table II, we observe Cost is around 2 · 104 , while for 30 nodes in the same density
that with 10 nodes, the BestTime consistently stays within the class, the cost rises to approximately 6.5 · 104 . This increase
0.08–0.32 second range, regardless of density. With 15 nodes, is attributable to the rise in both deployment costs (Costc )
it increases significantly compared to 10 nodes, but remains and shortest path costs (Costp ), as larger networks require
manageable, ranging between 1.22 and 2.70 seconds. With the distribution of VNFs across more nodes and covering
25 nodes, there is an increase, but still limited, in fact, it longer distances. Density, however, follows a different trend.
rises to 16.99 seconds for ED1 and 21.62 seconds for ED3. As density increases, Costp decreases because the paths
With 30 nodes, the highest recorded BestTime is observed, between nodes become shorter. Nevertheless, Costc tends to
with values ranging from 34.21 seconds for ED1 to 55.65 rise slightly with the increase in density, as more nodes are
seconds for ED4. In general, it is observed that as the density needed to manage the more connected network. Therefore,
increases, the BestTime increases linearly for each number of since Costp constitutes the vast majority of the total cost
nodes. This increase becomes greater as the number of nodes for each set of instances (over 90%), the average total cost
increases. In addition, for each density class, it is noted that decreases, as can be seen from the AVG rows.
as the number of nodes increases, the BestTime increases in a
non-linear manner. Similarly, the total runtime of the BRKGA
follows a linear trend as the density increases for each number B. Network Evaluation Campaigns
of nodes and a non-linear trend as the network size increases During a further experimental campaign, we compared the
for each density class. performance of the data plane devices when dealing with an
Regarding the average number of colors identified by the entire ML model and when, instead, the model is decomposed
cd function, it is observed that, on average, the number of following our deployment approach. We measured the time to
colors increases as the density decreases. Specifically, in all obtain the classification outcome – namely classification time
instances with 10 and 15 nodes, #colors is always equal to – and the throughput guaranteed by the networking devices
the minimum available, which is 3. With 25 nodes, the average that execute the additional and AI-related task. In addition, to
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 8
Average DDoS Packet Size Analysis At 500 pkts/s the gap starts to be prominent, with an average
2500
2208 2208 2208 2208 2208 2208 2208
classification time of ∼360 ms for the SL-VNF against 20
2136
ms for the split configuration. When the attack rate is around
2000
Average Packet Size [bytes]
2) Classification Time Analysis: We also analyzed the av- 3) Throughput Analysis: In a further test campaign the
erage classification time of the networking devices within the average throughput of the PDPs in both configurations, i.e.,
proposed distributed approach under increasing traffic loads. SL-VNF and WL-VNF, is measured by varying the network
In Fig. 4a, the achievable average classification time under topologies and the related value of #colors. This is to demon-
a varying attack rate is shown. With the first small topology strate that the proposed approach of optimizing the distribution
(10 switches, 50 hosts of which 25 are attackers) – which of active IDS features is scalable in terms of network devices’
requires a SL composed of three WLs to guarantee the security capacity in managing network traffic.
coverage – it can be observed that while the amount of handed It is observed in Figs. 5 that the WL-VNFs deployment
packets is around 200–400 pkts/s the SL-VNF configuration setting shows the best gain for the network, both in terms
performs better, showing an average classification time that is of throughput and delays, with the increase in the amount
about 60% less than the WL-VNF (an average of 0.62 ms of of traffic generated by the distributed malicious hosts. When
the SL-VNF against 1.7 ms of the WL-VNF). This is due to the considering the SL-VNF configuration, the throughput experi-
additional intermediate communication that happens between enced by the network devices decreases as the SL complexity
the PDPs to get the final classification. However, as the attack increases (from three to seven WLs), mainly due to the
rate intensifies and the switches become overwhelmed with increasing number of WLs that need to be queried on a single
network packets to analyze, this advantage diminishes, allow- PDP. With the simplest SL, the average network throughput
ing the WL-VNFs configuration to demonstrate its strengths starts to drop below 5 Mbps when the attack rate is 700 pkt/s,
in handling critical attack situations. The differences can be quickly approaching 0 Mbps at 800 pkt/s. This trend worsens
appreciated when the attack rate is in the range of 600–800 when considering more complex SLs. The #colors = 5
pkts/s, with the classification time more than halved. Under scenario shows that the network throughput drops to zero
heavy attack load, 900–1000 pkts/s, the SL-VNF configuration when approaching an attack rate of 600–700 pkt/s. Even worse
is not able to timely handle the classification tasks, reaching a is the case of the most complex SL (#colors = 7), whose
maximum time to complete classification which is more than overhead causes the average network throughput to approach
1000 ms against the ∼ 200 ms achieved through the adoption zero starting from an attack rate in the range of 400-500 pkt/s.
of the proposed model splitting and distribution paradigm. In such situations, data plane devices experience substantial
In Fig. 4b the results with the medium network topology degradation in their forwarding capabilities.
(25 network switches and 125 hosts – 75 attackers) and However, when the SL is split and distributed across the
a SL composed of five WL-VNF. In this case, due to the network, the computational load imposed on the PDP devices
lesser model complexity, the benefits of the proposal can be is alleviated, making it possible to consider the integration of
appreciated starting from 300–400 pkts/s and it shows its even complex AI models within the network without affecting
effectiveness around 500–600 pkts/s by reducing the time to the normal network operation too much. In fact, when con-
complete the classification of more than 90%. Even under the sidering the #colors = 3 scenario and the split configuration,
highest attack rate (1000 pkts/s), the reduction achieved by the average network throughput starts to drop below 5 Mpbs
the proposal is more than 50% (∼1500 ms with the proposal with an attack rate of 900–1000 pkt/s. Considering an attack
against ∼3600 ms with the SL-VNF configuration). rate in the range of 100–500 pkt/s, we saw a 20% increase
This trend is confirmed by the experiments carried out with in throughput on average. With a higher attack rate, this
the largest topology (see Fig. 4c), in which the optimization advantage improved further (∼50–55%), up to the point where
problem suggested an SL with seven WLs to cope with the advantages of the distributed approach ensure that the
network security coverage. In this case, the highest complexity network is still able to guarantee a minimum throughput
of the SL-VNF leads the network to be unable to timely handle while with the SL-VNF the network is completely down
classification tasks starting from an attack rate of 300 pkts/s. again. The advantages of the proposed approach become more
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 10
#colors=3: Time to Comple Traffic Classification #colors=5: Time to Comple Traffic Classification #colors=7: Time to Comple Traffic Classification
1400 4000 7000
Not Split Not Split Not Split
1200 Split 3500 Split 6000 Split
3000
Classification Time [ms]
0 0 0
100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000
Attack Rate [pkts/s] Attack Rate [pkts/s] Attack Rate [pkts/s]
#colors=3: Measured Average Throughput #colors=5: Measured Average Throughput #colors=7: Measured Average Throughput
25 25 25
Not Split Not Split Not Split
Split Split Split
20 20 20
Throughput [Mbps]
Throughput [Mbps]
Throughput [Mbps]
15 15 15
10 10 10
5 5 5
0 0 0
100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000
Attack Rate [pkts/s] Attack Rate [pkts/s] Attack Rate [pkts/s]
evident as the complexity of the SL increases and the size and split AI approach are clear, making it a viable solution
of the network expands. When it is necessary to split a SL for supporting AI-relevant tasks within current as well as
into five WLs to cover the network, the resulting reduction future PDP devices. Finally, it is important to highlight a
of the computational burden in each device preserves even key feature of the proposed approach: it can effectively
more the average network throughput. Indeed, the average operate (without any modification) with both encrypted and
network throughput is in the range [∼6, ∼15] Mbps even unencrypted network traffic, as it relies exclusively on header
under attack rates of 700–1000 pkts/s, where instead the non- information, which is always transmitted in plaintext.
split configuration causes the average throughput measured
on the PDPs to be zero. Finally, in the #colors = 7 4) Shortest Path Detouring Analysis: To evaluate the im-
network topology, the complexity of the SL causes significant pact of the coloring constraints on network performance, we
performance degradation starting from attack rates of 400 also conducted an analysis of the AWDelay metric introduced
packets per second (leading to a rapid zeroing of the average previously. Specifically, we assessed the impact of network
throughput), while the proposed distributed approach improves density and size on path detours by analyzing the average
scalability. This method effectively manages the computational weighted delay.
overhead, allowing the network to handle large attack volumes Table III presents the average AWDelay values grouped
while maintaining a satisfactory level of throughput. by the number of nodes N and the density class ED. The
Nonetheless, a truly zero-cost solution does not exist yet. AWDelay values shown for each combination of N and ED
The execution of models still imposes a measurable impact represent the average computed across all instances discussed
on network throughput, with an observed average value of in Section V-A2. The AVG row reports the average calculated
approximately 35 Mbps when no SL/WL-VNFs are active based on the nodes, while the column AVG shows the average
within the switch. This limitation stems from the technological relative to the density. Additionally, the row labeled VAR
constraints of current networking devices which are not yet indicates the variance of all AWDelay values for each number
inherently designed to fully support the seamless integration of nodes N , providing a measure of data dispersion and
of networking and AI workflows. However, it is expected allowing us to assess the variability with the number of nodes.
that these issues will be resolved in future 6G networks, The plots shown in Fig. 6 represent the cumulative distri-
which will likely incorporate advanced, high-performance bution of AWDelay for the density classes for each node class.
chips capable of significantly increasing computational power. Thus, each curve shows the cumulative percentage of recorded
Having said that, the advantages of the proposed distributed results that exhibit an AWDelay less than or equal to a specific
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 11
ED1 ED2 ED3 ED4 ED1 ED2 ED3 ED4 ED1 ED2 ED3 ED4 ED1 ED2 ED3 ED4
1.0 1.0 1.0 1.0
0.9 0.9 0.9 0.9
0.8 0.8 0.8 0.8
cumulative distribution
cumulative distribution
cumulative distribution
cumulative distribution
0.6 0.6 0.6 0.6
0.5 0.5 0.5 0.5
0.4 0.4 0.4 0.4
0.3 0.3 0.3 0.3
0.2 0.2 0.2 0.2
0.1 0.1 0.1 0.1
0.0 0.0 0.0 0.0
0% 3% 6% 9% 12% 15% 18% 21% 24% 27% 30% 0% 2% 4% 6% 8% 10% 12% 14% 0.0% 0.4% 0.8% 1.2% 1.6% 2.0% 2.4% 2.8% 3.2% 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0%
AWDelay AWDelay AWDelay AWDelay
value indicated on the x-axis. identical behavior, with a high percentage of observed values
For N = 10, a clear upward trend in the curves is observed, (90%) having delays below about 1%. The ED2 curve shows
where a high percentage of observed values (around 60%) a similar trend but with a slightly more gradual increase,
is concentrated within the lower AWDelay range (0–5.5%), indicating greater variability in delays compared to the other
especially for the first three density classes. On the other density classes. The minimum and maximum AWDelay values
hand, the results for ED4 show generally higher delays, but are 0.01% and 3.30%, respectively, with an overall average of
more spread out over a wider interval. Specifically, the curves 0.57%. The curves associated with 30 and 25 nodes converge
associated with the first three density classes show a rapid ac- much more quickly compared to those for 10 and 15 nodes.
cumulation around 5% AWDelay, while the ED4 curve shows a This suggests that, as the number of nodes increases, the effect
slower accumulation, suggesting a more dispersed distribution of network density becomes less pronounced, leading to more
of delays, with the presence of paths experiencing higher similar delay distributions.
delays. In this class of nodes, the minimum and maximum The results of the experiments, as shown in the Table, indi-
AWDelay values are 0.52% and 29.30%, respectively, with an cate that the average weighted delay behaves consistently as
overall average of 6.58%. the network grows in size. Specifically, AWDelay significantly
For N = 15, the graph in Fig. 6.(b) shows a behavior similar decreases with an increasing number of nodes. For example, in
to what was previously observed, but with some significant networks with 10 nodes in the density class ED1, the average
differences. First of all, for all density classes, 80% of delays delay reaches around 5%, while for networks with 30 nodes,
are below about 5%. A slight difference is seen in the ED3 the delay drops to approximately 0.5%. This trend can also be
class, where about 95% of the values are concentrated in the observed in the average delay, which decreases from 6.58%
lower AWDelay range (0–5%). Another difference is that in with 10 nodes to 0.57% with 30 nodes. This indicates that
this class, the trends of the four curves are quite similar. the overhead introduced by the coloring constraints becomes
The minimum and maximum AWDelay values are 0.34% and less significant in larger networks, making the approach more
13.45%, respectively, with an overall average of 3.54%. scalable and efficient as the network grows.
Compared to the previous plots, the graph with 25 nodes Interestingly, when varying the density for a fixed number of
(Fig. 6.(c)) shows a more concentrated distribution of AWDelay nodes, except for the case with 10 nodes, the average AWDelay
values. All the plots reach 90% of the cumulative distribution remains almost constant. The variance of all AWDelay values
at lower AWDelay values compared to the previous plots. decreases from 3 · 10−3 for N = 10 to 3 · 10−5 for N = 30.
This indicates that most of the paths in networks with 25 This behavior is attributed to the fact that as density increases,
nodes experience lower delays, concentrating below around and consequently, the number of available paths increases, the
2.5% AWDelay. Specifically, the curves for the first three probability of significant detours from the classic shortest path
density classes show almost identical behavior, with very decreases, thus mitigating any further delay reduction. For
rapid accumulation (90%) for delays below about 1.5%. The example, networks with N = 30 and higher density classes
ED4 curve shows a similar trend, although it has a slightly (such as ED4) consistently show lower AWDelay values,
more gradual increase, suggesting greater variability in delays supporting the hypothesis that denser networks provide more
compared to the other density classes, but still well-contained direct alternative paths even with coloring constraints. The
compared to cases with fewer nodes. The minimum and stability of AWDelay across different density classes reinforces
maximum AWDelay values are 0.04% and 3.19%, respectively, the robustness of our approach, as the method maintains a
with an overall average of 0.88%. consistent balance between security and efficiency without sig-
Similarly, in the plots of Fig. 6.(d), as previously observed nificantly compromising network performance, even in denser
for the instances with 25 nodes, the AWDelay values are topologies.
concentrated within a very narrow range (up to 3.5%). Similar This trend is further supported by the variability observed in
to the previous case, all the curves reach 90% of the cumulative Fig. 7, where the box plots illustrate the distribution of AWDe-
distribution at AWDelay values below about 2%. Specifically, lay across different densities. In particular, the interquartile
the curves representing ED1, ED3, and ED4 show almost ranges expand in sparser networks, showing greater variability
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 12
in path efficiency due to the limited number of feasible paths [2] C. Kim, “Programming the network dataplane,” ACM SIGCOMM:
that meet the coloring constraints. The box plots also highlight Florianopolis, Brazil, 2016.
[3] G. Siracusano and R. Bifulco, “In-network Neural Networks,” arXiv
that in more connected networks, such as those with ED4, preprint arXiv:1801.05731, 2018.
the AWDelay distribution is more compact, suggesting a more [4] D. Sanvito, G. Siracusano, and R. Bifulco, “Can the network be the
uniform detour behavior. AI accelerator?” in Proceedings of the 2018 Morning Workshop on In-
Network Computing, 2018, pp. 20–25.
[5] T. Swamy, A. Rucker, M. Shahbaz, I. Gaur, and K. Olukotun, “Taurus:
30% a data plane architecture for per-packet ML,” in Proceedings of the 27th
N10
N15
ACM International Conference on Architectural Support for Program-
25%
N25
ming Languages and Operating Systems, 2022, pp. 1099–1114.
20%
[6] T. Swamy et al., “Homunculus: Auto-Generating Efficient Data-Plane
N30
ML Pipelines for Datacenter Networks,” in Proceedings of the 28th ACM
AWDelay