0% found this document useful (0 votes)
15 views

Optimal In-Network Distribution of Learning Functions For A Secure-by-Design Programmable Data Plane of Next-Generation Networks.18384v1

Uploaded by

neturiue
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Optimal In-Network Distribution of Learning Functions For A Secure-by-Design Programmable Data Plane of Next-Generation Networks.18384v1

Uploaded by

neturiue
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

JOURNAL OF LATEX CLASS FILES, VOL. XX, NO.

X, XXX 2024 1

Optimal In-Network Distribution of Learning


Functions for a Secure-by-Design Programmable
Data Plane of Next-Generation Networks
Mattia Giovanni Spina, Edoardo Scalzo, Floriano De Rango, Francesca Guerriero, Antonio Iera

Abstract—The rise of programmable data plane (PDP) and that focuses on the use of distributed artificial intelligence
in-network computing (INC) paradigms paves the way for the (AI) techniques is the so-called “In-network distributed in-
arXiv:2411.18384v1 [cs.NI] 27 Nov 2024

development of network devices (switches, network interface telligence”, which aims to enable network devices to collab-
cards, etc.) capable of performing advanced computing tasks.
This allows to execute algorithms of various nature, including orate and make intelligent decisions autonomously, without
machine learning ones, within the network itself to support user the need for centralized control. This paradigm can make
and network services. In particular, this paper delves into the networks more scalable and fault-tolerant (as they become
issue of implementing in-network learning models to support less dependent on centralized controls) and highly adaptable
distributed intrusion detection systems (IDS). It proposes a model to changing conditions and traffic distributions in real-time
that optimally distributes the IDS workload, resulting from
the subdivision of a “Strong Learner” (SL) model into lighter through intelligent decisions about traffic routing, resource
distributed “Weak Learner” (WL) models, among data plane management, and network performance optimization.
devices; the objective is to ensure complete network security Recently, interest is emerging in solutions that go beyond
without excessively burdening their normal operations. Further- the standard uses of distributed intelligence on the network
more, a meta-heuristic approach is proposed to reduce the long (such as supporting Self-optimizing networks, Autonomous
computational time required by the exact solution provided by
the mathematical model, and its performance is evaluated. The network management, and Context-aware networking), aiming
analysis conducted and the results obtained demonstrate the to improve network security by allowing AI-enhanced network
enormous potential of the proposed new approach to the creation devices to autonomously distinguish between legitimate and
of intelligent data planes that effectively act as a first line of anomalous traffic flows. This can, at the same time, improve
defense against cyber attacks, with minimal additional workload the accuracy and increase the speed of intrusion detection.
on network devices.
For their part, the fixed-perimeter nature of traditional IDSs
Index Terms—In-Network Computing, Distributed AI, IDS, is no longer adequate for the highly pervasive and dynamic
Programmable Data Plane, Security by Design. nature of next-generation networks. Even recent solutions in
the literature, which rely on in-network telemetry and traffic
I. I NTRODUCTION data forwarding to a centralized SDN controller that runs the
detection module and completes the decision-making process,
T HE evolving cyber threat landscape requires increas-
ingly agile and adaptable cyber-security solutions. The
emerging paradigms of in-network computing (INC) and in-
do not meet the mentioned requirements.
Next-generation networks require Active IDSs (also called
network distributed learning (INDS), coupled with the concept Intrusion Prevention Systems - IPS), which leverage the INC
of distributed Intrusion Detection Systems (IDS), emerge as and distributed intelligence paradigms to process and analyze
key components to address the challenge. The integration network data within Programmable Data Plane (PDP) devices,
of these concepts has in fact the potential to revolutionize and enable the devices themselves to block threats through
network security by offering a robust, scalable, and resilient completely decentralized procedures; thereby improving the
defense against ever-evolving threats. effectiveness and timeliness of intrusion detection and ensur-
INC exploits the idea of distributing computational tasks ing greater scalability, resilience, and fault tolerance.
across the network infrastructure, rather than relying solely on In this paper we refer to a new paradigm of Active
edge or cloud computing resources. To this end, it leverages Intrusion Detection Systems, we recently proposed in [1],
the capabilities of network devices, such as switches, routers, which leverages the concept of AI model splitting to split a
and network interface cards (NICs), to perform data processing Strong Learner (SL) model into its individual Weak Learner
or caching. An interesting subfield of In-Network Computing (WL) components. The latter are mapped into Virtual Network
Functions (VNF), with both threat detection and response
The authors are with the University of Calabria, Italy capabilities, that can be distributed among the PDP devices
M. G. Spina, F. De Rango, and A. Iera are also with CNIT, Italy. of a next-generation network.
This work was partially supported by the European Union under the
Italian National Recovery and Resilience Plan (NRRP) of NextGenerationEU, For the aforementioned paradigm to be truly effective,
partnership on “Telecommunications of the Future” (PE00000001 - program orchestration is required to always implement an optimal
“RESTART”). distribution of learning functions that truly allows the network
This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may to (i) continuously improve the accuracy of intrusion detection
no longer be accessible. by adapting to new threats, (ii) reduce the processing load, and
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 2

(iii) reduce both the impact on the standard functionality of devices with custom and expensive hardware to enable them
the involved network devices (e.g., packet forwarding) and the to perform ML/DL-relevant tasks.
reaction time to threats. The main contributions of this paper Parallel efforts have focused on encoding ML models
can therefore be summarized as follows: within programmable networking devices, particularly Ran-
• demonstrate the potential of jointly using PDP devices dom Forests (RFs) and Decision Trees (DTs). In this direction,
and in-network distributed learning to enable the network SwitchTree [7] and Forest [8] stand out as the most valuable
user plane to implement a fully distributed active IDS, examples. Both proposals strove to find the best encoding
and increase the effectiveness of this new functionality; methodology to embed DTs and RFs within constrained and
• propose an optimization model for efficient deployment instruction set-limited PDPs. Following this trend, the works in
of in-network learning models for distributed Active IDS, [9]–[12] show effort in designing a framework capable of en-
which balances security coverage with performance; coding general RF/DT within P4-enabled networking devices.
• propose a meta-heuristic approach providing a practical Recent research has demonstrated the remarkable capabilities
and scalable solution to the optimization problem; of eBPF (extended Berkeley Packet Filter), showing nearly
• conduct a comprehensive performance analysis aimed at equivalent performance to P4 in managing general-purpose
demonstrating the effectiveness of the proposed approach tasks offloaded to networking devices [13]. An important
in enhancing the protection of the network against cyber contribution in this domain is found in [14], where the authors
threats while minimizing the impact on the overall net- focus on developing an efficient and effective encoding of a
work performance. DNN using eBPF technology.
The remainder of the paper is organized as follows. Sec- A common effort emerging from the literature is the search
tion II presents the main related works in the key reference for optimal encodings of the entire (sometimes complex)
areas of this research. In Section III, an innovative paradigm ML/DL models to adapt them to network devices with reduced
that exploits distributed in-network learning models to im- impact on packet forwarding performance. None of them ad-
plement a “secure-by-design” data plane is introduced, while dresses how to intelligently distribute in-network classification
Section IV illustrates a model for the optimization of the in- modules to achieve pervasive and ubiquitous security through
network distribution of learning elements and related meta- a fully distributed and collaborative approach of such modules,
heuristic solution. The results of a comprehensive performance which is the objective of the novel paradigm studied in our
evaluation campaign are presented in Section V. Finally, in paper.
Section VI, conclusions are drawn and future work is outlined.
B. In-Network Learning Distribution
II. R ELATED W ORKS
The paradigm of the distribution of computational functions
A. In-Network Security: ML/DL-aided Traffic Classification relevant to AI (both training and inference) finds its first
With the advent of Programmable Data Plane (PDP) and evidence in the context of Edge and Cloud Computing.
INC capabilities, recent efforts have focused on the design Many works in the literature addressed the concept of
of in-network IDS solutions (also referred to as in-network decomposing a deep neural network (DNN) into its layers
classifiers) to address security-related challenges. A significant to distribute the workload between an edge mobile device
area of research investigated the use of the programmable and the cloud, proposing optimization models for this pur-
PISA (Protocol Independent Switch Architecture) switch ar- pose. Among others, in [15] the best split is determined via
chitecture by means of Reconfigurable Match Tables (RMT), regression models that predict the computational and energy
enabled by the introduction of the P4 language [2]. In [3] consumption of each DNN layer, while in [16] the optimal
the authors proposed N2Net, a solution that implements the solution is determined by considering device and network
forwarding pass of a Binary Neural Network (BNN) in a resource utilization to minimize end-to-end latency between
P4-enabled switch, outlining the limitations of modern pro- the edge and the cloud.
grammable networking devices in accommodating complex Only recently, with the emergence of the potential of the
ML/DL models characterized by intricate computations and in-network computing paradigm [17], attention has shifted
mathematical operations. Following this direction, the authors towards a distribution of learning functions that also exploits
of BaNaNa Split [4] extended the use of the BNN to Smart- the network segments that connect Edge and Cloud. Under-
NICs to overcome the mentioned limitations: the joint work of standing the close and crucial integration between artificial
programmable networking devices and end-host applications. intelligence and future 6G networks, the authors of [18],
Nevertheless, the proposed solution does not fit well the [19] and [20] envisaged and analyzed the structural changes
concept of ubiquitous and pervasive in-network security, since needed for the future 6G networks to naturally accommodate
it does not work without a server that shares the workload distributed artificial intelligence activities within their Data
with the networking device. Plane.
With Taurus [5] and Homunculus [6], Swamy et al. pro- Instead, Saquetti et. al. [21] focus on the constrained nature
posed to equip the programmable networking devices with of PDP devices as well as the limitations imposed by the
dedicated hardware capable of supporting map-reduce ab- reference PDP programming language (i.e., P4) when dealing
straction to perform complex mathematical operations. Main with distributed intelligence in the network. Through a simple
challenge of this approach is the need to redesign networking PoC – a neural network with 3 layers and a total of seven
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 3

neurons – they proposed an optimization model to distribute WL-VNFs and made available to the orchestration functions
the DNN within the network at single neuron granularity, with that are in the second level. This process is depicted in Fig.1.
a one-to-one mapping between PDP and neuron. However, it An optimal distribution strategy of the WL-VNFs among
turns out that this type of distribution is not feasible when the PDP devices is then decided, which allows the selected
neural network is complex, severely limiting the applicability switches that host the WL-VNFs to operate cooperatively as
of the proposal. an active IDS in the network. The activities described in this
In the wake of the recent effort to deploy intelligence paper refer to what is only theorized at the second level of the
“in-the-network” by leveraging key enablers envisioned for mentioned architecture, but not previously developed. Specifi-
upcoming 6G networks, our research aims to help fill a crucial cally, the goal is to find the set of WL-VNFs and the switches
literature and structural gap regarding network security for that host them in such a way as to maximize the security
future generation networks. The close integration between AI coverage of the considered network, i.e., the effectiveness in
and networks is a key factor for pursuing the concept of in- detecting and reacting to the maximum number of attacks.
tegrated security. By deploying virtualized anomaly detection
Strong Learner (SL)
and response functions across the network and enabling their W L1 W L3
SL Decomposition &
collaborative action, a security fabric can be created that makes mapping into WL-VNFs

the network the first line of defense against malicious attempts.


We think that this approach is essential to avoid the mistakes W L2
made with previous generations of networks, in which security Intelligent Deployment &
and Distribution of WL-VNFs
was not designed in perfect synergy with the network itself but
was treated as an “additional” functionality, thus opening the AIePDP
S8
door to more intelligent and malicious attacks. S6 5 4 S7
W L3
The potential of the approach described is accompanied 3 7
3
1
by new challenges, such as finding the optimal positioning 2
S1 1
S5 W L1 S2
within the network of the AI-empowered security capabilities
W L3 W L2
mentioned above to minimize both the delay in completing 3
2
tasks and the resource consumption of the network devices S4 10
S3

involved. Our paper aims precisely to contribute to finding a W L1

solution to this compelling research problem.


Fig. 1. Proposed Split-AI In-Network Distribution Strategy.

III. D EPLOYMENT OF AN ML- ENABLED ACTIVE IDS IN A The functions that will then perform this activity are hosted
NETWORK DATAPLANE in the lowest level of the architecture (as shown in Fig.1),
i.e. the AI-enhanced programmable data plane. Here, the
The reference for the research reported in this paper is the
cooperative policy that the group of WLs implements provides
one presented by the authors in [1], where a new paradigm
that all flows are analyzed and the suspicious ones are properly
according to which anomaly detection capabilities are natively
marked by each WL to signal this to the following WLs that
embedded in the devices of a typical data plane of a future
must be executed on the flow to reconstruct the original SL.
programmable network is introduced. That work in fact reports
The flow, as it passes through the network, is analyzed by
only a simple proof-of-concept of the resulting ML-enabled
the various WLs that constitute the SL, and each one signals
Active Intrusion Detection System, for which instead in the
the result of its inference to the others. If a WL realizes that
present paper we propose an effective method of optimizing
it is the last of the set that constitutes an SL and that all the
the deployment of learning functions in the devices and their
others have already performed the flow analysis, it completes
related chaining. For the benefit of the reader, we briefly report
its analysis, and through a majority voting algorithm takes the
the basic concepts, referring to the aforementioned paper for
final decision, blocking the flows that the WLs chain deems
the details of the hypothesized architecture.
malicious. The algorithm is completely distributed and does
not require human involvement or of the network controller.
A. Projecting the Ensemble Learning over the Network To allow distributed WL-VNFs to inform each other on the
inferences carried out for a network flow, a custom header,
The reference framework includes all the functionalities
P4-encoded, is considered as well as a procedure carried out
to implement the proposed paradigm, distributed over three
by the PDP device augmented with the WL-VNFs.
logical levels, Artificial Intelligence Plane (AIP), Control
& Orchestration Plane (C&OP), AI-enhanced Programmable
Data Plane (AIePDP) [1]. IV. F ORMULATION
The proposed paradigm envisages that through ad hoc In this section, we propose a variant of the shortest path
functions included in the first level, the model that must be problem to optimize the deployment of the WL-VNFs.
embedded in the PDP is trained, its partitioning is performed, We represent a network using a graph. The nodes in our
and the VNFs that will carry out detection and response model represent the network nodes in which the WL-VNFs
to attacks are created. An SL appropriately trained for the can be deployed, while the edges denote the links between
purpose is then broken down into individual WLs coded as network elements. We use node coloring to represent the
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 4

implementation of specific WL-VNFs, where each color1 variable equal to 1 if and only if the edge (i, j) is visited in
corresponds to a different type of WL-VNF and the coloring the path s–t, and yic be a binary variable equal to 1 if and
cost corresponds to the associated implementation cost. For only if the vertex i is colored by c in the graph. The last set
instance, an SL composed of three WLs will determine three of variables keeps track of the coloring of the nodes in each
WL-VNFs and therefore three different colors (e.g., red, green, path s–t. In particular, given the color c, fixed the source s and
st
and blue), as shown in Fig.2. the target t, zic must be equal to 1 if and only if in the path
s–t the vertex i is colored with c and is traversed. In addition,
Color let wij ∈ Z+ be the positive weight associated with each edge
Domain
W L1 (i, j) and pc ∈ Z+ the cost of coloring a node with color c.
Strong Learner (SL)
The All-Pairs Shortest Path Coloring problem presented can
W L2 be formulated using the following programming model.
X X st
X
min wij · xij + pc · yic (1)
W L3 (s,t)∈V ×V (i,j)∈E: (i,c)∈V ×C
i̸=t∧j̸=s
s.t. 
X 1
 if i = s
st
X st
Fig. 2. From WL-VNFs to Colors domain. xij − xji = −1 if i = t ∀ i, s, t ∈ V (2)

0
j∈V \{s} j∈V \{t} otherwise
The graph edges are weighted to reflect a network connec- X st
xij ≤
X X st
xij ∀s,t∈V ;∀k∈S;
∀S⊊V \{s,t}:|S|≥2 (3)
tion characteristic, such as latency or bandwidth. Our objective (i,j)∈E(S) i∈S\{k} j∈V \{s}
X
is to find the optimal deployment of WL-VNFs to ensure yic ≤ 1 ∀i ∈ V (4)
c∈C
comprehensive network security coverage. X st
zic ≥ 1 ∀ s, t ∈ V ; ∀c ∈ C (5)
This approach guarantees pervasive and ubiquitous network i∈V
protection, aligning with the need for robust cybersecurity st
zic ≤
X st
xij ∀ s, t ∈ V ; ∀i ∈ V \ {t}; ∀c ∈ C (6)
measures in the evolving landscape of next-generation net- j∈V \{s}

works. Practically, we modified the behavior of the shortest st


X st
ztc ≤ xjt ∀ s, t ∈ V ; ∀c ∈ C (7)
path problem by adding and taking into account coloring j∈V \{t}
st
constraints designing and introducing a new model named All- zic ≤ yic ∀ s, t, i ∈ V ; ∀c ∈ C (8)
st
Pairs Shortest Path Coloring problem (APSPC), where the cost xij ∈ {0, 1} ∀s, t, ∈ V ; ∀(i, j) ∈ E (9)

to be minimized includes both the costs of the different paths yic ∈ {0, 1} ∀(i, c) ∈ V × C (10)
st
zic ∈ {0, 1} ∀s, t, i ∈ V ; ∀c ∈ C. (11)
between pairs of source nodes and target nodes, ensuring that
each path passes through at least one colored node for each
The objective of the model (1) is to minimize the total
color and the cost of coloring the nodes themselves. In the
weight of the traversed edges and the cost of coloring the
remainder of the section, we propose a detailed mathematical
nodes. Constraints (2) ensure flow conservation, and equations
model that represents the problem and a meta-heuristic ap-
(3) are subtour elimination constraints represented in cutset
proach, based on a Biased Random-Key Genetic Algorithm
form, named Generalized Cut-Set (GCS) inequalities. This
(BRKGA), providing a practical and scalable solution to the
latter set of constraints ensures that the number of edges with
optimization problem.
both extremes in S, i.e., |E(S)|, cannot be greater than the
number of vertices in S traversed from the s–t path. This type
A. Exact Model of constraint is necessary due to the coloring constraints (5)–
This section delves into the mathematical complexities of (7), which could generally induce cycles disconnected from
the APSPC problem through the development of an Integer the simple path s–t. Constraints (4) ensure that each node is
Linear Programming (ILP) model. The problem is formulated colored with at most one color, and constraints (5) ensure that
on an undirected connected loopless graph G = (V, E), with in each shortest path s–t, there is at least one colored vertex
the goal of determining the simple shortest paths between all for each color c ∈ C. The constraints (6) and (7) ensure that
pairs of nodes (source-target) such that each path includes at a node i can contribute to the s–t path with color c only
least one vertex colored for each color in the set C. Despite if i is effectively traversed as an intermediate node or as
the undirected nature of the graph, this model incorporates the destination node, respectively. The set of constraints (8)
directed flow constraints, which are necessary for the formal establishes that if a node i contributes to at least one s–t path
definition of paths from a source node s to a target node t. with a specific color c, then i must indeed be colored with c
For this reason, with the abuse of terminology, once the nodes in the solution. Finally, constraints (9)–(11) define the variable
s and t have been fixed, any node can have outgoing and domains.
incoming edges. Three sets of binary variables are introduced Additionally, a separation procedure is developed for the
to indicate whether an edge is traversed and whether a vertex computationally expensive subtour elimination constraints (3).
is colored with a specific color; specifically, let xst So, initially, the relaxed problem is considered, meaning the
ij be a binary
subtour elimination constraints are temporarily omitted. Dur-
1 The terms “color” and “SL/WL-VNF” will be used interchangeably. More ing the resolution process, any violated subtours in the current
specifically, SL-VNF refers to a scenario in which only one color is needed. solution are identified. Regarding the separation routine, a
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 5

method considered in [22] is used, focusing on identifying Algorithm 1 decode


the strongly connected components in the graph induced by 1: input chromosome, n := number of nodes (dimension of the chromosome)
2: procedure DECODE
the current solution. Violated GCS constraints are dynamically 3: Initialize random generator gen with seed chromosome[0]
added to the model using a modified version of Tarjan’s 4: Reset nodeColors to −1 for all nodes
5: for i ← 0 to n do
algorithm (see [23]), as proposed by [24]. 6: Select a random color using gen
7: colorCost ← colorCosts[color]
8: if SHOULD C OLOR N ODE(i, chromosome, colorCost, gen) then
9: nodeColors[i] ← color
B. Meta-heuristic 10: end if
11: end for
The BRKGA is a significant advancement in genetic algo- 12: fitness ← CALCULATE F ITNESS(nodeColors)
rithms, developed to tackle complex and large-scale combina- 13: return fitness
torial optimization problems. It uses a population of solutions 14: end procedure
represented as vectors of real numbers between 0 and 1, known
as random keys. A key component in the BRKGA is the
decoder, a deterministic function that maps the random-key fitness function, i.e., it represents the decoder. The procedure
vectors to the solution space of the specific problem. The begins with the initialization of a random number generator
decoder ensures that each vector is translated into a solution, gen using the first value of the chromosome as the seed
maintaining consistency and reproducibility of the results. (line 3). This ensures that the random generation operations
In our study, we consider a multi-parent and multi- are reproducible throughout the entire genetic evolution. In
population BRKGA with bidirectional Permutation-based Im- line 4, all nodes are initially uncolored. This is represented
plicit Path-Relinking (IPR-Per) (see [25]). During the evolu- by setting nodeColors to −1 for each node. The procedure
tion process of the considered BRKGA, several key operations iterates with a for loop over all nodes to determine whether
are utilized. It starts by creating the first generation of m each node should be colored or left uncolored. In particular,
populations and using a seed to generate all the chromosomes. for each node, in line 6 a random color is selected using
The size of a single population is calculated as p := α · n, the random number generator gen. The cost associated with
where α ≥ 1 is called population size factor; an elite the selected color is calculated by accessing the colorCosts
population is defined as pe := p·pcte , where pcte ∈ [0.1, 0.25] vector. It is then checked whether the node should be colored
is the elite percentage parameter; finally, the size of the mutant using the shouldColorNode function (line 8). In line 9, if the
population is pm := p · pctm , where pctm ∈ [0.1, 0.3] is the node should be colored, the color is assigned to the node.
mutant percentage. In the second step, the decoder converts Once colors have been assigned to all nodes, the fitness of the
the chromosomes in the APSPC solutions and consequently solution is calculated using the calculateFitness function in
computes the fitness values. If the stopping criteria are not line 12, which evaluates the quality of the solution based on the
reached, then the next step is to create a new generation assigned colors. Finally, the procedure returns the calculated
and the process is repeated by decoding new populations. In fitness value.
particular, the population of the current generation is divided
into two parts according to fitness: the elite population pe con- Algorithm 2 shouldColorNode
taining the chromosome with the best fitness, and the non-elite 1: procedure SHOULD C OLOR N ODE(node, chromosome, colorCost, gen)
2: nodeDegree ← GET N ODE D EGREE(node)
population pne which contains the rest of the chromosomes. 3: avgNodeWeight ← GETAVG N ODE W EIGHT(node)
The elite individuals are directly copied to the next generation ▷ Phase 1: Probability based on color cost
4: ColorCostFactor ← colorCost / (avgNodeWeight · (n - 1))
to preserve high-quality solutions. Mutation introduces new 5: if ColorCostFactor ≤ 0.1 then
random individuals to explore new areas of the solution space. 6: return true
7: end if
The remaining part of the population, p(1 − pcte − pctm ), is ▷ Phase 2: Probability based on other node characteristics
generated by the multi-parent crossover. For this crossover, 8: if chromosome[node] ≥ 0.1 then
9: NodeProbability ← chromosome[node] · nodeDegree / avgGraphDegree ·
it is necessary to choose three parameters, the number of avgGraphWeight / avgNodeWeight
total parents (πt ) and elite parents (πe ) to be selected; the 10: else
11: NodeProbability = 1
probability that each parent has of passing genes on to their 12: end if
child. The probability is calculated taking into account the 13: dis ← U NIFORM R EAL D ISTRIBUTION(0.0, 1.0)
14: return (dis(gen) < NodeProbability)
bias of the parent, which is defined by a pre-determined, non-
15: end procedure
increasing weighting bias function (ϕ) over its rank r. Multi-
parent crossover allows multiple parents to contribute to the
new offspring, increasing genetic diversity. Multi-population The next function to be analyzed is shouldColorNode.
evolution enables multiple populations to evolve in parallel and Algorithm 2 is designed to determine whether a node in the
exchange their best individuals, reducing the risk of premature graph should be colored based on the node’s characteristics.
convergence. Regarding global stopping criteria, we consider The procedure begins by getting the degree of the input node
two rules. The procedure is interrupted if either the set time and the average weight of the edges incident to the node
limit or the maximum number of consecutive iterations without (avgNodeWeight). The decision process is divided into two
improvement (wi) are reached. phases to ensure a balanced evaluation, it is sufficient that one
Algorithm 1 is designed to transform a chromosome into of the two phases is verified for the node to be colored. In
a solution for the APSPC, evaluating its quality through a Phase 1, the procedure calculates the ColorCostFactor as a
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 6

function of the color cost and avgNodeWeight (line 4). This V. P ERFORMANCE E VALUATION
probability assesses the cost-effectiveness of coloring the node.
If the ratio is very low, the node is colored with certainty. This section illustrates an in-depth performance evaluation
Intuitively, this means that we color the node if the cost of campaign conducted to assess the benefits of the proposal
coloring is relatively small compared to the benefit we gain in terms of both optimization and network-relevant aspects.
from coloring it. Phase 2 focuses on other characteristics of Two experimental campaigns will be described in order to
the node. The procedure calculates the NodeProbability as a accomplish this task: (i) Model Evaluation Campaigns; and
function of the ratio between nodeDegree and avgNodeWeight, (ii) Network Evaluation Campaigns.
and the chromosome gene associated with the node (line 9).
This operation allows us to determine how important it is to
color a node based on the number of connections and the A. Model Evaluation Campaigns
strength of those connections (average edge weight). If these
values indicate that the node is influential in the network, In this section, we summarize the results of our computa-
then the probability of coloring it increases. If the gene is tional experiments on the meta-heuristic defined. In particular,
too low, the probability is set to one to avoid invalidating the we conduct an in-depth analysis of the impact of various
probability calculation. Finally, a random number is generated network characteristics on the effectiveness and efficiency of
using a uniform distribution between 0.0 and 1.0, and the node the entire defined system.
is colored if this random number is less than NodeProbability The BRKGA has been implemented in C++ using clang
(line 11). The procedure returns the boolean result, indicating version 14.0.3. For the compilation, the C++17 standard was
whether the node should be colored or not. set using the CMAKE CXX STANDARD 17 specification
The calculateFitness function evaluates the fitness of a in the CMake configuration file. All the optimization
solution by calculating the aggregate path cost between all computational tests were conducted using an Apple M2 Max
pairs of nodes within the graph, based on their color as- processor with CPU 12-core and GPU 38-core and 96 GB
signments. Initially, it computes an overall color cost derived LPDDR5 of RAM running macOS Ventura 13.3.
from color assignments. Subsequently, the algorithm iterates
over each node pair to determine the shortest path between 1) Instances and Parameter Setting: In order to evaluate the
them, applying a modified Dijkstra algorithm that incorporates performance of the proposed approach, a set of instances was
the color constraints. If a valid path exists, its cost is added generated as described below. The set is composed of random
to the aggregate path cost. If no valid path is found, the topology networks, each of which is identified by a unique
algorithm designates the solution as infeasible and halts further combination of the following parameters: number of nodes
calculations. (n), edge density (d), and color cost ranges (cr). In particular,
we considered: four values for the number of the nodes, i.e.,
n ∈ {10, 15, 25, 30}; four values for the edge density, that
C. Strong Learner Splitting and #colors Selection determines the number of the edges #e = d · n(n − 1)/2,
with d ∈ {0.25, 0.35, 0.45, 0.55}; and four ranges of values
The number of colors (i.e., the different WLs that com- for the color cost, i.e., cr1 = [1, 125], cr2 = [50, 150],
pose the entire SL) available to color the nodes of a given cr3 = [75, 175], cr4 = [100, 200]. For each instance, the
graph is chosen using the function defined below, denoted as number of colors is uniquely determined by the function (12).
cd : R → 2 Z + 1. Given a real number x, this function More in detail, given a certain number of nodes, we start by
returns the largest odd integer less than x or returns 3 if the generating the minimum spanning tree G = (V, E) first to
largest integer less than x is 2. Formally: ensure connectivity, then we randomly add edges to E, until
 the needed number of edges, determined by the edge density
3
 if ⌊x⌋ = 2 parameter, is reached. The costs of the edges are determined
cd(x) := ⌊x⌋ − 1 if ⌊x⌋ ∈ 2Z \ {2} as a sample from a uniform distribution in the interval [1, 200].


⌊x⌋ if ⌊x⌋ ∈ 2Z + 1. The color costs are determined as a sample from a uniform
distribution in the color costs value range parameter. For each
The exact number of colors, #colors, available for the graph scenario, identified by a given combination of values of n,
G = (V, E) is given by evaluating the function cd in the d, and cr, we generated six different random instances, for a
average number of nodes present in all classical shortest paths, total of 384 instances, by varying the seed used to initialize
i.e., without the coloring constraint. the random number generator. We organized each set into four
classes, based on edge density, named {EDi }4i=1 .
For the metaheuristic parameters, we carried out a pre-
 2 X 
#colors = cd d(i, j) , (12) liminary tuning phase using irace, a tool that performs an
|V | · (|V | − 1) automatic configuration to optimize parameter values (refer
(i,j)∈E|i<j
to [26] for details). This tuning was done using four random
where d(i, j) is the number of nodes present in the classical instances of each of the ADi sets. Table I summarizes the
shortest path between i and j calculated using the Dijkstra tuned parameters of the BRKGA, grouping them into three
algorithm. sets: Operator, IPR-Per and Others.
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 7

TABLE I ranges from 3.08 in the ED3 class to 3.17 in ED1, while in
T UNED BRKGA PARAMETERS . the ED4 class, all instances have #colors equal to 3. Overall,
5 instances with 5 colors were recorded. With 30 nodes, the
Operator IP R − P er Other
highest #colors values are recorded, ranging from 3.08 in the
pcte pctm πt πe ϕ sel md pctp α m
0.1 0.6 3 1 1/r2 randS 0.15 0.85 20 2 ED4 class to 3.42 in ED1. In total, 4 instances with 7 colors
and 3 instances with 5 colors were recorded. Therefore, for
each density class, as the number of nodes increases, #colors
also increases. These trends can be explained by the fact that,
2) Experimental Results: The summary table will be pre- in fully random topologies with a greater number of nodes
sented by grouping instances according to their density class and/or relatively low density, it is more likely to find, on
EDi and the number of nodes. Each row in the tables refers average, the shortest path with a higher length, which requires
to a subset of instances from a given set that share the the use of more colors, as expected from the definition of the
same edge density and, where specified, the same number cd function.
of nodes. These are indicated by the descriptor in the Set As expected, #N Dy increases with the total number of
column, where the acronym “ED” stands for edge density nodes in the network. For example, in the case of 10 nodes
and “N” stands for nodes. Furthermore, all time values are and density class ED1, the average number of deployed nodes
measured in seconds. Table II provides detailed information is 4.21, while with 30 nodes in the same class, it increases to
on the results obtained by applying BRKGA to the set of 14.79. This trend is consistent across all classes, confirming
all instances. Each row reports the average values for the that as the graph size increases, more nodes are involved in the
following parameters: the number of available colors in the deployment of learning models and VNFs necessary to ensure
instances (#colors), calculated using the cd function; the time network security coverage. With the same number of nodes, it
taken by the metaheuristic to identify the obtained solution is observed that as density increases, the number of deployed
(BestT ime (s)); the total execution time (Time (s)); the nodes tends to increase. For instance, for N = 15, #N Dy
number of deployed nodes (#N Dy); the total solution cost increases from 6.92 in density class ED1 to 9.00 in class ED2,
(Cost); color-related costs (Costc ); and path cost (Costp ). The and 8.08 in class ED4. The scalability of the proposed model is
number of referred instances is 24 for each row aggregating on evident from the way it adapts to networks of varying sizes and
both the edge density and the number of nodes, and 96 for the densities. The increase in #N Dy with the growth in both the
AV G rows aggregating only on the edge density. For all the number of nodes and density shows that the model can handle
experiments, we set the time limit equal to 900 seconds and larger and more complex network topologies. This scalability
the maximum of consecutive iterations without improvement is crucial for next-generation networks, where the number of
wi to 10. nodes and connections will continuously increase, requiring an
Analyzing the behavior of the average best time, it increases efficient distribution of learning functions across the network.
as expected as both the number of nodes and the density The increase in the number of nodes has a significant
increase. However, the effect of the number of nodes is impact on the total costs for each density class. For example,
more significant compared to the density, while still remaining observing the results in the table, for 10 nodes and ED1, the
below 1 minute. In particular, as shown in Table II, we observe Cost is around 2 · 104 , while for 30 nodes in the same density
that with 10 nodes, the BestTime consistently stays within the class, the cost rises to approximately 6.5 · 104 . This increase
0.08–0.32 second range, regardless of density. With 15 nodes, is attributable to the rise in both deployment costs (Costc )
it increases significantly compared to 10 nodes, but remains and shortest path costs (Costp ), as larger networks require
manageable, ranging between 1.22 and 2.70 seconds. With the distribution of VNFs across more nodes and covering
25 nodes, there is an increase, but still limited, in fact, it longer distances. Density, however, follows a different trend.
rises to 16.99 seconds for ED1 and 21.62 seconds for ED3. As density increases, Costp decreases because the paths
With 30 nodes, the highest recorded BestTime is observed, between nodes become shorter. Nevertheless, Costc tends to
with values ranging from 34.21 seconds for ED1 to 55.65 rise slightly with the increase in density, as more nodes are
seconds for ED4. In general, it is observed that as the density needed to manage the more connected network. Therefore,
increases, the BestTime increases linearly for each number of since Costp constitutes the vast majority of the total cost
nodes. This increase becomes greater as the number of nodes for each set of instances (over 90%), the average total cost
increases. In addition, for each density class, it is noted that decreases, as can be seen from the AVG rows.
as the number of nodes increases, the BestTime increases in a
non-linear manner. Similarly, the total runtime of the BRKGA
follows a linear trend as the density increases for each number B. Network Evaluation Campaigns
of nodes and a non-linear trend as the network size increases During a further experimental campaign, we compared the
for each density class. performance of the data plane devices when dealing with an
Regarding the average number of colors identified by the entire ML model and when, instead, the model is decomposed
cd function, it is observed that, on average, the number of following our deployment approach. We measured the time to
colors increases as the density decreases. Specifically, in all obtain the classification outcome – namely classification time
instances with 10 and 15 nodes, #colors is always equal to – and the throughput guaranteed by the networking devices
the minimum available, which is 3. With 25 nodes, the average that execute the additional and AI-related task. In addition, to
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 8

TABLE II cooperative behavior constraints. In other words, we evaluated


D ETAILED RESULTS OF THE BRKGA the traffic detouring from the traditional shortest path that is
Set #colors BestTime Time #N Dy Cost Costc Costp caused by complying with network security constraints. The
N10ED1 3.00 0.08 0.16 4.21 20605.7 516.5 20089.2
N15ED1 3.00 1.22 2.82 6.92 30081.7 941.7 29140.1 network topologies used to test the experiments are publicly
N25ED1 3.17 16.99 40.80 12.38 50684.4 1641.1 49043.2 available at [32].
N30ED1 3.42 34.21 88.02 14.79 64995.9 1920.0 63075.9
AVG 3.15 13.13 32.95 9.57 41591.9 1254.8 40337.1 We evaluated the proposal scalability under three topologies
N10ED2 3.00 0.17 0.45 4.58 11526.4 579.4 10947.0
N15ED2 3.00 1.40 5.29 9.00 21894.2 1200.5 20693.6 of increasing dimensions: the first one with 10 nodes and 25
N25ED2 3.17 19.04 57.06 14.50 36370.6 1908.2 34462.4
N30ED2 3.25 37.27 110.52 16.83 47228.7 2190.7 45037.9 edges, for which the value of #colors computed by Eq.12 is
AVG 3.10 14.47 43.33 11.23 29254.9 1469.7 27785.2
N10ED3 3.00 0.32 0.81 4.88 9140.1 618.0 8522.1
3 (i.e., SL is splitted into three WLs); the second one with
N15ED3
N25ED3
3.00
3.08
2.70
21.62
7.61
59.56
7.42
14.21
16526.6
28602.1
865.9
1900.3
15660.1
26701.7
25 nodes and 48 edges and a computed #colors equal to 5 ;
N30ED3 3.17 36.21 163.72 17.75 38783.9 2277.9 36505.9 and the third, bigger, topology with 30 nodes and 51 edges,
AVG 3.06 15.21 57.92 11.06 23263.2 1415.5 21847.6
N10ED4 3.00 0.30 1.15 4.96 8436.8 681.6 7755.2 for which #colors = 7. The 15-node topology previously
N15ED4 3.00 2.09 8.26 8.08 13872.0 1076.8 12795.2
N25ED4 3.00 16.94 92.46 16.79 25438.2 2081.6 23356.5
considered is not used for these experiments as the calculated
N30ED4 3.08 55.65 185.47 17.25 30632.8 2004.6 28628.2 value of #colors was found to be identical to that of the
AVG 3.02 18.75 71.84 11.77 19594.9 1461.2 18133.8
10-node topology, and thus it adds little to the experiment.
The chosen topologies allow to test the scalability degree
of the proposal while increasing the model complexity and
evaluate the detouring imposed on the shortest path nature of
therefore the amount of WLs that need to be deployed to obey
the network due to the coloring constraint, we introduced the
and guarantee the network security coverage. According to
AWDelay metric. Further detail about this metric will be given
[7], these are appropriate SL complexities when dealing with
in Section V-B1
network traffic classifications. However, the proposal is general
The objective is to assess that under heavy network load,
enough to be extended to more complex models, making it
e.g., volumetric Distributed Denial of Service (DDoS), the
adaptable for other AI-relevant tasks.
reduced workload imposed on the single data plane device
The proposal has been implemented using P4-enabled
will lead the network to scale well in these critical situations
virtual PDP, namely BMv2 [33] that are based on the
guaranteeing the forwarding activities. We tested the network
v1Model architecture. Due to the limited instruction set of
by considering different attack intensities, starting with 100
the P4 language (it does not support basic operations such
pkt/s generated by each of the attackers and reaching 1000pkt/s
as division, exponentiation or logarithm), we extracted 43
with an incremental step of 100 pkt/s. To characterize the size
features of the CIC-IDS 2018. The P4 code that implements
of the DoS/DDoS packets, we analyzed the DDoS evaluation
the models and the associated feature extractor will be
dataset (CIC-DDoS2019) [27]. The dataset contains real-world
publicly made available.2
data, recorded by the Canadian Institute for Cybersecurity
(CIC), representing the most common DDoS attack types –
characterized by means of 80 network features – such as 1) Evaluating Shortest Path Detouring: AWDelay: Given a
SYN flooding, UPD DDoS, DNS-based DDoS, WebDDoS, pair of source and target nodes (s, t), we denote with SP (s, t)
and many others. On the basis of the analysis conducted on the cost of the classical shortest path between s and t, and
the average packet size (Avg Packet Size feature), we uniformly equivalently, we denote with SPC (s, t) the cost of the shortest
chose the attack packet size in the range [317,2208] bytes (see path obtained for the problem with coloring constraints.
Fig. 3). Following the work in [28], in order to parameterize We can define a weighted average of the delays as a function
the attack scenario with respect to the network topology, we of the lengths of the classical shortest paths. Let delay(s, t)
considered a number of attackers that is set to 50% of the total be the relative delay between the constrained shortest path and
hosts of the network. the classical one between source s and target t, i.e.,
In order to recreate a real experimental scenario, we
generated typical benign background traffic based on the SP (s, t) − SPC (s, t)
delay(s, t) := ,
CIC-IDS 2018 [29]. In particular, we considered the dataset SPC (s, t)
days Wednesday-14-02-2018 TrafficForML CICFlowMeter,
Wednesday-21-02-2018 TrafficForML CICFlowMeter, then the weighted average of delays is defined as follows:
Wednesday-28-02-2018 TrafficForML CICFlowMeter. We
analyzed the probability distribution of the interarrival times 2 X
registered in the benign flows (more than 1.5 million samples) AW Delay := · wij · delay(i, j),
|V | · (|V | − 1)
finding an exponential distribution with a λ = 0.4. To generate (i,j)∈E|i<j
the benign background traffic we used the Distributed Internet
Traffic Generator (D-ITG) generator [30], [31] and set the where wij is the normalization of the following weights that
lambda equal to 0.4; while the packet size is uniformly depend on the length of the classical shortest paths defined as:
distributed within a range of [16, 360] bytes. wij := elength(SPC (i,j)) .
Finally, since our proposal extends the shortest-path nature
of the networks for the sake of security, we evaluated how
much the traditional short path is affected by the security and 2 GitHub repository at [32]
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 9

Average DDoS Packet Size Analysis At 500 pkts/s the gap starts to be prominent, with an average
2500
2208 2208 2208 2208 2208 2208 2208
classification time of ∼360 ms for the SL-VNF against 20
2136
ms for the split configuration. When the attack rate is around
2000
Average Packet Size [bytes]

1000 pkts/s, the benefits of the proposal are indeed highlighted


1526 allowing the network to adapt to the huge attack rate, showing
1500
a reduction of 55% in the average classification time.
In light of the considerations made so far, it can be
1000
concluded that as the size of the network topology and the
532 load it is subjected to increase, using a split-AI approach
500
317 to distribute the workload within programmable data planes,
allows for an effective integration of complex AI-relevant
0
S P L P S P P od od ag S tasks within the network, but also a scalable and adaptable
DN LDA SSQ NT etBIO SNM SSD Flo PFlo DPL bDDo
M N N
SY UD U We solution to network changes. These results shed light on the
DDoS Attack Type
importance of split-AI approaches to cope with the upcoming
Fig. 3. Average Packet Size for DDoS attack in CIC-DDoS2019.
seamless and tight integration of networking and AI, for
future 6G networks.

2) Classification Time Analysis: We also analyzed the av- 3) Throughput Analysis: In a further test campaign the
erage classification time of the networking devices within the average throughput of the PDPs in both configurations, i.e.,
proposed distributed approach under increasing traffic loads. SL-VNF and WL-VNF, is measured by varying the network
In Fig. 4a, the achievable average classification time under topologies and the related value of #colors. This is to demon-
a varying attack rate is shown. With the first small topology strate that the proposed approach of optimizing the distribution
(10 switches, 50 hosts of which 25 are attackers) – which of active IDS features is scalable in terms of network devices’
requires a SL composed of three WLs to guarantee the security capacity in managing network traffic.
coverage – it can be observed that while the amount of handed It is observed in Figs. 5 that the WL-VNFs deployment
packets is around 200–400 pkts/s the SL-VNF configuration setting shows the best gain for the network, both in terms
performs better, showing an average classification time that is of throughput and delays, with the increase in the amount
about 60% less than the WL-VNF (an average of 0.62 ms of of traffic generated by the distributed malicious hosts. When
the SL-VNF against 1.7 ms of the WL-VNF). This is due to the considering the SL-VNF configuration, the throughput experi-
additional intermediate communication that happens between enced by the network devices decreases as the SL complexity
the PDPs to get the final classification. However, as the attack increases (from three to seven WLs), mainly due to the
rate intensifies and the switches become overwhelmed with increasing number of WLs that need to be queried on a single
network packets to analyze, this advantage diminishes, allow- PDP. With the simplest SL, the average network throughput
ing the WL-VNFs configuration to demonstrate its strengths starts to drop below 5 Mbps when the attack rate is 700 pkt/s,
in handling critical attack situations. The differences can be quickly approaching 0 Mbps at 800 pkt/s. This trend worsens
appreciated when the attack rate is in the range of 600–800 when considering more complex SLs. The #colors = 5
pkts/s, with the classification time more than halved. Under scenario shows that the network throughput drops to zero
heavy attack load, 900–1000 pkts/s, the SL-VNF configuration when approaching an attack rate of 600–700 pkt/s. Even worse
is not able to timely handle the classification tasks, reaching a is the case of the most complex SL (#colors = 7), whose
maximum time to complete classification which is more than overhead causes the average network throughput to approach
1000 ms against the ∼ 200 ms achieved through the adoption zero starting from an attack rate in the range of 400-500 pkt/s.
of the proposed model splitting and distribution paradigm. In such situations, data plane devices experience substantial
In Fig. 4b the results with the medium network topology degradation in their forwarding capabilities.
(25 network switches and 125 hosts – 75 attackers) and However, when the SL is split and distributed across the
a SL composed of five WL-VNF. In this case, due to the network, the computational load imposed on the PDP devices
lesser model complexity, the benefits of the proposal can be is alleviated, making it possible to consider the integration of
appreciated starting from 300–400 pkts/s and it shows its even complex AI models within the network without affecting
effectiveness around 500–600 pkts/s by reducing the time to the normal network operation too much. In fact, when con-
complete the classification of more than 90%. Even under the sidering the #colors = 3 scenario and the split configuration,
highest attack rate (1000 pkts/s), the reduction achieved by the average network throughput starts to drop below 5 Mpbs
the proposal is more than 50% (∼1500 ms with the proposal with an attack rate of 900–1000 pkt/s. Considering an attack
against ∼3600 ms with the SL-VNF configuration). rate in the range of 100–500 pkt/s, we saw a 20% increase
This trend is confirmed by the experiments carried out with in throughput on average. With a higher attack rate, this
the largest topology (see Fig. 4c), in which the optimization advantage improved further (∼50–55%), up to the point where
problem suggested an SL with seven WLs to cope with the advantages of the distributed approach ensure that the
network security coverage. In this case, the highest complexity network is still able to guarantee a minimum throughput
of the SL-VNF leads the network to be unable to timely handle while with the SL-VNF the network is completely down
classification tasks starting from an attack rate of 300 pkts/s. again. The advantages of the proposed approach become more
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 10

#colors=3: Time to Comple Traffic Classification #colors=5: Time to Comple Traffic Classification #colors=7: Time to Comple Traffic Classification
1400 4000 7000
Not Split Not Split Not Split
1200 Split 3500 Split 6000 Split

3000
Classification Time [ms]

Classification Time [ms]

Classification Time [ms]


1000 5000
2500
800 4000
2000
600 3000
1500
400 2000
1000

200 500 1000

0 0 0
100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000
Attack Rate [pkts/s] Attack Rate [pkts/s] Attack Rate [pkts/s]

(a) (b) (c)


Fig. 4. Average Classification Time for Experimental Scenarios: a)#colors = 3, b)#colors = 5 ,#colors = 7.

#colors=3: Measured Average Throughput #colors=5: Measured Average Throughput #colors=7: Measured Average Throughput
25 25 25
Not Split Not Split Not Split
Split Split Split
20 20 20
Throughput [Mbps]

Throughput [Mbps]

Throughput [Mbps]
15 15 15

10 10 10

5 5 5

0 0 0
100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000
Attack Rate [pkts/s] Attack Rate [pkts/s] Attack Rate [pkts/s]

(a) (b) (c)


Fig. 5. Average Throughput for Experimental Scenarios: a)#colors = 3, b)#colors = 5 ,#colors = 7.

evident as the complexity of the SL increases and the size and split AI approach are clear, making it a viable solution
of the network expands. When it is necessary to split a SL for supporting AI-relevant tasks within current as well as
into five WLs to cover the network, the resulting reduction future PDP devices. Finally, it is important to highlight a
of the computational burden in each device preserves even key feature of the proposed approach: it can effectively
more the average network throughput. Indeed, the average operate (without any modification) with both encrypted and
network throughput is in the range [∼6, ∼15] Mbps even unencrypted network traffic, as it relies exclusively on header
under attack rates of 700–1000 pkts/s, where instead the non- information, which is always transmitted in plaintext.
split configuration causes the average throughput measured
on the PDPs to be zero. Finally, in the #colors = 7 4) Shortest Path Detouring Analysis: To evaluate the im-
network topology, the complexity of the SL causes significant pact of the coloring constraints on network performance, we
performance degradation starting from attack rates of 400 also conducted an analysis of the AWDelay metric introduced
packets per second (leading to a rapid zeroing of the average previously. Specifically, we assessed the impact of network
throughput), while the proposed distributed approach improves density and size on path detours by analyzing the average
scalability. This method effectively manages the computational weighted delay.
overhead, allowing the network to handle large attack volumes Table III presents the average AWDelay values grouped
while maintaining a satisfactory level of throughput. by the number of nodes N and the density class ED. The
Nonetheless, a truly zero-cost solution does not exist yet. AWDelay values shown for each combination of N and ED
The execution of models still imposes a measurable impact represent the average computed across all instances discussed
on network throughput, with an observed average value of in Section V-A2. The AVG row reports the average calculated
approximately 35 Mbps when no SL/WL-VNFs are active based on the nodes, while the column AVG shows the average
within the switch. This limitation stems from the technological relative to the density. Additionally, the row labeled VAR
constraints of current networking devices which are not yet indicates the variance of all AWDelay values for each number
inherently designed to fully support the seamless integration of nodes N , providing a measure of data dispersion and
of networking and AI workflows. However, it is expected allowing us to assess the variability with the number of nodes.
that these issues will be resolved in future 6G networks, The plots shown in Fig. 6 represent the cumulative distri-
which will likely incorporate advanced, high-performance bution of AWDelay for the density classes for each node class.
chips capable of significantly increasing computational power. Thus, each curve shows the cumulative percentage of recorded
Having said that, the advantages of the proposed distributed results that exhibit an AWDelay less than or equal to a specific
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 11

ED1 ED2 ED3 ED4 ED1 ED2 ED3 ED4 ED1 ED2 ED3 ED4 ED1 ED2 ED3 ED4
1.0 1.0 1.0 1.0
0.9 0.9 0.9 0.9
0.8 0.8 0.8 0.8
cumulative distribution

0.7 0.7 0.7 0.7

cumulative distribution

cumulative distribution
cumulative distribution
0.6 0.6 0.6 0.6
0.5 0.5 0.5 0.5
0.4 0.4 0.4 0.4
0.3 0.3 0.3 0.3
0.2 0.2 0.2 0.2
0.1 0.1 0.1 0.1
0.0 0.0 0.0 0.0
0% 3% 6% 9% 12% 15% 18% 21% 24% 27% 30% 0% 2% 4% 6% 8% 10% 12% 14% 0.0% 0.4% 0.8% 1.2% 1.6% 2.0% 2.4% 2.8% 3.2% 0.0% 0.5% 1.0% 1.5% 2.0% 2.5% 3.0%
AWDelay AWDelay AWDelay AWDelay

(a) N10 (b) N15 (c) N25 (d) N30


Fig. 6. Cumulative distribution of AWDelay for the density classes for each node class.

value indicated on the x-axis. identical behavior, with a high percentage of observed values
For N = 10, a clear upward trend in the curves is observed, (90%) having delays below about 1%. The ED2 curve shows
where a high percentage of observed values (around 60%) a similar trend but with a slightly more gradual increase,
is concentrated within the lower AWDelay range (0–5.5%), indicating greater variability in delays compared to the other
especially for the first three density classes. On the other density classes. The minimum and maximum AWDelay values
hand, the results for ED4 show generally higher delays, but are 0.01% and 3.30%, respectively, with an overall average of
more spread out over a wider interval. Specifically, the curves 0.57%. The curves associated with 30 and 25 nodes converge
associated with the first three density classes show a rapid ac- much more quickly compared to those for 10 and 15 nodes.
cumulation around 5% AWDelay, while the ED4 curve shows a This suggests that, as the number of nodes increases, the effect
slower accumulation, suggesting a more dispersed distribution of network density becomes less pronounced, leading to more
of delays, with the presence of paths experiencing higher similar delay distributions.
delays. In this class of nodes, the minimum and maximum The results of the experiments, as shown in the Table, indi-
AWDelay values are 0.52% and 29.30%, respectively, with an cate that the average weighted delay behaves consistently as
overall average of 6.58%. the network grows in size. Specifically, AWDelay significantly
For N = 15, the graph in Fig. 6.(b) shows a behavior similar decreases with an increasing number of nodes. For example, in
to what was previously observed, but with some significant networks with 10 nodes in the density class ED1, the average
differences. First of all, for all density classes, 80% of delays delay reaches around 5%, while for networks with 30 nodes,
are below about 5%. A slight difference is seen in the ED3 the delay drops to approximately 0.5%. This trend can also be
class, where about 95% of the values are concentrated in the observed in the average delay, which decreases from 6.58%
lower AWDelay range (0–5%). Another difference is that in with 10 nodes to 0.57% with 30 nodes. This indicates that
this class, the trends of the four curves are quite similar. the overhead introduced by the coloring constraints becomes
The minimum and maximum AWDelay values are 0.34% and less significant in larger networks, making the approach more
13.45%, respectively, with an overall average of 3.54%. scalable and efficient as the network grows.
Compared to the previous plots, the graph with 25 nodes Interestingly, when varying the density for a fixed number of
(Fig. 6.(c)) shows a more concentrated distribution of AWDelay nodes, except for the case with 10 nodes, the average AWDelay
values. All the plots reach 90% of the cumulative distribution remains almost constant. The variance of all AWDelay values
at lower AWDelay values compared to the previous plots. decreases from 3 · 10−3 for N = 10 to 3 · 10−5 for N = 30.
This indicates that most of the paths in networks with 25 This behavior is attributed to the fact that as density increases,
nodes experience lower delays, concentrating below around and consequently, the number of available paths increases, the
2.5% AWDelay. Specifically, the curves for the first three probability of significant detours from the classic shortest path
density classes show almost identical behavior, with very decreases, thus mitigating any further delay reduction. For
rapid accumulation (90%) for delays below about 1.5%. The example, networks with N = 30 and higher density classes
ED4 curve shows a similar trend, although it has a slightly (such as ED4) consistently show lower AWDelay values,
more gradual increase, suggesting greater variability in delays supporting the hypothesis that denser networks provide more
compared to the other density classes, but still well-contained direct alternative paths even with coloring constraints. The
compared to cases with fewer nodes. The minimum and stability of AWDelay across different density classes reinforces
maximum AWDelay values are 0.04% and 3.19%, respectively, the robustness of our approach, as the method maintains a
with an overall average of 0.88%. consistent balance between security and efficiency without sig-
Similarly, in the plots of Fig. 6.(d), as previously observed nificantly compromising network performance, even in denser
for the instances with 25 nodes, the AWDelay values are topologies.
concentrated within a very narrow range (up to 3.5%). Similar This trend is further supported by the variability observed in
to the previous case, all the curves reach 90% of the cumulative Fig. 7, where the box plots illustrate the distribution of AWDe-
distribution at AWDelay values below about 2%. Specifically, lay across different densities. In particular, the interquartile
the curves representing ED1, ED3, and ED4 show almost ranges expand in sparser networks, showing greater variability
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 12

in path efficiency due to the limited number of feasible paths [2] C. Kim, “Programming the network dataplane,” ACM SIGCOMM:
that meet the coloring constraints. The box plots also highlight Florianopolis, Brazil, 2016.
[3] G. Siracusano and R. Bifulco, “In-network Neural Networks,” arXiv
that in more connected networks, such as those with ED4, preprint arXiv:1801.05731, 2018.
the AWDelay distribution is more compact, suggesting a more [4] D. Sanvito, G. Siracusano, and R. Bifulco, “Can the network be the
uniform detour behavior. AI accelerator?” in Proceedings of the 2018 Morning Workshop on In-
Network Computing, 2018, pp. 20–25.
[5] T. Swamy, A. Rucker, M. Shahbaz, I. Gaur, and K. Olukotun, “Taurus:
30% a data plane architecture for per-packet ML,” in Proceedings of the 27th
N10
N15
ACM International Conference on Architectural Support for Program-
25%
N25
ming Languages and Operating Systems, 2022, pp. 1099–1114.
20%
[6] T. Swamy et al., “Homunculus: Auto-Generating Efficient Data-Plane
N30
ML Pipelines for Datacenter Networks,” in Proceedings of the 28th ACM
AWDelay

15% International Conference on Architectural Support for Programming


Languages and Operating Systems, Volume 3, 2023, pp. 329–342.
10% [7] J.-H. Lee and K. Singh, “SwitchTree: in-network computing and traffic
analyses with Random Forests,” Neural Computing and Applications,
5%
pp. 1–12, 2020.
0% [8] C. Busse-Grawitz et al., “pForest: In-Network Inference with Random
ED1 ED2 ED3 ED4 Forests,” arXiv preprint arXiv:1909.05680, 2019.
Edge density [9] C. Zheng and N. Zilberman, “Planter: seeding trees within switches,”
in Proceedings of the SIGCOMM’21 Poster and Demo Sessions, 2021,
pp. 12–14.
Fig. 7. Box plots of AWDelay for each edge density class. [10] G. Xie, Q. Li, Y. Dong, G. Duan, Y. Jiang, and J. Duan, “Mousika:
Enable General In-Network Intelligence in Programmable Switches by
Knowledge Distillation,” in IEEE INFOCOM 2022-IEEE Conference on
Computer Communications. IEEE, 2022, pp. 1938–1947.
TABLE III [11] C. Zheng et al., “IIsy: Hybrid In-Network Classification Using Pro-
AWDelay RESULTS FOR DENSITY AND NUMBER OF NODES grammable Switches,” IEEE/ACM Transactions on Networking, 2024.
[12] G. Xie, Q. Li, G. Duan, J. Lin, Y. Dong, Y. Jiang, D. Zhao, and Y. Yang,
N10 N15 N25 N30 AVG “Empowering in-network classification in programmable switches by bi-
ED1 5.16% 3.37% 0.94% 0.50% 2.50% nary decision tree and knowledge distillation,” IEEE/ACM Transactions
ED2 5.77% 3.92% 0.78% 0.73% 2.80% on Networking, vol. 32, no. 1, pp. 382–395, 2024.
ED3 6.34% 3.26% 0.79% 0.58% 2.74% [13] J. Gallego-Madrid, A. Molina-Zarca, R. Sanchez-Iborra, J. Ortiz, and
ED4 9.04% 3.60% 1.02% 0.46% 3.53% A. F. Skarmeta, “Fast traffic processing in multi-tenant 5G environments:
AVG 6.58% 3.54% 0.88% 0.57% A comparative performance evaluation of P4 and eBPF technologies,”
VAR 0.003 0.001 0.0001 0.00003 Engineering Science and Technology, vol. 52, 2024.
[14] J. Gallego-Madrid, I. Bru-Santa, A. Ruiz-Rodenas, R. Sanchez-Iborra,
and A. Skarmeta, “Machine learning-powered traffic processing in
commodity hardware with eBPF,” Computer Networks, vol. 243, 2024.
VI. C ONCLUSION AND F UTURE W ORKS [15] Y. Kang, J. Hauswald, C. Gao, A. Rovinski, T. Mudge, J. Mars, and
L. Tang, “Neurosurgeon: Collaborative Intelligence Between the Cloud
In this paper, we explored the benefits of the INC paradigm and Mobile Edge,” SIGARCH Comput. Archit. News, vol. 45, no. 1, p.
and the programmable nature of networks combined with 615–629, apr 2017.
[16] A. Banitalebi-Dehkordi, N. Vedula, J. Pei, F. Xia, L. Wang, and
distributed AI and split-AI techniques with the aim of im- Y. Zhang, “Auto-Split: A General Framework of Collaborative Edge-
proving the security of upcoming 6G networks. We considered Cloud AI,” in Proceedings of the 27th ACM SIGKDD Conference on
a split-AI approach through which complex ensemble (SL) Knowledge Discovery & Data Mining, ser. KDD ’21. New York, NY,
USA: Association for Computing Machinery, 2021, p. 2543–2553.
models are broken into lightweight functional blocks to be [17] S. Kianpisheh and T. Taleb, “A Survey on In-Network Computing:
executed on PDPs as a chain of VNFs (WL-VNFs). The Programmable Data Plane and Technology Specific Applications,” IEEE
goal of these functions is to detect malicious behaviors that Commun. Surv. Tutorials, vol. 25, no. 1, pp. 701–761, Jan. 2023.
[18] S. Schwarzmann et al., “Native Support of AI Applications in 6G
may occur in the network. We formulated an optimization Mobile Networks via an Intelligent User Plane,” in 2024 IEEE Wireless
problem, All-Pairs Shortest Path Coloring, that intelligently Communications and Networking Conference (WCNC), 2024.
distributes the WL-VNF components on PDPs while taking [19] M. Spina et al., “In-network computing and split-ai in 6g: Enablers and
proof-of-concept studies,” pp. 1–6, 2024.
into account both the shortest path nature of the network [20] S. Schwarzmann, R. Trivisonno, S. Lange, T. E. Civelek, D. Corujo,
and the constraint of reconstructing the decomposed SL by R. Guerzoni, T. Zinner, and T. Mahmoodi, “An intelligent user plane
concatenating the distributed WL-VNFs that compose it. To to support in-network computing in 6g networks,” in ICC 2023-IEEE
International Conference on Communications, 2023, pp. 1100–1105.
efficiently solve the APSPC problem, we also designed a [21] M. Saquetti, R. Canofre, A. F. Lorenzon, F. D. Rossi, J. R. Azambuja,
meta-heuristic approach. The results demonstrate that the joint W. Cordeiro, and M. C. Luizelli, “Toward in-network intelligence:
combination of INC and distributed AI not only overcomes Running distributed artificial neural networks in the data plane,” IEEE
Communications Letters, vol. 25, no. 11, pp. 3551–3555, 2021.
the limitations of implementing complex AI models on PDP [22] R. Cerulli, F. Guerriero, E. Scalzo, and C. Sorgente, “Shortest paths
devices but also significantly increases the scalability and with exclusive-disjunction arc pairs conflicts,” Computers & Operations
preserves the forwarding capabilities of AI-enhanced PDPs, Research, vol. 152, p. 106158, 2023.
[23] R. Tarjan, “Depth-first search and linear graph algorithms,” SIAM
especially under heavy traffic attack conditions. journal on computing, vol. 1, no. 2, pp. 146–160, 1972.
[24] E. Nuutila and E. Soisalon-Soininen, “On finding the strongly connected
R EFERENCES components in a directed graph,” Information processing letters, vol. 49,
no. 1, pp. 9–14, 1994.
[1] M. G. Spina, F. De Rango, E. Scalzo, F. Guerriero, and A. Iera, [25] C. E. Andrade, R. F. Toso, J. F. Gonçalves, and M. G. Resende,
“Distributing Intelligence in 6G Programmable Data Planes for Effective “The multi-parent biased random-key genetic algorithm with implicit
In-Network Deployment of an Active Intrusion Detection System,” path-relinking and its real-world applications,” European Journal of
arXiv, Oct. 2024. Operational Research, vol. 289, no. 1, pp. 17–30, 2021.
JOURNAL OF LATEX CLASS FILES, VOL. XX, NO. X, XXX 2024 13

[26] M. López-Ibáñez, J. Dubois-Lacoste, L. Pérez Cáceres, M. Birattari, and


T. Stützle, “The irace package: Iterated racing for automatic algorithm
configuration,” Operations Research Perspectives, vol. 3, pp. 43–58,
2016. [Online]. Available: https://doi.org/10.1016/j.orp.2016.09.002
[27] I. Sharafaldin, A. H. Lashkari, S. Hakak, and A. A. Ghorbani, “De-
veloping Realistic Distributed Denial of Service (DDoS) Attack Dataset
and Taxonomy,” in 2019 International Carnahan Conference on Security
Technology (ICCST). IEEE, 2019, pp. 01–03.
[28] K. Doshi, Y. Yilmaz, and S. Uludag, “Timely Detection and Mitigation
of Stealthy DDoS Attacks Via IoT Networks,” IEEE Trans. Dependable
Secure Comput., vol. 18, no. 5, pp. 2164–2176, Jan. 2021.
[29] “A realistic cyber defense dataset (cse-cic-ids2018),” accessed: 2023-
04-04. [Online]. Available: https://registry.opendata.aws/cse-cic-ids2018
[30] A. Botta, A. Dainotti, and A. Pescapè, “A tool for the generation
of realistic network workload for emerging networking scenarios,”
Computer Networks, vol. 56, no. 15, pp. 3531–3547, 2012.
[31] M. W. Nadeem, H. G. Goh, Y. Aun, and V. Ponnusamy, “Detecting and
Mitigating Botnet Attacks in Software-Defined Networks Using Deep
Learning Techniques,” IEEE Access, vol. 11, pp. 49 153–49 171, 2023.
[32] “TLC UNICAL In-network-Distributed-IDS,” Nov. 2024, [Online;
accessed 18. Nov. 2024]. [Online]. Available: https://github.com/
mattiagiovanni/TLC UNICAL In-network-Distributed-IDS
[33] “behavioral-model,” Sep. 2024, [Online; accessed 5. Sep. 2024].
[Online]. Available: https://github.com/p4lang/behavioral-model

Mattia Giovanni Spina is a PhD student at the University of Calabria (Italy).


His research interest is in the area of security in future generation networks
and distributed AI in-network architectures.

Floriano De Rango is associate professor of Telecommunications at the


University of Calabria (Italy). His research interests include security in
wireless and IoT networks and networking solutions for V2X systems.

Antonio Iera is full professor of Telecommunications at the University of


Calabria (Italy). His research interests include next generation mobile and
wireless networks and the Internet of Things. He is currently Editor in Chief
of the Elsevier Computer Networks journal.

Edoardo Scalzo is a junior researcher of Operations Research at the Univer-


sity of Calabria (Italy). His research interests include network optimization,
logistics and combinatorial optimization.

Francesca Guerriero is a full professor of Operations Research at the Univer-


sity of Calabria, Italy. Her primary research interests revolve around network
optimization, logistics, combinatorial optimization, and the intersection of
optimization and big data.

You might also like