0% found this document useful (0 votes)
14 views

Network Situation Features Extraction Method of Computer Network Based on Knowledge Graph

Uploaded by

tuzzohra60
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Network Situation Features Extraction Method of Computer Network Based on Knowledge Graph

Uploaded by

tuzzohra60
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer

Engineering and Applications (CVIDL & ICCEA)

Network Situation Features Extraction Method of


Computer Network Based on Knowledge Graph
2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications (CVIDL & ICCEA) | 978-1-6654-5911-2/22/$31.00 ©2022 IEEE | DOI: 10.1109/CVIDLICCEA56201.2022.9824397

Li-Qiong DENG Yi-Fei WU


Department of Command Information System and Network Department of Command Information System and Network
Air Force Communication NCO Academy Air Force Communication NCO Academy
Dalian, China Dalian, China
[email protected] [email protected]

Jie-Yu Wang Bo Wang


Department of Command Information System and Network Department of Command Information System and Network
Air Force Communication NCO Academy Air Force Communication NCO Academy
Dalian, China Dalian, China
[email protected] [email protected]
Abstract—A feature extraction method of network situation II. RELATED WORKS
based on knowledge graph is proposed for network situation
assessment. The knowledge graph of network flow is constructed The existing computer network situation feature extraction
by knowledge graph technology. Combined with graph algorithm algorithms can be mainly divided into two categories. One
and complex network feature analysis method, different types of category is analyzing the data of network flow itself to obtain
node features and overall network features in network situation the corresponding features. Related research are, using the deep
are extracted, so as to obtain the importance ranking of nodes packet detection method to extract and analyze the network
and the overall evaluation value of network situation. The data packets[4]. The network is regarded as a signal changing
experimental results show that the algorithm can accurately with time[5], and the time-domain features are extracted by
locate the key node information in the computer network and using the time series analysis method to model, predict and
scientifically describe its network connection relationship, which identify the network epidemic. The basic features of network
provides an effective means for the network to achieve more traffic data and its high-order statistics are used to classify
efficient management and maintenance. network traffic by machine learning[6]. However, the scale of
computer network continues to increase, and the interactive
Keywords-network situation; knowledge graph; feature
data of network hosts increases massively. For large-scale
extraction; computer network
network data, such data analysis methods are too expensive and
I. INTRODUCTION time-consuming to meet the real-time requirements of network
situation analysis.
The core position of computer network technology in
carrying information transmission is becoming increasingly The other category is analyzing the importance of nodes in
prominent. With the development of information technology, the network to obtain the network situation features. Related
the scale of its communication network is becoming more research are, using the centrality of nodes in the network
complex and diverse. The traditional network management topology to evaluate. Typical indicators are degree centrality,
technology cannot effectively present the structural features of betweenness centrality, subgraph, tightness and so on[7].
computer networks. A more effective network structure feature Calculating the shortest path to measure efficiency of the
extraction method from the perspective of intelligent network[8]. Calculating the node's own attributes under the
technology is required. influence of propagation mechanism[9]. These methods analyze
network features from the perspective of a certain performance
At present, computer network is the basis for carrying all index, which lack the analysis of the impact of various features
information systems and an important support to ensure the on the network operation from the perspective of the whole
function of various information transmission[1]. In the face of network, and lack the analysis of the relationship between node
increasingly complex and dynamic computing networks, in features and network situation indicators. The practicability
order to ensure the efficient flow of information, the network needs to be enhanced.
must have the ability of invulnerability and dynamic
adaptation[2]. At present, the computer network structure is In the computer network, the most important performance
developing in the direction of flat, distributed and self- indicators are robustness and invulnerability, which are the
organizing[3]. In the actual networking process, many basic network nature reflected when the system network structure is
forms such as tree network, star network and mesh network are subjected to random faults. Therefore, in view of this demand,
randomly combined to form hybrid network forms such as this paper proposes a network situation feature extraction
mesh network and composite network. How to accurately algorithm that can meet large-scale real-time network analysis,
extract the characteristic parameters which can reflect the and analyzes the relationship between each feature and network
situation of computer network has important research value for indicators, which provides a means of prediction and
accurately describing and improving the network structure. strengthening management for network security.
978-1-6654-5911-2/22/$31.00 ©2022 IEEE

484
Authorized licensed use limited to: NUST School of Electrical Engineering and Computer Science (SEECS). Downloaded on November 28,2024 at 08:46:04 UTC from IEEE Xplore. Restrictions apply.
III. COMPUTER NETWORK SITUATION FEATURE EXTRACTION described, and the structural problems between the network
TECHNOLOGY BASED ON KNOWLEDGE GRAPH nodes are solved.
A. Technical Framework Finally, the obtained characteristic parameters are used to
From the perspective of network management, the provide support for the understanding of network situation, and
extraction of computer network situation features can be the visualization method of knowledge graph is used to
understood as: from various network data such as the operation intuitively show the situation of computer network structure.
status, logs, network traffic and user behavior of various By mining the hidden correlation, the abnormal behavior of the
network node devices, the features which can reflect the current network is found in time, and a certain ability of situation
network state and trend can be collected. So that the system prediction is provided, so that the network managers can grasp
becomes more Intelligently. A framework of computer network the network operation state in time, so as to improve the ability
situation feature extraction is presented based on knowledge of network security monitoring and management.
graph technology, as shown in Figure 1. B. Node Features Extraction Algorithm Based on Knowledge
Graph
Network situation understanding
In order to describe the features of nodes in computer
Situation Visualization Display and networks more accurately and comprehensively, this paper
Users Situation Prediction quantitatively describes them from the perspectives of node
centrality and node transitivity. These features reflect network
performance from different perspectives, which can be used
Features selection and evaluation independently or integrated.
Feature 1) Features Extraction of Node Centrality
Topological, state, and statistical
Library Node centrality is an important indicator parameter for
characteristics
judging the importance of nodes in the network, which is used
to reflect the role of nodes and its impact on the network.
Knowledge graph construction According to the centrality of nodes, more important nodes can
Knowledge
be selected for targeted maintenance. The higher the centrality
Ontology and relation extraction, of a node, the more network events associated with the node,
Base RDF triple model indicating the higher the activity of the node[10]. In computer
networks, active nodes are generally backbone nodes or
backbone nodes. The overall node activity of the network can
Data collection and extraction be calculated by calculating the proportion of the number of
Information system network nodes with large centrality. The features extraction methods of
Data Base parameters : traffic, log, network and node centrality are mainly as follows:
security equipment, etc. a) Degree centrality
Figure 1. Network feature extraction technology framework based on Definition: number of edges connected by nodes.
knowledge graph
Hypothesis: the node with large degree centrality is the
In this framework, knowledge graph technology is used to node with many connection relationships. The greater the
achieve the gap spanning from network data to knowledge and degree centrality, the stronger the node influence.
feature. Firstly, it extracts and fuses the network heterogeneous b) Betweenness centrality
data information generated by different geographical locations. Definition: number of shortest paths through the node.
Then, the network connection graph is constructed by the
knowledge graph technology. Based on the knowledge graph of Hypothesis: if the node is located on multiple shortest paths,
network flow, this paper uses the triple RDF model to establish it is the core node. The greater the betweenness centrality, the
the combination of node relations, and regards the computer stronger the control ability of the node in dissemination.
network connection graph as a weighted complex network.
c) Closeness centrality
Each network node is an entity in the graph, and the edges
between nodes have corresponding weights. The network is Definition: average distance from nodes to other nodes.
simplified as an undirected or a directed connected graph Hypothesis: if the shortest distance from the node to the
composed of nodes and links. Then different eigenvalues are other nodes in the graph is small, the closer the node is to the
used to describe the connection graph, and the characteristic geometric center, indicating that the propagation ability of the
parameters are obtained to accurately describe the network node is stronger.
structure of different scales, so as to quantitatively analyze the
network structure. In this paper, the graph algorithm of The above three centrality features can be integrated. If
knowledge graph and the complex network analysis method are each edge has a weight value (such as the flow or delay of the
combined to extract the node features and the overall features link), the weight information can also be added to calculate
of the network. The structural information of the each feature above. Finally the central feature of the node can
communication interaction mode of the network node is be obtained. It can comprehensively reflect the importance of
the node in the network structure from the three aspects of the

485
Authorized licensed use limited to: NUST School of Electrical Engineering and Computer Science (SEECS). Downloaded on November 28,2024 at 08:46:04 UTC from IEEE Xplore. Restrictions apply.
influence, control and propagation of the nodes. For the nodes 3) Clustering coefficient
with the highest importance, it is necessary to focus on Subgroups with different close ties are often formed in
maintenance in operation and maintenance management. For computer networks, namely subnets. The local clustering
the nodes with the lowest importance, the structure can be coefficient of a node is given by dividing the edges between the
optimized. nodes in the neighborhood by the proportion of the number of
edges that may exist between them. Taking the average
2) Node transitivity feature extraction
clustering coefficient of all nodes as the average clustering
The centrality feature is to measure the direct impact of coefficient of the network can reflect the overall clustering
nodes on the network results, and the importance of nodes in level of the network. Clustering coefficient is defined as the
the network is often affected by their adjacent nodes. If a node proportion of closed paths in all paths with length 2 in the
connects more important nodes, it indicates that the node has network. The clustering coefficient of a node is to measure the
stronger outward transmission ability. At the same time, if a average probability that two neighbor nodes of a node are
node is connected by many other nodes, it indicates that this neighbors, which is used to measure the degree of node
node is more important. Nodes with strong transitive ability aggregation, as shown in Equation (3).
play an important role in the dynamic networking of computer
networks. If the number of such nodes accounts for a high 𝐶𝑖 =
proportion, it indicates that the network structure has strong 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑 𝑛𝑜𝑑𝑒 𝑝𝑎𝑖𝑟𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟 𝑜𝑓 𝑛𝑜𝑑𝑒 𝑖
(3)
survivability. 𝑇ℎ𝑒 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟 𝑛𝑜𝑑𝑒 𝑝𝑎𝑖𝑟𝑠 𝑜𝑓 𝑛𝑜𝑑𝑒 𝑖

This paper selects PageRank algorithm to calculate the The clustering coefficient of the overall network can be
transmission force of network nodes in the graph. PageRank, calculated by the clustering coefficient of each node, which can
also known as Web Ranking, Google Left Ranking and PR[11], reflect the transmission of the network. The greater the
is an algorithm used by Google to rank Web pages in its search clustering coefficient, the higher the clustering degree of the
engine search results. It is essentially an algorithm to analyze network and the stronger the robustness.
the importance of Web pages based on the number and quality
of hyperlinks between Web pages. The main calculation D. Network Situation Visualization Based on Knowledge
principle is to define the random walk model on the network Graph
map, namely, the first-order Markov chain. The characteristic After the structural features of the computer network are
of random walk is that the transfer probability from one node to obtained, these eigenvalues can be stored in the graph database
all nodes with connection is equal. The transfer matrix is 𝑀, the based on knowledge graph. The knowledge graph database can
Markov chain has a stationary distribution 𝑅 so that it satisfies realize real-time dynamic display and calculation of large-scale
Equation (1). networks, and provide data support platform for network
situation visualization and situation prediction. The following
𝑀𝑅 = 𝑅 (1) is an example of network structure visualization based on
The final calculation is the PageRank distribution value of Neo4j graph database[12]. The graph database can also use the
the figure, and each component is the PR value of each node. query language Cypher to realize fast query and reasoning of
graph data.
C. Network Statistical Features Extraction Based on
Knowledge Graph
In order to calculate the overall network structure features
of computer network and measure the overall robustness and
invulnerability of network structure, this paper selects the
following characteristic parameters for calculation.
1) Graph Density 𝐷
2𝑅
𝐷= (2)
𝑁(𝑁−1)

where R represents the number of relations in the graph,


namely the number of edges, and N represents the number of
nodes in the graph. The greater the D, the greater the density of Figure 2. An example of network situation visualization based on Neo4j
the graph. The more redundant structures in the network graph databases
structure, the stronger the overall connectivity of the network.
When D is 1, the network graph is a complete graph. IV. EXPERIMENTAL RESULTS AND ANALYSIS

2) Average shortest path length In this paper, the computer network data are used as the
The Dijkstra algorithm is used to calculate the shortest path experimental object in the experiment. The experimental
between all nodes, and then the average value of the shortest platform uses Neo4j as the graph database, and Python as the
path is calculated. The smaller the average value, the smaller API programming interface of the graph database. The
the overall path length of the network, that is, the shorter the programming is realized based on the networkx plug-in in
propagation distance between any two points, indicating that Python. The data source of this experiment includes two types:
the overall transmission efficiency of the network is higher. network flow data generated by network data capture software

486
Authorized licensed use limited to: NUST School of Electrical Engineering and Computer Science (SEECS). Downloaded on November 28,2024 at 08:46:04 UTC from IEEE Xplore. Restrictions apply.
and random network data generated by simulation algorithm, Data name Density Average shortest path Clustering coefficient
covering different network scales, such as Table I. DWB 0.0823 2.9950 0.3211
DWC 0.0996 2.9417 0.5455
TABLE I. THE LIST OF EXPERIMENTAL DATA DMA 0.1139 2.1145 0.1946
Hop Edge Hop Edge DMB 0.0100 4.8003 0.0782
Name Name
number number number number DMC 0.0999 1.9000 0.1000
DWA 38 38 DMA 100 564 It is found in the experiment that the average shortest path
of the network with higher density is smaller than that of the
DWB 105 147 DMB 400 800
network with different network structure types. Most of the
DWC 382 713 DMC 1000 49945 network structures with higher density have the features of
Data DWA, DWB and DWC come from three types of multi-connection means and multi-redundancy, and have strong
computer networks. DMA, DMB and DMC are BA scale-free robustness and invulnerability. For example, DWA, DWB and
network, WS small-world network and ER random graph DWC are aimed at three types of computer network structures.
network generated by simulation. Based on these data, the The C type network structure is the network with the most
knowledge graph of computer network can be constructed. redundant structure and the highest reliability. The density
Each data contains the main fields such as the starting node, the value and clustering coefficient are the largest among the three
arriving node, the weight and the time information. The entities networks, and the average shortest path is the shortest.
in the network knowledge map are the nodes in the network, In addition, the network structure with high clustering
and the edges are connected to the nodes with information coefficient is characterized by strong aggregation and high
transmission relationship. The weight is the traffic value robustness. For example, for the DWA network structure
transmitted on each edge. without redundant structure, the clustering coefficient is 0, and
A. Experiment of Node Feature Extraction for the DWC with the most redundant structure, the
aggregation degree is the highest. However, when the network
It is found through experiments that the features of each size becomes larger, the clustering coefficient will decrease to a
node in the above data source show similar rules. Taking the certain extent. For example, for DMA, DMB and DMC data
node feature extraction results of DWA data as an example, as with large network size, the largest clustering coefficient is the
shown in Figure 3, the horizontal axis is the node serial number, scale-free network DMA with six edges added to each
and the vertical axis is the eigenvalue. connection. The clustering coefficient of random network and
small-world network is not high, which indicates that if the
aggregation degree of the network needs to be improved, the
number of connections between nodes and the connection
density of the whole network need to be improved too.
C. Network Visualization Experiment
The networkx plug-in in Python can be used to realize the
real-time visualization of the above network knowledge map
data, and the results are shown in Figure 4. For tens of millions
of network data, real-time dynamic visualization of large-scale
graph data can be realized based on Neo4j graph database,
which provides an effective means for intuitive display of
network situation.

Figure 3. Network feature extraction technology framework based on


knowledge graph

It can be found from the figure that the four types of node
features have similar overall trends. For example, the four types
of node features for node 17 are high, but the features have
different changes. This reflects the sensitivity of different
features, and the impact on network performance indicators are
different.
B. Network Feature Extraction Experiment
The overall features of the network show great differences
in different network structures. The comparison results of the
three types of network feature extraction for six data sources Figure 4. Network knowledge graph visualization
are shown in Table II.
TABLE II. THE LIST OF NETWORK FEATURE EXTRACTION RESULTS
Data name Density Average shortest path Clustering coefficient
DWA 0.0667 3.1139 0.0000

487
Authorized licensed use limited to: NUST School of Electrical Engineering and Computer Science (SEECS). Downloaded on November 28,2024 at 08:46:04 UTC from IEEE Xplore. Restrictions apply.
D. Experimental Analysis [2] Zhu W, Wang Z, et al. Capability-based context ontology modeling and
reasoning for (CISR)-I-4 communication[J]. Journal of Systems
Through the repeated comparison of the above experiments, Engineering and Electronics, 2016, 27(4):845-857.
this paper summarizes the following corresponding relationship [3] ZHANG Jin. Risk Management of command information system ’s
between these features and the performance index of computer communication network in foundation phase based on the complexity[J].
network. Fire Control & Command Control, 2014,39(04):5-9.
[4] A Boukhtouta, et al. Network malware classification comparison using
TABLE III. CORRESPONDENCE BETWEEN COMPUTER NETWORK FEATURES DPI and flow packet headers[J].Journal of Computer Virology and
AND PERFORMANCE
Hacking Techniques, 2016,12(2):69-100.
Network Performance indicators of
Incidence [5] Zhou Yingjie, Hu Guangming.GNAED: a data mining framework for
Feature Name corresponding networks
Nodes with the largest number of network-wide abnormal event detection in backbone networks[C].
Degree Relatively Proceedings of IPCCC ’ 11 Washington D.C., USA: IEEE Press,
connections or communication traffic
centrality strong 2011:1-2.
affect network connectivity.
Important nodes on information [6] Nguyen T, Armitage G J . A survey of techniques for internet traffic
Betweenness critical path can speed up the Relatively classification using machine learning[J]. IEEE Communications Surveys
centrality efficiency and speed of information strong & Tutorials, 2008, 10(3):56-76.
transmission. [7] Zhou Xuan, Zhang Fengming, Zhou Weiping, et al. Evaluating
The important nodes of information Functional Robustness of Complex Networks Using Node Efficiency[J].
Closeness
transmission can affect the extension Strong Acta Physical Sinica, 2012,61(19): 1-7.
centrality
range of network transmission. [8] Liu Jianguo, Ren Zhuoming, Guo Qiang, et al. Research progress of
The influential nodes can affect the node importance ranking in complex networks[J]. Acta Physical Sinica,
overall connectivity and propagation 2013,62(17):1-10.
PageRank Strong
efficiency of the network and [9] Klemm K, et al. A measure of individual role in collective dynamics[J].
improve the survivability. Scientific Reports,2012,292(02):1-8.
Reflecting the invulnerability of
network structure, the greater the [10] Zhang Ming zhi, Luo Kai, Wu xi. Study on Key Nodes Analysis Method
Density Strong of Space Information Network[J]. Journal of System
density, the more redundant the
network structure. Simulation,2015,27(06):1235-1239.
The shorter the average shortest path [11] Qingyu Su, Cong Chen, et al. Identification of critical nodes for cascade
Average faults of grids based on electrical PageRank[J]. Global Energy
is, the faster the network Strong
shortest path Interconnection, 2021,4(06):587-595.
connectivity rate is.
The higher the clustering coefficient, [12] Mark Needham, et al. Graph algorithm for data analysis : based on Spark
Clustering Relatively
the stronger the robustness of the and Neo4j[M]. Beijing: People ' s Post and Telecommunication
coefficient strong Press,2021.
network.
In the specific use, some features or all of them can be
selected according to the actual needs. It can not only
accurately describe the network situation, but also provide a
strong basis for the operation and maintenance personnel to
formulate the optimization scheme of the network, which will
greatly improves the practicability.
V. SUMMARY
From the perspective of the actual operation and
maintenance of computer networks, this paper proposes a
network feature extraction method based on knowledge graph
algorithm, which can calculate various different features of
nodes and networks. The corresponding relationship between
these features and network performance indicators is verified
by experiments, which provides an effective solution for the
storage and visualization of large-scale computer network
situation. It can not only accurately describe the computer
network situation and realize the understanding of the network
situation, but also provide important theoretical support for the
formulation and optimization of network management and
maintenance schemes.
On this basis, the further research can continue to explore
the deep application of knowledge graph technology in network
situation understanding and network attacks, and realize the
real sense of “intelligent network operation and maintenance ”.
REFERENCES
[1] Liu Jingxing, Cheng Shaodong, et al.Study on node importance of
complex network based military command control
networks[A].Proceedings of International Conference on Machine
Learning and Cybernetics[C].Xian:IEEE Press, 2012:920-923.

48
Authorized licensed use limited to: NUST School of Electrical Engineering and Computer Science (SEECS). Downloaded on November 28,2024 at 08:46:04 UTC from IEEE Xplore. Restrictions apply.

You might also like