0% found this document useful (0 votes)
58 views

VividGraph Learning To Extract and Redesign Network Graphs From Visualization Images

This document describes a new method called VividGraph that uses deep learning to extract network graph data from static images. It can classify graphs as directed or undirected, segment nodes and links, and reconstruct topological relationships. This allows graphs from sketches, papers, and websites to be modified by obtaining the underlying data. The method is robust to variations in graphs and overcomes challenges of existing extraction techniques. It was evaluated on various real-world graphs and able to accurately extract and redesign graphs.

Uploaded by

sasasa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

VividGraph Learning To Extract and Redesign Network Graphs From Visualization Images

This document describes a new method called VividGraph that uses deep learning to extract network graph data from static images. It can classify graphs as directed or undirected, segment nodes and links, and reconstruct topological relationships. This allows graphs from sketches, papers, and websites to be modified by obtaining the underlying data. The method is robust to variations in graphs and overcomes challenges of existing extraction techniques. It was evaluated on various real-world graphs and able to accurately extract and redesign graphs.

Uploaded by

sasasa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TVCG.2022.3153514, IEEE Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 1

VividGraph: Learning to Extract and Redesign


Network Graphs from Visualization Images
Sicheng Song, Chenhui Li, Yujing Sun, and Changbo Wang

Abstract—Network graphs are common visualization charts. They often appear in the form of bitmaps in papers, web pages,
magazine prints, and designer sketches. People often want to modify graphs because of their poor design, but it is difficult to obtain
their underlying data. In this paper, we present VividGraph, a pipeline for automatically extracting and redesigning graphs from static
images. We propose using convolutional neural networks to solve the problem of graph data extraction. Our method is robust to
hand-drawn graphs, blurred graph images, and large graph images. We also present a graph classification module to make it effective
for directed graphs. We propose two evaluation methods to demonstrate the effectiveness of our approach. It can be used to quickly
transform designer sketches, extract underlying data from existing graphs, and interactively redesign poorly designed graphs.

Index Terms—Information visualization, Network graph, Data extraction, Chart recognition, Semantic segmentation, Redesign

1 I NTRODUCTION images. VividGraph integrates the classification of di-


rected graphs and undirected graphs, node extraction
N ETWORK graphs (hereinafter called graphs) are
popular and primary forms for information visual-
ization that can clearly visualize various types of graph
and links, and the algorithm to calculate topological
relations and interactive chart redesign. Inspired by
Haehn et al. [9], we generate directed and undirected
data [1, 2, 3]. The results of graph visualization usually
graph datasets with pixel-level labels to train the deep
appear in the form of static images in websites, papers,
learning module in our pipeline. We classify a graph
magazines, and other printed matter. We may need to
into a directed graph or an undirected graph through
obtain the original graph data in many cases, such as
a classification neural network. Then, we obtain the
updating the graph data, recoloring or relaying out the
node and link pixels through the semantic segmentation
graph. However, due to the lack of original code and
network. Finally, we reconstruct the topological relation
design files, it is difficult to obtain original network data.
of the nodes through calculation. Following the data
There are some methods for solving this problem,
assumptions of existing chart extraction work [8, 10],
such as information steganography [4], pattern recogni-
we make precise definitions on the scope of our work:
tion, and statistical analysis. Information steganography
VividGraph assumes that the graph has no text, and
needs to write the original data into the image in
the nodes are circular. The nodes do not overlap, and
advance, so it is not effective for a large number of
the edges are straight. In our user study, 91.23% of
existing images. In addition, the information steganog-
the participants agree with the scope of our work and
raphy steps are complicated, and statistical analysis
believe that the graph in this paper is common in the
performance is not satisfactory. Therefore, the common
real world. Our training data overcome the difficult
method is pattern recognition [5, 6, 7, 8]. These data
task of labeling real data and enable the model to have
extraction methods focus on simple charts, such as
a good generalization ability. The easy morphological
bar charts, line charts, radar charts, and heat maps.
method [8] cannot handle the inconsistent node size.
However, none of these methods can solve the graph
Small nodes will be eroded. Our semantic segmentation
data extraction task. Extracting the underlying graph
network can robustly detect many types of graphs.
data is challenging. First, it is difficult to label the graph
We demonstrate our recognition accuracy from struc-
datasets, and there is no public corpus. Second, the
tural similarity and image similarity. Our method is
dimension of graph data is higher than that of bar charts
robust enough to solve the recognition of a variety
or pie charts, which makes data regression difficult.
of graphs, such as directed graphs, undirected graphs,
Third, the graph edges are difficult to extract.
hand-drawn graphs, printed graphs, and graphs from
We introduce VividGraph, a pipeline for automat-
the D3 library [11] and E-charts [12] gallery. The ex-
ically extracting and redesigning graphs from static
tracted data can be used to help designers quickly
transform their ideas into interactive graphs. They can
• Sicheng Song, Chenhui Li, Yujing Sun, and Changbo Wang are
with the School of Computer Science and Technology, East China
also easily redesign and modify graphs with poor design
Normal University, Shanghai, China. E-mail: [email protected], cb- or outdated data.
[email protected]. Our work makes the following contributions:
Manuscript received xx xx, 2020; revised xx xx, 2020. (1) We first propose using convolutional neural net-

1077-2626 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:04 UTC from IEEE Xplore. Restrictions apply.
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TVCG.2022.3153514, IEEE Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2

Bitmap Vector
an attention mechanism to detect bar charts, in which
deep learning techniques have been well applied.
D3 Some researchers have focused on the method of
semiautomatically extracting bitmap chart data. Jung et
al. [5] introduced ChartSense, a system to increase the
E-charts
Input Retarget Recolor Re-layout Modification
data extraction accuracy by manually adding informa-
tion. DataThief [18] is another semiautomatic tool for ex-
Fig. 1. VividGraph can be used in many practical ap- tracting data from line charts. The users need to provide
plications. The input is a bitmap. Through our semantic information such as the coordinates of the start point
segmentation and connection algorithm, we can obtain and endpoint of the line and the positions of the hor-
its underlying data. Using the extracted data, we can re- izontal and vertical axes. iVoLVER [19] integrates data
construct the vector of the graph and redesign the chart, extraction of bitmap images and SVG objects on the web,
such as recoloring, re-layout, and data modification. providing a semiautomatic data extraction framework.
These semiautomatic methods rely on a large quantity
of user interaction data, such as specifying the data type
works to solve the problem of graph data extraction.
(e.g., color and shape) and providing the dividing line
(2) We propose a pipeline with semantic segmentation
location. While improving data extraction accuracy, it
to accurately identify the characteristics of graphs.
also reduces efficiency and requires considerable manual
The method is robust to hand-drawn graphs, blurred
intervention. However, to the best of our knowledge,
images, and large images.
these frameworks do not support data extraction and
(3) We propose two methods to evaluate the effective-
redesign of graph bitmaps.
ness of our methods through structure similarity and
image similarity.
2.2 Graph Perception with CNNs
2 R ELATED W ORK Haehn et al. [9] reproduced Cleveland and McGill’s [20]
graphic perception evaluation experiment with CNNs.
Our work is related to three technologies: chart ex-
They compared the recognition capabilities of four net-
traction, graph perception with CNNs (Convolutional
works, MLP, LeNet [21], VGG [22], and Xception [23],
Neural Networks), and chart interaction.
on nine basic perception tasks. They presented that
the graphic perception ability of VGG19 is the best
2.1 Chart Extraction among these networks. They proposed that graphs are
Some inverse visualization studies extract data from advanced graphical coding, so the task of extracting
charts. Harper et al. [13] proposed a method for extract- data from graphs is challenging. Haleem et al. [24]
ing underlying data from a visualization chart built with evaluated the readability of force-directed graph lay-
the D3 library [11]. This method depends on web page outs with CNNs. Giovannangeli et al. [25] continued
code, such as HTML and SVG. The extracted data can be this experiment and used CNNs to evaluate the image
used for chart redesign or color mapping redefinition. perception ability of graphs. However, their evaluation
WebPlotDigitizer [14] is another graph data extraction task indicators were only the number of edges, nodes
tool based on web pages. It can extract four types of and maximum degree of the graph, not the topological
graph data, including bar charts, line charts, pole charts relations, the most critical data in the graph.
and ternary charts. However, the accuracy of this tool is These pattern recognition methods all regress the
not high, and users generally add information manually graph numerically. The graphs are too complex to make
to improve accuracy. Another tool called Ycasd [15] these methods feasible. Therefore, we thought of the
requires the user to provide the position of all points semantic segmentation method, a method commonly
on the line to extract the line chart data. used in the field of computer vision for natural images.
Static bitmaps are encountered more often. There are The fully convolutional network (FCN) [26] was the
some studies based on image processing and machine earliest method to obtain segmentation results equal to
learning to solve the problem of data extraction in the input image size through a series of convolutional
bitmap charts. ReVision [6] is a data extraction frame- layers and deconvolution. SegNet [27] is similar to FCN.
work for bitmap charts, which automatically divides While deepening the network, it uses pooling indices to
charts into ten categories and focuses on data extraction save the contour image information. PSPNet [28] uses
for pie charts and bar charts. FigureSeer [7] focuses a pyramid pooling module to simultaneously feed the
on extracting data from line charts. Poco et al. [8, 16] feature map through four parallel pooling layers. It then
proposed a data extraction method with legends and obtains and upsamples four outputs of different sizes.
extend the research to heat maps. They focused on the These semantic segmentation networks are applied to
role of the legend text in the charts and added an OCR natural image data, such as VOC datasets. Deep fea-
module to solve the data extraction problem for charts ture extraction networks cause the target to lose small
with legends. Zhou et al. [17] proposed a network with features, leaving basically correct segmentation results.

1077-2626 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:04 UTC from IEEE Xplore. Restrictions apply.
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TVCG.2022.3153514, IEEE Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 3

Data Source Classification Segmentation Reconnect Redesign


ID

Color
Node
Coordinate

Radius

ID

Node Edge Back Source & Target


Undirected Graph
Undirected Graph Edge
Width Layout Redesign
Directed Graph Color Redesign
Color
Printed Chart Data Modification
Large Network Graph BackColor Representation Change

Hand-drawn Chart GraphType

Blurred chart Directed Graph Arrow ArrowColor

Fig. 2. The VividGraph pipeline includes five steps: (1) input a graph image, (2) classification of directed and
undirected graphs, (3) semantic segmentation network, (4) algorithm to reconnect the nodes, and (5) interactive
chart redesign.

However, for graph images, these small features, such proposed a pipeline that can answer questions about the
as edges, are important. The network parameters need chart.
to be simplified compared to natural images.

3 M ETHODS
2.3 Chart Interaction The extraction of graphs faces three major problems: it
Chart redesign can maximize the value of the original is difficult to label the data set, the topological relation
data and help readers understand these data quickly is large. If a graph has n nodes, there will be in the order
and accurately. [29] Good chart visualization is made of n2 topological relations. The traditional method [25]
up of a clear-sighted layout, easily distinguishable color has difficulty extracting edges. Therefore, we establish a
scheme and interactive application scenarios. First, the graph data set with automatically generated pixel labels
layout gives readers the first impression of a chart. and propose VividGraph to automatically extract the
Takayuki et al. [30] combined a force-directed layout data of graphs. VividGraph is a framework composed
with a hybrid space-filling method to simultaneously of four modules: (1) classification of directed and undi-
represent both connectivity and categories of multiple- rected graphs, (2) semantic segmentation network, (3)
category graphs. Arc diagrams [31] were proposed to algorithm of node reconnection, and (4) interactive chart
display complex patterns of string data with overlap- redesign. First, we classify the graph into a directed
ping parts. Ka-ping et al. [32] added animation tech- graph and an undirected graph so that the second step
niques to radial layouts so that users can interactively uses different parameters for data extraction. Second,
explore the dynamic evolution of topological relations. VividGraph uses a semantic segmentation network to
Second, the similarity and comparison of color schemes locate the node and edge pixels. Third, we design an
help readers understand directly. Lujin Wang et al. [33] algorithm to reconnect these nodes, which can calculate
used a knowledge-based system to learn color design topological relations. Fourth, VividGraph uses the ex-
rules and apply them to illustrative visualization. Jorge tracted data to redesign graphs according to user needs.
et al. [8] extracted color encodings from a heatmap
and recolored it with different color schemes to make
the heatmap more comprehensible. Third, deeper in- 3.1 Training Dataset Generation
formation can be acquired by interactive operations. Considering that our data extraction algorithm uses a
Thinkbase and Thinkpedia [34] are used to excavate semantic segmentation neural network, we need graph
semantic graphs with interactive operations of large datasets with pixel-level labels. We classify pixel labels
knowledge repositories so that web content can be into three categories, including background, nodes, and
explored more easily. NR [35] provided an interactive edges. When we obtain the category of each pixel, we
database used for visual interactive analytics based on can calculate all network attributes including relation,
the web. Lai et al. [36] combined the chart extraction node size, edge width (thickness), color, etc. However,
technology of bar charts and pie charts with natural it is difficult to label the graphs generated by common
language processing technology and proposed a visual visualization frameworks with pixels. Inspired by some
interactive automatic annotation method. Kim et al. [37] studies [9, 10, 38] using image synthesis as datasets,

1077-2626 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:04 UTC from IEEE Xplore. Restrictions apply.
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TVCG.2022.3153514, IEEE Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 4

400 400 800


40,000 visualization images, of which 30,000 are used
350 350 700
300
250
300
250
600
500
as the training set and 10,000 as the validation set.
200 200 400
150 150 300
100 100 200
50 50 100
0
3 8 13 18 23 28 33 38 43 48
0
0 5 10 15 20 25 30 35 40 45
0
0 5 10 15 20 25
3.2 Graph Classification
(a) Node (b) Edge (c) Maxdegree Our pipeline can also solve the directed graph data
extraction problem. To obtain the extraction results more
Fig. 3. The data distribution of our training set. The
accurately, we first classify the input images. We use
horizontal axis is the number of nodes (a), the number
a convolutional neural network (CNN) to build this
of edges (b) and the maximum degree (c). The vertical
module. CNN is one of the methods commonly used
axis is the number of images.
for image classification in the field of image processing
and has been proven to have good performance [40] on
the famous ImageNet dataset [41]. We tried the VGG19
we provide a graph data generator for inverse visual- model, which has been proven to have good graphical
ization study of graphs, which can automatically mark perception ability [9]. However, we found that it did
the category of each pixel. This generator can control not perform well on the graph classification task. We
the number, color and size of nodes, the width (thick- also tried the shallower network LeNet-1, and it also
ness), number and color of edges, the color and size appeared to underfit. We compared the effects of Incep-
of the background, and the type of directed graph or tionV3, ResNet-101 [42], and Xception. The classification
undirected graph. The pixels in the graph are detected task results of the three networks are shown in Table 1.
as background, node, edge, and arrow (only for the Finally, we choose InceptionV3 [43] among many CNN
directed graph). models.
The generator was made using Python tools and the The complete neural network of the system is shown
Skimage module [39]. The image size is 320 × 320. The in Figure 4. The classification network and the semantic
number of nodes is random between 0 and 49. The segmentation network can be trained together or sepa-
node size is random between 6 and 15 to ensure that rately. We additionally generated 1,000 directed graphs
the nodes do not overlap. To make our model robust and 1,000 undirected graphs for evaluation. The total
to various layouts (e.g., force-oriented or circular), we accuracy was 99.65%. We normalized the input image to
randomly select node positions so that the model can 224 × 224 × 3 dimensions (sRGB space). We chose an ef-
learn various layouts. We add conditions in the process ficient stochastic gradient descent (SGD) optimizer [44].
of generating nodes. If the generated node overlaps with The initial learning rate was set to 0.001, and it decreased
other nodes, it will be re-randomized until it does not dynamically every 5 epochs.
overlap with other nodes. The edge width (thickness) is
random between 1 and 6. The colors of the background, TABLE 1
nodes and edges are all random between (0,0,0) and Performance comparison of different models in graph
(255,255,255). The random attributes can vary in a single classification tasks
graph.
We first generate nodes on the image and then ran- Accuracy
Model Parameters
Directed Undirected Total
domly select two nodes to connect to generate edges.
The graphs generated in this way can cover different VGG19 144 M 50.3% 50.1% 50.2%
Xception 22.8 M 99.3% 99.5% 98.9%
graph families, but it also has some limitations. Giovan- InceptionV3 23.6 M 99.5% 99.8% 99.65%
nangeli et al. [25] proposed a data generation limitation. ResNet-101 44.7 M 88.4% 99.2% 93.8%
Randomly selecting the number of nodes causes the
graph density distribution and the number of edges in After graph classification, the output graph has three
the dataset to follow a power-law distribution. To avoid types of labels (for undirected graphs) or four types of
overfitting, we increase the control of the maximum
degree and the number of edges in the process of
random generation. As shown in Figure 3, the number Conv, ReLU Inception Linear, Softmax

of edges and graph density in our dataset are uniformly


distributed.
We generate 15,000 undirected graph images as the
training set and 5,000 undirected graph images as the
validation set. Then, we add arrows to all images to
generate a directed graph image. The width of the arrow
Classification Extraction
is randomly from 1 to 5, and the length of the arrow is
randomly from 1 to 5. We obtain 15,000 directed graph Fig. 4. VividGraph contains two neural network models:
images as the training set and 5,000 directed graph a classification network and a semantic segmentation
images as the validation set. We generate a total of network.

1077-2626 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:04 UTC from IEEE Xplore. Restrictions apply.
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TVCG.2022.3153514, IEEE Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 5

𝐶𝐶back
𝑂𝑂𝑖𝑖 𝐶𝐶𝑖𝑖 to reconnect nodes to calculate the topological relation
of the graph. First, we extract the connected components
𝑅𝑅𝑖𝑖
labeled as nodes in the image. We perform an erosion
𝐴𝐴 < 𝛾𝛾
operation on these connected components to remove
𝑗𝑗
𝐶𝐶𝑖𝑖1 ,𝑖𝑖2
noise, where the size of the kernel is determined by the
a 𝐴𝐴 ≥ 𝛾𝛾 b c area of the connected components. Erosion and dilation
are two fundamental morphological operations of image
Fig. 5. Illustration of the semantic segmentation and processing [47]. Then, we use the same size kernel to
reconnection algorithm. In (b), white pixels indicate that perform a dilation operation on the connected com-
the pixels are detected as background, blue pixels are ponents to maintain the node radius. These connected
nodes, and orange pixels are edges. There are noise components are the nodes of the graph.
pixels detected as edges near the nodes, but it does not • W, H: image width and height
reduce the efficiency of our algorithm. • Cx,y : input image color, storing 3D data (R, G, B).
 0, back
• Labelx,y = 1, node :semantic segmentation results
labels (for directed graphs).
2, edge

of pixels located at (x, y)
3.3 Semantic Segmentation Network • Oi , Ri : center coordinate and radius of Node i
Giovannangeli et al. [25] noted that it is difficult to rec- • Cij1 ,i2 : color of Edge j
ognize edges with traditional image processing methods • CC: Connected Component
and evaluate the possibility of using CNNs to perceive • Areai : rectangular area surrounding CC i
the network. However, their evaluation task indicators • k: erosion or dilation operation core size
were only the number of edges, nodes and maximum • γ: threshold of edge pixels between two nodes
degree of the graph. Therefore, they could not extract
the topological relation in the images. Many inverse
Algorithm 1 Node Reconnect Algorithm
visualizations in the past were based on simple visual
coding, and their attributes were often relatively simple. Input: {Cx,y |x ∈ [0, W ], y ∈ [0, H]},
For example, the attribute of bar charts is the length of {Labelx,y |x ∈ [0, W ], y ∈ [0, H]}
the bar, the attribute of scatter graphs is the coordinate Output: Oi , Ri , Cij1 ,i2
value of a point, and the attribute of point cloud graphs Extract the CC of Labelx,y = 1
is the number of points. As a high-level visual coding, for all CC do √
the attributes of networks are not only the size, location, k = 31 × Areai
and number of its nodes but also the relations of these Use (k, k) size kernel to perform morphological
nodes. If using the adjacency matrix of the graph as opening on CC
the annotations for training, the data extraction task end for
becomes a deep learning regression task. This method Extract CC again
is unrealistic, and the regression data dimension is too for all CC do
large. Therefore, we split the data extraction task into Oi =The coordinates
√ of the center pixel of CC
two parts. The first part uses the semantic segmentation Ri = 21 × Areai
network to locate the nodes and edges, and the second end for
part reconnects the nodes through our algorithm. for each Node i1 and Node i2 ,i1 6= i2 do
In traditional convolutional neural networks, an input Draw an line connecting Node i1 and Node i2
image often has only one output label. However, in Check A = {(x, y)|Labelx,y = 2, (x, y) ∈ line}
a semantic segmentation network, each pixel has an Perform dilation operation
output label. In this task, we use the elements of the Set γ ∝ lengthline
graph as pixel-level labels (background, node, edge). if |A| > γ then
We choose U-Net [45], a semantic segmentation con- Node i1 and Node i2 are connected
volutional network applied to medical images, as our Cij1 ,i2 = C¯x,y , (x, y) ∈ A
semantic segmentation network model. We normalize end if
the image to a dimension of 320 × 320 × 3 (sRGB space). end for
We choose VGG16 as the U-Net backbone network. We return Oi , Ri , Cij1 ,i2 ;
use the popular deep learning framework Keras [46] to
quickly implement our model.
We draw a line between every two nodes to check the
number of pixels detected as edges on this line. If the
3.4 Reconnection Algorithm number of such pixels exceeds the threshold, the two
When we obtain the pixel-level label output by the nodes are considered to be connected. The size of the
semantic segmentation network, we design an algorithm threshold is proportional to the length of the line. The

1077-2626 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:04 UTC from IEEE Xplore. Restrictions apply.
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TVCG.2022.3153514, IEEE Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 6

a b c d
result of semantic segmentation and the process of the
reconnection algorithm are shown in Figure 5. Blank
Fragment

With the advantages of deep learning methods, our


method can detect edges with various widths. After
reconnecting all nodes, we calculate the width of each
edge. We calculate the average distance from the pixels Fig. 6. Illustration of large graph data extraction. In (b),
detected as the edge category near an edge to the edge, we crop the input image into several equal pieces. In (c),
and the width of this edge is twice this distance. we extract data from each piece separately. In (d), we
We also generate a set of data with edges of differ- reconnect the nodes in a large matrix.
ent widths (thicknesses) to evaluate our algorithm for
extracting edge width. The dataset is generated by the
D3 library, containing 100 images with the resolution of Then, we check the pixels detected as arrow categories
480 × 480. The correct rate of edge width is 70.76%. on the two segments. The segment with more pixels
The graph colors are important. The color of edges connects the target node. This method is suitable for
Cij1 ,i2 is mentioned in Algorithm 1. We take the color directed graphs where most arrows are near the target
of the center pixel of each node as the node color. nodes.
The average color value of the pixels recognized as the
background category is taken as the background color 3.6 Large Network Graph
Cback .
VividGraph is also suitable for large graphs with high
Many real graphs do not have a good colormap,
resolution. When the resolution of a picture is high
which prevents users from clearly observing the re-
(more than 4096 × 4096), we can crop the picture and
sults. We also add the contrast comparison between the
extract data from each part separately. We put each part
foreground and background colors and the background
of the extracted data into a large matrix and then use
color recommendation in the color extraction to comply
Algorithm 1 to reconnect the nodes. To improve the
with W3C standards [48]. Many real-world graphs have
algorithm efficiency, we do not calculate blank image
poor designs, making readers unable to see the nodes
fragments that do not contain node-type and edge-type
or edges clearly. We calculate node color averages and
pixels. If the line connecting two nodes passes through
edge color averages. We then calculate their average
these blank fragments, then the two nodes are definitely
as the foreground color Cf ore and give them the same
not connected, and there is no need to perform the
importance. The number of edge pixels is generally less
connection determination. This method can effectively
than the number of node pixels, but when calculating
solve the error caused by convolution when the semantic
color contrast, the edge color is as important as the node
segmentation network has only 320 × 320 dimensional
color. We use Cback as the background color. First, their
input for large graphs with dense nodes. The steps in
brightness is calculated and then contrast is calculated. If
this method are shown in Figure 6.
the contrast is lower than 7:1, the chart does not comply
We generate 100 large graph images by D3 library. The
with the W3C standard. We choose the color with the
number of nodes in each graph is around 100, and the
highest contrast to replace the background color from
resolution is 8000×8000. The average time consumption
the safe colors preset by W3C. In the sRGB space, the
without the algorithm is 6.05 seconds. The average time
definitions of color brightness and contrast are shown
consumption with this algorithm is 2.37 seconds. Time
in Equation 1. Ci is the color of node i, and Cij1 ,i2 is the
efficiency increased by 60.83%.
color of the edge connecting node i1 and node i2 . We
assume that there are n nodes and m edges here. Cw is
the contrast color of the foreground and background. l 4 A PPLICATION
is the luminance of the color. l1 represents the brighter VividGraph can be applied to different kinds of scenar-
of the foreground and background colors, and l2 is the ios, such as converting sketches into electronic charts,
other. obtaining original data and visualization retargeting.
 1
Pn 1
Pm j
 Cf ore = 2n i=1 Ci + 2m j=1 Ci1 ,i2 The specific examples generated by D3, E-charts and
l = 0.2126R + 0.7152G + 0.0722B (1) AntV are implemented on a PC with an Intel Core i5-

Cw = ll12 +0.05 , l > l 10400F CPU and 32 GB of memory. We use an NVIDIA
+0.05 1 2
GTX1080Ti, 11 GB memory to train our model.
3.5 Directed Graph
For directed graphs, their label categories increase from 4.1 Quick Realization of Sketches
three to four (background, node, edge and arrow). The Network graphs are widely used in our daily work.
arrow category is used to determine the direction of the Enterprise staff need to use graphs to clearly present the
edge. complex relations of different areas. Teachers who teach
We take the coordinates of the midpoint of the two computer science need to vividly explain complicated
nodes and draw a segment with each of the two nodes. graph algorithms with graphs. Designers usually add

1077-2626 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:04 UTC from IEEE Xplore. Restrictions apply.
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TVCG.2022.3153514, IEEE Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 7

graphs in their works to enhance aesthetic feelings.


However, most of these people are not familiar with
visualization tools such as E-charts, D3 and AntV, so
it is difficult for them to generate electronic graphs.
To solve these problems, VividGraph provides a pow-
erful function that converts graph sketches into elec-
tronic charts accurately and quickly. Users only need 2021/6/22 0001.svg

to draw a graph on a piece of paper with colored pens,


take a picture of the sketch and upload this picture to (a) Sketch (b) Reconstruction Result
VividGraph. After these steps, VividGraph extracts the
sketch accurately and shows users a graph on the web
that can be saved as a clear image.
A company’s network administrator, who is respon-
sible for the design, management, and maintenance of
computers and networks, often needs to draw a sketch
of the company’s network connections. She or he can
simply draw the network structure connecting the com-
puters on paper and then use VividGraph to turn it (c) Sketch with Shadow (d) Reconstruction Result
into an interactive graph for modification. As shown in Fig. 7. The reconstruction result of computer network
Figure 7(a), the designer uses blue nodes to represent sketches. The reconstruction results have some bias in
the internet, pink nodes to represent routers, orange color, but the topological relation and position are robust.
nodes to represent firewalls, red nodes to represent
switches, yellow nodes to represent servers, and purple
nodes to represent personal computers. Our method
is robust enough to deal with various types of hand- file:///D:/download/0001.svg 1/1

drawn graphs, even if noise is accidentally introduced


when taking pictures. It is common for the designer to
introduce shadows when taking pictures, as shown in
Figure 7(c). The shadow noise present more challenges
for semantic segmentation. Our semantic segmentation (a) Input (b) Output
model and Algorithm 1 can still obtain the correct
result of relations. However, since the node color in
the original image is a shaded color, our algorithm also
extracts a shaded color instead of the real color of the
node. Our evaluation and user study both include this
type of sketches.

4.2 Extraction of Underlying Data


In scientific research, the reuse of previous research re-
sults facilitate reproducibility and open science. Graphs
published in books or papers always contain complex
relations, such as social networks and knowledge maps.
To make full use of data embedded in these existing
(c) Overlap
graphs, VividGraph can identify the topological relation
of the graph and generate the corresponding JSON file. Fig. 8. The result of extracting the underlying data of the
Additionally, it shows a new graph on the web and graph in the paper [49].
allows users to operate on it.
To obtain underlying bitmap data, users take a picture
or a screenshot of a graph on papers and upload the Neural network zoo [49] uses graphs to clearly and
image to VividGraph. After that, on the one hand, users vividly summarize neural networks in recent years. We
can download the generated JSON file directly to obtain notice that many readers requested graph SVGs on the
the underlying data of the graph. On the other hand, author’s blog to help with their research, but the author
users can obtain the vector of the graph by E-charts, D3 only has bitmap data. We input one of the network
or AntV. Based on the vector, users can perform some pictures in this paper into VividGraph. The output are
operations to change the network, such as adding new shown in Figure 8(b). The output image is almost the
nodes and links, deleting useless nodes, and changing same as the input image (SSIM [50] = 0.8414), which
their position. proves that our extraction is accurate. We overlap the

1077-2626 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:04 UTC from IEEE Xplore. Restrictions apply.
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TVCG.2022.3153514, IEEE Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 8

input image and output image to show the difference


more clearly. As shown in Figure 8(c), there is a slight
offset in the positions of the node and the edge, the
color of the edge has little bias, and other results are
completely correct.
Shutterstock is a global picture market open to artists
and creators. We select some design pictures related to
the graph and put them into VividGraph, as shown in
Figure 9(a). Our system can also restore some blurred
graphs. Figure 9(b) is the result of data extraction and
reconstruction of a chart with a resolution of 160 × 160, (a) Input
which is an obviously blurred graph from the Google
image search system. When a directed graph appears,
VividGraph automatically classifies the graph, uses the
parameters of the directed graph to perform semantic
segmentation, and obtains accurate results, whose re-
sults are shown in Figure 9(c).
VividGraph can also extract the underlying data of
large graphs. Figure 10 shows large graphs from the
D3 gallery, which have a large number of nodes and
dense edges. In the red box, because the edges are dense,
the output has more edges than the input. In the blue
box, due to insufficient pixels in the segmentation result, (b) Segmentation Result
the output has fewer edges than the input. Despite
some minor differences, the topological relations of most
networks were accurately extracted. We also conduct
a user study in Sec. 5, and most of the users indicate
that these minor errors can be accepted in large-scale
network extraction. Users gave the topological relations
of the large network a score of 85.12 (out of 100). The
real world corpus in Figure 14 includes large network.
In addition, we provide the interaction of adding and
deleting nodes and edges in the system, and users can
also eliminate errors through this interactive operation.
(c) Output

4.3 Visualization Redesign Fig. 10. The reconstruction results of large graphs.
VividGraph can extract complex networks with dense
Many existing graphs have poor designs because they
nodes and edges. We show some details in the lower
are not designed by professional visualization workers.
right corner of each picture. We also label some errors in
VividGraph provides automatic and interactive chart
the figure.

Input Input Input

redesign functions, including recoloring, re-layout and


data modification. Color plays a crucial role in visu-
alization. Some poor color schemes make people feel
unpleasant and even make it impossible for people
with color weakness or color blindness to obtain the
Output Output Output
chart characteristics. We show the recoloring results of
low-contrast graphs in Figure 11(b). We can see that
VividGraph also has good performance on graphs with
similar nodes, edges and background colors.
To make it easier for users to access the graph infor-
(a) Shutterstock (b) Blurred (c) Directed
mation, the graph has many different kinds of repre-
Fig. 9. The results of extracting the underlying data of the sentations, such as a tree, force-directed layout, radial
graphs on the internet. The first row is the input bitmap, layout, circle layout, and grid. The force-directed layout
and the second row is the output vector reconstructed is one of the most basic graph layouts. The layout
from the extracted result. algorithm exerts a repulsive force between any two

1077-2626 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:04 UTC from IEEE Xplore. Restrictions apply.
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TVCG.2022.3153514, IEEE Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 9

Input Input Input


through user interviews and questionnaires after using
the system in actual scenarios. We interviewed three
types of people: professional designers, people who are
not proficient in computers, and scientific researchers.
Procedure: Our investigation is divided into five
steps: (1) user screening and informed consent, (2) sys-
Output Output Output tem introduction, (3) questionnaire survey, (4) actual
use of the system, and (5) interview. We first obtained
informed consent from the participants and classified
the participants by occupation. After introducing the
VividGraph functions in detail, we conducted a ques-
tionnaire survey to evaluate the effectiveness of our
(a) Re-layout (b) Recolor (c) Modification method through user points. After that, users tried
Fig. 11. Redesign of graphs. In (a), we convert a graph our system to complete some work tasks related to
into adjacency matrix respectively. In (b), we recolor a their careers, and we obtained user feedback through
graph with poor color design. In (c), we modify a neural interviews.
graph node connection relation. Recruitment: We recruited 60 participants, including
professional designers (more than 2 years in the indus-
try), scientific researchers in the computer field, and or-
nodes and an attractive force between two linked nodes. dinary workers who are not majors in computer science.
All nodes are evenly distributed across the screen to We excluded the data of 3 participants because they did
achieve an ”aesthetically pleasing” presentation [51]. not understand the system. We analyzed the data of the
The adjacency matrix performs better than the net- remaining 57 participants (µage =28.7 years of age, 33
work layout in terms of weight variation and connec- computer professionals, 24 noncomputer professionals)
tivity tasks [52, 53]. In particular, compared to network and obtained feedback through interviews.
representation, the adjacency matrix shows the relations
Questionnaire: We designed 20 questions to evaluate
between nodes clearer, especially when the number of
the effectiveness of our method. We designed 14 ques-
links in the graph is large [54], which is called a dense
tions for participants to score the extraction results. The
graph. For example, there is a technical department
results were extracted from hand-drawn graphs, images
that consists of employees in different positions, such
from Shutterstock, images from the D3 library, images
as product managers, software development engineers,
from academic papers and large graphs. Participants
and test engineers. As shown in Figure 11(a), each node
scored from four aspects: color, node location, node
in the graph represents an employee, and each link
radius, and topological relation by a rating slider. This
represents a cooperation project between employees.
slider was initially set to 0, and the user could move
The department manager needs to count the projects
the slider to the maximum value of 100, which repre-
that every employee has participated in to assign new
sents the degree of satisfaction with the reconstruction
project tasks. The department manager has to count the
results in this aspect. The scoring result is shown in
links one by one with the network in the node-link
Figure 12. Our average score was 89.09, indicating that
graph, while he or she only needs to count nonempty
participants were generally satisfied with our extraction
cells row by row to obtain the degree of each node in
performance. Some graphs in bitmap format saved on-
the adjacency matrix.
line were blurred due to compression or transmission.
To adapt to changeable application scenarios, Vivid-
We present two sets of blurred graph results extracted
Graph also provides some node and link operations
and reconstructed by the system; 80.7% of participants
for users. Users can add a new node with a specified
thought the output results were clearer. The next 4
radius and color and put it anywhere on the canvas or
questions were the results of re-layout or recoloring
delete any existing node. Additionally, it allows users
graphs with poor design, allowing participants to judge
to add links between nodes with any color or delete
whether the graphs helped them more rapidly obtain
any existing link. Through these operations, the designer
the characteristics of graphs, 92.99% of the participants
or researcher can modify the bitmap of the existing
thought it was easier to obtain the characteristics of
graph instead of drawing the graph from scratch, which
the recolored graph. 59.65% of the participants felt that
improves efficiency. In Figure 11(c), we show the mod-
the relayout of the adjacency matrix obtained features
ification of an existing neural graph to another neural
visually faster. When the graph was close to the fully
network.
connected graph, this ratio increased to 73.68%. The
results indicated that 91.23% of participants thought the
5 U SER STUDY images given in this questionnaire were common graphs
To improve the user experience of VividGraph, we in the real world, and 96.49% thought VividGraph could
conducted some user studies. We obtained feedback help them with their daily study or work. The ques-

1077-2626 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:04 UTC from IEEE Xplore. Restrictions apply.
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TVCG.2022.3153514, IEEE Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 10

tionnaire data have been included in the appendix for ifying graphs in papers. Most of the existing graphs in
further study. the paper are bitmaps or sketches. In these interviews,
more than half of the studies were related to deep
100
Color learning. When modifying other neural network struc-
90 tures(e.g. change Inception V1 to Inception V2), they can
80
Edge
Thickness 70
Position save drawing time by using VividGraph. In addition,
60
many researchers are not specialized in visualization.
Our system can also help them generate graphs with
Topologic Node
al Relation Radius reasonable layout and color matching, making the charts
pleasant and easy to obtain features. This type of user
(a) (b)
feedback was the most positive feedback among the
Fig. 12. The questionnaire scoring results. The maximum three types of users. They also hoped that some auxiliary
score is 100. In (a), Our topological relation and location interactive functions can be added in the future.
extraction results are more satisfactory. The extraction of
node size, edge thickness and color is basically accurate,
but still needs improvement. In (b), the graphs from 6 P ERCEPTUAL E VALUATION
D3 library have the most satisfactory performance. The
sketch has room for improvement due to some color bias. We propose two methods to prove the effectiveness of
our method. The first method is NetSimile [55], which
Interview: After the survey, we invited participants to is proven to be a scalable and effective method for
use our system to solve some actual problems in their measuring the structural similarity between two net-
work. We obtain their feedback through interviews to works. NetSimile features integrate the degree of nodes,
improve the system. We divide the feedback into three clustering coefficient, two-hop away neighbors, ego net,
categories according to the user’s occupation. Ordinary etc. It generates a 35-dimensional signature vector from
people who are not computer majors think it is helpful the average, median, and standard deviation of these
to convert hand-drawn graphs into images. In the in- features. It defines the similarity of two networks by
terview, a middle school teacher who teaches natural the Canberra distance of two signature vectors. Given a
sciences and a primary school teacher who teaches graph signature vector xi and the other graph signature
mathematics both stated that graphs often appear in vector yi , where i = 1, 2, 3..., 35, the Canberra distance
elementary education. When they need to make course- between xi and yi is defined as:
ware and examination papers, their electronic drawing 35
ability is so weak that they spend considerable time X kxi − yi k
(2)
drawing graphs. They proposed enhancing the accuracy i=1
kxi k + kyi k
and robustness of hand-drawn graph extraction.
Professional designers feel that the function of con- The lower the Canberra distance is, the more similar
verting hand-drawn graphs into images is not practical. the two networks. This method does not require a one-
They can use drawing software proficiently, and the to-one correspondence of nodes between the two net-
time consumption of hand drawing is greater than works. Therefore, when there are missing nodes in our
drawing directly on the computer. However, they find evaluation experiment, we can still output evaluation
it helpful to convert existing graphs into vectors. They indicators. NetSimile is an indicator that varies with
need to use materials found on the internet in their the number of nodes. To show the range of NetSimile,
design work. The quality of these materials is uneven, we traverse the NetSimile between each graph and the
many of which are bitmaps. They proposed the need graph structure with the same number of nodes. We
for more professional automatic color schemes and more choose the maximum value from them as the maxi-
customized data modification options, such as node size, mum value of NetSimile. When NetSimile reaches this
node color, edge color, and edge width. After adding value, the two graphs are significantly different. In our
some customization options to the system, we again datasets, this value is approximately 24, and we choose
invite these professional designers to use the system. it as the maximum value of the Y-axis.
They also proposed adding some functions to improve The second method is SSIM [50], which is widely
the human-computer interaction experience, such as used to evaluate the similarity of two images. We use
color-picking pens, auxiliary design lines, and canvas the extracted data (topological relation; node coordi-
changes. Embedding our system into drawing software nates, color and size; background color; edge color) to
on iPads or professional drawing software can also be reconstruct the image of the graph. We measure the
considered. In the future, we will continue to explore similarity of the two graphs by calculating the SSIM of
this direction. the reconstructed image and the original image. Given
The last group of people is scientific researchers. They the images x and y of two graphs, where µx is the
are as proficient in computer software as designers. average of x, σx2 is the variance in x, σxy is the covariance
However, their demand lies more in quoting and mod- of x and y, and c is a constant used to maintain stability,

1077-2626 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:04 UTC from IEEE Xplore. Restrictions apply.
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TVCG.2022.3153514, IEEE Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 11

nodes of various sizes, edges of different lengths, and


various distribution relations.

6.2 Real-World Corpus


(a) Chain (SSIM = 0.98, N etSimile = 3.13)
We collect 100 pictures from Google, Shutterstock,
E-charts Gallery, D3 Gallery and user hand-drawn
sketches as our evaluation dataset. Examples of pictures
from each source are given in Figure 14. The average
SSIM obtained by extracting the underlying data in the
(b) Loop (SSIM = 0.98, N etSimile = 2.53)
bitmap by our method and reconstructing the image was
0.95. The average NetSimile was 7.67. The error mainly
comes from some complex graphs and unclear pictures
in the real world. However, VividGraph can still extract
most of the network, especially when some hand-drawn
graphs have background noise. This is not possible for
(c) Egocentric (SSIM = 0.97, N etSimile = 0.06) traditional image processing methods.
In addition, we also find that the image extracted
and reconstructed by our method is clearer and easier
for obtaining chart features than the original image. To
verify this point of view, we also use the evaluation
model of the attention mechanism [57], an automatic
(d) Clique (SSIM = 0.93, N etSimile = 0) model that has been proven to predict the importance
of different elements in the design of data visualization.
Fig. 13. We evaluate four patterns of graphs. The results
This method can give an attention score between 0 and
of the correct recognition are filled with blue. In the gray
255. The higher the score, the more attractive the data
box, we show the missing edge.
visualization elements in the image are. The average
score of the input images was 96.44. The average score
SSIM is defined as: of the output images was 97.99. The score increased
by 1.55. The small change in score proves the accuracy
(2µx µy + c1 ) (2σxy + c2 ) of extraction, while the positive change shows that
SSIM(x, y) =   (3)
µ2x + µ2y + c1 σx2 + σy2 + c2 reconstructing real-world bitmaps through our method
can help users better obtain chart features.
The higher the 1-SSIM is, the more different the two
graphs. When 1-SSIM is 0, the two images are exactly
the same. 6.3 Additional Corpus
In addition, to ensure the accuracy of the similarity To further verify that our method is robust to graphs
evaluation, we turn off the contrast calculation of the with different resolutions and numbers of nodes, we
foreground and background colors and the recoloring build four additional corpora. They include Simple I cor-
function module of the chart with poor design in the pus from the D3 library, Simple II corpus from E-charts,
evaluation experiment. complex corpus from Skimage, and directed graph cor-
pus from D3 library. The graphs of Simple I and Simple
II corpora are undirected graphs with random topologi-
6.1 Pattern Corpus cal relations, random colors, and force-directed layouts.
VividGraph is robust to different network structures. We choose three resolutions of 480 × 480, 640 × 640, and
Some studies [56] related to graphs summarize four 800 × 800 and three types of node numbers around 10,
patterns of networks. There are chain type, loop type, 25, and 50. The results of the evaluation experiment are
egocentric type, and clique type. We select these four shown in Figure 15(a) and (b).
patterns that often appear in the network for evaluation The graphs of a complex corpus do not follow any
experiments. layout, so there are many layouts that have more node-
For each pattern of the network, we select ten images edge overlap, edge-edge overlap, or nodes that are
with a resolution of 960 × 960 for evaluation. Some closer but still non-overlapping. Moreover, their colors
experimental results are shown in Figure 13. The blue are all random, their edges are thin, and there are many
images are reconstructed from the extraction results. The images where human eyes cannot distinguish between
network structures of these four modes were extracted the background and the edges. The results shown in
correctly. The average SSIM was 0.97, and the average Figure 15(c) demonstrate that although the indicators
NetSimile was 1.43. High SSIM and low NetSimile show decline, our method is still effective.
that VividGraph can accurately extract networks of var- We test three sets of directed graphs with a reso-
ious patterns. This is because our training set covers lution of 640 × 640 and the number of nodes around

1077-2626 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:04 UTC from IEEE Xplore. Restrictions apply.
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TVCG.2022.3153514, IEEE Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 12

Hand-drawn Academic Paper E-Charts Gallery D3 Gallery Shutterstock Simple Ⅰ Simple Ⅱ Directed Graph Complex

Real World Corpus (100) Additional Corpus (3,000)

Fig. 14. Our perceptual evaluation dataset examples.

1-SSIM NetSimile 1-SSIM NetSimile Time/Seconds Time/Seconds


1 24 1 24
3 3
0.8 20 0.8 20
16 16
0.6 0.6 2.5 2.5
12 12
0.4 0.4
8 8
0.2 4 0.2 4 2 2
0 0 0 0
V=10 V=25 V=50 V=10 V=25 V=50
1.5 1.5
1-SSIM 480×480 640×640 800×800 1-SSIM 480×480 640×640 800×800 V=10 V=25 V=50 V=10 V=25 V=50
NetSimile 480×480 640×640 800×800 NetSimile 480×480 640×640 800×800
480×480 640×640 800×800 480×480 640×640 800×800
(a) Simple I Dataset (b) Simple II Dataset
(a) Simple I Dataset (b) Simple II Dataset
1-SSIM NetSimile 1-SSIM NetSimile
1 24 1 24 Time/Seconds Time/Seconds
0.8 20 0.8 20
3 3
16 16
0.6 0.6 2.9
12 12
0.4 0.4 2.5
8 8 2.8
0.2 4 0.2 4 2.7
0 0 0 0 2
V=10 V=25 V=50 V=10 V=20 V=30 2.6
1-SSIM 320×320 480×480 640×640 1-SSIM 480×480 640×640 800×800 1.5 2.5
NetSimile 320×320 480×480 640×640 NetSimile 480×480 640×640 800×800
V=10 V=25 V=50 V=10 V=20 V=30

(c) Complex Dataset (d) Directed Graph Dataset 320×320 480×480 640×640 480×480 640×640 800×800

Fig. 15. Evaluation experiment results. In (a), the results (c) Complex Dataset (d) Directed Graph Dataset
demonstrate that VividGraph can deal with graphs of Fig. 16. Time performance comparisons of evaluation
various image resolutions and different scale nodes. In experiments.
(b), the results are similar to those of Simple I. In (c),
the results demonstrate that although 1-SSIM raises,
VividGraph is still effective in extreme cases. In (d), the good time performance for graphs with fifty nodes in
results demonstrate that VividGraph is also suitable for general.
data extraction of directed graphs.

7 L IMITATION AND D ISCUSSION


10, 20, and 30. The results are shown in Figure 15(d).
Since our pipeline has an efficient classification network The current version of VividGraph also has some lim-
and model parameters trained separately for directed itations in extracting graph data. First, our method is
graphs, our method is also effective for nondense, clear data-driven, so the model is based on our training
directed graphs. set. We have tried to make our training dataset cover
various graph styles, including the scale of graph, the
graph density, the graph layout, the arrow size, the
6.4 Time Performance image resolution, etc. However, we cannot cover the
We also evaluate the time performance of VividGraph space for all visualizations. When the number of nodes
on these datasets in Figure 16. In the figure, the X-axis in the picture exceeds 50, the accuracy of the model
represents the number of nodes, the Y-axis represents will decrease. we cannot deal with the overlapping
the number of seconds used, and the lines of different nodes. When the nodes overlap, more errors will occur.
colors represent images of different resolutions. As shown in Figure 15, the error of complex dataset
The resolution of the image has little effect on the time is higher than that of simple dataset. The reason is
efficiency of VividGraph. The time used will increase that complex dataset includes images with overlapping
significantly as the number of nodes increases. This is nodes. The error also increased when V = 50. We
because the time spent on VividGraph mainly comes propose an extraction algorithm of large graphs to solve
from Algorithm 1. The time complexity of Algorithm 1 those high-resolution images with more than 50 nodes.
is O(n2 ), where n is the number of nodes. Therefore, In the process of cutting the picture, if the node is cut
when processing large-scale graphs, the calculation time into a non-circular shape, there may be problems with
of VividGraph will become longer. VividGraph has a semantic segmentation or errors in the size of the nodes.

1077-2626 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:04 UTC from IEEE Xplore. Restrictions apply.
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TVCG.2022.3153514, IEEE Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 13

As shown in Figure 10(b), the segmentation result of a [4] P. Zhang, C. Li, and C. Wang, “Viscode: Embedding information
node in the detail is smaller than the ground truth. in visualization images using encoder-decoder network,” IEEE
TVCG, 2020.
Second, the reconnection algorithm can be further [5] D. Jung, W. Kim, H. Song, J.-i. Hwang, B. Lee, B. Kim, and J. Seo,
improved. The segmentation results are not completely “Chartsense: Interactive data extraction from chart images,” in
accurate. When the pixels of the edge category are Proceedings of the 2017 chi conference on human factors in computing
systems, 2017, pp. 6706–6717.
identified as the background category, the edge width [6] M. Savva, N. Kong, A. Chhajta, L. Fei-Fei, M. Agrawala, and
will be smaller than the original image, as shown in J. Heer, “Revision: Automated classification, analysis and re-
Figure 8(c). When the pixels of the background category design of chart images,” in Proceedings of the 24th annual ACM
symposium on User interface software and technology, 2011, pp. 393–
are identified as the edge category, the color of the 402.
edges is closer to back ground, as shown in Figure 8(b). [7] N. Siegel, Z. Horvitz, R. Levin, S. Divvala, and A. Farhadi, “Fig-
Besides, these conditions will result in the offset of the ureseer: Parsing result-figures in research papers,” in European
Conference on Computer Vision. Springer, 2016, pp. 664–680.
edge, as shown in Figure 8(c). For the sketches with
[8] J. Poco, A. Mayhua, and J. Heer, “Extracting and retargeting color
shadow as shown in Figure 7(c,d), our algorithm cannot mappings from bitmap images of visualizations,” IEEE TVCG,
infer the original color of shadowed nodes. We plan vol. 24, no. 1, pp. 637–646, 2017.
to increase the accuracy of the segmentation model or [9] D. Haehn, J. Tompkin, and H. Pfister, “Evaluating ‘graphical
perception’with cnns,” IEEE TVCG, vol. 25, no. 1, pp. 641–650,
optimize the heuristic rules to improve this in the future. 2018.
Third, when the three nodes are collinear, if we have [10] L. Yuan, W. Zeng, S. Fu, Z. Zeng, H. Li, C.-W. Fu, and H. Qu,
no prior knowledge and cannot distinguish with our “Deep colormap extraction from visualizations,” IEEE TVCG,
2021.
eyes, our algorithm will think that any two of these [11] M. Bostock, V. Ogievetsky, and J. Heer, “D3 data-driven docu-
nodes are connected. As shown in Figure 11(c), the ments,” IEEE TVCG, vol. 17, no. 12, pp. 2301–2309, 2011.
collinear nodes are considered as connected with each [12] D. Li, H. Mei, Y. Shen, S. Su, W. Zhang, J. Wang, M. Zu, and
other. W. Chen, “Echarts: a declarative framework for rapid construc-
tion of web-based visualization,” Visual Informatics, vol. 2, no. 2,
pp. 136–146, 2018.
8 C ONCLUSION [13] J. Harper and M. Agrawala, “Deconstructing and restyling d3
visualizations,” in Proceedings of the 27th annual ACM symposium
We proposed a method to extract the underlying data on User interface software and technology, 2014, pp. 253–262.
of the graph image. We also proposed a pipeline called [14] A. Rohatgi, “Webplotdigitizer,” 2017.
[15] A. Gross, S. Schirm, and M. Scholz, “Ycasd–a tool for capturing
VividGraph, which combines a semantic segmentation and scaling data from graphical representations,” BMC bioinfor-
model and a node connection algorithm. This frame- matics, vol. 15, no. 1, p. 219, 2014.
work is suitable for undirected graphs, directed graphs, [16] J. Poco and J. Heer, “Reverse-engineering visualizations: Recov-
blurred graph images, hand-drawn graphs, large graph ering visual encodings from chart images,” in Computer Graphics
Forum, vol. 36, no. 3. Wiley Online Library, 2017, pp. 353–363.
images, and other graphs. VividGraph can be used [17] F. Zhou, Y. Zhao, W. Chen, Y. Tan, Y. Xu, Y. Chen, C. Liu, and
to quickly transform designer sketches, restore blurred Y. Zhao, “Reverse-engineering bar charts using neural networks,”
graph pictures, obtain the underlying data of bitmaps to Journal of Visualization, pp. 491–435, 2021.
[18] A. Flower, J. W. McKenna, and G. Upreti, “Validity and reliability
generate vectors, modify graph data, redesign graphs, of graphclick and datathief iii for data extraction,” Behavior
etc. modification, vol. 40, no. 3, pp. 396–413, 2016.
In the future, we plan to improve the time efficiency [19] G. G. Méndez, M. A. Nacenta, and S. Vandenheste, “ivolver:
Interactive visual language for visualization extraction and re-
and accuracy of pipelines for large-scale networks by construction,” in Proceedings of the 2016 CHI Conference on Human
optimizing networks and algorithms. We will combine Factors in Computing Systems, 2016, pp. 4073–4085.
the model of this paper with OCR technology to im- [20] W. S. Cleveland and R. McGill, “Graphical perception: Theory,
prove our system. Cooperating with designers to im- experimentation, and application to the development of graphical
methods,” Journal of the American statistical association, vol. 79, no.
prove the human-computer interaction experience of the 387, pp. 531–554, 1984.
system is also under our consideration. [21] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based
learning applied to document recognition,” Proceedings of the
IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
ACKNOWLEDGMENTS [22] K. Simonyan and A. Zisserman, “Very deep convolutional net-
works for large-scale image recognition,” pp. 1–14, 2015.
We would like to acknowledge the support from NSFC
[23] F. Chollet, “Xception: Deep learning with depthwise separable
under Grant No. 61802128 and 62072183. convolutions,” in Proceedings of the IEEE CVPR, 2017, pp. 1251–
1258.
[24] H. Haleem, Y. Wang, A. Puri, S. Wadhwa, and H. Qu, “Evaluating
R EFERENCES the readability of force directed graph layouts: A deep learning
[1] J. Yang, Y.-G. Jiang, A. G. Hauptmann, and C.-W. Ngo, “Evaluat- approach,” Computer Graphics and Applications, IEEE, 2019.
ing bag-of-visual-words representations in scene classification,” [25] L. Giovannangeli, R. Bourqui, R. Giot, and D. Auber, “Toward
in Proceedings of the international workshop on Workshop on multi- automatic comparison of visualization techniques: Application
media information retrieval, 2007, pp. 197–206. to graph visualization,” Visual Informatics, 2020.
[2] Y. Liu, X. Lu, Y. Qin, Z. Tang, and J. Xu, “Review of chart recogni- [26] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional net-
tion in document images,” in Visualization and Data Analysis 2013, works for semantic segmentation,” in Proceedings of the IEEE
vol. 8654. International Society for Optics and Photonics, 2013, CVPR, 2015, pp. 3431–3440.
pp. 384–391. [27] V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep
[3] E. Brynjolfsson and K. McElheran, “The rapid adoption of data- convolutional encoder-decoder architecture for image segmenta-
driven decision-making,” American Economic Review, vol. 106, tion,” IEEE transactions on pattern analysis and machine intelligence,
no. 5, pp. 133–39, 2016. vol. 39, no. 12, pp. 2481–2495, 2017.

1077-2626 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:04 UTC from IEEE Xplore. Restrictions apply.
information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/TVCG.2022.3153514, IEEE Transactions on Visualization and Computer Graphics
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 14

[28] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing analysis,” in Proceedings of the SIGCHI conference on human factors
network,” in Proceedings of the IEEE CVPR, 2017, pp. 2881–2890. in computing systems, 2013, pp. 483–492.
[29] N. Kong and M. Agrawala, “Graphical overlays: Using layered [53] M. Okoe, R. Jianu, and S. Kobourov, “Node-link or adjacency
elements to aid chart reading,” IEEE transactions on visualization matrices: Old question, new insights,” IEEE TVCG, vol. 25, no. 10,
and computer graphics, vol. 18, no. 12, pp. 2631–2638, 2012. pp. 2940–2952, 2018.
[30] T. Itoh, C. Muelder, K.-L. Ma, and J. Sese, “A hybrid space- [54] M. Ghoniem, J.-D. Fekete, and P. Castagliola, “On the readability
filling and force-directed layout method for visualizing multiple- of graphs using node-link and matrix-based representations: a
category graphs,” in 2009 IEEE Pacific Visualization Symposium. controlled experiment and statistical analysis,” Information Visu-
IEEE, 2009, pp. 121–128. alization, vol. 4, no. 2, pp. 114–135, 2005.
[31] M. Wattenberg, “Arc diagrams: Visualizing structure in strings,” [55] M. Berlingerio, D. Koutra, T. Eliassi-Rad, and C. Faloutsos,
in IEEE Symposium on Information Visualization, 2002. INFOVIS “Netsimile: A scalable approach to size-independent network
2002. IEEE, 2002, pp. 110–116. similarity,” arXiv preprint arXiv:1209.2684, 2012.
[32] K.-P. Yee, D. Fisher, R. Dhamija, and M. Hearst, “Animated [56] Y. Wang, G. Baciu, and C. Li, “Smooth animation of structure
exploration of graphs with radial layout,” in Proc. IEEE InfoVis evolution in time-varying graphs with pattern matching,” in
2001, 2001, pp. 43–50. SIGGRAPH Asia 2017 Symposium on Visualization, ser. SA ’17, New
[33] L. Wang, J. Giesen, K. T. McDonnell, P. Zolliker, and K. Mueller, York, NY, USA, 2017.
“Color design for illustrative visualization,” IEEE TVCG, vol. 14, [57] Z. Bylinskii, N. W. Kim, P. O’Donovan, S. Alsheikh, S. Madan,
no. 6, pp. 1739–1754, 2008. H. Pfister, F. Durand, B. Russell, and A. Hertzmann, “Learning
[34] C. Hirsch, J. Hosking, and J. Grundy, “Interactive visualiza- visual importance for graphic designs and data visualizations,”
tion tools for exploring the semantic graph of large knowledge in Proceedings of the 30th Annual ACM symposium on user interface
spaces,” in Workshop on Visual Interfaces to the Social and the software and technology, 2017, pp. 57–69.
Semantic Web (VISSW2009), vol. 443, 2009, pp. 11–16.
[35] R. Rossi and N. Ahmed, “The network data repository with
interactive graph analytics and visualization,” in Twenty-Ninth
Sicheng Song received his B.Eng. from
AAAI Conference on Artificial Intelligence, 2015.
Hangzhou Dianzi University, China, in 2019.
[36] C. Lai, Z. Lin, R. Jiang, Y. Han, C. Liu, and X. Yuan, “Automatic
He is working toward the Ph.D. degree with
annotation synchronizing with textual description for visualiza-
East China Normal University, Shanghai, China.
tion,” in Proceedings of the 2020 CHI Conference on Human Factors
His main research interests include information
in Computing Systems, 2020, pp. 1–13.
visualization and visual analysis.
[37] D. H. Kim, E. Hoque, and M. Agrawala, “Answering questions
about charts and generating visual explanations,” in Proceedings
of the 2020 CHI Conference on Human Factors in Computing Systems,
2020, pp. 1–13.
[38] Y. Ma, A. K. Tung, W. Wang, X. Gao, Z. Pan, and W. Chen,
“Scatternet: A deep subjective similarity model for visual analysis
of scatterplots,” IEEE TVCG, vol. 26, no. 3, pp. 1562–1576, 2018.
[39] S. Van der Walt, J. L. Schönberger, J. Nunez-Iglesias, F. Boulogne, Chenhui Li received Ph.D. from the Depart-
J. D. Warner, N. Yager, E. Gouillart, and T. Yu, “scikit-image: ment of Computing at Hong Kong Polytechnic
image processing in python,” PeerJ, vol. 2, p. e453, 2014. University, in 2018. He is an associate profes-
[40] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classi- sor with the School of Computer Science and
fication with deep convolutional neural networks,” in Advances Technology at East China Normal University.
in neural information processing systems, 2012, pp. 1097–1105. He received ICCI*CC Best Paper Award (2015)
[41] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, and SIGGRAPH Asia Sym. Vis. Best Paper
“Imagenet: A large-scale hierarchical image database,” in 2009 Award (2017). He has served as a local chair
IEEE CVPR. Ieee, 2009, pp. 248–255. in VINCI2019. He works on the research of in-
[42] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for formation visualization and computer graphics.
image recognition,” in Proceedings of the IEEE CVPR, 2016, pp.
770–778.
[43] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna,
“Rethinking the inception architecture for computer vision,” in
Yujing Sun received her B.Eng. from East
Proceedings of the IEEE CVPR, 2016, pp. 2818–2826.
China Normal University, in 2020. She is work-
[44] L. Bottou, “Stochastic gradient descent tricks,” in Neural networks: ing toward the Master degree with East China
Tricks of the trade. Springer, 2012, pp. 421–436. Normal University, Shanghai, China. Her main
[45] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional research interests include information visualiza-
networks for biomedical image segmentation,” in International tion and visual analysis.
Conference on Medical image computing and computer-assisted inter-
vention. Springer, 2015, pp. 234–241.
[46] F. Chollet et al., “Keras,” https://github.com/keras-team/keras,
2015.
[47] J. Serra, “Image analysis and mathematical morphol-ogy,” 1982.
[48] B. Caldwell, M. Cooper, L. G. Reid, G. Vanderheiden,
W. Chisholm, J. Slatin, and J. White, “Web content accessibility
guidelines (wcag) 2.0,” WWW Consortium (W3C), 2008. Changbo Wang is a professor with the School
[49] S. Leijnen and F. v. Veen, “The neural network zoo,” in Multi- of Computer Science and Technology, East
disciplinary Digital Publishing Institute Proceedings, vol. 47, no. 1, China Normal University. He received his Ph.D.
2020, p. 9. degree at the State Key Lab of CADCG of Zhe-
[50] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image jiang University in 2006. He was a post-doctor
quality assessment: from error visibility to structural similarity,” of the State University of New York in 2010.
IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, His research interests mainly include computer
2004. graphics, information visualization, visual Ana-
[51] S. E. Palmer, K. B. Schloss, and J. Sammartino, “Visual aesthetics lytics, etc. He is serving as the Young AE of
and human preference,” Annual review of psychology, vol. 64, pp. Frontiers of Computer Science, and PC member
77–107, 2013. for several international conferences.
[52] B. Alper, B. Bach, N. Henry Riche, T. Isenberg, and J.-D. Fekete,
“Weighted graph comparison techniques for brain connectivity

1077-2626 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more
Authorized licensed use limited to: ASTAR. Downloaded on February 08,2023 at 04:50:04 UTC from IEEE Xplore. Restrictions apply.
information.

You might also like