0% found this document useful (0 votes)

4 views

Very Fast Interactive Visualization of Large Sets of - 2015 - Procedia Computer

Very-Fast-Interactive-Visualization

Uploaded by

Peter Paul

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Very Fast Interactive Visualization of Large Sets of - 2015 - Procedia Computer

Very-Fast-Interactive-Visualization

Uploaded by

Peter Paul

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Procedia Computer Science

Volume 51, 2015, Pages 572–581

ICCS 2015 International Conference On Computational Science

Very Fast Interactive Visualization of Large Sets of

High-dimensional Data
Witold Dzwinel and Rafał Wcisło
AGH University of Science and Technology, Cracow, Poland
{dzwinel|wcislo}@agh.edu.pl

Abstract
The embedding of high-dimensional data into 2D/3D space is the most popular way of data
visualization. Despite recent advances in developing of very accurate dimensionality reduction
algorithms, such as BH-SNE, Q-SNE and LoCH, their relatively high computational complex-
ity still remains the obstacle for interactive visualization of truly large datasets consisting of
M ∼ 106+ of high-dimensional N ∼ 103+ feature vectors. We show that a new clone of the
multidimensional scaling (MDS) – nr-MDS – can be up to two orders of magnitude faster than
the modern dimensionality reduction algorithms. We postulate its linear O(M ) computational
and memory complexities. Simultaneously, our method preserves in 2D/3D target spaces high
separability of data, similar to that obtained by the state-of-the-art dimensionality reduction
algorithms. We present the eﬀects of nr-MDS application in visualization of data repositories
such as 20 Newsgroups (M = 1.8 · 104 ), MNIST (M = 7 · 104 ) and REUTERS (M = 2.67 · 105 ).

Keywords: multidimensional scaling, particle-based stress minimization, interactive visualization

1 Introduction
Visual exploration of high-dimensional data is an essential component of the big data analytics
[10]. Such the data can be represented as a set N Y = {yi = (yi1 , ..., yiN )}i=1,...,M , of
M , N -dimensional feature vectors, with N ∼ 103+ . Data mapping B : Y → X to a visually
perceived space n X = {xi = (xi1 , ..., xin )}i=1,...,M , where n = dim X = 2(3), which
preserves signiﬁcant topological properties of Y in X, is a truly algorithmic challenge.
A plethora of dimensionality reduction techniques have been developed over the last decade
(see e.g. [19, 2, 5]). However, most of them are not capable of retaining both the local and
global properties of data in a single map. One of the most popular embedding technique –
multidimensional scaling (MDS) – computes a low-dimensional map of points with interpoint
Euclidean distances d = {rij }M ×M that approximate input distance matrix Δ = {Dij }M ×M ,
where Dij = D(yi , yj ) → 1 is a dissimilarity measure. The type of properties of Y (local
or global), which will be preserved in X, depends on the error function V : n×M → 1

572 Selection and peer-review under responsibility of the Scientiﬁc Programme Committee of ICCS 2015

c The Authors. Published by Elsevier B.V.
doi:10.1016/j.procs.2015.05.325
Very Fast Interactive Visualization of Large Sets of High-dimensional Data Dzwinel, Wciso

(called “the stress”). The stress V (Δ − d) is the function of the norm of diﬀerence between
distance matrices from the source Y and the target X spaces. In the classical MDS algorithms,
a conﬁguration of points X = {xi }i=1...M , which minimizes the stress:
k
k m
E(X) = min V (Δ − d) = min wij Dij − rij (1)
x x
i<j

represents the result of Y embedding in 2D/3D target space. The values of k, m are the
parameters of V (·). For low-dimensional data (N ∼ 10), MDS mapping is able to properly
approximate the structure of the original data in 2D/3D spaces. However, the result of visual-
ization of high-dimensional data with N ∼ 102+ is highly unreliable.
For N ∼ 102+ , due to the “curse of dimensionality” principle, MDS produces the low-
dimensional map with a characteristic spherical shape, and a low density in the sphere center
(e.g., see Figure 2b). Thus the structure of the source space becomes strongly distorted and
the clusters separability can be lost. Second, both the memory and computational complexities
of the classic MDS algorithm are at least O(M 2 ) , i.e., prohibitively high for visualization of
datasets with M ∼ 105+ . Moreover, the minimization of criterion (1) requires sophisticated and
slow optimization methods to find the global minimum of V (·). The gradient based techniques
can get stuck in a local minimum.
The first problem can be solved by using recently developed dimensionality reduction algo-
rithms that preserve small pairwise distances via stochastic neighbor embedding, such as t-SNE
and its clones [18]. From methodological point of view, these algorithms consist in replacing
V (Δ − d) in the criterion (1) with a more general one V (Δ, d). In t-SNE and its clones the
low-dimensional map is obtained as the result of minimization of V (Δ, d), which is equal to the
Kullback-Leibler divergence between the two probability distributions computed in the source
and the target spaces with respect to the locations of the points in the latter one. Despite im-
pressive success of t-SNE, its memory and computational complexities remain the same as for
MDS. Larger data sets (M ∼ 104+ ) can be visualized by using approximated versions of t-SNE,
such as BH-SNE [17] and Q-SNE [7] or LoCh [4] algorithms. The computational √ complexities
of these techniques are O(M log M ) for BH-SNE and Q-SNE while O(M M ) for LoCH.
Though these algorithms reconstruct very precisely the local properties of the original data
in the low-dimensional map - such as the nearest neighbors lists and distances distribution
– they are still too slow to be used in interactive visualization of big data. For example, the
fastest BH-SNE algorithm requires as many as 12 minutes (serial code running on Intel® Core™
i5-4258U, 2.6GHz) to visualize 7 · 104 samples of MNIST data set [17]. Up to now, there are
not any published parallel implementations of BH-SNE, Q-SNE and LoCH to estimate how
the algorithms will perform on GPU or MIC architectures. Moreover, although the memory
complexity was decreased in BH-SNE to O(M ), the number of distances, which has to be kept
in the operational memory, remains too large. It is proportional to the squared average number
of points in Barnes–Hut cells.
Moreover, the gradient descent algorithm used for minimization of the error function is too
stiff. It does not allow for interactive steering during visualization, e.g., changing manually the
error function type and its parameters to find the best embedding. Meanwhile, interactive data
mining enables to match optimal machine learning tools to data being explored and allows for
deeper penetration of their parameter spaces.
We describe a new clone of MDS – nr-MDS – which outperforms the state-of-the-art dimen-
sionality reduction techniques. We postulate its O(M ) memory and computational complexities.
Thus it is not only orders of magnitude faster than state-of-the-art dimensionality reduction
techniques but, unlike classical MDS, properly reconstructs separability of high-dimensional

573
Very Fast Interactive Visualization of Large Sets of High-dimensional Data Dzwinel, Wciso

data sets in 2D/3D spaces. In the following section we present our idea, its methodological
background and, consequently, the results of experiments. Finally, we present the conclusions
and discuss directions for future work.

2 MDS with reduced dissimilarity matrix

In [12] we presented the parallel versions of particle-based MDS algorithm, which do not use
all of distances Δ = {Dij }M ×M , i.e., Lmax = M ·(M − 1)/2. We showed that the acceptable
approximations of the source space Y can be obtained by using only a small subset D, nd = #D
of all distances from Δ. In the simplest case, they can be taken in random from the full set of
Lmax distances.
The minimum number of distances, which is suﬃcient to obtain a stable solution in X is
equal to Lmin = n·M − n·(n + 1)/2 , i.e., for small n and large M Lmin = n·M . Therefore,
the number of distances nd taken in random from the full set of Lmax distances, allowing
for unambiguous reconstruction data topology, nd ≥ Lmin . As shown in [12], our RANDOM
algorithm performs surprisingly well, giving errors and quality of embeddings very similar to
the full-distance MDS but in incomparable shorter time. The timings obtained suggest O(M )
complexity of RANDOM MDS algorithm for nd ∼ 20M−50M . Moreover, the SUBSET version
of RANDOM algorithm, in which the distances were selected in a more rigorous way [12],
demonstrate excellent scaling properties on the multi-thread architectures, such as CPU and
GPU, speeding up the serial code by additional two orders of magnitude.

X0 = set_random_conﬁguration()
do (for timestep n = 0, 1, 2, ...k ; forkevery particle i = 1...M )
Fn,i := − j∈Di wij ∇(Dij − rn,ij )m // 1. compute forces acting on each particle i
// from particles j, where rn,ij := xn,i − xn,j
pn+1,i := pn,i + 21 Fn,i · Δt // 2. calculate momenta pi for each particle
if |pn+1,i | > (m = 1) · vmax then
pn+1,i := vmax · pn+1,i /|pn+1,i | // 3. avoid numerical blow up
xn+1,i := xn,i + pn+1,i · Δt/(m = 1) // 4. move particles, create new conﬁguration Xn+1
Fn+1,i := calculate_forces() // 5. calculate new forces Fn+1,i
pn+1,i := ϕ · (pn+1,i + 12 Fn+1,i · Δt) // 6. use “damping” factor ϕ
until (|E(Xn+1 − E(xn )| < δ)

Figure 1: The pseudocode of the particle-based MDS. Di are the sets of random neighbors of
each data sample i.

For minimization of V (Δ − d) we used very fast and robust particle-based MDS method,
which pseudocode is shown in Figure 1. The same approach can be used for minimization of
any error function V (Δ, d), such as that employed in t-SNE. Shortly, the particle-based MDS
represents a data sample yi as a particle xi in 2D/3D target space. They interact with each
k k
other via semi-harmonic forces f ij (Xn ) = −grad[wij (Dij − rn,ij )m ]. Every two particles i
and j located in xn,i and xn,j in timestep n are separated by the Euclidean distance rn,ij =
xn,i − xn,j . The total force Fn,i acting on a particle i is equal to the sum of f ij (Xn ) forces
from particles belonging to its random neighbors rni . We simulate the Newtonian dynamics of
(initially) random conﬁguration of particles X0 in discrete timesteps n. We assume additionally
that particles Xn evolves in a dissipative environment represented by a damping factor ϕ (see
Figure 1). The particle system Xn converges to a stationary state X with a minimal potential

574
Very Fast Interactive Visualization of Large Sets of High-dimensional Data Dzwinel, Wciso

energy E(X) = V (Δ, d). Due to more extensive penetration of the stress function domain, the
particle-based method allows for obtaining a better minimum (i.e., closer to the global one)
that any other gradient-based techniques [1, 3]. Moreover, particle-based MDS is really fast.
By using the “leap-frog” numerical scheme, which is very resistant on fluctuations and very
stable, one can use large timesteps Δt without worrying about unphysical effects and precision
of Xn time evolution. In the beginning of simulation the particle shifts can be very large. Thus,
initially, the method works like the simulated annealing algorithm with high temperature. The
numerical “heating” can be controlled by using large or variable damping factor ϕ (from [0, 1))
and small value of vmax . At the end, close to the attractor point, the algorithm behaves
like the gradient descent method. The particle-base heuristics is fast and self-adaptive. It
requires only two parameters to be set (Δt and ϕ). Furthermore, it fits perfectly to interactive
visualization purposes. The time evolution of the particle system X can be visualized and
controlled interactively by changing not only the parameters but the stress function type as
well.
However, as shown in Figure 2b, MDS yields very poor embeddings for high-dimensional and
multi-class data [17]. As shown in Figure 2a, even PCA can outperform MDS in visualization of
10 class 784-dimensional MNIST data set. This is the consequence of “curse of dimensionality”
principle, i. e. in high-dimensional spaces all the distances between data samples are practically
the same. Moreover, the high-dimensional data usually create a low-dimensional manifold Σ
embedded in the feature space Y, such that dim Σ dim Y. Because the manifold can be
very complicated and folded, the distances between samples in Y can be completely different
than those on the manifold Σ. Therefore, in the high-dimensional spaces, only small pairwise
distances between neighboring data vectors are reliable. The isomap-based MDS assumes that
pair-wise distances between the nearest neighboring points are only known. To approximate all
other distances on the manifold Σ the Floyd-Warshall algorithm can be employed [16].

Figure 2: The results of mapping into 2D space of MNIST data set by using a) PCA (M =
7 · 104 ), b) MDS (M = 2.5 · 103 , k = 1, m = 2), c) nr-MDS (M = 2.5 · 103 , k = 1, m = 2).

The main idea of nr-MDS is different than in [16]. We are trying to “unfold” the manifold
and fit it into the hyperplane of minimal dimensionality by modifying the distances Δ in
the original multi-dimensional space Y, prior to its embedding in X space. Simultaneously, to
speed up the calculations, we select the smallest possible set of distances D = {Dij }, nd = #D,
sufficient to obtain a proper data embedding. To this end, we construct the neighbor graph
G(V, E) where V is the set of nodes representing data samples yi ∈ Y, and E is the set of edges
corresponding to the selected distances D. To define E we compute the sets Knni and Krni
of edges connecting each point yi ∈ Y to their nn nearest neighbors and rn random neighbors,
respectively. The nearest neighbor of particle yi cannot besimultaneously its random neighbor.
Then, the set of edges E = Knn ∪ Krn, where Knn = i=M Knni and Krn = i=M Krni .

575
Very Fast Interactive Visualization of Large Sets of High-dimensional Data Dzwinel, Wciso

It means that a node i ∈ V has at least nt = nn + rn outgoing edges to its neighbors (both
the nearest and random ones) and, possibly, some additional edges connecting it to other data
samples j ∈ V, which do not belong to Ei = Knni ∪ Krni while i belongs to their nearest or
random neighbors. Let us assume that all distances between the nodes of G(V, K) connected
by the edges from Knn are set to 0. The number of zeroed distances nnd = #Knn > nn, while
the number of random non-zero distances rnd = #Krn > rn. The total number of distances
nd = #E = #D = nnd + rnd should be nd ≥ Lmin .
Setting the distances of the nearest data samples to 0 causes the manifold “shrinkage” while
non-zero random distances produce the “surface tension”, which is trying to unfold it. The
quality of the final result depends mainly on selecting the proper balance between the nearest
nn and random neighbors rn set for each data sample. This balance depends on the stress
function type and the structure of data visualized. The influence of long random distances
inscribing the global structure of data visualized can be additionally controlled by employing
different weights wnn and wrn for forces acting between the nearest neighbors and random
neighbors respectively (see step 1 in the pseudocode from Figure 1). Hence, in nr-MDS, the
forces acting on each particle i are computed as follows:
k m k m
k
Fn,i := − wrn · ∇ Dij − rn,ij − wnn · ∇ rn,ij (2)
j∈Krni j∈Knni

In all the tests we have assumed that wnn and wrn are constant and wnn = 1. As shown in
Figure 2c and Figure 3a,b, we obtained very good cluster separation similar to [18, 9], better
than in [6] and much better than in [20]. In the following section we present the preliminary
results of visualization of much larger data sets.

3 Experiments
3.1 Data sets
We performed experiments using three very popular data sets: MNIST, REUTERS and the
20-NG. We brieﬂy describe each of them below and in Table 1.

MNIST. The source MNIST data set (http://yann.lecun.com/exdb/mnist/) contains M =

7 · 104 grayscale handwritten digit images of size N = 28 × 28 = 784 pixels, each of which
belongs to one of 10 classes. We made experiments for various number of vectors taken
randomly from the source data set. We use as input both the raw pixel values and vectors
preprocessed by using PCA to reduce data dimensionality (784D → 30D).

REUTERS. This text corpus is known as “Reuters Corpus, Volume1” or RCV1. We used
a subset of this repository (http://www.jmlr.org/papers/v5/lewis04a.html) containing
M = 2.67 · 105 documents and 8 topics. We used two types of data. (1) Data set of
2000-dimensional BoW (bag of words) vectors of the most frequently used words without
“stop words” and preprocessed by using Porter2 stemming. The vectors are normalized
to 1 and, ﬁnally, preprocessed by using tf.idf ranking. (2) Data set of 30-dimensional
vectors preprocessed by PCA projection of 6000-dimensional structural – semantic vector
representation. The data set is highly imbalanced. For M = 2.67 · 105 the topics and
their sizes as follows: GDIS (8081), C12 (11 944), ECAT (89 131), M131 (25 719), C151
(89 683), E212 (27 257), G154 (1342), M143 (21 774).

576
Very Fast Interactive Visualization of Large Sets of High-dimensional Data Dzwinel, Wciso

20NG. The twenty newsgroups data set (http://qwone.com/ jason/20Newsgroups/) is a col-

lection of M = 1.8·104 documents, partitioned evenly across 20 various newsgroups. They
2000D BoW vectors were preprocessed in a similar way as the REUTERS dataset. The
30D vectors were obtained by using PCA projection 2000D→30D.

For all of these data sets we calculate the values of classiﬁcation factor cf. It is deﬁned as
the ratio of the number of data samples for which the closest neighbor belongs to the same
class, to the total number of samples M . We compute the cf values for the source data set and
its PCA and nr-MDS embeddings in 3D space. Of course, the cf value cannot be a measure of
the quality of embedding, but is one of important indicators of data separation [11].

Data #vectors dimensionality coordinates metrics # cf cf cf

name (M ) (N ) classes source PCA nr-MDS
N-D 3D 3D
MNIST 70 000 784 grayscale Euclidian 10 0.9745 - 0.9547
10 100 784 (0-255) 0.9491 - 0.9031
2 500 784 0.9044 - 0.8316
70 000 PCA (784→30) 0.9778 0.4364 0.9655
10 000 PCA (784→30) 0.9491 0.4334 0.9238
2 500 PCA (784→30) 0.9044 0.4620 0.8492

REUTERS 53 386 2000 tf.idf 1-cosinus 8 0.8856 0.6498 0.8335

26 693 2000 0.8675 0.6822 0.8071
266 931 PCA (6000→30) 0.9359 0.5614 0.9005
53 386 PCA (6000→30) 0.9265 0.5461 0.8900
26 693 PCA (6000→30) 0.9232 0.4971 0.8873

20NG 18 759 2000 tf.idf 1-cosinus 20 0.6593 - 0.6162

18 759 PCA (2000→30) 0.6917 0.2911 0.6004

Table 1: The list of data sets used for tests.

3.2 Experimental setup

The most of important parameters are as follows. Setting up k, m, w parameters allows
for selecting a proper “stress” function (1). They can be changed anytime, even during the
simulation. Similarly, the parameters of optimization procedure such as the timestep Δt and
dumping factor ϕ, can be controlled interactively. To prevent numerical blow-up the velocity
of particles can be bounded i. e. |vi | < vmax . The other parameters such as nn, rn and the
weights wrn and wnn should be properly matched. For example, for the stress with k = 1 and
m = 2 (1) and for 2D and 3D maps we assume the smallest possible number of neighbors pair
(nn, rn) i.e., (2, 1) and (3, 1), respectively. This assumption yields the number of distances
computed very close to the minimal number of degrees of freedom Lmin . In fact, the distances
to the nearest neighbors have not to be stored in the computer memory, because we assume
that all of them are set to 0. To diminish the influence of long range (random) distances we
decreased the weights of the forces acting between the samples and their random neighbors as
follows: wrn = 0.01 for 2D and wrn = 0.1 for 3D maps while wnn =1 in all the experiments.
When the value of nn is larger the influence of long distances in 3D is weaker. For the stress
function with k = 1 and m = 1, the influence of long distances is even smaller, so the weights
wrn are set to 1. The simulations were performed by using non-optimized and scalar version of
C++ code working in on-line mode with visualization framework.

577
Very Fast Interactive Visualization of Large Sets of High-dimensional Data Dzwinel, Wciso

The code was run on a notebook computer with an Intel® Core™ i5 3317U CPU with 16GB
of memory running at 1.7GHz (i.e., 60% of the CPU clock used in [17]).

3.3 Results
The 2D mappings of larger MNIST data sets than that visualized in Figure 2c are shown in
Figure 3. They were performed for the original data set with dimensionality N = 784 and the
stress function with k = 1 and m = 2 (1). As shown, in Figure 3a,b, the results of embeddings
for large M have improved signiﬁcantly. The cluster overlaps are much smaller for the full
dataset with M = 7 · 104 samples, what is reﬂected in increasing cf value (see Table 1). It
grows from 0.68 for M = 2.5 · 103 to 0.88 for the full MNIST data set. The cluster separability
increases considerably for the stress function with k = 1 and m = 1. However, as shown in
Figure 3c, the clusters shrink too much, and their local structure becomes invisible. As shown
in Table 1, the PCA preprocessing of raw data improves distinctly (2-3%) the values of cf and
data separability.

Figure 3: The results of mapping into 2D space of MNIST by using nr-MDS for the original
data set (N = 784) a) M = 104 , k = 1, m = 2, w = 0, b) M = 7 · 104 , k = 1, m = 2, w = 0, c)
M = 7 · 104 , k = 1, m = 1, w = 0.

For the full MNIST data set the number of distances for nn = 2 i nr = 1 is equal to
L = 1.8 · 105 , i.e., very close to the minimal conﬁguration Lmin = 2M = 1.4 · 105 . However,
comparing L to Lmax = 2.4 · 109 , the improvement in memory and computational complexity
in comparison to the classical MDS is overwhelming. Moreover, because all the distances to
the nearest neighbors are set to 0, only distances to random neighbors should be stored. The
same “minimal” setup was used for M from 2.5 · 103 to 7 · 104 MNIST sets. This suggests that
the computational complexity can be O(M ). The measured timings obtained for visualization
of MNIST data sets in 2D are as follows: t = 1 sec. for M = 2.5 · 103 (Figure 2c); t = 3.5
sec. for M = 104 (Figure 3a); t = 20 sec. for M = 7 · 104 (Figure 3b,c)). As reported in
[17], the BH-SNE needs 750 sec. to obtain similar separation of clusters. By normalizing the
computational power of CPUs used for visualization in both cases, nr-MDS is more than 60
times faster than BH-SNE on MNIST dataset. Moreover, in this particular case (2D mapping),
we need to store only one random distance per data sample, one index of random neighbor and
2 indices of its nearest neighbors. So, the memory complexity is also very low.
However, MNIST is the data set of rather well separated-classes. Much more diﬃcult are
text repositories such as REUTERS and 20NG. The documents and news are represented as
tf.idf vectors based on the most frequent words. Such the data representation is often very noisy
and, in some situations, highly unreliable. Moreover, REUTERS is a very unbalanced data set.
Nevertheless, as shown in Figure 3 and Table 1, our algorithm was able to obtain 2D/3D maps
of well separated clusters. It is also clearly seen that 30D data sets of PCA projections of 6000D

578
Very Fast Interactive Visualization of Large Sets of High-dimensional Data Dzwinel, Wciso

Figure 4: The results of mapping into 2D space of REUTERS data set by using nr-MDS. a) data
set with M = 5.3 · 104 , b) M = 2.67 · 105 samples were used, however, the largest two clusters
C151 (8.9 · 104 ) and ECAT (8.9 · 104 ) were removed for better visualization of the smaller topics
(M = 9.6 · 104 ).

Figure 5: The results of mapping into 2D space of 20NG data set by using nr-MDS. a) The
clusters of all 20 topics. b) The scatter-plot showing 14 the most visible clusters.

structural-semantic representation of documents and their 3D embeddings give greater values of

cf than 2000D BoW vectors. The computational time needed for embedding of M = 2.67 · 105
samples in 2D is about 2 minutes what is acceptable for interactive data visualization. However,
for out-of-date Intel® i5 3317U CPU (1.7GHz) processor used in the experiments, this is rather
the upper limit.
The 20NG data set is the most diﬃcult data set out of the three. It is rather small,
contains shorter documents from as many as 20 classes. It is more noisy than REUTERS.
Additionally, many classes are sematically and structurally very close and they overlap each
other. As shown in Figure 5a, the nr-MDS produces 2D map, which reﬂects the global data
structure, reconstructing properly similarity relations between the newsgroups. As shown in

579
Very Fast Interactive Visualization of Large Sets of High-dimensional Data Dzwinel, Wciso

Figure 5b, as many as 14 clusters can be clearly visible. The result of clusters separability is
inferior to that obtained by van der Maaten (see http://lvdmaaten.github.io/tsne/), which uses
Simon’s FDA generated features. However, it is much better than others, which used BoW text
representation (see e.g. [15]) where only six the best separated clusters are visualized and there
is no room too ﬁt 14 more).

4 Conclusions
In this paper we present the effects of adaptation of the particle-based MDS to interactive visu-
alization of large high-dimensional data sets. First, to decrease the computational complexity of
multidimensional scaling we compute the smallest possible set of distances between the original
high-dimensional data Y, which are sufficient to obtain a proper data embedding in X. We
show that for each data vector in Y only indices of a small number of the nearest neighbors nn
and distances to the random neighbors rn have to be computed, such that nn+nr ≥ n. Second,
the main idea of nr-MDS consists in modification of these distances prior to data embedding
into 2D/3D space setting all distances between a data sample and its nearest neighbors nn to
zero. Thus, only the distances to random neighbors nr are stored. All of these improvements
make the algorithm very fast. Much faster than the top dimensionality reduction techniques
due to its lower O(M ) computational and memory complexities. The efficiency of nr-MDS can
be increased considerably by employing modern multi-thread computer architectures such as
GPU and MIC [2, 8, 12, 13, 14]. Furthermore, the results of embeddings obtained by nr-MDS
are amazingly good, reconstructing in 2D/3D properly separated classes from high-dimensional
data sets. In general, this concept can be applied for various type of data reduction techniques,
because it does not depend on the particular technique used.
Of course, our algorithm has also some important drawbacks. It yields poor results in re-
constructing the lists of the nearest neighbors (NN). The “crowding problem”, similar as in SNE
[18], causes that particles gather in the center of clusters (see Figure 3, especially Figure 3c).
Furthermore, the method is not fully understood, and more research on its theoretical basis
is required. Also more experiments on various data sets and direct comparisons to other data
reduction techniques should be carried out.
However, the major goal in interactive data visualization is very fast generation of 2D/3D
maps, which preserve in X the clusters separation from Y. The nr-MDS fits perfectly for this
purpose. The accurate reconstruction of the nearest neighbors lists for each particle i in X is
rather a secondary requirement. First, the NN lists were computed from the original data and
are known before mapping, so they can be recalled in any moment of interactive visualization
and presented visually, e.g., as the spines pointing from a given sample to its nn nearest neigh-
bors. Second, the lists of NN are not reliable from the context of both measurement errors and
the “curse of dimensionality” principle. Thus the knowledge they provide is very volatile. Mean-
while, just the clusters in various data resolutions represent “primordial” granules of knowledge.
When the accurate reconstruction of the nearest neighbors in 2D/3D will be required, the nr-
MDS can be employed first to generate an initial configuration for more advanced techniques
such as BH-SNE (Q-SNE). Summing up, we expect that the method presented in this paper
will be the inspiration of developing more efficient and reliable interactive visualization tools
for exploration of really big data sets.
Acknowledgments. This research is supported by the Polish National Center of Science
(NCN) DEC-2013/09/B/ST6/01549. We thank dr Marcin Kurdziel and dr Wojciech Czech
from AGH Department of Computer Science for their contribution to this paper.

580
Very Fast Interactive Visualization of Large Sets of High-dimensional Data Dzwinel, Wciso

References
[1] M. Andrecut. Molecular dynamics multidimensional scaling. Physics Letters A, 373(23-24):2001–
2006, 2009.
[2] Jan De Leeuw and Patrick Mair. Multidimensional scaling using majorization: Smacof in r.
Department of Statistics, UCLA, 2011.
[3] Witold Dzwinel and Jan Błasiak. Method of particles in visual clustering of multi-dimensional
and large data sets. Future Generation Computer Systems, 15(3):365–379, 1999.
[4] Samuel G. Fadel, Francisco M. Fatore, Felipe S.L.G. Duarte, and Fernando V. Paulovich. Loch:
A neighborhood-based multidimensional projection technique for high-dimensional sparse spaces.
Neurocomputing, 150, Part B(0):546–556, 2015.
[5] S.L. France and J.D. Carroll. Two-way multidimensional scaling: A review. Systems, Man, and
Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 41(5):644–661, Sept 2011.
[6] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks.
Science, 313(5786):504–507, 2006.
[7] Stephen Ingram and Tamara Munzner. Dimensionality reduction for documents with nearest
neighbor queries. Neurocomputing, 150, Part B(0):557–569, 2015.
[8] Stephen Ingram, Tamara Munzner, and Marc Olano. Glimmer: Multilevel mds on the gpu.
Visualization and Computer Graphics, IEEE Transactions on, 15(2):249–261, March 2009.
[9] Tomoharu Iwata, Neil Houlsby, and Zoubin Ghahramani. Active learning for interactive visualiza-
tion. Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics,
pages 342–350, 2013.
[10] Daniel Keim, Huamin Qu, and Kwan-Liu Ma. Big-data visualization. Computer Graphics and
Applications, IEEE, 33(4):20–21, July 2013.
[11] Robson Motta, Rosane Minghim, Alneu de Andrade Lopes, and Maria Cristina F. Oliveira. Graph-
based measures to assist user assessment of multidimensional projections. Neurocomputing, 150,
Part B(0):583–598, 2015.
[12] Piotr Pawliczek and Witold Dzwinel. Interactive data mining by using multidimensional scaling.
Procedia Computer Science, 18(0):40–49, 2013. 2013 International Conference on Computational
Science.
[13] Piotr Pawliczek, Witold Dzwinel, and David A. Yuen. Visual exploration of data by using mul-
tidimensional scaling on multicore cpu, gpu, and mpi cluster. Concurrency and Computation:
Practice and Experience, 26(3):662–682, 2014.
[14] Piotr Pawliczek, Witold Dzwinel, and David A. Yuen. Interactive data exploration with multi-
thread mic computer architecture. International Conference on Artificial Intelligence and Soft
Computing. ICAISC-2015, June 14-18, Zakopane, Poland, 2015 (submitted for publication).
[15] Ruslan Salakhutdinov and Geoffrey Hinton. Semantic hashing. International Journal of Approxi-
mate Reasoning, 50(7):969–978, 2009.
[16] Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric framework for
nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.
[17] Laurens Van Der Maaten. Accelerating t-sne using tree-based algorithms. J. Mach. Learn. Res.,
15(1):3221–3245, January 2014.
[18] Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine
Learning Research, 9(2579-2605):85, 2008.
[19] Laurens JP van der Maaten, Eric O Postma, and H Jaap van den Herik. Dimensionality reduction:
A comparative review. Journal of Machine Learning Research, 10(1-41):66–71, 2009.
[20] Zhao Zhang, Tommy W.S. Chow, and Mingbo Zhao. Trace ratio optimization-based semi-
supervised nonlinear dimensionality reduction for marginal manifold visualization. Knowledge
and Data Engineering, IEEE Transactions on, 25(5):1148–1161, May 2013.

581

The AI Wealth Creation Blueprint PDF
67% (3)
The AI Wealth Creation Blueprint PDF
50 pages
Procedural Generation in Game Design
93% (14)
Procedural Generation in Game Design
339 pages
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
88% (8)
Christopher Langan - CTMU, The Cognitive-Theoretic Model of The Universe, A New Kind of Reality Theory
56 pages
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
81% (48)
Gayle Laakmann McDowell - Cracking The Coding Interview - 189 Programming Questions and Solutions (2015, CareerCup)
708 pages
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
100% (10)
Gödel, Escher, Bach - An Eternal Golden Braid (20th Anniversary Edition) by Douglas R. Hofstadter (Charm-Quark) PDF
821 pages
A Coomer's Guide To AI Dungeon
No ratings yet
A Coomer's Guide To AI Dungeon
30 pages
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
100% (25)
Chris Bailey - Hyperfocus - The New Science of Attention, Productivity, and Creativity-Viking (2018)
306 pages
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
100% (24)
The Art of Asking ChatGPT For High-Quality Answers A Complete Guide To Prompt Engineering Techniques (Ibrahim John) (Z-Library)
52 pages
Banana Pancakes - Ukulele Chord Chart
100% (1)
Banana Pancakes - Ukulele Chord Chart
2 pages
The Fabric of Reality
100% (1)
The Fabric of Reality
6 pages
75 Productivity Hacks - System Sunday
100% (7)
75 Productivity Hacks - System Sunday
75 pages
Military Remote Viewing Manual
100% (5)
Military Remote Viewing Manual
72 pages
Buja Swayne Littman Dean Hofman Chen JCGS 2008 06 Vol17 Data Visualization With Multidimensional Scaling
No ratings yet
Buja Swayne Littman Dean Hofman Chen JCGS 2008 06 Vol17 Data Visualization With Multidimensional Scaling
30 pages
Efficient Hierarchical Clustering of Large Data Sets Using P-Trees
No ratings yet
Efficient Hierarchical Clustering of Large Data Sets Using P-Trees
4 pages
Interactive Data Visualization With Multidimensional Scaling
No ratings yet
Interactive Data Visualization With Multidimensional Scaling
32 pages
2nd Paper
No ratings yet
2nd Paper
12 pages
A Discriminate Multidimensional Mapping For Small Sample Database
No ratings yet
A Discriminate Multidimensional Mapping For Small Sample Database
9 pages
Euclidean Distance Matrix Trick
No ratings yet
Euclidean Distance Matrix Trick
3 pages
DAS732-lecture_2024-08-08
No ratings yet
DAS732-lecture_2024-08-08
47 pages
Geos: Geodesic Image Segmentation: Abstract. This Paper Presents Geos, A New Algorithm For The E Cient
No ratings yet
Geos: Geodesic Image Segmentation: Abstract. This Paper Presents Geos, A New Algorithm For The E Cient
14 pages
multitask-smv-learning-for-remote-sensing-data-classification
No ratings yet
multitask-smv-learning-for-remote-sensing-data-classification
8 pages
seminarMDS
No ratings yet
seminarMDS
24 pages
Ee5551 Newproj Report
No ratings yet
Ee5551 Newproj Report
18 pages
sensors-24-07463-v2
No ratings yet
sensors-24-07463-v2
42 pages
Decmaps New
No ratings yet
Decmaps New
21 pages
IJIGSP-Template 2nd Paper
No ratings yet
IJIGSP-Template 2nd Paper
11 pages
(BDI) Bidirectional Learning For Offline Infinite-Width Model-Based Optimization
No ratings yet
(BDI) Bidirectional Learning For Offline Infinite-Width Model-Based Optimization
14 pages
A Connection Between Score Matching
No ratings yet
A Connection Between Score Matching
13 pages
Springer 2019
No ratings yet
Springer 2019
12 pages
MKLforDR Lin2011 PDF
No ratings yet
MKLforDR Lin2011 PDF
14 pages
Chapter 8 Mmedia
No ratings yet
Chapter 8 Mmedia
18 pages
AA_research_paper
No ratings yet
AA_research_paper
28 pages
Cerra SCC10 Final
No ratings yet
Cerra SCC10 Final
6 pages
UTNet A Hybrid Transformer Architecture For Medical Image Segmentation PDF
No ratings yet
UTNet A Hybrid Transformer Architecture For Medical Image Segmentation PDF
11 pages
A Comparative Study On Image Compression in Cloud Computing
No ratings yet
A Comparative Study On Image Compression in Cloud Computing
3 pages
IAP Chapter7 2021
No ratings yet
IAP Chapter7 2021
76 pages
3191 Random Projections For Manifold Learning
No ratings yet
3191 Random Projections For Manifold Learning
8 pages
Animating Multidimensional Scaling To Visualize - Dimensional Data Sets
No ratings yet
Animating Multidimensional Scaling To Visualize - Dimensional Data Sets
3 pages
00science Saul Nonlinear Dimensionality Reduction
No ratings yet
00science Saul Nonlinear Dimensionality Reduction
4 pages
Schaeffer et al - An Information-Theoretic Understanding of Maximum Manifold Capacity Representations
No ratings yet
Schaeffer et al - An Information-Theoretic Understanding of Maximum Manifold Capacity Representations
8 pages
Context-Guided Multi-View Stereo With Depth Back-Projection
No ratings yet
Context-Guided Multi-View Stereo With Depth Back-Projection
12 pages
Dynamic Time Warping Algorithm Review PDF
No ratings yet
Dynamic Time Warping Algorithm Review PDF
23 pages
Graph Drawing by High-Dimensional Embedding: Journal of Graph Algorithms and Applications
No ratings yet
Graph Drawing by High-Dimensional Embedding: Journal of Graph Algorithms and Applications
20 pages
Big Data Algo
No ratings yet
Big Data Algo
20 pages
Image Operations in Discrete Radon Space
No ratings yet
Image Operations in Discrete Radon Space
6 pages
Sciencedirect: Cluster Based Application Mapping Strategy For 2D Noc
No ratings yet
Sciencedirect: Cluster Based Application Mapping Strategy For 2D Noc
8 pages
Efficient Spatio-Temporal Data Association Using Multidimensional Assignment For Multi-Camera Multi-Target Tracking
No ratings yet
Efficient Spatio-Temporal Data Association Using Multidimensional Assignment For Multi-Camera Multi-Target Tracking
12 pages
HL Icdcs2020
No ratings yet
HL Icdcs2020
11 pages
2461 Out of Sample Extensions For Lle Isomap Mds Eigenmaps and Spectral Clustering
No ratings yet
2461 Out of Sample Extensions For Lle Isomap Mds Eigenmaps and Spectral Clustering
8 pages
Robust Risk Aggregation With Neural Networks: Stephan Eckstein Michael Kupper Mathias Pohl May 27, 2020
No ratings yet
Robust Risk Aggregation With Neural Networks: Stephan Eckstein Michael Kupper Mathias Pohl May 27, 2020
42 pages
NeurIPS 2020 Banditpam Almost Linear Time k Medoids Clustering via Multi Armed Bandits Paper
No ratings yet
NeurIPS 2020 Banditpam Almost Linear Time k Medoids Clustering via Multi Armed Bandits Paper
12 pages
Incoherent Projection Matrix Design For
No ratings yet
Incoherent Projection Matrix Design For
5 pages
Min-Sum Soft Symbol Estimation for Iterative MMSE Detection
No ratings yet
Min-Sum Soft Symbol Estimation for Iterative MMSE Detection
8 pages
Chapter 2 - Graphics Primitive
0% (1)
Chapter 2 - Graphics Primitive
38 pages
sliterature_review_DPC
No ratings yet
sliterature_review_DPC
12 pages
2211.13220: TetraDiffusion: Tetrahedral Diffusion Models for 3D Shape Generation
No ratings yet
2211.13220: TetraDiffusion: Tetrahedral Diffusion Models for 3D Shape Generation
39 pages
A New Method D of Multi Dimensiona Al Scaling
No ratings yet
A New Method D of Multi Dimensiona Al Scaling
6 pages
Hetero-Modal Variational Encoder-Decoder For
No ratings yet
Hetero-Modal Variational Encoder-Decoder For
9 pages
A Fast Algorithm of Temporal Median Filter For Background Subtraction
No ratings yet
A Fast Algorithm of Temporal Median Filter For Background Subtraction
8 pages
Reversible Gromov-Monge Sampler for Simulation-Based Inference
No ratings yet
Reversible Gromov-Monge Sampler for Simulation-Based Inference
49 pages
Cloudsvm: Training An SVM Classifier in Cloud Computing Systems
No ratings yet
Cloudsvm: Training An SVM Classifier in Cloud Computing Systems
13 pages
Censoring Sensitive Data From Images
No ratings yet
Censoring Sensitive Data From Images
6 pages
NeurIPS-2020-towards-maximizing-the-representation-gap-between-in-domain-out-of-distribution-examples-Paper
No ratings yet
NeurIPS-2020-towards-maximizing-the-representation-gap-between-in-domain-out-of-distribution-examples-Paper
12 pages
Luvizon_2D3D_Pose_Estimation_CVPR_2018_paper
No ratings yet
Luvizon_2D3D_Pose_Estimation_CVPR_2018_paper
10 pages
New Bridges Between Deep Learning and Partial Differential Equations
No ratings yet
New Bridges Between Deep Learning and Partial Differential Equations
5 pages
BTL DSRR
No ratings yet
BTL DSRR
11 pages
Metric Learning
No ratings yet
Metric Learning
10 pages
Diname2019 0021
No ratings yet
Diname2019 0021
4 pages
Geometry-Based Distortion Measures For Space Deformation - Emil Saucan
No ratings yet
Geometry-Based Distortion Measures For Space Deformation - Emil Saucan
17 pages
A Distributed Linear Least Squares Method For Precise Localization With Low Complexity in Wireless Sensor Networks
No ratings yet
A Distributed Linear Least Squares Method For Precise Localization With Low Complexity in Wireless Sensor Networks
16 pages
Probabilistic Feature Selection and Classification Vector Machine
No ratings yet
Probabilistic Feature Selection and Classification Vector Machine
27 pages
Digital Image Processing: Fundamentals and Applications
From Everand
Digital Image Processing: Fundamentals and Applications
Fouad Sabry
No ratings yet
Characterization of Water Quality Conditions in The Klang - 2015 - Procedia Envi
No ratings yet
Characterization of Water Quality Conditions in The Klang - 2015 - Procedia Envi
6 pages
Classification of External Stakeholders Pressures I - 2015 - Procedia Environmen
No ratings yet
Classification of External Stakeholders Pressures I - 2015 - Procedia Environmen
6 pages
Association Rule Mining For Island and Mainland Ro - 2023 - Transportation Resea
No ratings yet
Association Rule Mining For Island and Mainland Ro - 2023 - Transportation Resea
8 pages
Charge Equalization Systems For Serial Valve Regulated Lead Aci - 2016 - Energy
No ratings yet
Charge Equalization Systems For Serial Valve Regulated Lead Aci - 2016 - Energy
8 pages
A Toolkit To Co Develop Human Machine Interfaces Toget - 2023 - Transportation R
No ratings yet
A Toolkit To Co Develop Human Machine Interfaces Toget - 2023 - Transportation R
8 pages
Automated Requirements Extraction For Scientific - 2015 - Procedia Computer Sci
No ratings yet
Automated Requirements Extraction For Scientific - 2015 - Procedia Computer Sci
10 pages
Changing CPU Frequency in CoMD Proxy Application Offloa - 2015 - Procedia Comput
No ratings yet
Changing CPU Frequency in CoMD Proxy Application Offloa - 2015 - Procedia Comput
10 pages
Barriers To Widespread Adoption of Electric Vehicles An Analysis of Consumer Attitudes and Perceptions
No ratings yet
Barriers To Widespread Adoption of Electric Vehicles An Analysis of Consumer Attitudes and Perceptions
13 pages
Behavioral Pathway Analysis of Pedestrian Inj - 2023 - Transportation Research I
No ratings yet
Behavioral Pathway Analysis of Pedestrian Inj - 2023 - Transportation Research I
13 pages
Case Study Based Scenario Analysis Comparing GHG e - 2023 - Cleaner Engineering
No ratings yet
Case Study Based Scenario Analysis Comparing GHG e - 2023 - Cleaner Engineering
10 pages
A Unified and Memory Efficient Framework For Simulating - 2015 - Procedia Compu
No ratings yet
A Unified and Memory Efficient Framework For Simulating - 2015 - Procedia Compu
10 pages
Can AI Predict The Impact of Its Implementati - 2024 - Renewable and Sustainable
No ratings yet
Can AI Predict The Impact of Its Implementati - 2024 - Renewable and Sustainable
21 pages
Balancing Profitability of Energy Production Societ - 2022 - Renewable and Sust
No ratings yet
Balancing Profitability of Energy Production Societ - 2022 - Renewable and Sust
13 pages
Bayesian Extreme For Modeling High PM10 Concent - 2015 - Procedia Environmental
No ratings yet
Bayesian Extreme For Modeling High PM10 Concent - 2015 - Procedia Environmental
6 pages
Transportation System Functions During Hurrica - 2023 - Transportation Research
No ratings yet
Transportation System Functions During Hurrica - 2023 - Transportation Research
10 pages
Baggage Dissociation For Sustainable Air Trave - 2023 - Transportation Research
No ratings yet
Baggage Dissociation For Sustainable Air Trave - 2023 - Transportation Research
11 pages
What Do Drivers Prefer A Field Study On Longitudin - 2023 - Transportation Rese
No ratings yet
What Do Drivers Prefer A Field Study On Longitudin - 2023 - Transportation Rese
8 pages
A Decade of Fuel Poverty in England A Spatio Temporal An - 2023 - Energy Resear
No ratings yet
A Decade of Fuel Poverty in England A Spatio Temporal An - 2023 - Energy Resear
15 pages
ZnAl ZR Hydrotalcite Like Compounds Activated at Low Temperatur - 2017 - Energy
No ratings yet
ZnAl ZR Hydrotalcite Like Compounds Activated at Low Temperatur - 2017 - Energy
8 pages
An Under Developed Dimension in Upgrading Energy Inefficie - 2023 - Energy Resea
No ratings yet
An Under Developed Dimension in Upgrading Energy Inefficie - 2023 - Energy Resea
11 pages
Will Energy Strategies To Reduce Oil Imports by Countries Depen - 2019 - Energy
No ratings yet
Will Energy Strategies To Reduce Oil Imports by Countries Depen - 2019 - Energy
11 pages
Transportation Research Part C: Yingjie Zhang, Zhen (Sean) Qian, Frances Sprei, Beibei Li
No ratings yet
Transportation Research Part C: Yingjie Zhang, Zhen (Sean) Qian, Frances Sprei, Beibei Li
16 pages
Microplastics and Road Markings The Rol - 2022 - Transportation Research Part D
No ratings yet
Microplastics and Road Markings The Rol - 2022 - Transportation Research Part D
21 pages
A Comprehensive Review of Fuzzy Multi Criteria Decision Ma - 2019 - Energy Strat
No ratings yet
A Comprehensive Review of Fuzzy Multi Criteria Decision Ma - 2019 - Energy Strat
22 pages
Vehicle Telematics Data For Urban Freig - 2022 - Transportation Research Part D
No ratings yet
Vehicle Telematics Data For Urban Freig - 2022 - Transportation Research Part D
18 pages
Time Based Fuel Efficient Aircraft Descents Thrus - 2023 - Transportation Resea
No ratings yet
Time Based Fuel Efficient Aircraft Descents Thrus - 2023 - Transportation Resea
19 pages
How To Compare Sustainability Impacts - 2022 - Transportation Research Part D
No ratings yet
How To Compare Sustainability Impacts - 2022 - Transportation Research Part D
18 pages
Indirect CO2 Emissions Caused by The Fuel - 2022 - Transportation Research Part
No ratings yet
Indirect CO2 Emissions Caused by The Fuel - 2022 - Transportation Research Part
12 pages
Energy Efficiency in Ship Operations - Exp - 2022 - Transportation Research Part
No ratings yet
Energy Efficiency in Ship Operations - Exp - 2022 - Transportation Research Part
29 pages
From Strategic Noise Maps To Receiver Cen - 2022 - Transportation Research Part
No ratings yet
From Strategic Noise Maps To Receiver Cen - 2022 - Transportation Research Part
17 pages
My Ai Cheat List
100% (11)
My Ai Cheat List
3 pages
The Secrets of A Slot Machine
No ratings yet
The Secrets of A Slot Machine
4 pages
2045: The Year Man Becomes Immortal
No ratings yet
2045: The Year Man Becomes Immortal
9 pages
Teas Topics To Study
100% (12)
Teas Topics To Study
6 pages
Mercity - Ai-Guide To Fine-Tuning LLMs Using PEFT and LoRa Techniques
No ratings yet
Mercity - Ai-Guide To Fine-Tuning LLMs Using PEFT and LoRa Techniques
25 pages
Mythic Magazine #009
100% (3)
Mythic Magazine #009
27 pages
Improved Statistical Test
87% (171)
Improved Statistical Test
20 pages
Download Complete Artificial Intelligence and Problem Solving 1st Edition Danny Kopec PDF for All Chapters
100% (4)
Download Complete Artificial Intelligence and Problem Solving 1st Edition Danny Kopec PDF for All Chapters
61 pages
Algebra Workbook
100% (3)
Algebra Workbook
299 pages
Next Generation Sequencing Data Analysis
No ratings yet
Next Generation Sequencing Data Analysis
435 pages
Ghosh S. Mathematics and Computer Science Vol 1. 2023
No ratings yet
Ghosh S. Mathematics and Computer Science Vol 1. 2023
743 pages
Prompt Engineering - Links and Resources
No ratings yet
Prompt Engineering - Links and Resources
2 pages
A Methodology For Detecting Credit Card Fraud
No ratings yet
A Methodology For Detecting Credit Card Fraud
60 pages
Deep Thinking Where Machine Intelligence PDF
100% (1)
Deep Thinking Where Machine Intelligence PDF
3 pages
Scientific American - April 2024
100% (1)
Scientific American - April 2024
88 pages
Websites and Tools Links
No ratings yet
Websites and Tools Links
3 pages
List of Deepfake Tools
No ratings yet
List of Deepfake Tools
5 pages
Cognitive Bias Cheat Sheet
100% (1)
Cognitive Bias Cheat Sheet
17 pages

Very Fast Interactive Visualization of Large Sets of - 2015 - Procedia Computer

Uploaded by

Very Fast Interactive Visualization of Large Sets of - 2015 - Procedia Computer

Uploaded by

Procedia Computer Science

Volume 51, 2015, Pages 572–581

ICCS 2015 International Conference On Computational Science

Very Fast Interactive Visualization of Large Sets of

Keywords: multidimensional scaling, particle-based stress minimization, interactive visualization

2 MDS with reduced dissimilarity matrix

MNIST. The source MNIST data set (http://yann.lecun.com/exdb/mnist/) contains M =

20NG. The twenty newsgroups data set (http://qwone.com/ jason/20Newsgroups/) is a col-

Data #vectors dimensionality coordinates metrics # cf cf cf

REUTERS 53 386 2000 tf.idf 1-cosinus 8 0.8856 0.6498 0.8335

20NG 18 759 2000 tf.idf 1-cosinus 20 0.6593 - 0.6162

Table 1: The list of data sets used for tests.

3.2 Experimental setup

structural-semantic representation of documents and their 3D embeddings give greater values of

You might also like