Very Fast Interactive Visualization of Large Sets of - 2015 - Procedia Computer
Very Fast Interactive Visualization of Large Sets of - 2015 - Procedia Computer
Abstract
The embedding of high-dimensional data into 2D/3D space is the most popular way of data
visualization. Despite recent advances in developing of very accurate dimensionality reduction
algorithms, such as BH-SNE, Q-SNE and LoCH, their relatively high computational complex-
ity still remains the obstacle for interactive visualization of truly large datasets consisting of
M ∼ 106+ of high-dimensional N ∼ 103+ feature vectors. We show that a new clone of the
multidimensional scaling (MDS) – nr-MDS – can be up to two orders of magnitude faster than
the modern dimensionality reduction algorithms. We postulate its linear O(M ) computational
and memory complexities. Simultaneously, our method preserves in 2D/3D target spaces high
separability of data, similar to that obtained by the state-of-the-art dimensionality reduction
algorithms. We present the effects of nr-MDS application in visualization of data repositories
such as 20 Newsgroups (M = 1.8 · 104 ), MNIST (M = 7 · 104 ) and REUTERS (M = 2.67 · 105 ).
1 Introduction
Visual exploration of high-dimensional data is an essential component of the big data analytics
[10]. Such the data can be represented as a set N Y = {yi = (yi1 , ..., yiN )}i=1,...,M , of
M , N -dimensional feature vectors, with N ∼ 103+ . Data mapping B : Y → X to a visually
perceived space n X = {xi = (xi1 , ..., xin )}i=1,...,M , where n = dim X = 2(3), which
preserves significant topological properties of Y in X, is a truly algorithmic challenge.
A plethora of dimensionality reduction techniques have been developed over the last decade
(see e.g. [19, 2, 5]). However, most of them are not capable of retaining both the local and
global properties of data in a single map. One of the most popular embedding technique –
multidimensional scaling (MDS) – computes a low-dimensional map of points with interpoint
Euclidean distances d = {rij }M ×M that approximate input distance matrix Δ = {Dij }M ×M ,
where Dij = D(yi , yj ) → 1 is a dissimilarity measure. The type of properties of Y (local
or global), which will be preserved in X, depends on the error function V : n×M → 1
572 Selection and peer-review under responsibility of the Scientific Programme Committee of ICCS 2015
c The Authors. Published by Elsevier B.V.
doi:10.1016/j.procs.2015.05.325
Very Fast Interactive Visualization of Large Sets of High-dimensional Data Dzwinel, Wciso
(called “the stress”). The stress V (Δ − d) is the function of the norm of difference between
distance matrices from the source Y and the target X spaces. In the classical MDS algorithms,
a configuration of points X = {xi }i=1...M , which minimizes the stress:
k
k m
E(X) = min V (Δ − d) = min wij Dij − rij (1)
x x
i<j
represents the result of Y embedding in 2D/3D target space. The values of k, m are the
parameters of V (·). For low-dimensional data (N ∼ 10), MDS mapping is able to properly
approximate the structure of the original data in 2D/3D spaces. However, the result of visual-
ization of high-dimensional data with N ∼ 102+ is highly unreliable.
For N ∼ 102+ , due to the “curse of dimensionality” principle, MDS produces the low-
dimensional map with a characteristic spherical shape, and a low density in the sphere center
(e.g., see Figure 2b). Thus the structure of the source space becomes strongly distorted and
the clusters separability can be lost. Second, both the memory and computational complexities
of the classic MDS algorithm are at least O(M 2 ) , i.e., prohibitively high for visualization of
datasets with M ∼ 105+ . Moreover, the minimization of criterion (1) requires sophisticated and
slow optimization methods to find the global minimum of V (·). The gradient based techniques
can get stuck in a local minimum.
The first problem can be solved by using recently developed dimensionality reduction algo-
rithms that preserve small pairwise distances via stochastic neighbor embedding, such as t-SNE
and its clones [18]. From methodological point of view, these algorithms consist in replacing
V (Δ − d) in the criterion (1) with a more general one V (Δ, d). In t-SNE and its clones the
low-dimensional map is obtained as the result of minimization of V (Δ, d), which is equal to the
Kullback-Leibler divergence between the two probability distributions computed in the source
and the target spaces with respect to the locations of the points in the latter one. Despite im-
pressive success of t-SNE, its memory and computational complexities remain the same as for
MDS. Larger data sets (M ∼ 104+ ) can be visualized by using approximated versions of t-SNE,
such as BH-SNE [17] and Q-SNE [7] or LoCh [4] algorithms. The computational √ complexities
of these techniques are O(M log M ) for BH-SNE and Q-SNE while O(M M ) for LoCH.
Though these algorithms reconstruct very precisely the local properties of the original data
in the low-dimensional map - such as the nearest neighbors lists and distances distribution
– they are still too slow to be used in interactive visualization of big data. For example, the
fastest BH-SNE algorithm requires as many as 12 minutes (serial code running on Intel® Core™
i5-4258U, 2.6GHz) to visualize 7 · 104 samples of MNIST data set [17]. Up to now, there are
not any published parallel implementations of BH-SNE, Q-SNE and LoCH to estimate how
the algorithms will perform on GPU or MIC architectures. Moreover, although the memory
complexity was decreased in BH-SNE to O(M ), the number of distances, which has to be kept
in the operational memory, remains too large. It is proportional to the squared average number
of points in Barnes–Hut cells.
Moreover, the gradient descent algorithm used for minimization of the error function is too
stiff. It does not allow for interactive steering during visualization, e.g., changing manually the
error function type and its parameters to find the best embedding. Meanwhile, interactive data
mining enables to match optimal machine learning tools to data being explored and allows for
deeper penetration of their parameter spaces.
We describe a new clone of MDS – nr-MDS – which outperforms the state-of-the-art dimen-
sionality reduction techniques. We postulate its O(M ) memory and computational complexities.
Thus it is not only orders of magnitude faster than state-of-the-art dimensionality reduction
techniques but, unlike classical MDS, properly reconstructs separability of high-dimensional
573
Very Fast Interactive Visualization of Large Sets of High-dimensional Data Dzwinel, Wciso
data sets in 2D/3D spaces. In the following section we present our idea, its methodological
background and, consequently, the results of experiments. Finally, we present the conclusions
and discuss directions for future work.
X0 = set_random_configuration()
do (for timestep n = 0, 1, 2, ...k ; forkevery particle i = 1...M )
Fn,i := − j∈Di wij ∇(Dij − rn,ij )m // 1. compute forces acting on each particle i
// from particles j, where rn,ij := xn,i − xn,j
pn+1,i := pn,i + 21 Fn,i · Δt // 2. calculate momenta pi for each particle
if |pn+1,i | > (m = 1) · vmax then
pn+1,i := vmax · pn+1,i /|pn+1,i | // 3. avoid numerical blow up
xn+1,i := xn,i + pn+1,i · Δt/(m = 1) // 4. move particles, create new configuration Xn+1
Fn+1,i := calculate_forces() // 5. calculate new forces Fn+1,i
pn+1,i := ϕ · (pn+1,i + 12 Fn+1,i · Δt) // 6. use “damping” factor ϕ
until (|E(Xn+1 − E(xn )| < δ)
Figure 1: The pseudocode of the particle-based MDS. Di are the sets of random neighbors of
each data sample i.
For minimization of V (Δ − d) we used very fast and robust particle-based MDS method,
which pseudocode is shown in Figure 1. The same approach can be used for minimization of
any error function V (Δ, d), such as that employed in t-SNE. Shortly, the particle-based MDS
represents a data sample yi as a particle xi in 2D/3D target space. They interact with each
k k
other via semi-harmonic forces f ij (Xn ) = −grad[wij (Dij − rn,ij )m ]. Every two particles i
and j located in xn,i and xn,j in timestep n are separated by the Euclidean distance rn,ij =
xn,i − xn,j . The total force Fn,i acting on a particle i is equal to the sum of f ij (Xn ) forces
from particles belonging to its random neighbors rni . We simulate the Newtonian dynamics of
(initially) random configuration of particles X0 in discrete timesteps n. We assume additionally
that particles Xn evolves in a dissipative environment represented by a damping factor ϕ (see
Figure 1). The particle system Xn converges to a stationary state X with a minimal potential
574
Very Fast Interactive Visualization of Large Sets of High-dimensional Data Dzwinel, Wciso
energy E(X) = V (Δ, d). Due to more extensive penetration of the stress function domain, the
particle-based method allows for obtaining a better minimum (i.e., closer to the global one)
that any other gradient-based techniques [1, 3]. Moreover, particle-based MDS is really fast.
By using the “leap-frog” numerical scheme, which is very resistant on fluctuations and very
stable, one can use large timesteps Δt without worrying about unphysical effects and precision
of Xn time evolution. In the beginning of simulation the particle shifts can be very large. Thus,
initially, the method works like the simulated annealing algorithm with high temperature. The
numerical “heating” can be controlled by using large or variable damping factor ϕ (from [0, 1))
and small value of vmax . At the end, close to the attractor point, the algorithm behaves
like the gradient descent method. The particle-base heuristics is fast and self-adaptive. It
requires only two parameters to be set (Δt and ϕ). Furthermore, it fits perfectly to interactive
visualization purposes. The time evolution of the particle system X can be visualized and
controlled interactively by changing not only the parameters but the stress function type as
well.
However, as shown in Figure 2b, MDS yields very poor embeddings for high-dimensional and
multi-class data [17]. As shown in Figure 2a, even PCA can outperform MDS in visualization of
10 class 784-dimensional MNIST data set. This is the consequence of “curse of dimensionality”
principle, i. e. in high-dimensional spaces all the distances between data samples are practically
the same. Moreover, the high-dimensional data usually create a low-dimensional manifold Σ
embedded in the feature space Y, such that dim Σ dim Y. Because the manifold can be
very complicated and folded, the distances between samples in Y can be completely different
than those on the manifold Σ. Therefore, in the high-dimensional spaces, only small pairwise
distances between neighboring data vectors are reliable. The isomap-based MDS assumes that
pair-wise distances between the nearest neighboring points are only known. To approximate all
other distances on the manifold Σ the Floyd-Warshall algorithm can be employed [16].
Figure 2: The results of mapping into 2D space of MNIST data set by using a) PCA (M =
7 · 104 ), b) MDS (M = 2.5 · 103 , k = 1, m = 2), c) nr-MDS (M = 2.5 · 103 , k = 1, m = 2).
The main idea of nr-MDS is different than in [16]. We are trying to “unfold” the manifold
and fit it into the hyperplane of minimal dimensionality by modifying the distances Δ in
the original multi-dimensional space Y, prior to its embedding in X space. Simultaneously, to
speed up the calculations, we select the smallest possible set of distances D = {Dij }, nd = #D,
sufficient to obtain a proper data embedding. To this end, we construct the neighbor graph
G(V, E) where V is the set of nodes representing data samples yi ∈ Y, and E is the set of edges
corresponding to the selected distances D. To define E we compute the sets Knni and Krni
of edges connecting each point yi ∈ Y to their nn nearest neighbors and rn random neighbors,
respectively. The nearest neighbor of particle yi cannot besimultaneously its random neighbor.
Then, the set of edges E = Knn ∪ Krn, where Knn = i=M Knni and Krn = i=M Krni .
575
Very Fast Interactive Visualization of Large Sets of High-dimensional Data Dzwinel, Wciso
It means that a node i ∈ V has at least nt = nn + rn outgoing edges to its neighbors (both
the nearest and random ones) and, possibly, some additional edges connecting it to other data
samples j ∈ V, which do not belong to Ei = Knni ∪ Krni while i belongs to their nearest or
random neighbors. Let us assume that all distances between the nodes of G(V, K) connected
by the edges from Knn are set to 0. The number of zeroed distances nnd = #Knn > nn, while
the number of random non-zero distances rnd = #Krn > rn. The total number of distances
nd = #E = #D = nnd + rnd should be nd ≥ Lmin .
Setting the distances of the nearest data samples to 0 causes the manifold “shrinkage” while
non-zero random distances produce the “surface tension”, which is trying to unfold it. The
quality of the final result depends mainly on selecting the proper balance between the nearest
nn and random neighbors rn set for each data sample. This balance depends on the stress
function type and the structure of data visualized. The influence of long random distances
inscribing the global structure of data visualized can be additionally controlled by employing
different weights wnn and wrn for forces acting between the nearest neighbors and random
neighbors respectively (see step 1 in the pseudocode from Figure 1). Hence, in nr-MDS, the
forces acting on each particle i are computed as follows:
k m k m
k
Fn,i := − wrn · ∇ Dij − rn,ij − wnn · ∇ rn,ij (2)
j∈Krni j∈Knni
In all the tests we have assumed that wnn and wrn are constant and wnn = 1. As shown in
Figure 2c and Figure 3a,b, we obtained very good cluster separation similar to [18, 9], better
than in [6] and much better than in [20]. In the following section we present the preliminary
results of visualization of much larger data sets.
3 Experiments
3.1 Data sets
We performed experiments using three very popular data sets: MNIST, REUTERS and the
20-NG. We briefly describe each of them below and in Table 1.
REUTERS. This text corpus is known as “Reuters Corpus, Volume1” or RCV1. We used
a subset of this repository (http://www.jmlr.org/papers/v5/lewis04a.html) containing
M = 2.67 · 105 documents and 8 topics. We used two types of data. (1) Data set of
2000-dimensional BoW (bag of words) vectors of the most frequently used words without
“stop words” and preprocessed by using Porter2 stemming. The vectors are normalized
to 1 and, finally, preprocessed by using tf.idf ranking. (2) Data set of 30-dimensional
vectors preprocessed by PCA projection of 6000-dimensional structural – semantic vector
representation. The data set is highly imbalanced. For M = 2.67 · 105 the topics and
their sizes as follows: GDIS (8081), C12 (11 944), ECAT (89 131), M131 (25 719), C151
(89 683), E212 (27 257), G154 (1342), M143 (21 774).
576
Very Fast Interactive Visualization of Large Sets of High-dimensional Data Dzwinel, Wciso
For all of these data sets we calculate the values of classification factor cf. It is defined as
the ratio of the number of data samples for which the closest neighbor belongs to the same
class, to the total number of samples M . We compute the cf values for the source data set and
its PCA and nr-MDS embeddings in 3D space. Of course, the cf value cannot be a measure of
the quality of embedding, but is one of important indicators of data separation [11].
577
Very Fast Interactive Visualization of Large Sets of High-dimensional Data Dzwinel, Wciso
The code was run on a notebook computer with an Intel® Core™ i5 3317U CPU with 16GB
of memory running at 1.7GHz (i.e., 60% of the CPU clock used in [17]).
3.3 Results
The 2D mappings of larger MNIST data sets than that visualized in Figure 2c are shown in
Figure 3. They were performed for the original data set with dimensionality N = 784 and the
stress function with k = 1 and m = 2 (1). As shown, in Figure 3a,b, the results of embeddings
for large M have improved significantly. The cluster overlaps are much smaller for the full
dataset with M = 7 · 104 samples, what is reflected in increasing cf value (see Table 1). It
grows from 0.68 for M = 2.5 · 103 to 0.88 for the full MNIST data set. The cluster separability
increases considerably for the stress function with k = 1 and m = 1. However, as shown in
Figure 3c, the clusters shrink too much, and their local structure becomes invisible. As shown
in Table 1, the PCA preprocessing of raw data improves distinctly (2-3%) the values of cf and
data separability.
Figure 3: The results of mapping into 2D space of MNIST by using nr-MDS for the original
data set (N = 784) a) M = 104 , k = 1, m = 2, w = 0, b) M = 7 · 104 , k = 1, m = 2, w = 0, c)
M = 7 · 104 , k = 1, m = 1, w = 0.
For the full MNIST data set the number of distances for nn = 2 i nr = 1 is equal to
L = 1.8 · 105 , i.e., very close to the minimal configuration Lmin = 2M = 1.4 · 105 . However,
comparing L to Lmax = 2.4 · 109 , the improvement in memory and computational complexity
in comparison to the classical MDS is overwhelming. Moreover, because all the distances to
the nearest neighbors are set to 0, only distances to random neighbors should be stored. The
same “minimal” setup was used for M from 2.5 · 103 to 7 · 104 MNIST sets. This suggests that
the computational complexity can be O(M ). The measured timings obtained for visualization
of MNIST data sets in 2D are as follows: t = 1 sec. for M = 2.5 · 103 (Figure 2c); t = 3.5
sec. for M = 104 (Figure 3a); t = 20 sec. for M = 7 · 104 (Figure 3b,c)). As reported in
[17], the BH-SNE needs 750 sec. to obtain similar separation of clusters. By normalizing the
computational power of CPUs used for visualization in both cases, nr-MDS is more than 60
times faster than BH-SNE on MNIST dataset. Moreover, in this particular case (2D mapping),
we need to store only one random distance per data sample, one index of random neighbor and
2 indices of its nearest neighbors. So, the memory complexity is also very low.
However, MNIST is the data set of rather well separated-classes. Much more difficult are
text repositories such as REUTERS and 20NG. The documents and news are represented as
tf.idf vectors based on the most frequent words. Such the data representation is often very noisy
and, in some situations, highly unreliable. Moreover, REUTERS is a very unbalanced data set.
Nevertheless, as shown in Figure 3 and Table 1, our algorithm was able to obtain 2D/3D maps
of well separated clusters. It is also clearly seen that 30D data sets of PCA projections of 6000D
578
Very Fast Interactive Visualization of Large Sets of High-dimensional Data Dzwinel, Wciso
Figure 4: The results of mapping into 2D space of REUTERS data set by using nr-MDS. a) data
set with M = 5.3 · 104 , b) M = 2.67 · 105 samples were used, however, the largest two clusters
C151 (8.9 · 104 ) and ECAT (8.9 · 104 ) were removed for better visualization of the smaller topics
(M = 9.6 · 104 ).
Figure 5: The results of mapping into 2D space of 20NG data set by using nr-MDS. a) The
clusters of all 20 topics. b) The scatter-plot showing 14 the most visible clusters.
579
Very Fast Interactive Visualization of Large Sets of High-dimensional Data Dzwinel, Wciso
Figure 5b, as many as 14 clusters can be clearly visible. The result of clusters separability is
inferior to that obtained by van der Maaten (see http://lvdmaaten.github.io/tsne/), which uses
Simon’s FDA generated features. However, it is much better than others, which used BoW text
representation (see e.g. [15]) where only six the best separated clusters are visualized and there
is no room too fit 14 more).
4 Conclusions
In this paper we present the effects of adaptation of the particle-based MDS to interactive visu-
alization of large high-dimensional data sets. First, to decrease the computational complexity of
multidimensional scaling we compute the smallest possible set of distances between the original
high-dimensional data Y, which are sufficient to obtain a proper data embedding in X. We
show that for each data vector in Y only indices of a small number of the nearest neighbors nn
and distances to the random neighbors rn have to be computed, such that nn+nr ≥ n. Second,
the main idea of nr-MDS consists in modification of these distances prior to data embedding
into 2D/3D space setting all distances between a data sample and its nearest neighbors nn to
zero. Thus, only the distances to random neighbors nr are stored. All of these improvements
make the algorithm very fast. Much faster than the top dimensionality reduction techniques
due to its lower O(M ) computational and memory complexities. The efficiency of nr-MDS can
be increased considerably by employing modern multi-thread computer architectures such as
GPU and MIC [2, 8, 12, 13, 14]. Furthermore, the results of embeddings obtained by nr-MDS
are amazingly good, reconstructing in 2D/3D properly separated classes from high-dimensional
data sets. In general, this concept can be applied for various type of data reduction techniques,
because it does not depend on the particular technique used.
Of course, our algorithm has also some important drawbacks. It yields poor results in re-
constructing the lists of the nearest neighbors (NN). The “crowding problem”, similar as in SNE
[18], causes that particles gather in the center of clusters (see Figure 3, especially Figure 3c).
Furthermore, the method is not fully understood, and more research on its theoretical basis
is required. Also more experiments on various data sets and direct comparisons to other data
reduction techniques should be carried out.
However, the major goal in interactive data visualization is very fast generation of 2D/3D
maps, which preserve in X the clusters separation from Y. The nr-MDS fits perfectly for this
purpose. The accurate reconstruction of the nearest neighbors lists for each particle i in X is
rather a secondary requirement. First, the NN lists were computed from the original data and
are known before mapping, so they can be recalled in any moment of interactive visualization
and presented visually, e.g., as the spines pointing from a given sample to its nn nearest neigh-
bors. Second, the lists of NN are not reliable from the context of both measurement errors and
the “curse of dimensionality” principle. Thus the knowledge they provide is very volatile. Mean-
while, just the clusters in various data resolutions represent “primordial” granules of knowledge.
When the accurate reconstruction of the nearest neighbors in 2D/3D will be required, the nr-
MDS can be employed first to generate an initial configuration for more advanced techniques
such as BH-SNE (Q-SNE). Summing up, we expect that the method presented in this paper
will be the inspiration of developing more efficient and reliable interactive visualization tools
for exploration of really big data sets.
Acknowledgments. This research is supported by the Polish National Center of Science
(NCN) DEC-2013/09/B/ST6/01549. We thank dr Marcin Kurdziel and dr Wojciech Czech
from AGH Department of Computer Science for their contribution to this paper.
580
Very Fast Interactive Visualization of Large Sets of High-dimensional Data Dzwinel, Wciso
References
[1] M. Andrecut. Molecular dynamics multidimensional scaling. Physics Letters A, 373(23-24):2001–
2006, 2009.
[2] Jan De Leeuw and Patrick Mair. Multidimensional scaling using majorization: Smacof in r.
Department of Statistics, UCLA, 2011.
[3] Witold Dzwinel and Jan Błasiak. Method of particles in visual clustering of multi-dimensional
and large data sets. Future Generation Computer Systems, 15(3):365–379, 1999.
[4] Samuel G. Fadel, Francisco M. Fatore, Felipe S.L.G. Duarte, and Fernando V. Paulovich. Loch:
A neighborhood-based multidimensional projection technique for high-dimensional sparse spaces.
Neurocomputing, 150, Part B(0):546–556, 2015.
[5] S.L. France and J.D. Carroll. Two-way multidimensional scaling: A review. Systems, Man, and
Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 41(5):644–661, Sept 2011.
[6] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks.
Science, 313(5786):504–507, 2006.
[7] Stephen Ingram and Tamara Munzner. Dimensionality reduction for documents with nearest
neighbor queries. Neurocomputing, 150, Part B(0):557–569, 2015.
[8] Stephen Ingram, Tamara Munzner, and Marc Olano. Glimmer: Multilevel mds on the gpu.
Visualization and Computer Graphics, IEEE Transactions on, 15(2):249–261, March 2009.
[9] Tomoharu Iwata, Neil Houlsby, and Zoubin Ghahramani. Active learning for interactive visualiza-
tion. Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics,
pages 342–350, 2013.
[10] Daniel Keim, Huamin Qu, and Kwan-Liu Ma. Big-data visualization. Computer Graphics and
Applications, IEEE, 33(4):20–21, July 2013.
[11] Robson Motta, Rosane Minghim, Alneu de Andrade Lopes, and Maria Cristina F. Oliveira. Graph-
based measures to assist user assessment of multidimensional projections. Neurocomputing, 150,
Part B(0):583–598, 2015.
[12] Piotr Pawliczek and Witold Dzwinel. Interactive data mining by using multidimensional scaling.
Procedia Computer Science, 18(0):40–49, 2013. 2013 International Conference on Computational
Science.
[13] Piotr Pawliczek, Witold Dzwinel, and David A. Yuen. Visual exploration of data by using mul-
tidimensional scaling on multicore cpu, gpu, and mpi cluster. Concurrency and Computation:
Practice and Experience, 26(3):662–682, 2014.
[14] Piotr Pawliczek, Witold Dzwinel, and David A. Yuen. Interactive data exploration with multi-
thread mic computer architecture. International Conference on Artificial Intelligence and Soft
Computing. ICAISC-2015, June 14-18, Zakopane, Poland, 2015 (submitted for publication).
[15] Ruslan Salakhutdinov and Geoffrey Hinton. Semantic hashing. International Journal of Approxi-
mate Reasoning, 50(7):969–978, 2009.
[16] Joshua B. Tenenbaum, Vin de Silva, and John C. Langford. A global geometric framework for
nonlinear dimensionality reduction. Science, 290(5500):2319–2323, 2000.
[17] Laurens Van Der Maaten. Accelerating t-sne using tree-based algorithms. J. Mach. Learn. Res.,
15(1):3221–3245, January 2014.
[18] Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of Machine
Learning Research, 9(2579-2605):85, 2008.
[19] Laurens JP van der Maaten, Eric O Postma, and H Jaap van den Herik. Dimensionality reduction:
A comparative review. Journal of Machine Learning Research, 10(1-41):66–71, 2009.
[20] Zhao Zhang, Tommy W.S. Chow, and Mingbo Zhao. Trace ratio optimization-based semi-
supervised nonlinear dimensionality reduction for marginal manifold visualization. Knowledge
and Data Engineering, IEEE Transactions on, 25(5):1148–1161, May 2013.
581