Manifold Datamining
Manifold Datamining
Abstract: To discover knowledge from simulation results of power systems, effective data mining techniques are
important for both academic and industry. In this paper, a method to visualize transient dynamics of power systems
based on manifold learning is proposed, which is aiming at clustering stable and unstable simulation cases concisely.
Regarding as a collection of time series data, results of one simulation case are approximately presented as a state
point in a visualization space. Different dimension reduction methods in the manifold learning family are applied to
enable such visualizations. Then, stable and unstable simulation cases are clustered according to their Euclid
distances in the visualization space by the Fuzzy C-means method Using the IEEE 39 bus system, the proposed
method is tested and validated. Comparisons with other dimension reduction methods are presented. It can be seen
that the Isomap manifold learning method helps to reveal stable and unstable case clusters effectively and accurately.
Key Words: Data Ming Manifold Learning Visualization
10119
In the following part, the dimension reduction method 2.2 Result Analysis of Transient Dynamic Simulations
manifold learning is reviewed. Based on the dimension
The power system can be described by a set of DAE
reduction method, this work visualizes the results in a two
(Differential Algebraic Equations). The network power flow
and three dimension space. And by using the clustering
equation is algebraic and the transient dynamics of the
method, the stable and unstable region are drawn.
component is represented by the differential equations. The
Considering there may exist case that the stable case is
equations description is as follows:
clustered into unstable cluster, two indexes to calculate the
correctness of the clustering results are brought out. The f ( x, y ) 0
indexes of correctness according to different dimension
reduction method can indicate the effectiveness. Simulation d (1)
results verify that the manifold learning method Isomap is dt x g ( x, y )
better than the traditional PCA method.
In the formulation, x is the state variables, y is the
2 Preliminaries
algebraic variables and f , g are the algebraic and
2.1 Basics of Manifold Learning differential equations of the power system.
The manifold learning is brought out to address to the The transient simulation is computing three set of DAE
dimensional reduction problem. Visualization is reshaping equations, before the fault, during the fault and after the fault.
the high dimensional vector into a two or three-dimension And the difference of the transient process is the difference
vector. The commonly used dimension reduction method is of the parameter of the three set of DAE since the fault is
PCA which tries to find a Euclidean coordinate in the lower reflected by the change of the equation. The transient
dimension to best reflect the feature before the reduction. It simulation results are the time series of the algebraic and
is a linear dimension reduction method for it can only find state variables. The difference of the simulation results is
the linear coordinate to get the projection of the point in the caused by the simulation initial condition such as the fault
high dimension onto the lower dimension. However, many location, type and during time. Therefore, simulation results
PXOWLYDULDWHGDWDLVQWJHQHUDWHGOLQHDUO\WKHOLQHDUUHGXFWLRQ in the high dimension can find the low dimension manifold.
method will destroy the structure of the data which may lead
to error when the analysis is based on the dimension-reduced 3 Visualization of Power System Dynamics
results. If the data in the high dimension space is sampled
from a low dimensional manifold, manifold learning can 3.1 Data Preparation for Dimension Reduction
recover the low dimensional manifold structure from the
The simulation results of a power system during a time
high dimensional sampling data. This is illustrated in the
window includes all the time series of every algebraic and
literature [14]-[15]. Take the Swiss roll for example, the
state variable on the network which can be represented by a
WUDGLWLRQDO3&$PHWKRGFDQWGHSLFWLWLQWKHtwo-dimension
space as the manifold learning method. And the latter part of matrix S . The column of the matrix S stand for the time
this work will show the difference between the nonlinear series of a variable such as the voltage amplitude or phase
dimension reduction method and the linear method and the angle and the arrow of the matrix stand for all the variable
correct rate of the clustering results can indicate the better state at one snapshot. Normally it will include the
performance of manifold learning when dealing with information of Generator, the network and the excitation
nonlinear dimension reduction problem. system. Besides the matrix presentation S , it can also be
There are many manifold learning methods are brought shaped into a vector S i with a high dimension. So, the
out such as the IsomapLLE and the Laplacian method. simulation result can be can be represented by the point in
And Isomap is the most classic one. And Isomap algorithm is the high dimension space. The power system dynamic
based on multidimensional scaling (MDS), and is mainly to process is determined by three main factors: fault location,
keep the geometric distance between the data points and fault type and fault duration. Different combination of the
intrinsic geometric properties. The complete isometric three effecting factors will lead to different simulation results.
feature mapping algorithm (Isomap) has three steps. Firstly, By changing the simulation fault condition, we can get the
determine neighbor points on the manifold M based on the simulation results of different cases and each case can be
represented by a vector of high dimension. Pile up those
distances d X (i, j ) which is the distance of each pair of
vectors together to form a matrix as figure 1and data
d M (i, j ) between
points. Secondly, the geodesic distances preparation for the dimension reduction is finished. The next
part is about visualize the results on the two dimension
each pair of points on the manifold M is estimated by
figure and the three dimension figure.
shortest path algorithm. At last, construct an embedding of
the data in a d -dimensional Euclidean space by applying
classical MDS to the matrix of graph distance
DG ^dG (i, j )` .
More detailed calculation method or information about
the manifold learning can be found in literature [14]-[15].
10120
W
W
*, 1, (, GI Generator Information
NI Network Information when a case is stable, it can be clustered into the unstable
Historical Transformig si EI Excitation Information
cluster. Therefore according to the correct rate of clustering
s simulation 1 Matrix to
Vector
WL
results collection W *, *,L 1, 1,L (, (,L can assess the effect of the dimension reduction. And the
WL
Pile Up the correct rate computing method is as follows:
WQ 2 Simulation Results
of Different Cases
Numc cos
6 *, *,L 1, 1,L (, (,L Indexcros (2)
6 *, *,L 1, 1,L (, (,L
Numstable
Numccou
Indexcrou (3)
6Q *, *,L 1, 1,L (, (,L Numunstable
Figure 1 Data Preparation of Dimension Reduction
In the above formula, the Indexcros and Indexcrou are
3.2 Visualization Space and State Point Clustering
the correct rate of stable cluster case and the unstable cluster
We can make use of the historical simulation data to form case. The Numccos and Numccou are the number of the
a data set including S1 , S2 ,...S n and the points stand for the correct cluster of stable case and the unstable case. The
simulation results may filled the high dimension space. And Numstable and Numunstable are the number of the stable
according to the stable or not stable state, the space may be
case and the unstable case.
clustered into several areas. +RZHYHU LWV inconvenient to
compute the distance of each point in the high dimensional And according to the two indexes, the effect of the
space. The key part to solve this process is dimension dimension reduction method can be assessed. To verify the
reduction. And if reduce the dimension of the data into two usefulness of the algorithm, the simulation is performed on
or three and plot those points in the figure, the dynamic the 10 machine 39 bus system.
process of the power system is visualized. According to the
visualized results, we can cluster the points in the low 4 Simulation
dimension to do some research on the character of those
results. And the use of the dimension reduction method will 4.1 Case Study 1
play a significant role on the clustering results.
A comparison of three simulation is made. N-1 test
Transforming the simulation is performed with 5 different fault duration and 3
Constructing the Simulation Results different fault types by using the PSAT. And the fault is
Historical into A Vector and added to the system at 1.02s and the 5 fault clear time is from
Simulation Set Constructing the 1.03s to 1.07s which is split evenly. And all the dynamic data
Results Set including the generator state variable, network variable and
excitation system variables from 0s-1.7s simulated at time
step of 0.01 is collected to describe the system dynamics.
And for the dimension reduction process, the dimension
reduction uses the toolbox developed by the engineer from
the Facebook AI Research [16]. According to the process
described above, transform the simulation results into a
Dimension
Visualize and Reduction of the
vector and pile up the historical simulation results to form a
Cluster the Results Dynamic Process new matrix and use the dimension reduction toolbox to
Vector perform the dimension reduction. According to the
dimension reduction results, the stable region and unstable
region is clustered by using the Fuzzy C-means method. And
Figure2. The visualization of power system dynamics
the dimension reduction results by PCA and Isomap are also
According to the visualization results, many interesting compared in the following figures. In those figures, the
researches can be carried out such as clustering the points points in blue color are stable cases and those in red colors
into several clusters and analyzing the characters of each are unstable cases.
group or paying special attention to the points on the
boundary of two different clusters. So get a reliable cluster
results is very important.
10121
Figure3. PCA method for 2-D visualization Figure6. Isomap method for 3-D visualization
10122
If using the WAMS data to replace the simulation data, it [8] Fahd Hashiesh, Hossam E. Mostafa, Abdel-Rahman Khatib,
can provide a way to analyze the stability of the real power ,EUDKLP+HODO0RKDPHG00DQVRXU$Q,QWHOOLJHQW:LGH
system. Therefore, this paper gives out a way to visualize the Area Synchrophasor Based System for Predicting and
Mitigating Based System for Predicting and Mitigating
power system dynamic process.
7UDQVLHQW ,QVWDELOLWLHV ,((( 7UDQVDFWLRQV on Smart Grid,
vol 3, pp.645-652, June 2012.
References [9] 1QRFHQW.DPZD656DPDQWDUD\*H]D-RRV&RPSOLDQFH
Analysis of PMU Algorithms and Devices for Wide-Area
[1] China Computer Federation Big Data Experts Committee, 6WDELOL]LQJ&RQWURORI/DUJH3RZHU,(((7UDQVDFWLRQVRQ
"China Big Data Technology and Industry Development Power System, vol 28, pp.1766-1778, May 2013.
White Paper (2013)," China Computer Federation, Beijing, [10] Yuri V. Makarov, Pengwei Du, Shuai Lu , Tony B. Nguyen,
China, 2013. Xinxin Guo, J. W. Burns, Jim F. Gronquist, M. A. Pai.
[2] Chinese Society for Electrical Engineering Information 308-Based Wide-Area Security Assessment: Concept,
Committee," Chinese Electric Power Big Data Development 0HWKRGDQG,PSOHPHQWDWLRQ,(((7UDQVDFWLRQVRQ6PDUW
White Paper (2013).Chinese Society for Electrical Grid, vol 3, pp.1325-1332, September 2012.
Engineering, Beijing, China, 2013. [11] Jie Yan, Chen-&KLQJ /LX 8PHVK 9DLG\D 308-Based
[3] Song Yaqi, Zhou Guoliang, Zhu Yongli. Present status and 0RQLWRULQJ RI 5RWRU $QJOH '\QDPLFV ,((( 7UDQVactions
challenges of big data processing in smart grid[J]Power on Power System, vol 26, pp.2125-2133 Novenber 2011.
System Technology, 2013, 37(4): 927-935(in Chinese) [12] Anamitra Pal, Gerardo A. Sanchez-Ayala, Virgilio A.
[4] K. Nishiya, J. Hasegawa. Dynamic State Estimation &HQWHQR -DPHV 6 7KRUS $ 308 3ODFHPHQW 6FKHPH
Including Anomaly Detection and Identification for Power Ensuring Real-Time Monitoring of Critical Buses of the
Systems. IEE Proceedings C - Generation, Transmission and 1HWZRUN,(((7UDQVDFWLRQVRQ3RZHU'elivery. vol. 29, pp.
Distribution, vol. 129, no. 5, pp. 192-198, September 1982. 510-517, April 2014.
[5] J. C. S. Souza, A. M. Leite da Silva and A. P. Alves da Silva. [13] 71DJDR .7DQDND .7DNHQDND 'HYHORSPHQW RI 6WDWLF
Data Visualization and Identification of Anomalies in Power and Simulation Programs for Voltage Stability Studies of
System State Estimation Using Artificial Neural Networks. %XON3RZHU6\VWHP,(((7UDQVDFWLRQVRQ3RZHU6\VWHPV
IEE Proceedings - Generation, Transmission and Distribution, Vol. 12, pp.273-281, February 1997.
vol. 144, no. 5, pp. 445-455, Sep 1997. [14] Joshua B. 7HQHQEDXP9LQGH6LOYD-RKQ&/DQJIRUG$
[6] Thorp J S, Phadke A G, Karimi K J. Real time voltage-phasor Global Geometric Framework for Nonlinear Dimensionality
measurement for static state estimation. IEEE Trans on Power 5HGXFWLRQ 6&,(1&( YRO SS-2323, December
Apparatus and Systems, 1985, PAS-104(11): 3098-3106. 2000.
[7] Dasgupta, S., Paramasivam M. Vaidya, U. Ajjarapu, V. [15] 6DP 7 5RZHLV DQG /DZUHQFH . 6DXO 1RQOLQHDU
Real-Time Monitoring of Short-Term Dimensionality Reduction by Locally Linear EmbedGLQJ
Voltage Stability Using PMU Data IEEE Transactions on SCIENCE vol 290, pp.2323-2326, December 2000.
Power Systems, vol 28, pp.3702-3711, November 2013. [16] Laurens van der Maaten. Drtoolbox.
https://lvdmaaten.github.io/drtoolbox/
10123