fault-2

This paper introduces a novel method for detecting and classifying faults in power transmission lines using a convolutional sparse autoencoder (CSAE) that automatically learns features from voltage and current signals. The method employs unsupervised feature learning to enhance generalizability and reduce the time-consuming process of feature design, making it practical for online transmission line protection. Testing results demonstrate the method's speed, accuracy, and robustness against noise and measurement errors.

Uploaded by

Neha Mahendran

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

fault-2

Uploaded by

Neha Mahendran

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

1748 IEEE TRANSACTIONS ON SMART GRID, VOL. 9, NO.

3, MAY 2018

Detection and Classification of Transmission Line

Faults Based on Unsupervised Feature Learning and
Convolutional Sparse Autoencoder
Kunjin Chen, Jun Hu, and Jinliang He, Fellow, IEEE

Abstract—We present in this paper a novel method for fault these tasks in a more consistent and effective manner [1].
detection and classification in power transmission lines based Transforming signals from time domain to frequency domain
on convolutional sparse autoencoder. Contrary to conventional is frequently used for feature extraction. Discrete Fourier
methods, the proposed method automatically learns features
from a dataset of voltage and current signals, on the basis transform (DFT), a widely adopted tool for signal analysis,
of which a framework for fault detection and classification is is used in the forms of full cycle discrete Fourier trans-
created. Convolutional feature mapping and mean pooling are form (FCDFT) [2], [3] and half cycle discrete Fourier trans-
implemented in order to generate feature vectors with local form (HCDFT) [4], [5]. Discrete wavelet transform (DWT) is
translation-invariance for half-cycle multi-channel signal seg- used by researchers to obtain information in certain frequency
ments. Fault detection and classification are achieved by a
softmax classifier using the feature vectors. Further, the pro- ranges [6]–[8]. The DWT coefficients of different frequency
posed method is tested under different sampling frequencies ranges (decomposition levels) are usually used to generate
and signal types. The generalizability of the proposed method features. S-transform (ST) is used as it reveals local spectral
is also verified by adding noise and measurement errors to characteristics [9], [10]. A number of features can be extracted
the data. Results show that the proposed method is fast and out of the S-matrix produced by ST. In addition to frequency
accurate in detecting and classifying faults, and is practical for
online transmission line protection for its high robustness and domain-based methods, researchers also adopt modal trans-
generalizability. formations such as Clarke transformation to extract useful
features [11], [12]. Further, dimensionality reduction meth-
Index Terms—Convolutional sparse autoencoder (CSAE), fault
detection, fault classification, transmission lines, unsupervised ods such as principal component analysis (PCA) can be used
learning. to produce more suitable inputs for certain fault classifica-
tion methods [13], [14]. Though the above-mentioned feature
extraction techniques have been applied to different types
I. I NTRODUCTION of transmission line systems with different configurations,
much prior knowledge of the specific system configuration is
AULT detection and classification are two important
F aspects of power transmission line protection. Over the
years, researchers have been seeking to realize fast and accu-
required and the process of determining the implementation
details oftentimes needs repeated modification and adjust-
ment. Thus, implementing these techniques can be quite
rate detection and classification of faults in transmission lines
time-consuming and lacks generalizability. Take DWT as an
using various methods, so that the faulted system can be pro-
example, researchers first need to determine which mother
tected from possible destructive effects caused by the fault.
wavelet and which decomposition levels to use before they
Further, the information provided by fault detection and clas-
can extract the features. As there are a large number of mother
sification can greatly facilitate the location of fault, thus
wavelets (e.g., Coiflets, Meyer, Daubechies and Symlets) to
reducing the fault clearing time.
choose from and the number of decomposition levels is
The extraction of features from the voltage and current sig-
affected by the sampling frequency, it is hard to tell which
nals, which is implemented purposefully, helps researchers
combination of the options shall be chosen, not to mention
better understand the nature and characteristics of the fault
the fact that some mother wavelet families have a series of
detection and classification tasks, and they can thus fulfill
wavelets and different features can be extracted out of the
Manuscript received March 14, 2016; revised June 21, 2016; accepted coefficients of different decomposition levels (e.g., energy and
August 7, 2016. Date of publication August 10, 2016; date of current ver- maximum of coefficients at one decomposition level) [1].
sion April 19, 2018. This work was supported in part by the State Key
Development Program (973 Program) for Basic Research of China under On the basis of feature extraction, the task of fault detec-
Grant 2013CB228206, and in part by the National Natural Science Foundation tion can be fulfilled by setting thresholds for the extracted
of China under Grant 51429701. Paper no. TSG-00329-2016. (Corresponding features. Similar to fault detection, the task of fault clas-
author: Jinliang He.)
The authors are with the State Key Laboratory of Power Systems, sification can also be done by setting a series of if-then
Department of Electrical Engineering, Tsinghua University, Beijing 100084, conditions with preset thresholds [15], [16]. Other methods
China (e-mail: [email protected]). mainly adopt artificial intelligence-based models. Artificial
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. neural networks (ANNs) including feedforward neural net-
Digital Object Identifier 10.1109/TSG.2016.2598881 work (FNN), radial basis function network (RBFN) and
1949-3053 c 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on September 16,2024 at 12:49:49 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: DETECTION AND CLASSIFICATION OF TRANSMISSION LINE FAULTS 1749

probabilistic neural network (PNN) are extensively used

to classify different types of faults [7], [13], [17], [18].
Support vector machines (SVMs) are used for their structural
risk-minimizing nature [19], [20]. Fuzzy logic based meth-
ods such as fuzzy-neuro approaches and adaptive-network- Fig. 1. The studied system with sources at both ends.
based fuzzy inference systems (ANFISs) are also adopted
by some researchers [4], [21]. Other methods used for TABLE I
S YSTEM PARAMETERS U SED FOR THE S IMULATION IN
fault classification include decision trees (DTs) and random MATLAB/S IMULINK
forests (RFs) [22], [23]. Although the aforementioned clas-
sification models are well-developed and have been proven
effective, the combination of a certain feature extraction tech-
nique and a certain classification model is almost arbitrary,
thus taking researchers much time to evaluate the performance
of different classification models.
As pointed out previously, the above-mentioned methods
require hand-designed features specific for system configura-
tions and parameters. Though favorable performance may be
achieved, the process of feature design and feature selection
is often time-consuming and lacks generalizability. Thus, it is
desirable to implement some feature extraction methods that
does not require much prior knowledge, so that the method classifier as the last layer of the network is standard practice
can be generalized to different cases without making signif- for implementing CNN [25]). The training strategy using data
icant modifications. Recent progress in the development of from different time ranges and the computationally efficient
machine learning has been receiving an ever-growing atten- testing strategy with a filtering operation are also proposed.
tion in that the models and algorithms are increasingly able to The performance of the proposed method is tested with dif-
automatically extract features with multiple levels of abstrac- ferent sampling frequencies and signal types. Discussion of
tion from large amounts of data [24]. In the fields such as the proposed method in the presence of noise and measure-
computer vision, speech recognition, and natural language ment errors is then provided with some slight modifications
processing, researchers have been able to build end-to-end to the original implementation. Further, we compare the pro-
models with much better performances than traditional mod- posed method with existing methods. The application of the
els which require hand-designed features [25]–[27]. One of the proposed method in smart grids is also presented.
key elements in the implementation of the models is that unla-
beled data can be used by one-layer structures such as sparse
autoencoder (SAE) to help extract features and pre-train the
II. U NSUPERVISED L EARNING OF S PARSE
models used for classification tasks [28]. Further, the features
M ULTI -C HANNEL F EATURES F ROM VOLTAGE
extracted by SAE can be used by convolutional neural net-
AND C URRENT S IGNALS
works (CNNs) which far outperform classical methods (e.g.,
methods based on scale-invariant feature transform (SIFT) and A. System Studied and Data Acquisition
SVM) in the image classification task [25]. In summary, adopt- A simple three-phase power system is studied in this paper
ing the unsupervised feature learning process has a threefold as shown in Fig. 1. The length of the 220 kV transmis-
advantage: (i) the time-consuming process of feature design sion line is 200 km and the system frequency is 50 Hz.
can be replaced by automatic feature learning with increased The transmission line connects two sources and has positive
generalizability, (ii) large amounts of unlabeled data collected sequence impedance Z1 = 4.76 + j59.75 and zero sequence
by online monitoring devices can be fully utilized, and (iii) the impedance Z0 = 77.70 + j204.26 . The system is modeled
feature extracting approach is fully compatible with powerful in MATLAB/Simulink, with which the data used in this paper
machine learning models including CNN. is simulated. The three phase voltage and current signals are
This paper presents a novel method for fault detection and collected by the relay employed at source 1 at the sampling
classification inspired by the above-mentioned studies and con- frequency of 20 kHz.
cepts. The unsupervised feature learning by SAE is used to By varying the tunable system parameters, a dataset of volt-
automatically extract features from voltage and current sig- age and current signals is generated. The system parameters
nals, as introduced in [29]. Instead of utilizing the signals used for simulation are listed in Table I. Concretely, the fault
individually, we combine three phase voltage and current sig- distance is the distance between fault point and the relay,
nals as a multi-channel signal. A related work using CNN to and the pre-fault power angle is the phase difference between
analyze multi-channel sequence is proposed in [30]. Rather source 1 and source 2 when the fault occurs. As we try all
than directly using the CNN model, we propose a framework combination of the parameters, a total of 24948 data samples
based on convolutional sparse autoencoder (CSAE) [31] and are collected in the dataset. Moreover, as the sampling fre-
softmax classifier to fulfill the tasks of fault detection and quency is 20 kHz, we are able to test the effect of sampling
classification in power transmission lines (adding a softmax frequencies that are less than or equal to 20 kHz.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on September 16,2024 at 12:49:49 UTC from IEEE Xplore. Restrictions apply.
1750 IEEE TRANSACTIONS ON SMART GRID, VOL. 9, NO. 3, MAY 2018

B. Unsupervised Feature Learning by Sparse Autoencoder

In this paper, we use the SAE introduced in [32] to achieve
unsupervised feature learning. Concretely, a SAE has a visi-
ble layer, a hidden layer and a reconstruction layer, and the
training process ensures that the output vector corresponding
to the reconstruction layer restores the input vector as much as
possible for each unlabeled data sample x ∈ Rn in the training
dataset. Thus, when the training process is properly completed,
the hidden nodes within the hidden layer are expected to give
effective feature representations of the data in the training
dataset. Given an input vector x, the output vector h(x) of
an SAE is calculated as
h(x) = W 2 f (W 1 x + b1 ) + b2 (1)
where f (z) = (1+exp(z))
1
is the nonlinear sigmoid activation
function, W 1 is the nh × n weight matrix associating the vis-
ible layer and the hidden layer, b1 is the bias vector for the
visible layer, W 2 is the n × nh weight matrix associating the
hidden layer and the reconstruction layer, and b2 is the bias
vector for the hidden layer. As our goal is to minimize the
difference between x and h(x), a cost function capable of mea-
suring this difference for the entire training dataset is needed.
Concretely, the cost function J consists three terms, namely
the squared error term, the weight decay term and the sparsity
Fig. 2. Schematic diagram of learning multi-channel features of voltage and
penalty term [32]. For a training dataset with m data samples, current signals by SAE.
the detailed definition of the cost function is
1 2 λ
m 2

J= h x(i) − x(i) + W i 2F The schematic diagram of the unsupervised feature learning
2m 2
i=1 i=1 procedure is illustrated in Fig. 2. In this paper, the volt-

nh
age and current signals are displayed using grayscale images,
+β D ρ
ρj (2)
such that the correlations across the channels can be clearly
j=1
observed. First of all, we randomly cut out a large num-
where the first term measures the total squared error between ber of patches from the training dataset. A zero component
the input and output data and the second term is the weight analysis (ZCA) whitening transform is then applied to the
decay term used to limit the magnitude of the weights so that patches, the theoretical foundations of which can be found
the autoencoder is not prone to overfitting. The third term in [35] and [36]. Concretely, for a given d × m matrix
is the sparsity penalty term, in which β controls the weight X containing m d-dimensional data samples, we use U =
of this term and D(ρ ρj ) is the Kullback-Leibler divergence (XXT )−1/2 = PD−1/2 PT (XXT can always be represented as
between ρ j and ρ. More specifically, ρ j is the average acti- PDPT using some orthogonal matrix P and diagonal matrix
vation of hidden node j with regard to all input data in the D) to transform X to XZ :
training dataset, ρ is the sparsity parameter and D(ρ ρj ) is
calculated as [33]: XZ = UX (4)
ρ 1−ρ
D ρ ρj = ρ log + (1 − ρ) log (3) We then replace X with XZ , so that the dimensions are uncor-
ρ
j 1−ρ j related with one another and the dimensions all have the same
By setting a very small sparsity parameter ρ (usually not more variance [36]. After applying ZCA to the patches cut out
than 0.1), we can make sure that for a given input vector x, from the training dataset, the pixels within the patches become
the activation level of the majority of the hidden nodes is uncorrelated and have the same variance, namely 1.
close to zero, while a small proportion of the hidden nodes For each channel, the brightest pixel reaches the positive
are highly activated. This indicates that we can easily find maximum (crest), whereas the darkest pixel reaches the nega-
some highly relevant feature representations of the input vector tive maximum (trough). As we use all six channels of voltage
x by looking at the activations of the hidden nodes in the and current signals simultaneously to extract the features, the
hidden layer. size of each patch is 6 × lP , lP being the length of the patches.
To train the SAE, we optimize the cost function J by Thus, both x and h(x) in Fig. 2 are 6lP ×1 vectors. Specifically,
iteratively updating the weight and bias values using back- at the sampling frequency of 20 kHz, if lP is set to 30, then
propagation algorithm [34]. After a certain number of itera- a patch covers a time span of 1.5 ms. After obtaining the
tions, J is expected to converge to a satisfactory local optimum, patches from the signals, an SAE is trained in accordance
and the unsupervised feature learning by SAE is achieved. with the above-mentioned method, and the hidden nodes can,

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on September 16,2024 at 12:49:49 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: DETECTION AND CLASSIFICATION OF TRANSMISSION LINE FAULTS 1751

be “non-faulty” when no fault occurs. A fault is detected when

the output of the system changes to a specific fault type.
The framework for fault detection and classification based
on CSAE is demonstrated in Fig. 4. Given a multi-channel
signal, a 6×lW window moves along the signal and an output is
given for each windowed signal segment. Concretely, when the
moving window arrives at the ith column of the multi-channel
signal, pixels within the windowed signal segment form a 6 ×
lW matrix, whose first column and last column are denoted
as pi−lW +1 and pi , respectively. Correspondingly, the system
output (the classified fault type) of this matrix is denoted as t(i) .
After the output t(i) is given, the window moves one column
forward, so that pi−lW +1 is excluded from the matrix and pi+1
is included as the last column. In the case of online monitoring,
such a procedure is uninterruptedly repeated.
For each windowed multi-channel signal segment (6 × lW
matrix), we first use the features extracted by the SAE to map
the 6 × lW matrix into convolved feature vectors. Each feature
Fr (r = 1, 2, . . . , k) is a 6 × lP matrix, and all the features
move forward one column a time through the window while
calculating dot products with all the patches they encounter.
Fig. 3. Examples of extracted 6 × 30 features. We restrain the features within the two ends of the window as
they move along, so the size of each convolved feature vector
is, therefore, 1 × (lW − lP + 1). Also note that the features
therefore, learn the features which, when combined, represent have been ZCA whitened previously so that the same whiten-
the intrinsic local characteristics of the multi-channel signals ing process is also applied to the patches prior to calculating
within the dataset. As introduced in [32], the element corre- the dot products. Thus, with k features available, we can get
sponding to the jth hidden node of the ith feature visualization k convolved feature vectors in this feature mapping process,
vector is calculated as namely m1 to mk . Despite the fact that we need to obtain k
W1
i,j convolved feature vectors with the size 1 × (lW − lP + 1) for
j
fi = (5) each window, which involves many calculations when com-
n i,j
j=1 W1 pleted alone, the computational burden can be greatly reduced
when we compute feature mapping for multiple successive
j
where fi is the jth element of the ith feature visualization vec- windows. Simply put, for the current window, all but the
i,j
tor f i , and W1 is the element at the ith row and jth column of last elements of the convolved feature vectors can be directly
W 1 . We can then reshape the feature visualization vectors into obtained from the previous window. Thus, we only need to
6 × lP matrices for visualization. Examples of 100 extracted calculate the last element of the convolved feature vectors,
6 × 30 features (lP = 30) corresponding to 100 hidden nodes which takes only k convolutional operations. This significant
are displayed in Fig. 3. The features are extracted from 250000 reduction in computational burden undoubtedly facilitates the
patches cut from the training dataset. Specifically, the chan- online implementation of the proposed method.
nels correspond to three phase voltage and three phase current After feature mapping, the convolved feature vectors then go
signals, respectively (top to bottom). For instance, the sec- through the pooling stage to generate shortened feature repre-
ond half of feature 18 indicates a severe fluctuation in the sentations. With the help of this pooling operation, the model
voltage and current signals. Similar fluctuation patterns can is less prone to overfitting and becomes more translation-
also be seen in some other features, such as feature 7, 8, invariant [37]. In this paper, we implement the simple mean
19 and 29. Feature 2, 28, 35 and 56 are comparatively more pooling by calculating the mean values of the 1 × sp disjoint
moderate, in which we can see gradual changes in certain segments within the convolved feature vectors, sp being the
channels. In Section III, these features are used to facilitate the number of adjacent elements to be pooled together. It should
implementation of convolutional sparse autoencoder (CSAE). be noted that it is acceptable if the length of the convolved
feature vectors is not divisible by sp , in which case the last
few elements of the vectors are abandoned. After pooling all
III. D ETECTION AND C LASSIFICATION OF FAULTS BASED
the k convolved feature vectors, we get k pooled convolved
ON C ONVOLUTIONAL S PARSE AUTOENCODER
feature vectors, namely d1 to dk . The length of the pooled
A. The Framework for Fault Detection and Classification convolved feature vectors, np , is determined by rounding down
In this paper, we propose and implement a framework based (lW − lP + 1)/sp , that is,
on CSAE to complete both the fault detection and classifica-
tion tasks. A fault diagnosis system is built on the basis of l W − lP + 1
np = (6)
this framework. Concretely, the system output is expected to sp

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on September 16,2024 at 12:49:49 UTC from IEEE Xplore. Restrictions apply.
1752 IEEE TRANSACTIONS ON SMART GRID, VOL. 9, NO. 3, MAY 2018

Fig. 4. The framework for fault detection and classification using CSAE and softmax classifier.

Further, the pooled convolved feature vectors are stacked into where y(i) is the actual class label of s(i) and 1{y(i) = j} is
a long feature vector s(i) , which is used as the input vector of defined as
a softmax classifier. The length of s(i) , ns , is calculated as
1, if y(i) = j
(i)
ns = knp + 1 (7) 1 y =j = (11)
0, otherwise
where the additional dimension corresponds to the bias term
A weight decay term is also added to the cost function, whose
used in the softmax classifier model. Concretely, softmax clas-
weight decay parameter is denoted as λs .
sifiers are based on the softmax regression model, which is
an extension of logistic regression model and is able to solve
multi-class problems [38]. For the softmax classifier, the prob- B. Training and Testing Strategy
ability of the ith stacked input vector s(i) belonging to class j, In previous studies, the signal segments used to train the
P(Y = j|s(i) ), is calculated as features and the classifier generally correspond to the same
T (i)
eθ j s
time range. However, as we depend only on the output of
P Y = j|s(i) = (8) the softmax classifier to decide whether a fault has occurred,
K θ Tl s(i)
l=1 e using training data corresponding to the same time range is
where Y is the stochastic variable of the output class cor- insufficient. In this light, we use data corresponding to sev-
responding to s(i) and θ j ∈ Rns is the parameter vector for eral different time ranges to form the training dataset. Further,
class j, j = 1, 2, . . . , K. Consequently, for the fault classifi- considering the dynamic process during which the post-fault
cation problem with 11 fault types, an 11-dimensional vector signal starts to appear at the end of the window and gradu-
containing all 11 probabilities is given as the output of the ally stretch into the window, it is difficult for the classifier
softmax classifier. We then assign x(i) to the fault type with to distinguish among different fault types at the early stage
the highest probability: due to the lack of information. Thus, we put some data with
post-fault signal appearing at the latter half of the window
t(i) = argmax P Y = j|s(i) (9) into the training dataset and label them as “non-faulty”, so
j
that the classifier can intentionally ignore the data with insuf-
Likewise, the softmax classifier is trained using the training ficient fault-related information and only start to classify the
dataset by iteratively optimizing the cost function faults when enough information is available.
⎡ ⎤
m K eθ Tj s(i) Concretely, we cut off multi-channel signal segments cor-
Js (θ ) = −⎣ (i)
1 y = j log ⎦ responding to 11 different time ranges to form the training
K
eθ Tl s(i)
i=1 j=1 l=1 dataset and the test dataset. The length of the time windows

K is 200 with a sampling frequency of 20 kHz (i.e., half cycle).
+ λs θ i 2 (10) For each multi-channel signal, we denote the column of fault
i=1 inception as pj and the time range starting with this column as

[ j, j + 199]. Five time ranges containing pre-fault information segments are randomly taken out from the complete training
are similarly denoted as [ j − 120, j + 79], [ j − 80, j + 119], dataset with 194199 signal segments.
[ j−60, j+139], [ j−40, j+159] and [ j−20, j+179]. Further,
the rest of the time ranges with only post-fault information are IV. R ESUTLS AND D ISCUSSION
denoted as [ j + 20, j + 219], [ j + 40, j + 239], [ j + 60, j + 259], A. Performance of the Proposed Method
[ j + 100, j + 299] and [ j + 200, j + 399]. To create the dataset,
we use all the 24948 simulated multi-channel signals. A total The performance of the proposed method for online fault
of 274428 multi-channel signal segments are cut off from the detection and classification is shown in Fig. 5. We randomly
simulated signals. Then, 70% of the segments are randomly select 50 multi-channel signal samples for each fault type and
assigned to the training dataset (194199 segments), and the plot the system output for each signal sample within the time
rest 30% are assigned to the test dataset (83229 segments). range between 0 ms and 50 ms. For clarity, the signals of four
Each segment is given a label indicating its fault type (for different fault categories are separately plotted in Fig. 5(a),
simplicity, “non-faulty” is also considered as a fault type). The Fig. 5(b), Fig. 5(c), and Fig. 5(d). The “non-faulty” signals
segments of the time range [ j − 120, j + 79] are all labeled as are also plotted in Fig. 5(d). The fault inception time is 10 ms,
“non-faulty”. and the system output for each signal sample is expected to
An online test dataset with 200 full-length multi-channel be “non” before the fault is detected. To clearly display the
signals for each type is also prepared to validate the real- system output for each signal sample, we shift all the outputs
time performance of the proposed system. With this online slightly up or down. The constants added to the outputs obey
test dataset, we can testify the robustness of the system and normal distribution N(0, 0.01) given the difference between
assess its fault detection speed. Despite the fact that the train- any two neighboring fault types on the vertical axis is 1. As
ing dataset only provides information up to 1 cycle after fault the filtering operation is applied to the system outputs, we can
inception, the system does not stop giving fault classification see that all signal samples are correctly classified into the 11
results until 2 cycles after fault inception. fault types once fault detection is securely done. No mistake
In addition, a filtering operation is applied to the output of is observed even in the time range between 30 ms to 50 ms,
the online fault diagnosis system. Because the multi-channel for which no training data is provided. Further, as is seen in
signal segments of the time range [ j−120, j+79] are labeled as Fig. 5, the fault detection time for all fault types are basically
“non-faulty”, the boundary between “faulty” and “non-faulty” between 5 and 10 ms. Considering the fact that we employ
states becomes indistinct to some extent. Thus, for the sam- the strategy to classify signal segments whose post-fault signal
pling frequency of 20 kHz, the system output is filtered in proportion is lower than 40% as “non-faulty”, such response
such a way that any change in the output is confirmed only speed is quite satisfactory.
when the changed output remains the same for 25 consecutive The average time used for fault detection of the online test
sample points (1.25 ms). Therefore, any change in the output dataset (200 signals for each fault type) is listed in Table II.
that lasts fewer than 25 consecutive sample points is filtered. As is depicted in the table, the average time used for fault
This filtering operation would certainly delay the detection detection is between 6 ms and 7 ms. Generally speaking, the
of faults, but it can greatly improve the online classification average detection time for ab-g, ac-g, bc-g and abc-g faults
performance of the fault diagnosis system. The detailed per- is longer than other fault types. The “non-faulty” type is not
formance of the proposed method and fault diagnosis system listed in this table, as the system output of this type is expected
is presented in Section IV. to remain unchanged.
To validate the performance of the softmax classifier, the
classification accuracy for different fault types is also cal-
C. Selection of Parameters Used in the Models culated and depicted in Table III. As mentioned previously,
1) Sparse Autoencoder: The SAE is used to implement the the training dataset contains 45000 multi-channel signal seg-
unsupervised feature learning. Concretely, a total of 250000 ments. The test dataset used here is also a subset of the
patches are cut from the 6 × 200 signal segments from the complete test dataset with 83229 multi-channel signal seg-
training dataset for the learning process. The hidden layer of ments, which is generated by randomly taking out 20000
the SAE has 100 hidden nodes. The sparsity parameter ρ, segments from the complete dataset. The overall classification
weight decay parameter λ and sparsity penalty parameter β accuracy is 99.74%, and the classification accuracies for all
are 0.1, 0.003 and 5, respectively. 11 types are higher than 99.29%. This result shows that the
2) Feature Mapping and Pooling: The window length lW proposed method is capable of classifying faults with quite
and patch length lP are 200 and 30, hence the length of each high accuracies.
convolved feature vector is 171. Moreover, we set the pooling
size sp to 5. Thus, the length of the pooled convolved feature B. The Effect of Sampling Frequency and Signal Type
vectors is 34. The previous results are obtained when the sampling fre-
3) Softmax Classifier: As we have 100 features and the quency of the multi-channel signals is 20 kHz. Under the
length of each pooled convolved feature vector is 34, the restriction of the data acquisition equipment, the maximum
input size of the softmax classifier is 3400. The weight decay sampling frequency can be much lower than 20 kHz in prac-
parameter λs is 0.0001. Considering the limited computational tice. Further, in cases where voltage and current signals are not
ability of the machine used for the experiments, 45000 signal available at the same time, we may only use the voltage signals

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on September 16,2024 at 12:49:49 UTC from IEEE Xplore. Restrictions apply.
1754 IEEE TRANSACTIONS ON SMART GRID, VOL. 9, NO. 3, MAY 2018

TABLE II
AVERAGE T IME OF FAULT D ETECTION FOR D IFFERENT FAULT T YPES

TABLE III
C LASSIFICATION ACCURACY OF THE P ROPOSED
M ETHOD FOR D IFFERENT FAULT T YPES

of the classification accuracies are then calculated. As dif-

ferent sampling frequencies are used, the window length lW
and patch length lP are accordingly modified so that the time
durations corresponding to lW and lP remain the same to the
maximum extent (as these parameters has to be integers, we
have to round them up or down if necessary). The values of
window length lW for the sampling frequencies (1.25 kHz to
20 kHz) are 13, 25, 50, 100, and 200, whilst the values of
patch length lP are 2, 4, 8, 15, and 30, respectively. Moreover,
the values for the pooling size sp for different frequencies are
set to 1, 2, 3, 4, and 5 in order that the pooled convolved fea-
ture vectors are comparable in length (note that the pooling
Fig. 5. Examples of output signals of the fault detection and classifica- size also has to be integers). As a result, the input sizes for
tion system for different types of faults. Each fault type has fifty examples the softmax classifier are 1200, 1100, 1400, 2100, and 3400,
whose original outputs are added with different small biases for the purpose given that the number of hidden nodes is 100.
of illustration.
The results of classification accuracy with different sam-
pling frequencies and signal types are shown in Fig. 6. First
of all, the classification accuracy increases as sampling fre-
or the current signals. Thus, we use the same training dataset quency increases for all 3 schemes. This is in line with the
and test dataset to examine how the proposed method perform expectation, as a higher sampling frequency provides more
with different sampling frequencies and signal types. In addi- information with respect to the specific fault type. What also
tion to the scheme in which both voltage and current signals meets our expectation is that the scheme using both voltage
are used, two other schemes using only the voltage signals or and current signals (Scheme I) has the highest accuracies for
the current signals are implemented. The sampling frequen- all sampling frequencies. Specifically, at the lower sampling
cies used are 1.25 kHz, 2.5 kHz, 5 kHz, 10 kHz, and 20 kHz, frequencies such as 1.25 kHz and 2.5 kHz, the advantage of
respectively. Each classification accuracy result is obtained by scheme I over scheme II and scheme III is more significant
repeating the implementation for 5 times, and the mean values compared to higher sampling frequencies. Further, it is noticed

TABLE IV
PARAMETERS U SED FOR THE S IMULATION IN PSCAD/EMTDC
TO G ENERATE DATASET II

Zs = 9.19 + 74.76 , and the transmission line has pos-

itive sequence impedance Z1 = 5.38 + 84.55 and zero
sequence impedance Z0 = 64.82 + 209.52 . The parameters
Fig. 6. The classification accuracies with different sampling frequencies for
used for simulation are listed in Table IV. Each fault type has
scheme I (voltage and current signals), scheme II (current signals only) and 540 simulation results, thus there are 5400 simulation results
scheme III (voltage signals only). (the “non-faulty” type is not considered here). Further, 59400
waveform segments are generated (11 different time ranges)
and are divided into a training set (41580 segments, 70%) and
that the performance of scheme II is better than scheme III a test set (17820 segments, 30%). We now refer to the previ-
at lower sampling frequencies, whereas at higher frequencies ous dataset as dataset I and refer to the dataset introduced in
scheme III has higher accuracies. At 5 kHz, both schemes this section as dataset II. In the following implementations of
have roughly the same performance. One explanation for this the proposed method, we use dataset I to extract the features
is that the current signals contain more low-frequency infor- and train the softmax classifier with dataset II. It should be
mation about the specific fault type than the voltage signals, noted that the feature extraction stage is unsupervised as the
while the voltage signals contain more fault-induced transients emphasis of this stage is to learn useful and universal feature
that are helpful to reveal the fault type. When both signals are representations for the convolutional operations. Thus, with
used, high classification accuracy can be achieved across the more data available, the effectiveness of this stage is expected
frequency range we consider, as both aspects of fault type- to be improved. As the dataset used for training the softmax
specific information can be fully used. It is worth noticing classifier has to be labeled, it is acceptable if the size of the
that as we try to keep some of the parameters consistent for dataset is relatively small, as has been discussed in [29].
all 3 schemes to ensure that the results are comparable, the Denoising Sparse Autoencoder (DSAE) is implemented to
performance of scheme II and scheme III may not represent enhance the performance of the proposed method by extract-
their optimal capability. Nevertheless, we can see that the clas- ing noise-tolerant features. In [40], additive Gaussian noise is
sification accuracy of scheme I is more than 98% even at lower used to extract useful robust feature representations and the
sampling frequencies, which validates the effectiveness of the denoising mechanism is discussed extensively. In this paper,
proposed method. we corrupt the input data of DSAE with white Gaussian
noise (WGN) to produce data with a specific signal to noise
C. Performance of the Proposed Method With Noise and ratio (SNR) and train the DSAE capable of reconstruct-
Measurement Errors ing the uncorrupted data. More specifically, we corrupt the
It is of great significance to ensure that the method used to input data x with WGN and produce x. The reconstruction
detect and classify faults can withstand noise and measurement of x, h( x), is then calculated using (1). Consequently, the
squared error term m (i) (i) 2
errors. In addition, the generalizability of the feature extract- i=1 h(x ) − x in (2) is replaced by
m (i) (i)
i=1 h(x ) − x . We then use the features extracted by
ing process needs verification. In this light, a new dataset of 2

voltage and current signals corresponding to different fault DSAE for the convolutional operations. Comparison of classi-
types are simulated using PSCAD/EMTDC. Noise and mea- fication accuracies of features extracted by SAE and DSAE
surement errors are separately considered in the first place, is presented in Fig. 7. We can see that the classification
and we then compare the performance of the proposed method accuracies of DSAE-based implementations are above 98%,
with some existing methods in the presence of both noise and while those of SAE-based implementations drop much faster
measurement errors. as SNR decreases. This result indicates that feature extraction
For the simulation setup in PSCAD/EMTDC, a transmission by DSAE greatly reduces the impact of WGN, given that we
line model similar to the model in Fig. 1 is used. The transmis- have prior knowledge of the SNR of the signals collected.
sion line adopts the frequency dependent phase-domain model, Further, two types of representative measurement errors are
which is theoretically the most accurate model as the fre- considered in this paper. The first type of error (type I error) is
quency dependence of internal transformation matrices can be “consecutive zero”, i.e., a section of the signal becomes zero.
represented [39]. In addition, the three phases of the transmis- The second type of error (type II error) is “consecutive high
sion line are untransposed, as opposed to the ideally transposed value”, which refers to the phenomenon in which the signal
line used for the model simulated in MATLAB/Simulink. rises to a high value (either positive or negative) and keeps
Concretely, the impedance of the sources at both ends is the value for a little while. To cope with measurement errors,

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on September 16,2024 at 12:49:49 UTC from IEEE Xplore. Restrictions apply.
1756 IEEE TRANSACTIONS ON SMART GRID, VOL. 9, NO. 3, MAY 2018

Fig. 8. The classification accuracies using features extracted by SAE with

Fig. 7. The classification accuracies using features extracted by SAE and and without dropout under different error rates.
DSAE under different SNR values.

dropout is added during the training process of SAE. Dropout fault classification. Concretely, we decompose the waveforms
has been proven useful to prevent neural networks from over- into eight detail levels and one approximate level (with the
fitting [41], which means that models trained with dropout are sampling frequency of 20 kHz, the approximate level covers
more tolerant to errors within the data. Concretely, dropout is 0-78.125 Hz) using Db4 mother wavelet. Energy (summation
implemented by randomly setting activations of hidden nodes of squared coefficients) [42] and maximum (the maximum of
to 0 at a given probability. In this paper, we apply dropout to absolute values of the coefficients) [43] features are calculated
the connections between the input layer and the hidden layer, from the coefficients in each decomposition level. For classi-
that is, we replace (1) with fication, SVM and ANN are used with both types of features
as inputs. The implementation of SVM with RBF kernel is
h(x) = W 2 rT f (W 1 x + b1 ) + b2 (12) introduced in [20]. Appropriate values of parameters γ and C
where r is the dropout masking vector whose jth element are determined using 10-fold cross-validation and grid search.
rj satisfies rj ∼ Bernoulli(1 − pd ), pd being the probability The structure of the ANNs has one hidden layer with fully
for a given connection to be masked by dropout. Comparison connected neurons. With regard to energy features, γ = 0.05
of classification accuracies of training SAE with and without and C = 20 are used for the SVM and λ = 0.001 is used
dropout is shown in Fig. 8. The value of pd is set to 0.1 after for the 54-200-11 ANN (λ is the weight decay parameter of
trying several different values. Each waveform segment has the ANN model for regularization). As for maximum features,
exactly one error randomly added to one of its six channels, γ = 0.01 and C = 20 are used for the SVM and λ = 0.001
and both type I and II errors account for 50% of the errors. is used for the ANN with the same structure.
For type II error, the high value is set to two times of the rated The results of classification accuracies are listed in Table V.
amplitude of the signal (errors with positive and negative high It is demonstrated in the table that the proposed CDSAE
values have the same proportion). The range of error rate (the method outperforms the other methods. With the implementa-
proportion of error in a single channel) is 0.5% to 3%. It is tion of DSAE with dropout, the performance of the proposed
clear from Fig. 8 that a dropout rate of 10% (pd = 0.1) greatly method is satisfactory in the presence of noise and measure-
improves the classification accuracies of the proposed method ment errors. Also note that the features used for convolutional
when measurement errors are taken into consideration. operations for dataset II are extracted from dataset I, which
We now combine the implementation of both DSAE and confirms the generalizability of the proposed method for trans-
dropout and compare the proposed method with existing meth- mission line systems that are similar in configuration but have
ods that have been proven effective in fault classification. For different parameters and system dynamics. Though the fea-
clarity, the proposed method with the implementation of DSAE tures extracted by DSAE with dropout are inevitably different
and dropout is referred to as CDSAE (convolutional DSAE), from the ones extracted by SAE without dropout, the gener-
which is essentially a slightly modified version of the origi- alizability is still verified, as both implementations only use
nal CSAE method. The SNR of signals with additive WGN dataset I during the feature extraction stage.
is set to 14, and the error rate is 1.5% (3 consecutive sam-
pling points with the sampling frequency of 20 kHz). The
features used for convolutional operations are extracted by V. A PPLICATION OF THE P ROPOSED
a DSAE with 10% dropout rate using dataset I, while the M ETHOD IN S MART G RIDS
softmax classifier is trained and tested using dataset II. All An illustrative diagram of implementing the proposed
signals in the datasets are corrupted with WGN, resulting in method in power systems is shown in Fig. 9. With intelli-
an SNR of 14. Other parameters remain the same as intro- gent electronic devices such as remote terminal units installed
duced in Section III. We also use DWT to extract features for at the terminals of substations in the monitored region [44],

TABLE V
C OMPARISON OF C LASSIFICATION ACCURACIES ON current signals guarantees favorable performance across the
DATASET II OF D IFFERENT M ETHODS considered frequency range. The proposed method is further
modified to ensure that the performance is favorable in the
presence of noise and measurement errors. Comparison of the
proposed method with existing methods show that the pro-
posed method is robust to noise and measurement errors with
high generalizability.
As we use simulated data for training and testing, future
implementation of the method may consider using real data
collected by various devices deployed in the power grid.
Moreover, in cases where only voltage or current signals are
available, the parameters such as window length and patch
length need to be tuned so that the classification accuracy can
be improved. In order to build a more comprehensive fault
diagnosis system capable of detecting and classifying faults
and power quality disturbances, the framework needs further
modification in window and patch lengths as well as the overall
structure.

R EFERENCES
[1] K. Chen, C. Huang, and J. L. He, “Fault detection, classification and
location for transmission lines and distribution systems: A review on the
methods,” High Voltage, vol. 1, no. 1, pp. 25–33, 2016.
[2] S.-L. Yu and J.-C. Gu, “Removal of decaying DC in current and voltage
signals using a modified Fourier filter algorithm,” IEEE Trans. Power
Del., vol. 16, no. 3, pp. 372–379, Jul. 2001.
Fig. 9. Application of the proposed method in smart grids.
[3] M. T. Hagh, K. Razi, and H. Taghizadeh, “Fault classification and loca-
tion of power transmission lines using artificial neural network,” in Proc.
Int. Power Eng. Conf. (IPEC), Singapore, 2007, pp. 1109–1114.
a fault monitoring and diagnosing system based on the pro- [4] B. Das and J. V. Reddy, “Fuzzy-logic-based fault classification scheme
for digital distance protection,” IEEE Trans. Power Del., vol. 20, no. 2,
posed method can be established. The system is able to give pp. 609–616, Apr. 2005.
real-time alerts at the moment of fault occurrence, and protec- [5] A. Jamehbozorg and S. M. Shahrtash, “A decision-tree-based method
tive actions can be taken if possible. The high generalizability for fault classification in single-circuit transmission lines,” IEEE Trans.
Power Del., vol. 25, no. 4, pp. 2190–2196, Oct. 2010.
of the proposed method means that it can be widely adopted [6] A. I. Megahed, A. M. Moussa, and A. E. Bayoumy, “Usage of wavelet
by power transmission systems as well as power distribution transform in the protection of series-compensated transmission lines,”
systems. IEEE Trans. Power Del., vol. 21, no. 3, pp. 1213–1221, Jul. 2006.
In addition, data recording and preliminary data analysis [7] K. M. Silva, B. A. Souza, and N. S. D. Brito, “Fault detection and clas-
sification in transmission lines based on wavelet transform and ANN,”
run continuously and store the data and analysis results in the IEEE Trans. Power Del., vol. 21, no. 4, pp. 2058–2063, Oct. 2006.
database. Reports on a daily, weekly, monthly, seasonal and [8] S. Ekici, S. Yildirim, and M. Poyraz, “Energy and entropy-based fea-
yearly basis can be generated via further analysis using stored ture extraction for locating fault on transmission lines by using neural
network and wavelet packet decomposition,” Expert Syst. Appl., vol. 34,
data, which may help grid operators assess the reliability of no. 4, pp. 2937–2944, 2008.
the grid in the monitored region and evaluate the necessity [9] S. R. Samantaray and P. K. Dash, “Pattern recognition based digital
of transforming or upgrading the grid infrastructure. Further relaying for advanced series compensated line,” Int. J. Elect. Power
Energy Syst., vol. 30, no. 2, pp. 102–112, 2008.
analysis can be done with the integration of fault locating [10] K. R. Krishnanand and P. K. Dash, “A new real-time fast dis-
methods. crete S-transform for cross-differential protection of shunt-compensated
power systems,” IEEE Trans. Power Del., vol. 28, no. 1, pp. 402–410,
Jan. 2013.
VI. C ONCLUSION [11] J.-A. Jiang, C.-S. Chen, and C.-W. Liu, “A new protection scheme for
fault detection, direction discrimination, classification, and location in
This paper presents a new method for detection and classifi- transmission lines,” IEEE Trans. Power Del., vol. 18, no. 1, pp. 34–42,
cation of power transmission line faults. Three-phase voltage Jan. 2003.
and current signals are combined as a multi-channel signal, [12] X. Dong, W. Kong, and T. Cui, “Fault classification and faulted-phase
and unsupervised feature learning from a dataset of signal selection based on the initial current traveling wave,” IEEE Trans. Power
Del., vol. 24, no. 2, pp. 552–559, Apr. 2009.
segments is achieved by SAE. A CSAE based framework [13] D. Thukaram, H. P. Khincha, and H. P. Vijaynarasimha, “Artificial neu-
capable of detecting and classifying faults is proposed with ral network and support vector machine approach for locating faults in
novel training and testing strategies, which greatly reduces the radial distribution systems,” IEEE Trans. Power Del., vol. 20, no. 2,
pp. 710–721, Apr. 2005.
computational burden and improves the performance. Results [14] J.-A. Jiang et al., “A hybrid framework for fault detection, classification,
show that the proposed method detects faults within 7 ms after and location—Part I: Concept, structure, and methodology,” IEEE Trans.
fault inception, and classifies faults with accuracies close to Power Del., vol. 26, no. 3, pp. 1988–1998, Jul. 2011.
[15] A. A. Girgis, A. A. Sallam, and A. K. El-Din, “An adaptive protec-
100% for all fault types. Tests with different sampling fre- tion scheme for advanced series compensated (ASC) transmission lines,”
quencies and signal types show that using both voltage and IEEE Trans. Power Del., vol. 13, no. 2, pp. 414–420, Apr. 1998.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on September 16,2024 at 12:49:49 UTC from IEEE Xplore. Restrictions apply.
1758 IEEE TRANSACTIONS ON SMART GRID, VOL. 9, NO. 3, MAY 2018

[16] O. A. S. Youssef, “Fault classification based on wavelet transforms,” [40] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol,
in Proc. IEEE/PES Transm. Distrib. Conf. Expo., vol. 1. Atlanta, GA, “Stacked denoising autoencoders: Learning useful representations in a
USA, 2001, pp. 531–536. deep network with a local denoising criterion,” J. Mach. Learn. Res.,
[17] R. N. Mahanty and P. B. D. Gupta, “Application of RBF neural net- vol. 11, pp. 3371–3408, Dec. 2010.
work to fault classification and location in transmission lines,” IEE Proc. [41] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and
Gener. Transm. Distrib., vol. 151, no. 2, pp. 201–212, Mar. 2004. R. Salakhutdinov, “Dropout: A simple way to prevent neural net-
[18] J. Upendar, C. P. Gupta, and G. K. Singh, “Discrete wavelet trans- works from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1,
form and probabilistic neural network based algorithm for classifica- pp. 1929–1958, Jun. 2014.
tion of fault on transmission systems,” in Proc. Annu. IEEE India [42] Z. Q. Bo, R. K. Aggarwal, A. T. Johns, H. Y. Li, and Y. H. Song,
Conf. (INDICON), vol. 1. Kanpur, India, 2008, pp. 206–211. “A new approach to phase selection using fault generated high frequency
[19] P. K. Dash, S. R. Samantaray, and G. Panda, “Fault classification and noise and neural networks,” IEEE Trans. Power Del., vol. 12, no. 1,
section identification of an advanced series-compensated transmission pp. 106–115, Jan. 1997.
line using support vector machine,” IEEE Trans. Power Del., vol. 22, [43] A. K. Pradhan, A. Routray, S. Pati, and D. K. Pradhan, “Wavelet fuzzy
no. 1, pp. 67–73, Jan. 2007. combined approach for fault classification of a series-compensated trans-
[20] U. B. Parikh, B. Das, and R. Maheshwari, “Fault classification tech- mission line,” IEEE Trans. Power Del., vol. 19, no. 4, pp. 1612–1618,
nique for series compensated transmission line using support vector Oct. 2004.
machine,” Int. J. Elect. Power Energy Syst., vol. 32, no. 6, pp. 629–636, [44] M. Kezunovic, “Smart fault location for smart grids,” IEEE Trans. Smart
2010. Grid, vol. 2, no. 1, pp. 11–22, Mar. 2011.
[21] M. J. Reddy and D. K. Mohanta, “Adaptive-neuro-fuzzy inference sys-
tem approach for transmission line fault classification and location
incorporating effects of power swings,” IET Gener. Transm. Distrib.,
vol. 2, no. 2, pp. 235–244, Mar. 2008.
[22] A. Jamehbozorg and S. M. Shahrtash, “A decision tree-based method
for fault classification in double-circuit transmission lines,” IEEE Trans. Kunjin Chen was born in Changsha, China, in
Power Del., vol. 25, no. 4, pp. 2184–2189, Oct. 2010. 1993. He received the B.Sc. degree in electrical
[23] M. K. Jena, L. N. Tripathy, and S. R. Samantray, “Intelligent relaying engineering from Tsinghua University, Beijing,
of UPFC based transmission lines using decision tree,” in Proc. 1st Int. China, in 2015, where he is currently pursuing
Conf. Emerg. Trends Appl. Comput. Sci. (ICETACS), Shillong, India, the M.Sc. degree with the Department of Electrical
2013, pp. 224–229. Engineering. His research interests include pattern
[24] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, recognition and data mining in power systems.
no. 7553, pp. 436–444, 2015.
[25] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classifica-
tion with deep convolutional neural networks,” in Proc. Adv. Neural Inf.
Process. Syst., 2012, pp. 1106–1114.
[26] A. Hannun et al., “Deep speech: Scaling up end-to-end speech recog-
nition,” arXiv preprint arXiv:1412.5567, 2014.
[27] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation
of word representations in vector space,” in Proc. Int. Conf. Learn.
Represent. Workshop, Scottsdale, AZ, USA, 2013, pp. 1301–3781.
[28] Q. V. Le, “Building high-level features using large scale unsuper- Jun Hu was born in Ningbo, China, in 1976. He
vised learning,” in Proc. IEEE Int. Conf. Acoust. Speech Signal received the B.Sc., M.Sc., and Ph.D. degrees in elec-
Process. (ICASSP), Vancouver, BC, Canada, 2013, pp. 8595–8598. trical engineering from the Department of Electrical
Engineering, Tsinghua University, Beijing, China, in
[29] K. Chen, J. Hu, and J. L. He, “A framework for automatically extracting
1998, 2000, and 2008.
overvoltage features based on sparse autoencoder,” IEEE Trans. Smart
He is currently an Associate Professor with
Grid, to be published, doi: 10.1109/TSG.2016.2558200.
the Department of Electrical Engineering, Tsinghua
[30] R. Zhang, C. Li, and D. Jia, “A new multi-channels sequence recognition University. His research fields include overvoltage
framework using deep convolutional neural network,” in Proc. INNS analysis in power system, sensors and big data,
Conf. Big Data Program, vol. 53. San Francisco, CA, USA, Aug. 2015, dielectric materials, and surge arrester technology.
pp. 383–390.
[31] M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt, “Spatio-
temporal convolutional sparse auto-encoder for sequence classification,”
in Proc. BMVC, Surrey, U.K., 2012, pp. 1–12.
[32] A. Ng, “Sparse autoencoder,” CS294A Lecture Notes, Stanford Univ.,
Stanford, CA, USA, pp. 1–19, 2011.
Jinliang He (M’02–SM’02–F’08) was born in
[33] J. Ngiam et al., “On optimization methods for deep learning,” in Proc.
Changsha, China, in 1966. He received the B.Sc.
28th Int. Conf. Mach. Learn. (ICML), Bellevue, WA, USA, 2011,
degree in electrical engineering from the Wuhan
pp. 265–272.
University of Hydraulic and Electrical Engineering,
[34] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning represen- Wuhan, China, in 1988, the M.Sc. degree in
tations by back-propagating errors,” Cogn. Model., vol. 5, no. 3, p. 1, electrical engineering from Chongqing University,
1988. Chongqing, China, in 1991, and the Ph.D. degree
[35] A. J. Bell and T. J. Sejnowski, “The ‘independent components’ of natural in electrical engineering from Tsinghua University,
scenes are edge filters,” Vis. Res., vol. 37, no. 23, pp. 3327–3338, 1997. Beijing, China, in 1994.
[36] A. Krizhevsky and G. Hinton, “Learning multiple layers of features from He became a Lecturer in 1994, and an Associate
tiny images,” Univ. Toronto, Toronto, ON, Canada, Tech. Rep., 2009. Professor in 1996, with the Department of Electrical
[37] S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back, “Face recognition: Engineering, Tsinghua University. From 1997 to 1998, he was a Visiting
A convolutional neural-network approach,” IEEE Trans. Neural Netw., Scientist with Korea Electrotechnology Research Institute, Changwon,
vol. 8, no. 1, pp. 98–113, Jan. 1997. South Korea, involved in research on metal oxide varistors and high voltage
[38] C. Hung, J. Nieto, Z. Taylor, J. Underwood, and S. Sukkarieh, polymeric metal oxide surge arresters. From 2014 to 2015, he was a Visiting
“Orchard fruit segmentation using multi-spectral feature learning,” in Professor with the Department of Electrical Engineering, Stanford University,
Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Tokyo, Japan, 2013, Palo Alto, CA, USA. In 2001, he was promoted to a Professor with Tsinghua
pp. 5314–5320. University. He is currently the Chair with High Voltage Research Institute,
[39] M. Z. Daud, P. Ciufo, and S. Perera, “Investigation on the suitabil- Tsinghua University. He has authored five books and 400 technical papers. His
ity of PSCAD/EMTDC models to study energisation transients of research interests include overvoltages and EMC in power systems and elec-
132 kV underground cables,” in Proc. Aust. Universities Power Eng. tronic systems, lightning protection, grounding technology, power apparatus,
Conf. (AUPEC), Sydney, NSW, Australia, 2008, pp. 1–6. and dielectric material.

Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY ROORKEE. Downloaded on September 16,2024 at 12:49:49 UTC from IEEE Xplore. Restrictions apply.