0% found this document useful (0 votes)
10 views14 pages

Tpel 2012 2192503

Survey on Reliability of Power Electronic Systems

Uploaded by

Ahmad Arslan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views14 pages

Tpel 2012 2192503

Survey on Reliability of Power Electronic Systems

Uploaded by

Ahmad Arslan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

IEEE TRANSACTIONS ON POWER ELECTRONICS, VOL. 28, NO.

1, JANUARY 2013 591

Survey on Reliability of Power Electronic Systems


Yantao Song and Bingsen Wang, Senior Member, IEEE

Abstract—With wide-spread application of power electronic sys- analyzed. In order to analyze the reliability of power electronic
tems across many different industries, their reliability is being systems, mathematical estimation of reliability is necessary.
studied extensively. This paper presents a comprehensive review Component-level failure models are studied extensively [3],
of reliability assessment and improvement of power electronic sys-
tems from three levels: 1) metrics and methodologies of reliability [7]–[13], and several quantitative methodologies are presented
assessment of existing system; 2) reliability improvement of exist- to build system-level reliability models, both of which combine
ing system by means of algorithmic solutions without change of to give an accurate reliability prediction [5], [14], [15], [16],
the hardware; and 3) reliability-oriented design solutions that are [17]. In many cases, the classic design cannot meet reliability re-
based on fault-tolerant operation of the overall systems. The intent quirement of specifications. Numerous solutions are proposed to
of this review is to provide a clear picture of the landscape of relia-
bility research in power electronics. The limitations of the current improve the reliability. Active online monitoring, management
research have been identified and the direction for future research of faults, and extending fault-tolerant operation by reconfigur-
is suggested. ing control strategies are among the commonly adopted methods
Index Terms—Fault diagnosis, fault-tolerant operation, power to enhance reliability [18]–[29]. Since redundant design is an
electronic systems, reliability. effective solution to maintain postfault operation and to thus re-
duce the number of unexpected breakdown of systems, various
power converter topologies equipped with redundant capability
are proposed [30]–[48]. In view of the importance of reliability
I. INTRODUCTION
and much research carried out into it, it is considered a timely
OWER electronic systems play an increasingly important
P role in adjustable-speed drives, unified power quality cor-
rection, utility interfaces with renewable energy resources, en-
attempt to present a systematic perspective on the status of the
power electronic reliability for engineering design and future
research.
ergy storage systems, and electric or hybrid electric vehicles This paper presents a comprehensive overview of the reliabil-
(HEVs). The power electronic techniques provide compact and ity of power electronic systems. The composition of the review
high-efficient solutions to power conversion. However, intro- is based on three different scenarios. First, for any given system
duction of power electronic techniques into these application the reliability assessment or benchmarking is necessary before
fields challenges reliability of the overall systems. One of the any reliability improvement effort is attempted. Second, if any
concerns related to reliability lies in the power semiconductor reliability improvement of the system is deemed necessary, the
devices and electrolytic capacitors that are the most vulnerable algorithmic change may be preferred over significant hardware
links. Most of power electronic converters are not equipped with alternation. Third, the reliability assurance can be implemented
redundancy. Therefore, any fault that occurs to the components in the design stage if the system is yet to be built. Based on
or subsystems of the system will lead to shutdown of the system. these three scenarios, the organization of the subsequent text is
These unscheduled interruptions not only cast significant safety as follows. Section II introduces fundamental theory of relia-
concerns, but also increases system operation cost and partially bility that is relevant to this study. Several common reliability
offsets the benefits of introducing power electronic systems. For models are presented and compared in Section III. Section IV
instance, in HEVs, faults of electric propulsion systems will im- summarizes the existing methods that are commonly employed
pair fuel economy and lengthen cost recovery period [1]. For a to enhance reliability of systems without fundamental change to
photovoltaic (PV) generation system, the cost of failure is equal the systems’ architectures. Such methods include active thermal
to the value of the energy that would be generated while the sys- and fault management and degraded operation under faulted
tem is down plus the cost of repairing and replacing parts [2]. situations. Section V introduces concepts of redundancy and
Over the past several decades, much attention has been di- modified power electronic systems that are equipped with re-
rected to the reliability of power electronic systems. In [3]–[6], dundant functionalities. The concluding remarks and discussion
various metrics of evaluating system reliability are defined and are summarized in Section VI.

II. RELIABILITY PREDICTION METRICS


Manuscript received February 15, 2012; revised March 21, 2012; accepted The first step in evaluating and improving system reliability
March 21, 2012. Date of current version September 11, 2012. Recommended is to determine what metrics to analyze. Because metrics al-
for publication by Associate Editor A. Lindemann.
The authors are with Department of Electrical and Computer Engi- ways reflect the design goals, any information that is utilized
neering, Michigan State University, East Lansing, MI 48824 USA (e-mail: to determine the metrics shall be based on requirements from
[email protected]; [email protected]). customers and careful consideration of intended applications.
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. The commonly adopted metrics for the evaluation of power
Digital Object Identifier 10.1109/TPEL.2012.2192503 electronic systems encompass reliability, failure rate, mean

0885-8993/$31.00 © 2012 IEEE


592 IEEE TRANSACTIONS ON POWER ELECTRONICS, VOL. 28, NO. 1, JANUARY 2013

of R(0) = 1, i.e., the item is fully functional at the initial state


t
− λ(τ )dτ
R(t) = e 0 . (2)
In many reliability models, the failure rates of components
and subsystems are assumed independent of time, although this
assumption has limitations [4], [49]. With the assumption of
λ(t) = λ, (2) is simplified to

R(t) = e−λt . (3)


Fig. 1. Typical failure rate curve as a function of time. The failure rate is then estimated from the mean number of
failures per unit time, which is expressed in failures in time
(FIT)
time to failure (MTTF), mean time to repair (MTTR), and
availability. 1 FIT = 10−9 failure/hour. (4)

A. Reliability C. Mean Time to Failure


Reliability is defined as the probability that an item (compo- The MTTF is the expected time before a failure occurs. Un-
nent, subsystem, or system) performs required functions for an like reliability, MTTF does not depend on a particular period of
intended period of time under given environmental and oper- time. It gives the average time in which an item operates with-
ational conditions [2]. The reliability function R(t) represents out failing. MTTF is a widely quoted performance metric for
the probability that the system will operate without failures over comparison of various system designs. This indicator reflects
a time interval [0, t]. life distribution of an item. Nonetheless, it does not convey the
The reliability of a system is dependent on the time in consid- information that a longer MTTF than the mission time means
eration. The reliability typically decreases as the time in consid- that the system is highly reliable within mission time.
eration progresses. For commercial products, the time should The relationship between MTTF and reliability function is
cover the warranty time. described by
 +∞
B. Failure Rate MTTF = R(t)dt (5)
0
The failure rate of an item is an indication of the “proneness to
where R(t) is the reliability function. When the failure rate λ(t)
failure” of the item after time t has elapsed. Fig. 1 shows a typical
is constant λ, the expression for MTTF is simplified to
failure rate curve as a function of time, which is commonly
known as the bathtub curve. The shape of the bathtub curve 1
MTTF = . (6)
in Fig. 1 suggests that the life cycle of an item can be divided λ
into three different periods: the burn-in period, the useful life
period, and the wear-out period. Although an item is subjected to D. Mean Time to Repair
quite extensive test procedure and much of the infant mortality is
removed before they are put into use, undiscovered defects in an The (MTTR) is the mean repair time that it takes to eliminate
item during the process of design or production lead to the high a failure and to restore the system to a specified state. The repair
failure rate in the burn-in period. When the item survives in the time depends on maintainability, such as effective diagnosis of
initial burn-in period, the failure rate tends to stabilize at a level faults, replaceable components at hand, and so on.
where it remains relatively constant for a certain period of time
before the item begins to wear out. While in wear-out period, E. Availability and Average Availability
systems have finished their required missions. Therefore, the The availability is the probability that a system will be func-
failure rate in useful life time is important to carry out reliability tioning at a given time. The average availability denotes the
analysis. mean portion of the time the system is operating over a given
The failure rate λ(t) is related to the reliability function R(t) period of time. For a repairable system, if it is repaired to an “as
by good as new” condition every time it fails, the average availabil-
R(t) − R(t + Δt) ity is
λ(t) = lim
Δ t→0 R(t)Δt MTTF
Aavg = . (7)
1 dR(t) MTTF + MTTR
=− (1) Therefore, availability improvement entails increasing MTTF
R(t) dt
and decreasing MTTR. The main limitation associated with the
where Δt is a time interval with Δt > 0. The reliability R(t) metric of average availability lies in the fact that it cannot reflect
is determined from the failure rate λ(t) with the consideration frequency of failures or maintenances required. Hence, it is
SONG AND WANG: SURVEY ON RELIABILITY OF POWER ELECTRONIC SYSTEMS 593

only utilized to assess the repairable systems where the primary


concern is availability rather than reliability.

III. RELIABILITY ASSESSMENT OF POWER ELECTRONIC Fig. 2. Illustration of a series configuration with n subsystems.
SYSTEMS
Reliability evaluation is important for design and operation Another important data source of empirical-based failure rate
management of the systems. Quantitative assessment of relia- models is RDF 2000, which considers dormant modes and ef-
bility for power electronic converters is essential in determining fects of the temperature cycles, and includes data of IGBTs [52].
whether a particular design meets certain specifications. It also RDF2000 is a preferred reference in a complex analysis since
serves as a criterion to compare different topologies, control it takes into account all types of stress. The failure rates of IG-
strategies, and components. Moreover, the accurate reliability BTs, diodes, and capacitors are estimated and compared in [5]
prediction gives a valuable guidance to management of the sys- from two data sources and also Coffin–Manson and Arrhenius
tem operation and maintenance. All reliability analysis involves Equations. It turns out that each approach has its disadvantages.
some forms of models, which are either at the component level Since the empirical models of electronic devices are based
or at the system level. on previously observed data, reliability prediction results from
these models are inaccurate for applications with different
design, and operational and environmental conditions. The
A. Component-Level Reliability Models physics-of-failure model is researched extensively for ana-
For power electronic systems, reliability research at the com- lyzing reliability of electronic devices, which specifically in-
ponent level has been mainly focused on failure rate models clude power semiconductor devices and electrolytic capaci-
for the key components in power circuits, such as power semi- tors [9]–[12]. Thermal failure mechanisms of IGBTs have be-
conductors, capacitors, and magnetic devices [1], [5], [14], [15], come a focal area in the component-level reliability research
[16], [50]. Field experiences have demonstrated that electrolytic of power electronics. The methodology considers electrical and
capacitors and power switching devices such as insulated gate mechanical stresses, temperature changes, and spatial tempera-
bipolar transistors (IGBTs) and metal–oxide field-effect transis- ture gradients. It tries to explore each root cause of component
tors (MOSFETs) are the most vulnerable components. Magnetic failures. The physics-of-failure method can model potential fail-
components are much more reliable and feature failure rates that ure mechanics, predict wear-out conditions, and integrate relia-
are more than one order of magnitude lower than those of other bility into design process. However, building this type of models
power devices [2], [51]. There are numerous reliability mod- is complex and costly, and requires substantial knowledge about
els available for these electronic components. Empirical-based materials, process, and failure mechanism [13].
models, which typically rely on observed failure data to quantify
model variables, are most widely employed to analyze the reli- B. System or Subsystem-Level Reliability Models for Nonfault-
ability of components. The premise is that the valid failure-rate Tolerant or Fault-Tolerant Systems
data are readily available either from field applications or from A system-level reliability model presents a clear picture of
laboratory tests. functional interdependences and provides a framework for de-
There are many empirical-based reliability models of elec- veloping quantitative reliability estimates of systems to guide
tronic devices, but the military handbook for the reliability the design tradeoff process. Several methodologies to quantify
prediction of electronic equipment (Military-Handbook-217) is the reliability metrics of power electronic converters have been
well known and widely accepted in both military and industrial introduced. They can be categorized into three types of relia-
applications [7]. MIL-217 provides an extensive database for bility models: part-count methods, combinatorial models, and
many different types of parts. It is intended to provide a uniform state-space models.
database for reliability prediction without substantial reliability 1) Part-Count Models: The following have been assumed in
experience of a particular component. However, the reliability the part-count model:
handbook is criticized for several limitations [8]. One of the lim- 1) any fault that occurs to each of the components or subsys-
itations is that the models in MIL-217 assume constant failure tems will cause the overall systems to fail;
rate for components over their lifetime [3]. Another main limi- 2) at components level, the failure rates of individual com-
tation is that the reliability results derived from these models are ponents are assumed constant during useful life time;
often pessimistic and cause costly conservative design. Further- 3) the system is treated as a series structure of all components
more, MIL-217 neither contains data to determine the influence or subsystems.
of dormant modes on components, nor contains the data that For a series structure with n subsystems as shown in Fig. 2,
reflect the effects of thermal cycles, which are all of significant the ith subsystem has failure rate λi ; the failure rate λ of the
importance for practical application of power electronics. The overall system is determined by
failure rate model of some commonly used components such as

n
IGBT is not covered by the handbook. Therefore, the reference λ= λi . (8)
values of MOSFETs are often chosen for analysis of failure rate i=1
of IGBTs.
594 IEEE TRANSACTIONS ON POWER ELECTRONICS, VOL. 28, NO. 1, JANUARY 2013

Fig. 3. Schematic of a typical three-phase voltage-source inverter for HEV.


Fig. 5. Schematic of a two-phase boost converter.

Fig. 6. Functional block diagram.

failure-free state when all components are nonfaulted. The sys-


Fig. 4. Fault tree of a typical three-phase voltage-source inverter for HEV.
tem can evolve from the failure-free state to other states when
faults occur to the components. There are two types of states
The main advantage of part-count method lies in its sim- in Markov models: 1) absorbing states that are associated with
plicity. A part-count model can provide an adequate reliability failed system configurations; and 2) nonabsorbing states that
estimation for small systems. It is also an effective approach to correspond to configurations in which the system can deliver
reliability comparison among different power electronic system full or partial functionalities.
architectures at the beginning of design stage. However, for the The Markov reliability model is used to analyze a two-phase
systems that can tolerate some failures or that can be repaired, interleaved boost converter for PV application [15]. The fault-
the approach lead to over conservative results. For power elec- tolerant system in [15] can operate with reduced phases and
tronic converters that are not equipped with fault-tolerant capa- depleted output capacitor bank. Each phase is divided into two
bility, this part-count method is often adopted [1], [4], [5], [14]. subsystems: input unit consisting of diode, switch, and induc-
For example, part-count method in [5] is employed to estimate tor, and output unit including output capacitor. Only when all
the reliability of a three-phase voltage-source inverter for HEVs input units, output units, or both in two phases are faulted, will
as shown in Fig. 3. Any fault of capacitor, IGBTs, and diodes the whole system fail. Since inductors’ failure rates are much
will lead the system to fail. The MTTF of the inverter is esti- lower than those of semiconductors and electrolytic capacitors,
mated based on this model their failures are not considered. It is assumed that the system is
1 1 nonrepairable, and that the controller is capable of fault detec-
MTTF = = (9) tion, isolation, and reconfiguration, i.e., the system has perfect
λ(t) λC + 6λT + 6λD
coverage.
where λC , λT , and λD are the failure rate of the capacitor, the The schematic, functional block diagram and state-transition
IGBT, and the diode, respectively. diagram of the converter are shown in Figs. 5, 6, and 7. The nodes
2) Combinatorial Models: Combinatorial models are exten- of the state-transition diagram in Fig. 7 represent the states of
sions to part-count models and include fault trees, success trees, each system configuration. The edges in Fig. 7 represent transi-
and reliability blocks diagrams. These methods can be used tions between configurations triggered by components failures.
to analyze reliability of simple redundant systems with perfect The state k m n (k = 1, 2, . . . , 7; m = 1, 2; n = 1, 2) denotes
coverage. the system state with m failed input stages and n failed output
Fault tree has been used to analyze reliability of electric drive stages. λT m and λD m denote the failure rate of IGBT or diode
systems as illustrated by Fig. 4 [17]. Unfortunately, combinato- under the condition of m failed input stages. λC m n represents
rial models cannot reflect the details of fault-tolerant systems, the failure rate of capacitors with m failed input stages and n
such as repair process, imperfect coverage, state-dependent fail- failed output stages.
ure rates, order of component failures, and reconfiguration. Evaluation of the model through simulation or mathemati-
3) Markov Model: Markov model is based on graphical rep- cal algorithms would yield the probability of the system be-
resentation of system states that correspond to system configura- ing in one of the states. The Chapman–Kolmogorov equation
tions, which are reached after a unique sequence of component is used to analyze the Markov reliability model. For instance,
failures and transitions among these states. The system is said in the Chapman–Kolmogorov equations for states 000 and 311 are
SONG AND WANG: SURVEY ON RELIABILITY OF POWER ELECTRONIC SYSTEMS 595

for generation is decomposed into storage capacitor bank, power


semiconductor devices, and cooling subsystems, based on the
similar failure rate models [2]. In [17] a motor drive system
is broken down into three functional blocks: stators, bearing,
and electric drive. Further decomposition can be carried out for
subsystems in order to simplify analysis.

IV. IMPROVEMENT OF THE SYSTEM RELIABILITY


When reliability of systems designed cannot meet require-
ments, it is necessary to improve it. Many solutions are proposed
to enhance reliability of the power electronic systems from the
Fig. 7. State-transition diagram.
perspectives of design and active management of operation. The
former is effective at the beginning phase of system design and
given by results in higher cost, which is explained in the next section. The
dp0 (t) latter is based on the existing hardware and realized by modified
= − [2(λT 0 + λD 0 ) + 2λC 00 ] p0 (t) or augmented control. This section is devoted to management of
dt
operation. The solutions found in literature can be classified into
dp3 (t)
= 2λC 10 p1 (t) + 2(λT 0 + λD 0 )p4 (t) three groups: thermal management, diagnostic, and prognostic.
dt
− [(λT 1 + λD 1 ) + λC 11 ] p3 (t) (10) A. Active Thermal Management
mn
where pk (t) is the probability of the system being in state k In power electronic systems, the key components, such as
at time t. Since the system has four nonabsorbing states, the electrolytic capacitors and power semiconductor devices, are
system reliability at time t can be expressed as sensitive to temperature and/or temperature variations. The most
R(t) = p0 (t) + p1 (t) + p2 (t) + p3 (t). (11) contributing stress factor to failure rates of MOSFETs and ca-
pacitors are related to temperature [53]. The most common
Once reliability R(t) is obtained, MTTF can be readily esti- failures of IGBTs are related to thermal-over-temperature- or
mated based on (5). thermal-cycling-induced failures [18].
Markov chain is a very effective approach to quantify the reli- Active thermal management techniques are proposed to reg-
ability of fault-tolerant systems. This approach can cover many ulate steady state and transient thermal-mechanical stress in
features of fault-tolerant systems, such as sequence of failures, power electronic modules [18], [53]. Central to the concept of
failure coverage, and state-dependent failure rates. One can es- active thermal control is that the junction temperature of devices
timate different reliability metrics from the Markov model, such depends on power loss and can be controlled by regulation of
as MTTF, reliability, availability, and so on. power loss of devices. In [53], the junction temperatures of
There are some limitations associated with Markov model. IGBTs and diodes are estimated based on the instantaneous
One important property of Markov process is that the transi- temperature of the heat sink and a dynamic thermal model of
tion probability from one state to another does not depend on the inverter for motor drives. Then switching frequency and load
the previous states but only on the present state. Hence, the current are then regulated according to the maximum junction
Markov model cannot be used to evaluate the system reliabil- temperature to guarantee junction temperatures of all devices
ity when components have time-varying failure rates. Another below a critical value. In [18], the maximum junction tempera-
shortcoming is that state space grows exponentially with the ture and temperature changes are monitored as shown in Fig. 8.
number of components. For large system, it is difficult to gen- When junction temperatures or temperature changes exceed the
erate the Markov model from the system functional description safety threshold, switching frequency and current limit are de-
and components failure analysis. creased to regulate power loss and thus to prevent overtemper-
The challenge of applying Markov models to increasingly ature and power-cycling failure in IGBT modules.
complicated systems can be clearly appreciated in a high-power However, the delay between change in junction temperatures
multilevel converter that may have hundreds of components and and changes in power loss makes it a challenging task to balance
subsequent failure mode transitions. In a centralized PV gen- effectiveness of thermal management and utilization of power
eration plant, many individual inverter systems provide power device thermal capacity.
to loads together and interact with each other. It is difficult or
laborious to build reliability models for these complex large sys-
tems. In order to effectively tackle the aforementioned problem, B. Fault Diagnosis
it is proposed to decompose a large system into several subsys- Fault diagnosis, which consists of two stages, i.e., fault de-
tems. Then, one of these reliability models or their combination tection and identification, is another effective approach to im-
is used to analyze reliability of each subsystem and the overall proving reliability of systems. Accurate and timely detection
system. Decomposition of a large system depends on the spe- and protection can prevent fault propagation and catastrophic
cific system and failure modes of components. The PV inverter results. The fault-tolerant operation also requires effective
596 IEEE TRANSACTIONS ON POWER ELECTRONICS, VOL. 28, NO. 1, JANUARY 2013

Fig. 9. Diagram of output-voltage phasors at switching frequency for the


Fig. 8. Regions for an active thermal controller. 11-level H-bridge cascaded inverter in each phase leg. (a) Under the normal
condition the output-voltage phasor v s at the switching frequency is approxi-
mately zero due to the phase shift of the carriers. (b) Under the faulted condition
the switching frequency phasor v s cannot be nullified and the resultant phasor’s
diagnosis of faults. Moreover, diagnosis reduces MTTR and phase angle is correlated with particular faulted cell.
in turn improves average availability of the system.
Many methods of fault diagnosis in power electronic systems
have been reported in the literature. These methods are mainly
classified into two categories: 1) the methods based on informa-
tion of input or output current or voltage at converter terminals;
and 2) the methods based on current or voltage information of
devices.
1) Diagnostic Techniques Based on Converter Terminal
Quantities: The basic idea is that characteristics of the input
or output voltage or current of the converters under normal con- Fig. 10. Trajectories of current vectors under different fault scenarios.
ditions are different from the ones under faulted conditions.
These electrical variables are sensed and compared with pre-
defined performance metrics to determine whether a fault has current vector reduces to zero. The incremental system model
happened and identify faulted components and types of faults. can be used to provide compensation to the measured current
Lezana et al. propose a method to detect a faulted cell in a cas- response, such that the modified incremental current signal will
caded multilevel converter as shown in Fig. 9 based on output- decay in the opposite direction to the initial offset caused by
voltage frequency analysis [22]. Due to phase shift among the voltage disturbance. Therefore, the trajectory of the current
PWM carrier signals, the output-voltage phasor v s at switch- response can be used to detect the occurrence of the misfiring
ing frequency is zero. If a fault occurs in a cell, the switching- fault and pinpoint the problematic switching device.
frequency phasor v s of the output phase voltage is nonzero. A method to detect faults of switches in a three-wire three-
The phase angle of the phasor v s indicates the location of the phase voltage-source inverter is proposed in [26]. The solution
faulted cell. The magnitude and phase angle of the phasor v s is based on the analysis of the current vector trajectory in the
can be obtained through discrete Fourier transform (DFT) of Concordia frame. It can be observed that the current trajectories
sampled output phase voltages. The same method is applied take a half cycle, rather than a cycle under faulted conditions as
to diagnosis of faults in flying capacitor multilevel converters shown in Fig. 10. Therefore, the faulty leg in the inverter can be
(FCMC) [23], [24]. The method is simple and bears minimal located based on the knowledge of the current trajectory. The
additional cost since only one sensor per phase is necessary re- faulty transistor is isolated by determining the current polarity
gardless the number of converter cells. However, the difficulty in the faulty phase. This method needs only two current sensors.
is to find a proper phasor amplitude threshold to assert a fault However, this method can detect only open-switch faults, and
since practical v s is nonzero even under normal conditions. The is only applicable to a three-phase sinusoidal inverter with no
second challenge is to distinguish normal transients from actual neutral current.
faults. In [27], the authors proposed a method of fault diagnosis
Smith et al. propose a method to detect intermittent misfiring for a three-phase voltage-source inverter of motor drives. The
fault of switches in the three-phase H-bridge inverter for motor method is based on Concordia stator mean current patterns.
drives based on motor stator current time-domain response [25]. Under healthy and ideal conditions, the average stator current
The principle is explained as follows. When a misfiring fault pattern is a point. Considering offset current, the average current
occurs on one of the switching devices in the inverter, the voltage pattern should be a circle. When a fault occurs, the stator currents
disturbance will cause an increment to the stator current space are no long symmetric, and a dc component exists in the stator
vector. The incremental current will be in the direction that is current vector. And correspondingly the average stator current
determined distinctively by the failed device. As the inverter pattern is biased in the direction that depends on which switch
recovers from the disturbance, the length of the incremental is out of order. Seven patterns are formed with the mean stator
SONG AND WANG: SURVEY ON RELIABILITY OF POWER ELECTRONIC SYSTEMS 597

A diagnosis method for open-circuit fault in three-phase


voltage-source inverters without sensors is proposed [29]. The
collector–emitter voltage drop of an IGBT follows voltage level
changes of the gate driving signal during normal state. When
an open-circuited fault occurs to the IGBT, its collector–emitter
voltage remains constant at high level. Therefore, the controller
can detect a fault by judging the gate command signal and the
collector–emitter voltage drop of an IGBT. Hence, the open-
circuited faults of six switches in the three-phase inverter can be
detected by monitoring of six gating signals and three collector–
emitter voltages of the lower switches in three phase legs.
The techniques based on device voltage and current infor-
mation feature high-speed response, accuracy, and reliability,
Fig. 11. Concordia average current vectors. and even can realize cycle-by-cycle diagnosis. However, large
numbers of sensors are needed since the controller needs almost
all the current and voltage signals of each switch. Therefore,
complex hardware and the associated high cost are the main
drawbacks although development of an integrated smart driver
technique will mitigate this problem. In addition, for large sys-
tems with hundreds of power switches, real-time monitoring
Fig. 12. Fault-detection scheme based on the AI algorithm. constitutes a daunting task for the controllers.
In summary, these techniques of fault diagnosis are mainly
focused on power switches. Most of these methods amount to
current vectors. One of these patterns is dedicated to the healthy indirect detection of switch faults based on load currents of
state while the other six correspond to open-circuited fault of the power converter. Their salient characteristics include simple
six switches, as shown in Fig. 11. This method only needs two hardware circuit and low cost. However, they typically take more
current sensors. However, the boundary design of the pattern than one fundamental period to detect faults. Such delay may be
that corresponds to healthy state has a significant influence on unacceptable for some applications that require real-time detec-
detection accuracy. tion. They also need complex algorithms to process measured
Artificial intelligence (AI) algorithms have been proposed to data. It is difficult to distinguish normal transient process and
detect fault in a cascaded multilevel converter [28]. The pro- actual faults. Methods to improve accuracy of detection are pro-
posed scheme is illustrated in Fig. 12. The first step is sampling posed by some literature. On the other hand, direct monitoring
of output-voltage signals. Then, the mathematical algorithms of voltage and current signals of devices provides a reliable solu-
like fast Fourier transform and correlation are applied to the tion to fault diagnosis with the substantial sensing requirement.
measured data. This is called feature extraction system (FES), Smart drivers are studied extensively, which could alleviate the
which can reduce the number and size of neural network (NN) problem as further progresses are made.
and training time. When FES is completed, the NN analyzes
the data to detect faults. The behaviors of the NN depend on
selection of FES process, its own structure, and the training C. Prognosis of Failures in Power Devices
process. The detection process is computationally intensive and Prognosis is the ability to accurately and precisely predict the
time consuming. remaining useful life of a failing component or subsystem [54].
Various specific methods of fault diagnosis have been pro- Prognosis can detect potential faults and notify controllers or
posed based on input or output currents or voltage information. personnel to take preventive or remedial actions [6]. In compar-
These methods employ output or input voltages and current in- ison to the fault diagnosis, significant challenges associated with
formation rather than states of power devices. Therefore, they prognosis exist since the prediction of fault evolution involves
belong to indirect detection schemes and bear relatively low substantial uncertainty. Several prognosis methods reported in
cost. However, these methods need performance criteria built in the literature are mainly focused on the power semiconductor
advance. Accuracy and reliability of diagnosis depend closely devices.
on whether these criteria can distinguish all possible healthy Prognostic system based on the saturated collector–emitter
states from faulted conditions. voltage VC E S at trace of the IGBT module is developed for HEV
2) Diagnostic Techniques Based on Voltages or Currents of application [19]. As shown in Fig. 13, VC E S at of IGBT mod-
Devices: These methods are based on direct detection of de- ules exhibits a significant degradation trace. VC E S at of IGBT
vice faults. The principle is that the current and voltage across remains unchanged until approximately 5 × 105 cycles. Then,
power switches cannot track command signals (gate driver sig- it starts decreasing gradually before a sudden drop (more than
nals) from the controller when they are open-circuited or short- 17%) at about 6 × 105 cycles, followed by a quick increase of
circuited. The controller needs to monitor these voltage and VC E S at . The prognosis method is based on the idea that an IGBT
current parameters. will be considered as being seriously degraded if its measured
598 IEEE TRANSACTIONS ON POWER ELECTRONICS, VOL. 28, NO. 1, JANUARY 2013

A. Necessity of Fault Tolerance


Although reliability such as MTTF or availability can be en-
hanced by many solutions and failure rates can be minimized
as low as possible, failure is inevitable during the mission time
of systems. In some critical applications, malfunction is unac-
ceptable or causes serious losses. Therefore, fault tolerance is
necessary in many power electronic systems. For electric drives
utilized for EVs and HEVs, faults can be critical since an un-
controlled output torque may have an adverse impact on the
vehicle stability, which ultimately can risk the passenger safety.
Hence, a limping-home function is desirable [30]. Wind turbine
should not stop under a breakdown of one or more power de-
vices such that electrical energy can be provided to network and
Fig. 13. Variation of V C E S a t of the IGBT module at 400 A as a function of costly disconnection time is avoided [45]. In high-power appli-
power cycles. cations, such as high-power motor drives for pipeline pumps in
petrochemical industry, for fans in cement industry, for pumps
in water pumping stations, and for steel rolling mills in metal
VC E S at deviates by more than ±15% from its normal reference industry, reactive power compensation, and grid interface of re-
value. A direct comparison is impossible because of wide and newable energy resources, an unexpected shutdown would cause
fast variation of operating current and temperature during ac- significant production loss [31].
tual vehicle operation. So an adaptive approach is developed. A According to the performance of postfault operation, there
prognostic subroutine is inserted into the vehicle control system are two types of fault-tolerant operation: the degraded operation
after key-on and/or key-off of each vehicle period. The prognos- and quasi-normal operation, which differ in terms of system
tic algorithm is responsible for building an adaptive reference cost, performance, and feasibility.
table and comparing measured VC E S at and the reference value
with same temperature in the lookup table.
B. Degraded Operation
The evaluation of the increase in leakage gate current of power
IGBTs or MOSFETs is used for early damage fault prognosis Degraded operation under postfault conditions denotes that
[20]. The principle is that the initial gate current of 100 nA rises systems can tolerant faults and continue to perform some key
to the order of microampere under stressed conditions. When the functions with reduced output power or voltage, worsened power
stressed conditions are suppressed, the gate current remains at quality, or other suboptimal performance metrics. Generally, de-
values that are several times higher than the initial gate current. graded operation is realized by reconfiguring control strategies
The component of output-voltage ripple at switching fre- to explore inherent redundant capability of the converters with
quency in switching power supply is monitored to predict fault no or few additional devices. Degraded operation of multilevel
of electrolytic capacitors and their remaining life time [21]. The converters and three-phase voltage-source inverters are investi-
increase of the voltage ripple across capacitors indicates in- gated extensively in the literature.
creased equivalent series resistance (ESR), which is one of the Degraded operation has been studied for cascaded H-bridge
best fault signatures for capacitors. multilevel (CHBM) inverter with space-vector modulation ap-
These methods of fault prognosis are all based on failure plied to motor drives shown in Fig. 14 [32], [33]. In [32], the
mechanism of components and need to monitor slight changes faulted converter cells are isolated from the system, and redun-
in electrical parameters. It is difficult and expensive to detect dant switching states are used to generate a neutral voltage shift.
small signatures which reflect incipient faults of components Thus, a balanced line-to-line ac output voltage and minimal har-
from large signals. monic distortion result. In comparison to normal operation, the
magnitude of output voltage decreases and harmonic distortion
increases. Due to unbalance, the load neutral point cannot be
V. FAULT-TOLERANT OPERATION OF POWER ELECTRONIC
directly connected to the neutral point of an inverter output.
SYSTEMS BASED ON REDUNDANT DESIGN
The main difference of control strategy proposed in [33] is that
Fault-tolerant operation means that a fault in a component the faulted cells also participate in operation and contribute two
or subsystem does not cause the overall system to malfunc- output-voltage levels dependent on specific faulted switches.
tion [55]. The characteristic of fault tolerance avoids the system Similar to CHBM converters, degraded operation of neutral
from significant loss or unexpected interruptions and improves point clamped (NPC) converters is realized by use of redun-
availability. The research in fault tolerance involves four dif- dant switching states [34]–[37]. Fig. 15 shows the schematic
ferent aspects: redundancy, fault diagnosis, fault isolation, and of a conventional three-phase NPC inverter. Li et al. present a
online repair. Redundancy can be realized by extra systems or control scheme to maintain continuous operation of the three-
components. Here, just the latter is considered and online repair level inverter for a flywheel energy storage system by utilizing
is unavailable. Fault diagnosis is covered by the last section, and the redundancy of voltage vectors [34]. Without the need of
this part focuses on redundant design and fault partitioning. extra power devices, this method covers single short-circuited
SONG AND WANG: SURVEY ON RELIABILITY OF POWER ELECTRONIC SYSTEMS 599

Fig. 14. CHBM inverter as a motor drive.


Fig. 16. Schematic of a phase leg of a neutral point-clamped inverter realized
with two different fault-tolerant designs.

Fig. 17. Vector diagram of an NPC converter when (a) S a 1 fails in short
circuit and (b) S a 2 fails in short circuit.

Fig. 15. Schematic of a conventional three-phase neutral point clamped in-


verter.

failure of switches or clamping diodes. However, the switches


have to withstand the full dc-link voltage. Therefore, overrating
of these switches is necessary. Fig. 16 shows two solutions to
achieving fault tolerance without device oversizing. The solu-
tion shown in Fig. 16(a) is studied in [35]–[37]. Three pairs of
thyristors are added to conventional NPC inverter structure to
provide balanced three-phase output voltages for short-circuited
or open-circuited failures in switches or clamped diodes. Un-
der the normal condition, the SCRs are in off-state. When Sa2
or Sa3 fails to turn on, the SCRs are activated to connect load Fig. 18. Three-phase voltage-source inverter for motor drive.
to neutral point and thus maintain continuous operation. The
solution shown in Fig. 16(b) features the same control scheme
with the previous method to realize degraded operation of NPC keep electromechanical torque unchanged as the system transi-
converters [38]. The advantage of these two solutions is that it is tions from three-phase to two-phase input power supply. When
not necessary to oversize voltage ratings of the power semicon- one phase fails in open circuit and the other two phase legs√of
ductors. Common to these two solutions is that the maximum the inverter function properly, the output currents increase to 3
modulation index is reduced because of loss of some critical times of the original value and the phase angle between them
switching states as shown in Fig. 17. Therefore, the attainable is regulated to 60◦ . Under such condition, the torque generated
magnitudes of output voltages decrease. by motor remains constant. It is worth mentioning that there
Fig. 18 shows a three-phase voltage-source inverter for a mo- are several limitations associated with this strategy: 1) it is only
tor drive system with open-phase fault-tolerant capability [39]. applicable for motor drives; 2) only open-circuit faults of diodes
The principle of the proposed strategy is that motor can work and switches can be handled; and 3) oversized dc-link capaci-
normally as a three-phase or two-phase machine by proper reg- tors are needed to handle substantially increased ripple current
ulation of the phase angles and magnitudes of stator current to under the faulted condition.
600 IEEE TRANSACTIONS ON POWER ELECTRONICS, VOL. 28, NO. 1, JANUARY 2013

Fig. 20. One phase of (a) a conventional three-cell FCMC; and (b) a fault-
tolerant three-cell FCMC.
Fig. 19. Three-phase matrix converter for motor drives with fault-tolerant
capability.
verter [40]. During normal state, the inverters operate with
N + 1 cascaded converter cells in each phase and 2N + 3
The similar principle is applied to a matrix converter for motor output-voltage levels. When a fault occurs to a switch, the
control shown in Fig. 19. Likewise, only open-circuited faults faulted cell is bypassed, and the number of output-voltage lev-
can be handled. els decreases to 2N + 1. The phase-shift angles are regulated
Degraded operation of power electronic systems can be re- to generate a balanced output voltage. At the same time, the
alized mainly by taking use of systems’ natural redundancies. dc-link voltages of those cells in the faulted phases also need to
Since minimum additional components are need, simple and increase in order to keep voltage magnitudes unchanged. The
low-cost solutions result. However, there are some significant same redundant design is proposed for a direct-drive wind tur-
limitations. bine application [41]. The main disadvantage is associated with
1) Application is narrow band. Because of degraded perfor- the large number of extra components that are used by bypass
mances, such as reduced output voltage, reduced power, switches and backup converter cells. Power semiconductor de-
and compromised harmonics distortion, it is only feasi- vices and dc-link capacitors need to be oversized to withstand
ble for applications that can tolerate the degraded perfor- the elevated dc-link voltage under faulty situations.
mance. For some applications, the degraded performance Maharjan et al. propose an improved control scheme for a
may be unacceptable. For instance, power converters with cascaded inverter with star configuration which is applied to
reduced output voltages under faulty conditions are not a battery-energy-storage system [56]. The method makes all
suitable well to utility applications. The converters in 3N − 1 healthy converters in the three phases rather than only
Fig. 18 and 19 are fault tolerant just for motor drives. converter cells in the phase with faults sharing the increased volt-
2) Faulted components and fault types that can be covered age burden equally by introducing a neutral shift. As a result,
are limited. the modulation indices for all remaining cells increase slightly.
3) Degraded operation is used for multilevel converters due The solution can be used for other applications, and can miti-
to the fact that they have complex structures and therefore gate the pressure of overdesigning power devices. However, the
many redundant switching states in combinations. neutral shift is realized by injecting a fundamental-frequency
zero-sequence voltage to each cell. Therefore, the neutral point
of the load cannot be directly connected to the neutral point of
C. Quasi-Normal Postfault Operation the inverter output.
Because of many limitations relying on inherent fault-tolerant FCMC as shown in Fig. 20(a) provide inherent series redun-
ability of systems, redundant design is studied and reported ex- dancy if the ratings of devices can withstand increased voltage
tensively to provide quasi-normal operation of power converters stress. An open-switch or short-switch fault-tolerant design for
under fault situations. three-cell four-level FCMC as shown in Fig. 20(b) is proposed
Due to high modularity, cascaded multilevel converters as and its control strategy is presented in [42]. According to mod-
shown in Fig. 14 can provide approximately similar perfor- ulation rules, there are 2m switching state combinations in each
mances to normal operation by simple redundant design. Song phase leg of an m-cell. However, based on previous clamped ca-
et al. present a fault-tolerant control for a static synchronous pacitor voltages, only m + 1 states are effective and are utilized
compensator (STATCOM) based on a cascaded H-bridge in- to generate m + 1 output-voltage levels while other states are
SONG AND WANG: SURVEY ON RELIABILITY OF POWER ELECTRONIC SYSTEMS 601

Fig. 22. One leg of NPC three-level converters with two different redundant
realizations.

Fig. 21. Modified general multilevel converter with fault-tolerant capability.

and low level with the absence of the middle level. Likewise, for
redundant. The presented method in [42] optimizes the switch- short-circuited faults of Sa1 and Sa4 , high and low voltage lev-
ing states by maximizing the number of output-voltage level. els remain while fuses and transistors are not actuated. In view
For the three-cell converter, the output phase voltage Vxg has of loss of middle voltage level, solution II shown in Fig. 22(b)
four levels under normal operation. When a single-switch fault is proposed. For any single short-circuited fault in one phase,
in one phase occurs, the faulted switch and its counterpart are the converter still provides three output-voltage levels, which is
bypassed. The corresponding capacitor is isolated from the sys- similar to normal operation. These two solutions can maintain
tem or connected in parallel to the one in a healthy cell. The normal output voltage under conditions of no more than one
remaining capacitor voltages are regulated appropriately, and faulted switch in each phase leg. However, some of the switches
accordingly the inverter still can provide four line-to-ground have to withstand the total dc-link voltage under faulty states.
levels. The disadvantages mainly lie in: 1) under faulty con- Therefore, oversizing of the component ratings is necessary.
ditions, some switches need to withstand full dc-link voltage, Although the two solutions aforementioned maintain similar-
which necessitates oversized design; and 2) large number of to-normal operation of the NPC converters under faulted con-
extra devices are commanded. ditions, necessary oversized voltage ratings of semiconductors
Fig. 21 shows a modified general multilevel converter that will lead to loss of the most significant advantage of multilevel
is proposed by Chen et al. This converter can tolerate short- converter, that is, voltage stress of devices is the half of dc-link
switch or open-switch faults without loss of any output-voltage voltage. The oversized design results in high cost and low ef-
level [43]. At least one redundant switching state combination ficiency even under the normal condition. A redundant phase
is obtained for any output-voltage level by the change of the cir- leg is added to the original topology to overcome these limi-
cuit configuration. The modified topology achieves redundancy tations [45], [46]. Fig. 23 depicts the overall structure. When
at the price of higher power loss and higher cost compared to a fault occurs to phase leg x with x ∈ {a, b, c}, fuses Fx and
the original five-level architecture. Even during normal oper- Fn are cleared. Thus, the faulted leg is isolated from the sys-
ation, some output voltage levels have to be realized by five tem. In addition, the bidirectional switch Tx , S5 , and S6 are
conducted semiconductor devices, while only four devices are activated such that the redundant phase leg replaces the faulted
in a conduction path for any voltage level for conventional topol- one. The system is reconfigured into a standard NPC converter.
ogy. In open-switch or short-switch fault conditions, six devices The main disadvantage is due to the large number of additional
have to conduct to provide a current path. Increased number of components and therefore higher cost.
conduction devices will lead to higher conduction losses. On Kwak et al. present a matrix converter with a redundant phase
the other hand, the original architecture features good symme- leg for heavy electric vehicles as shown in Fig. 24 [47]. The
try, where only eight main switches conduct load current and main idea is that the fourth leg replaces the faulted leg by a
other clamping switches only balance the capacitor voltages. bidirectional switch and thus normal operation of the system is
Therefore, the clamping switches only need low current rating. maintained. The limitation lies in the fact that the topology with
Two solutions to maintain short-switch fault-tolerant opera- redundant capability can cover only the open-circuited fault of
tion of three-phase NPC converters are presented in [44]. For switches or freewheeling diodes.
solution I shown in Fig. 22(a), the upper or lower fast fuse is There are some other modified topologies with quasi-normal
cleared by turning on the SCR when Sa2 or Sa3 fails to open. The operation capability. The aforementioned solutions include the
faulted phase leg operates with two voltage levels, the high level power electronic converters for which fault-tolerant operation
602 IEEE TRANSACTIONS ON POWER ELECTRONICS, VOL. 28, NO. 1, JANUARY 2013

their advantages and disadvantages are analyzed. For mission-


critical applications, fault-tolerant design of power electronic
systems serves as a suitable design option. From the analy-
sis of main modified topologies with fault-tolerant capability of
multilevel converters, matrix converters, and conventional three-
phase half-bridge inverters, it is shown that these new topolo-
gies with redundancy increase the complexity and cost of sys-
tems and even decrease some performance. The more-effective
component-level redundant design for power electronic systems
should be studied further.
The status of the current research and identified limitations
are summarized as follows:
1) Study of fault-tolerant operation is mainly focused on
Fig. 23. Three-level converters with a redundant phase leg. multilevel converters with more components and natu-
ral redundant switching state combinations. Some stud-
ies reported in the literature involve three-phase matrix
converters and voltage-source inverters for motor drives
where it is possible for the inverters to operate with two-
phase output.
2) More components are added to the standard power con-
verters to realize fault tolerance, especially for redundant
design. In a true redundant design for all components
(switches, diodes, and capacitors) and all types of faults
(short circuit and open circuit), the total number of power
components even hits four times of that of the standard
topology [48]. As a result, the cost could even exceed that
of system-level redundancy.
3) It appears that the fault tolerance is assumed in certain to
improve the reliability of the power electronic systems.
Very few quantitative estimation of increase in MTTF
or average availability due to redundant design has been
reported except that Lenana et al. compare failure rates of
semiconductors devices of standard multilevel converters
Fig. 24. Matrix converter for motor drives with a redundant phase leg. and ones with fault-tolerant design [31].
4) In contrast to the research effort devoted to switch (IGBT)-
is of importance due to their high part-count property and their fault-tolerant capability of systems, very minimal atten-
application with high requirement of reliability. tion has been directed to faults of diodes and capacitors.
In comparison to the degraded fault-tolerant operation, the 5) Since short-circuited faults and open-circuited faults need
systems with full redundant capability feature augmented num- different isolation and postfault operation strategies, some
ber of additional auxiliary components, which increases the continuous operations are only feasible for certain types of
initial cost of the system. Furthermore, some redundant de- fault. If systems are designed to handle two types of faults,
sign compromises performance such as efficiency under fault- the number of extra components increases substantially.
free state, which is undesirable or even unacceptable for 6) Due to many redundant switching-state combinations for
some applications that demand very tight efficiency or thermal multilevel converters, effective utilization of these redun-
requirements. dant states can optimize and simplify redundant design. It
is beneficial to further study redundant states and modu-
lation strategies.
VI. CONCLUSIONS AND DISCUSSIONS 7) Successful detection of faults and transition from faulty
A comprehensive review of the reliability of power electronic state to postfault state are prerequisites. Low-cost and re-
converters has been carried out with the intention to provide liable detection techniques and transient processes from
a clear picture of the current status of this particular research occurrence of faults to postfault steady state should be
field. The classification of reliability of power electronics sys- studied further.
tems is based on three levels. Methods of reliability assessment
are first analyzed and compared to provide designers an easy se-
lection of an appropriate reliability model. For existing systems ACKNOWLEDGMENT
or cost-sensitive systems, several solutions to improve reliability The authors would like to thank Dr. E. Strangas for his valu-
based on active management of operation are introduced, and able inputs during the preparation of this paper.
SONG AND WANG: SURVEY ON RELIABILITY OF POWER ELECTRONIC SYSTEMS 603

REFERENCES [25] K. S. Smith, R. Li, and J. Penman, “Real-time detection of intermittent


misfiring in a voltage-fed PWM inverter induction-motor drive,” IEEE
[1] M. A. Masrur, “Penalty for fuel economy—System level perspectives Trans. Ind. Electron., vol. 44, no. 4, pp. 468–476, Aug. 1997.
on the reliability of hybrid electric vehicles during normal and graceful [26] R. Peuget, S. Courtine, and J. P. Rognon, “Fault detection and isolation
degradation operation,” IEEE Syst. J., vol. 2, no. 4, pp. 476–483, Dec. on a PWM inverter by knowledge-based model,” IEEE Trans. Ind. Appl.,
2008. vol. 34, no. 6, pp. 1318–1326, Nov./Dec. 1998.
[2] A. Ristow, M. Begovic, A. Pregelj, and A. Rohatgi, “Development of a [27] D. Diallo, M. E. H. Benbouzid, D. Hamad, and X. Pierre, “Fault detec-
methodology for improving photovoltaic inverter reliability,” IEEE Trans. tion and diagnosis in an induction machine drive: A pattern recognition
Ind. Electron., vol. 55, no. 7, pp. 2581–2592, Jul. 2008. approach based on concordia stator mean current vector,” IEEE Trans.
[3] J. Jones and J. Hayes, “Estimation of system reliability using a “non- Energy Conv., vol. 20, no. 3, pp. 512–519, Sep. 2005.
constant failure rate” model,” IEEE Trans. Rel., vol. 50, no. 3, pp. 286– [28] S. Khomfoi and L. M. Tolbert, “Fault diagnosis and reconfiguration for
288, Sep. 2001. multilevel inverter drive using AI-based techniques,” IEEE Trans. Ind.
[4] A. Hoyland and M. Rausand, System Reliability Theory. New York: Electron., vol. 54, no. 6, pp. 2954–2968, Dec. 2007.
Wiley, 1994. [29] Q.-T. An, L.-Z. Sun, K. Zhao, and L. Sun, “Switching function model-
[5] P. Wikstrom, L. A. Terens, and H. Kobi, “Reliability, availability, and based fast-diagnostic method of open-switch faults in inverters without
maintainability of high-power variable-speed drive systems,” IEEE Trans. sensors,” IEEE Trans. Power Electron., vol. 26, no. 1, pp. 119–126, Jan.
Ind. Appl., vol. 36, no. 1, pp. 231–241, Jan./Feb. 2000. 2010.
[6] E. Strangas, S. Aviyente, J. Neely, and S. Zaidi, “Improving the reliability [30] O. Wallmark, L. Harnefors, and O. Carlson, “Control algorithms for a
of electrical drives through failure prognosis,” in Proc. IEEE Int. Symp. fault-tolerant PMSM drive,” IEEE Trans. Ind. Electron., vol. 54, no. 4,
Diagn. Electron. Mach., Power Electron. Drives, Sep. 2011, pp. 172–178. pp. 1973–1980, Aug. 2007.
[7] “Reliability prediction of electronic equipment,” Department of Defense, [31] P. Lezana, J. Pou, T. A. Meynard, J. Rodriguez, S. Ceballos, and F. Richard-
Washington DC, Tech. Rep. MIL-HDBK-217F, Dec. 1991. eau, “Survey on fault operation on multilevel inverters,” IEEE Trans. Ind.
[8] M. Pecht and K. Wen-Chang, “A critique of MIL-HDBK-217E reliability Electron., vol. 57, no. 7, pp. 2207–2218, Jul. 2010.
prediction methods,” IEEE Trans. Rel., vol. 37, no. 5, pp. 453–457, Dec. [32] S. Wei, B. Wu, F. Li, and X. Sun, “Control method for cascaded H-bridge
1988. multilevel inverter with faulty power cells,” in Proc. Appl. Power Electron.
[9] A. T. Bryant, P. A. Mawby, P. R. Palmer, E. Santi, and J. L. Hudgins, Conf. Exp., 2003, pp. 261–267.
“Exploration of power device reliability using compact device models [33] Y. Zang, X. Wang, B. Xu, and J. Liu, “Control method for cascaded H-
and fast electrothermal simulation,” IEEE Trans. Ind. Appl., vol. 44, bridge multilevel inverter failures,” in Proc. Cong. Int. Control Autom.,
no. 3, pp. 894–903, May/Jun. 2008. 2006, vol. 2, pp. 8462–8466.
[10] H. D. Lambilly and H. O. Keser, “Failure analysis of power modules: [34] S. Li and L. Xu, “Strategies of fault tolerant operation for three-level PWM
a look at the packaging and reliability of large IGBTs,” IEEE Trans. inverters,” IEEE Trans. Power Electron., vol. 21, no. 4, pp. 933–940, Jul.
Compon., Hybrids Manuf. Technol., vol. 16, no. 4, pp. 412–417, Jun. 2006.
1993. [35] G.-T. Park, T.-J. Kim, D.-W. Kang, and D.-S. Hyun, “Control method of
[11] M. Ciappa, F. Carbognani, and W. Fichtner, “Lifetime prediction and de- NPC inverter for continuous operation under one phase fault condition,”
sign of reliability tests for high-power devices in automotive applications,” in Proc. Rec. IEEE Annu. Ind. Appl. Conf., 2004, pp. 2188–2193.
IEEE Trans. Dev. Mater. Rel., vol. 3, no. 4, pp. 191–196, Dec. 2003. [36] J.-J. Park, T.-J. Kim, and D.-S. Hyun, “Study of neutral point potential
[12] C. Bailey, T. Tilford, and H. Lu, “Reliability analysis for power electronics variation for three-level NPC inverter under fault condition,” in Proc.
modules,” in Proc. 30th Int. Spr. Sem. Electron. Techn., May 2007, pp. 12– Annu. Conf. IEEE Ind. Electron., 2008, pp. 983–988.
17. [37] J.-C. Lee, T.-J. Kim, D.-W. Kang, and D.-S. Hyun, “A control method for
[13] M. J. Cushing, D. E. Mortin, T. J. Stadterman, and A. Malhotra, “Compar- improvement of reliability in fault tolerant NPC inverter system,” in Proc.
ison of electronics-reliability assessment approaches,” IEEE Trans. Rel., IEEE Power Electron. Spec. Conf., 2006, pp. 1–5.
vol. 42, no. 4, pp. 542–546, Dec. 1993. [38] J. Li, A. Q. Huang, S. Bhattacharya, and G. Tan, “Three-level active
[14] D. Hirschmann, D. Tissen, S. Schroder, and R. W. De Doncker, “Reliability neutral-point-clamped (ANPC) converter with fault tolerant ability,” in
prediction for inverters in hybrid electrical vehicles,” IEEE Trans. Power Proc. Appl. Power Electron. Conf. Expos., 2009, pp. 840–845.
Electron., vol. 22, no. 6, pp. 2511–2517, Nov. 2007. [39] T.-H. Liu, J.-R. Fu, and T. A. Lipo, “A strategy for improving reliability of
[15] S. S. Smater and A. D. Dominguez-Garcia, “A unified framework for field-oriented controlled induction motor drives,” IEEE Trans. Ind. Appl.,
reliability assessment of wind energy conversion systems,” in Proc. Power vol. 29, no. 5, pp. 910–918, Sep./Oct. 1993.
Energy Soc. Gen. Meet., 2010, pp. 1–4. [40] W. Song and A. Q. Huang, “Fault-tolerant design and control strategy for
[16] A. D. Dominguez-Garcia and P. T. Krein, “Integrating reliability into cascaded H-bridge multilevel converter-based STATCOM,” IEEE Trans.
the design of fault-tolerant power electronics systems,” in Proc. Power Ind. Electron., vol. 57, no. 8, pp. 2700–2708, Aug. 2010.
Electron. Spec. Conf., 2008, pp. 2665–2671. [41] M. A. Parker, N. Chong, and R. Li, “Fault-tolerant control for a modular
[17] Y. Wu, J. Kang, Y. Zhang, S. Jing, and D. Hu, “Study of reliability and generator-converter scheme for direct-drive wind turbines,” IEEE Trans.
accelerated life test of electric drive system,” in Proc. IEEE Int. Power Ind. Electron., vol. 58, no. 1, pp. 305–315, Jan. 2011.
Electron. Motion Control Conf., 2009, pp. 1060–1064. [42] X. Kou, K. A. Corzine, and Y. L. Familiant, “A unique fault-tolerant design
[18] D. A. Murdock, J. E. R. Torres, J. J. Connors, and R. D. Lorenz, “Active for flying capacitor multilevel inverter,” IEEE Trans. Power Electron.,
thermal control of power electronic modules,” IEEE Trans. Ind. Appl., vol. 19, no. 4, pp. 979–987, Jul. 2004.
vol. 42, no. 2, pp. 552–558, Mar./Apr. 2006. [43] A. Chen, L. Hu, L. Chen, Y. Deng, and X. He, “A multilevel converter
[19] Y. Xiong, C. Xu, Z. J. Shen, C. Mi, H. Wu, and V. K. Garg, “Prognostic topology with fault-tolerant ability,” IEEE Trans. Power Electron., vol. 20,
and warning system for power-electronic modules in electric, hybrid elec- no. 2, pp. 405–415, Mar. 2005.
tric, and fuel-cell vehicles,” IEEE Trans. Ind. Electron., vol. 55, no. 6, [44] S. Ceballos, J. Pou, E. Robles, J. Zaragoza, and J. Marti, “Performance
pp. 2268–2276, Jun. 2008. evaluation of fault-tolerant neutral-point-clamped converters,” IEEE
[20] A. Ginart, I. Barlas, J. L. Dorrity, P. Kalgren, and M. J. Roemer, “Self- Trans. Ind. Electron., vol. 57, no. 8, pp. 2709–2718, Aug. 2010.
healing from a PHM perspective,” in Proc. IEEE Aut. Conf., 2006, pp. 697– [45] S. Ceballos, J. Pou, E. Robles, I. Gabiola, J. Zaragoza, J. L. Villate, and
703. D. Boroyevich, “Three-level converter topologies with switch breakdown
[21] A. Lahyani, P. Venet, G. Grellet, and P. J. Viverge, “Failure prediction of fault-tolerance capability,” IEEE Trans. Ind. Electron., vol. 55, no. 3,
electrolytic capacitors during operation of a switchmode power supply,” pp. 982–995, Mar. 2008.
IEEE Trans. Power Electron., vol. 13, no. 6, pp. 1199–1207, Nov. 1998. [46] S. Ceballos, J. Pou, J. Zaragoza, J. L. Martin, E. Robles, I. Gabiola,
[22] P. Lezana, R. Aguilera, and J. Rodriguez, “Fault detection on multicell and P. Ibanez, “Efficient modulation technique for a four-leg fault-tolerant
converter based on output voltage frequency analysis,” IEEE Trans. Ind. neutral-point-clamped inverter,” IEEE Trans. Ind. Electron., vol. 55, no. 3,
Electron., vol. 56, no. 6, pp. 2275–2283, Jun. 2009. pp. 1067–1074, Mar. 2008.
[23] F. Richardeau, P. Baudesson, and T. A. Meynard, “Failures-tolerance and [47] S. Kwak, T. Kim, and G. Park, “Phase-redundant-based reliable direct
remedial strategies of a PWM multicell inverter,” IEEE Trans. Power ac/ac converter drive for series hybrid off-highway heavy electric vehi-
Electron., vol. 17, no. 6, pp. 905–912, Aug. 2002. cles,” IEEE Trans. Veh. Techn., vol. 59, no. 6, pp. 2674–2688, Jul. 2010.
[24] C. Turpin, P. Baudesson, F. Richardeau, F. Forest, and T. A. Meynard, [48] K. A. Ambusaidi, V. Pickert, and B. Zahawi, “Computer aided analysis
“Fault management of multicell converters,” IEEE Trans. Ind. Electron., of fault tolerant multilevel dc/dc converters,” in Proc. Int. Conf. Power
vol. 49, no. 5, pp. 988–997, Oct. 2002. Electron., Drives Energy Syst., 2006, pp. 1–6.
604 IEEE TRANSACTIONS ON POWER ELECTRONICS, VOL. 28, NO. 1, JANUARY 2013

[49] A. L. Reibman and M. Veeraraghavan, “Reliability modeling: An overview Bingsen Wang (S’01–M’06–SM’08) was born in
for system designers,” Computer, vol. 24, no. 4, pp. 49–57, Apr. 1991. China. He received the M.S. degrees from Shanghai
[50] G. Petrone, G. Spagnuolo, R. Teodorescu, M. Veerachary, and M. Vitelli, Jiao Tong University, Shanghai, China, and the Uni-
“Reliability issues in photovoltaic power processing systems,” IEEE versity of Kentucky, Lexington, U.K., in 1997 and
Trans. Ind. Electron., vol. 55, no. 7, pp. 2569–2580, Jul. 2008. 2002, and the Ph.D. degree from the University of
[51] F. Chan and H. Calleja, “Design strategy to optimize the reliability of Wisconsin-Madison, Madison, in 2006, all in electri-
grid-connected PV systems,” IEEE Trans. Ind. Electron., vol. 56, no. 11, cal engineering.
pp. 4465–4472, Nov. 2009. From 1997 to 2000, he was with Carrier Air Con-
[52] “RDF 2000: Reliability data handbook,” Union technique de LElectricite, ditioning Equipment Company as an Electrical Engi-
Tech. Rep. UTE C 20-810, France, 2000. neer at Shanghai. Upon his graduation with Ph.D., he
[53] V. Blasko, R. Lukaszewski, and R. Sladky, “On line thermal model and joined General Electric (GE) Global Research Cen-
thermal management strategy of a three phase voltage source inverter,” in ter, New York, as a Power Electronics Engineer. While being with GE, he was
Proc. Rec. IEEE Annu. Ind. Appl. Conf., 1999, pp. 1423–1431. involved in various research activities in power electronics, mainly focused
[54] G. Vachtsevanos, F. Lewis, M. Roemer, A. Hess, and B. Wu, Intelligent in the high-power area. From 2008 to 2009, he was with the Department of
Fault Diagnosis and Prognosis for Engineering Systems. Hoboken, NJ: Electrical Engineering at Arizona State University. Since 2010, he has been an
Wiley, 2006. Assistant Professor in the Department of Electrical and Computer Engineering,
[55] R. V. White and F. M. Miles, “Principles of fault tolerance,” in Proc. Appl. Michigan State University, East Lansing. His current research interests include
Power Electron. Conf. Exp., 1996, pp. 18–25. power conversion topologies, in particular multilevel converters and matrix con-
[56] L. Maharjan, T. Yamagishi, H. Akagi, and J. Asakura, “Fault-tolerant verters, dynamic modeling and control of power electronic systems, application
operation of a battery-energy-storage system based on a multilevel cascade of power electronics to renewable energy systems, power conditioning, flexible
PWM converter with star configuration,” IEEE Trans. Power Electron., ac transmission systems (FACTS), and electric drives. He has authored or coau-
vol. 25, no. 9, pp. 2386–2396, Sep. 2010. thored more than 20 technical articles in refereed journals and peer-reviewed
conference proceedings. He holds one Chinese patent.
Dr. Wang received the Prize Paper Award from the Industrial Power Con-
verter Committee of the IEEE Industry Application Society in 2005. He is a
member of Sigma Xi.

Yantao Song received the B.S degree from


Zhengzhou University, Zhengzhou, China, in 2004,
and the M.S. degree in electrical engineering from
Zhejiang University, Hangzhou, China, in 2006. He
is currently working toward the Ph.D degree at
Michigan State University, East Lansing.
From 2006 to 2008, he was with Emerson Net-
work Power as an Electrical Engineer. Then, he joined
FSP-Powerland as a Senior Design Engineer. His re-
search interests include power factor correction and
LLC resonant converters for power supplies, power
conversion for renewable resources generation, powertrains for hybrid electric
vehicles, and reliability of power electronic systems.

You might also like