Tpel 2012 2192503
Tpel 2012 2192503
Abstract—With wide-spread application of power electronic sys- analyzed. In order to analyze the reliability of power electronic
tems across many different industries, their reliability is being systems, mathematical estimation of reliability is necessary.
studied extensively. This paper presents a comprehensive review Component-level failure models are studied extensively [3],
of reliability assessment and improvement of power electronic sys-
tems from three levels: 1) metrics and methodologies of reliability [7]–[13], and several quantitative methodologies are presented
assessment of existing system; 2) reliability improvement of exist- to build system-level reliability models, both of which combine
ing system by means of algorithmic solutions without change of to give an accurate reliability prediction [5], [14], [15], [16],
the hardware; and 3) reliability-oriented design solutions that are [17]. In many cases, the classic design cannot meet reliability re-
based on fault-tolerant operation of the overall systems. The intent quirement of specifications. Numerous solutions are proposed to
of this review is to provide a clear picture of the landscape of relia-
bility research in power electronics. The limitations of the current improve the reliability. Active online monitoring, management
research have been identified and the direction for future research of faults, and extending fault-tolerant operation by reconfigur-
is suggested. ing control strategies are among the commonly adopted methods
Index Terms—Fault diagnosis, fault-tolerant operation, power to enhance reliability [18]–[29]. Since redundant design is an
electronic systems, reliability. effective solution to maintain postfault operation and to thus re-
duce the number of unexpected breakdown of systems, various
power converter topologies equipped with redundant capability
are proposed [30]–[48]. In view of the importance of reliability
I. INTRODUCTION
and much research carried out into it, it is considered a timely
OWER electronic systems play an increasingly important
P role in adjustable-speed drives, unified power quality cor-
rection, utility interfaces with renewable energy resources, en-
attempt to present a systematic perspective on the status of the
power electronic reliability for engineering design and future
research.
ergy storage systems, and electric or hybrid electric vehicles This paper presents a comprehensive overview of the reliabil-
(HEVs). The power electronic techniques provide compact and ity of power electronic systems. The composition of the review
high-efficient solutions to power conversion. However, intro- is based on three different scenarios. First, for any given system
duction of power electronic techniques into these application the reliability assessment or benchmarking is necessary before
fields challenges reliability of the overall systems. One of the any reliability improvement effort is attempted. Second, if any
concerns related to reliability lies in the power semiconductor reliability improvement of the system is deemed necessary, the
devices and electrolytic capacitors that are the most vulnerable algorithmic change may be preferred over significant hardware
links. Most of power electronic converters are not equipped with alternation. Third, the reliability assurance can be implemented
redundancy. Therefore, any fault that occurs to the components in the design stage if the system is yet to be built. Based on
or subsystems of the system will lead to shutdown of the system. these three scenarios, the organization of the subsequent text is
These unscheduled interruptions not only cast significant safety as follows. Section II introduces fundamental theory of relia-
concerns, but also increases system operation cost and partially bility that is relevant to this study. Several common reliability
offsets the benefits of introducing power electronic systems. For models are presented and compared in Section III. Section IV
instance, in HEVs, faults of electric propulsion systems will im- summarizes the existing methods that are commonly employed
pair fuel economy and lengthen cost recovery period [1]. For a to enhance reliability of systems without fundamental change to
photovoltaic (PV) generation system, the cost of failure is equal the systems’ architectures. Such methods include active thermal
to the value of the energy that would be generated while the sys- and fault management and degraded operation under faulted
tem is down plus the cost of repairing and replacing parts [2]. situations. Section V introduces concepts of redundancy and
Over the past several decades, much attention has been di- modified power electronic systems that are equipped with re-
rected to the reliability of power electronic systems. In [3]–[6], dundant functionalities. The concluding remarks and discussion
various metrics of evaluating system reliability are defined and are summarized in Section VI.
III. RELIABILITY ASSESSMENT OF POWER ELECTRONIC Fig. 2. Illustration of a series configuration with n subsystems.
SYSTEMS
Reliability evaluation is important for design and operation Another important data source of empirical-based failure rate
management of the systems. Quantitative assessment of relia- models is RDF 2000, which considers dormant modes and ef-
bility for power electronic converters is essential in determining fects of the temperature cycles, and includes data of IGBTs [52].
whether a particular design meets certain specifications. It also RDF2000 is a preferred reference in a complex analysis since
serves as a criterion to compare different topologies, control it takes into account all types of stress. The failure rates of IG-
strategies, and components. Moreover, the accurate reliability BTs, diodes, and capacitors are estimated and compared in [5]
prediction gives a valuable guidance to management of the sys- from two data sources and also Coffin–Manson and Arrhenius
tem operation and maintenance. All reliability analysis involves Equations. It turns out that each approach has its disadvantages.
some forms of models, which are either at the component level Since the empirical models of electronic devices are based
or at the system level. on previously observed data, reliability prediction results from
these models are inaccurate for applications with different
design, and operational and environmental conditions. The
A. Component-Level Reliability Models physics-of-failure model is researched extensively for ana-
For power electronic systems, reliability research at the com- lyzing reliability of electronic devices, which specifically in-
ponent level has been mainly focused on failure rate models clude power semiconductor devices and electrolytic capaci-
for the key components in power circuits, such as power semi- tors [9]–[12]. Thermal failure mechanisms of IGBTs have be-
conductors, capacitors, and magnetic devices [1], [5], [14], [15], come a focal area in the component-level reliability research
[16], [50]. Field experiences have demonstrated that electrolytic of power electronics. The methodology considers electrical and
capacitors and power switching devices such as insulated gate mechanical stresses, temperature changes, and spatial tempera-
bipolar transistors (IGBTs) and metal–oxide field-effect transis- ture gradients. It tries to explore each root cause of component
tors (MOSFETs) are the most vulnerable components. Magnetic failures. The physics-of-failure method can model potential fail-
components are much more reliable and feature failure rates that ure mechanics, predict wear-out conditions, and integrate relia-
are more than one order of magnitude lower than those of other bility into design process. However, building this type of models
power devices [2], [51]. There are numerous reliability mod- is complex and costly, and requires substantial knowledge about
els available for these electronic components. Empirical-based materials, process, and failure mechanism [13].
models, which typically rely on observed failure data to quantify
model variables, are most widely employed to analyze the reli- B. System or Subsystem-Level Reliability Models for Nonfault-
ability of components. The premise is that the valid failure-rate Tolerant or Fault-Tolerant Systems
data are readily available either from field applications or from A system-level reliability model presents a clear picture of
laboratory tests. functional interdependences and provides a framework for de-
There are many empirical-based reliability models of elec- veloping quantitative reliability estimates of systems to guide
tronic devices, but the military handbook for the reliability the design tradeoff process. Several methodologies to quantify
prediction of electronic equipment (Military-Handbook-217) is the reliability metrics of power electronic converters have been
well known and widely accepted in both military and industrial introduced. They can be categorized into three types of relia-
applications [7]. MIL-217 provides an extensive database for bility models: part-count methods, combinatorial models, and
many different types of parts. It is intended to provide a uniform state-space models.
database for reliability prediction without substantial reliability 1) Part-Count Models: The following have been assumed in
experience of a particular component. However, the reliability the part-count model:
handbook is criticized for several limitations [8]. One of the lim- 1) any fault that occurs to each of the components or subsys-
itations is that the models in MIL-217 assume constant failure tems will cause the overall systems to fail;
rate for components over their lifetime [3]. Another main limi- 2) at components level, the failure rates of individual com-
tation is that the reliability results derived from these models are ponents are assumed constant during useful life time;
often pessimistic and cause costly conservative design. Further- 3) the system is treated as a series structure of all components
more, MIL-217 neither contains data to determine the influence or subsystems.
of dormant modes on components, nor contains the data that For a series structure with n subsystems as shown in Fig. 2,
reflect the effects of thermal cycles, which are all of significant the ith subsystem has failure rate λi ; the failure rate λ of the
importance for practical application of power electronics. The overall system is determined by
failure rate model of some commonly used components such as
n
IGBT is not covered by the handbook. Therefore, the reference λ= λi . (8)
values of MOSFETs are often chosen for analysis of failure rate i=1
of IGBTs.
594 IEEE TRANSACTIONS ON POWER ELECTRONICS, VOL. 28, NO. 1, JANUARY 2013
Fig. 17. Vector diagram of an NPC converter when (a) S a 1 fails in short
circuit and (b) S a 2 fails in short circuit.
Fig. 20. One phase of (a) a conventional three-cell FCMC; and (b) a fault-
tolerant three-cell FCMC.
Fig. 19. Three-phase matrix converter for motor drives with fault-tolerant
capability.
verter [40]. During normal state, the inverters operate with
N + 1 cascaded converter cells in each phase and 2N + 3
The similar principle is applied to a matrix converter for motor output-voltage levels. When a fault occurs to a switch, the
control shown in Fig. 19. Likewise, only open-circuited faults faulted cell is bypassed, and the number of output-voltage lev-
can be handled. els decreases to 2N + 1. The phase-shift angles are regulated
Degraded operation of power electronic systems can be re- to generate a balanced output voltage. At the same time, the
alized mainly by taking use of systems’ natural redundancies. dc-link voltages of those cells in the faulted phases also need to
Since minimum additional components are need, simple and increase in order to keep voltage magnitudes unchanged. The
low-cost solutions result. However, there are some significant same redundant design is proposed for a direct-drive wind tur-
limitations. bine application [41]. The main disadvantage is associated with
1) Application is narrow band. Because of degraded perfor- the large number of extra components that are used by bypass
mances, such as reduced output voltage, reduced power, switches and backup converter cells. Power semiconductor de-
and compromised harmonics distortion, it is only feasi- vices and dc-link capacitors need to be oversized to withstand
ble for applications that can tolerate the degraded perfor- the elevated dc-link voltage under faulty situations.
mance. For some applications, the degraded performance Maharjan et al. propose an improved control scheme for a
may be unacceptable. For instance, power converters with cascaded inverter with star configuration which is applied to
reduced output voltages under faulty conditions are not a battery-energy-storage system [56]. The method makes all
suitable well to utility applications. The converters in 3N − 1 healthy converters in the three phases rather than only
Fig. 18 and 19 are fault tolerant just for motor drives. converter cells in the phase with faults sharing the increased volt-
2) Faulted components and fault types that can be covered age burden equally by introducing a neutral shift. As a result,
are limited. the modulation indices for all remaining cells increase slightly.
3) Degraded operation is used for multilevel converters due The solution can be used for other applications, and can miti-
to the fact that they have complex structures and therefore gate the pressure of overdesigning power devices. However, the
many redundant switching states in combinations. neutral shift is realized by injecting a fundamental-frequency
zero-sequence voltage to each cell. Therefore, the neutral point
of the load cannot be directly connected to the neutral point of
C. Quasi-Normal Postfault Operation the inverter output.
Because of many limitations relying on inherent fault-tolerant FCMC as shown in Fig. 20(a) provide inherent series redun-
ability of systems, redundant design is studied and reported ex- dancy if the ratings of devices can withstand increased voltage
tensively to provide quasi-normal operation of power converters stress. An open-switch or short-switch fault-tolerant design for
under fault situations. three-cell four-level FCMC as shown in Fig. 20(b) is proposed
Due to high modularity, cascaded multilevel converters as and its control strategy is presented in [42]. According to mod-
shown in Fig. 14 can provide approximately similar perfor- ulation rules, there are 2m switching state combinations in each
mances to normal operation by simple redundant design. Song phase leg of an m-cell. However, based on previous clamped ca-
et al. present a fault-tolerant control for a static synchronous pacitor voltages, only m + 1 states are effective and are utilized
compensator (STATCOM) based on a cascaded H-bridge in- to generate m + 1 output-voltage levels while other states are
SONG AND WANG: SURVEY ON RELIABILITY OF POWER ELECTRONIC SYSTEMS 601
Fig. 22. One leg of NPC three-level converters with two different redundant
realizations.
and low level with the absence of the middle level. Likewise, for
redundant. The presented method in [42] optimizes the switch- short-circuited faults of Sa1 and Sa4 , high and low voltage lev-
ing states by maximizing the number of output-voltage level. els remain while fuses and transistors are not actuated. In view
For the three-cell converter, the output phase voltage Vxg has of loss of middle voltage level, solution II shown in Fig. 22(b)
four levels under normal operation. When a single-switch fault is proposed. For any single short-circuited fault in one phase,
in one phase occurs, the faulted switch and its counterpart are the converter still provides three output-voltage levels, which is
bypassed. The corresponding capacitor is isolated from the sys- similar to normal operation. These two solutions can maintain
tem or connected in parallel to the one in a healthy cell. The normal output voltage under conditions of no more than one
remaining capacitor voltages are regulated appropriately, and faulted switch in each phase leg. However, some of the switches
accordingly the inverter still can provide four line-to-ground have to withstand the total dc-link voltage under faulty states.
levels. The disadvantages mainly lie in: 1) under faulty con- Therefore, oversizing of the component ratings is necessary.
ditions, some switches need to withstand full dc-link voltage, Although the two solutions aforementioned maintain similar-
which necessitates oversized design; and 2) large number of to-normal operation of the NPC converters under faulted con-
extra devices are commanded. ditions, necessary oversized voltage ratings of semiconductors
Fig. 21 shows a modified general multilevel converter that will lead to loss of the most significant advantage of multilevel
is proposed by Chen et al. This converter can tolerate short- converter, that is, voltage stress of devices is the half of dc-link
switch or open-switch faults without loss of any output-voltage voltage. The oversized design results in high cost and low ef-
level [43]. At least one redundant switching state combination ficiency even under the normal condition. A redundant phase
is obtained for any output-voltage level by the change of the cir- leg is added to the original topology to overcome these limi-
cuit configuration. The modified topology achieves redundancy tations [45], [46]. Fig. 23 depicts the overall structure. When
at the price of higher power loss and higher cost compared to a fault occurs to phase leg x with x ∈ {a, b, c}, fuses Fx and
the original five-level architecture. Even during normal oper- Fn are cleared. Thus, the faulted leg is isolated from the sys-
ation, some output voltage levels have to be realized by five tem. In addition, the bidirectional switch Tx , S5 , and S6 are
conducted semiconductor devices, while only four devices are activated such that the redundant phase leg replaces the faulted
in a conduction path for any voltage level for conventional topol- one. The system is reconfigured into a standard NPC converter.
ogy. In open-switch or short-switch fault conditions, six devices The main disadvantage is due to the large number of additional
have to conduct to provide a current path. Increased number of components and therefore higher cost.
conduction devices will lead to higher conduction losses. On Kwak et al. present a matrix converter with a redundant phase
the other hand, the original architecture features good symme- leg for heavy electric vehicles as shown in Fig. 24 [47]. The
try, where only eight main switches conduct load current and main idea is that the fourth leg replaces the faulted leg by a
other clamping switches only balance the capacitor voltages. bidirectional switch and thus normal operation of the system is
Therefore, the clamping switches only need low current rating. maintained. The limitation lies in the fact that the topology with
Two solutions to maintain short-switch fault-tolerant opera- redundant capability can cover only the open-circuited fault of
tion of three-phase NPC converters are presented in [44]. For switches or freewheeling diodes.
solution I shown in Fig. 22(a), the upper or lower fast fuse is There are some other modified topologies with quasi-normal
cleared by turning on the SCR when Sa2 or Sa3 fails to open. The operation capability. The aforementioned solutions include the
faulted phase leg operates with two voltage levels, the high level power electronic converters for which fault-tolerant operation
602 IEEE TRANSACTIONS ON POWER ELECTRONICS, VOL. 28, NO. 1, JANUARY 2013
[49] A. L. Reibman and M. Veeraraghavan, “Reliability modeling: An overview Bingsen Wang (S’01–M’06–SM’08) was born in
for system designers,” Computer, vol. 24, no. 4, pp. 49–57, Apr. 1991. China. He received the M.S. degrees from Shanghai
[50] G. Petrone, G. Spagnuolo, R. Teodorescu, M. Veerachary, and M. Vitelli, Jiao Tong University, Shanghai, China, and the Uni-
“Reliability issues in photovoltaic power processing systems,” IEEE versity of Kentucky, Lexington, U.K., in 1997 and
Trans. Ind. Electron., vol. 55, no. 7, pp. 2569–2580, Jul. 2008. 2002, and the Ph.D. degree from the University of
[51] F. Chan and H. Calleja, “Design strategy to optimize the reliability of Wisconsin-Madison, Madison, in 2006, all in electri-
grid-connected PV systems,” IEEE Trans. Ind. Electron., vol. 56, no. 11, cal engineering.
pp. 4465–4472, Nov. 2009. From 1997 to 2000, he was with Carrier Air Con-
[52] “RDF 2000: Reliability data handbook,” Union technique de LElectricite, ditioning Equipment Company as an Electrical Engi-
Tech. Rep. UTE C 20-810, France, 2000. neer at Shanghai. Upon his graduation with Ph.D., he
[53] V. Blasko, R. Lukaszewski, and R. Sladky, “On line thermal model and joined General Electric (GE) Global Research Cen-
thermal management strategy of a three phase voltage source inverter,” in ter, New York, as a Power Electronics Engineer. While being with GE, he was
Proc. Rec. IEEE Annu. Ind. Appl. Conf., 1999, pp. 1423–1431. involved in various research activities in power electronics, mainly focused
[54] G. Vachtsevanos, F. Lewis, M. Roemer, A. Hess, and B. Wu, Intelligent in the high-power area. From 2008 to 2009, he was with the Department of
Fault Diagnosis and Prognosis for Engineering Systems. Hoboken, NJ: Electrical Engineering at Arizona State University. Since 2010, he has been an
Wiley, 2006. Assistant Professor in the Department of Electrical and Computer Engineering,
[55] R. V. White and F. M. Miles, “Principles of fault tolerance,” in Proc. Appl. Michigan State University, East Lansing. His current research interests include
Power Electron. Conf. Exp., 1996, pp. 18–25. power conversion topologies, in particular multilevel converters and matrix con-
[56] L. Maharjan, T. Yamagishi, H. Akagi, and J. Asakura, “Fault-tolerant verters, dynamic modeling and control of power electronic systems, application
operation of a battery-energy-storage system based on a multilevel cascade of power electronics to renewable energy systems, power conditioning, flexible
PWM converter with star configuration,” IEEE Trans. Power Electron., ac transmission systems (FACTS), and electric drives. He has authored or coau-
vol. 25, no. 9, pp. 2386–2396, Sep. 2010. thored more than 20 technical articles in refereed journals and peer-reviewed
conference proceedings. He holds one Chinese patent.
Dr. Wang received the Prize Paper Award from the Industrial Power Con-
verter Committee of the IEEE Industry Application Society in 2005. He is a
member of Sigma Xi.