0% found this document useful (0 votes)
66 views

Machine Learning For The New York City Power Grid: Citation

This document discusses using machine learning techniques to predict failures and prioritize maintenance for New York City's aging power grid. Specifically, it introduces a general process for analyzing historical grid data and developing statistical models to predict risks and estimate failure rates for different grid components. This process has been specialized to produce rankings of failure risks for distribution feeders, cables, joints, transformers and manhole vulnerability. The models can help utility companies better target maintenance resources and improve grid reliability as infrastructure is upgraded to a smart grid over the coming decades.

Uploaded by

Luis Ramirez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

Machine Learning For The New York City Power Grid: Citation

This document discusses using machine learning techniques to predict failures and prioritize maintenance for New York City's aging power grid. Specifically, it introduces a general process for analyzing historical grid data and developing statistical models to predict risks and estimate failure rates for different grid components. This process has been specialized to produce rankings of failure risks for distribution feeders, cables, joints, transformers and manhole vulnerability. The models can help utility companies better target maintenance resources and improve grid reliability as infrastructure is upgraded to a smart grid over the coming decades.

Uploaded by

Luis Ramirez
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Machine Learning for the New York City Power Grid

The MIT Faculty has made this article openly available. Please share
how this access benefits you. Your story matters.

Citation Rudin, Cynthia et al. “Machine Learning for the New York City
Power Grid.” IEEE Transactions on Pattern Analysis and
Machine Intelligence 34.2 (2012): 328-345.
As Published http://dx.doi.org/10.1109/tpami.2011.108
Publisher Institute of Electrical and Electronics Engineers

Version Author's final manuscript


Accessed Sun Jul 22 22:58:46 EDT 2018
Citable Link http://hdl.handle.net/1721.1/68634
Terms of Use Creative Commons Attribution-Noncommercial-Share Alike 3.0
Detailed Terms http://creativecommons.org/licenses/by-nc-sa/3.0/
1

Machine Learning for the


New York City Power Grid
Cynthia Rudin†∗ , David Waltz∗ , Roger N. Anderson∗ , Albert Boulanger∗ , Ansaf Salleb-Aouissi∗ , Maggie
Chow‡ , Haimonti Dutta∗ , Philip Gross∗o , Bert Huang∗ , Steve Ierome‡ , Delfina Isaac‡ , Arthur Kressner‡ ,
Rebecca J. Passonneau∗ , Axinia Radeva∗ , Leon Wu∗

Abstract—Power companies can benefit from the use of knowledge discovery methods and statistical machine learning for preventive
maintenance. We introduce a general process for transforming historical electrical grid data into models that aim to predict the risk
of failures for components and systems. These models can be used directly by power companies to assist with prioritization of
maintenance and repair work. Specialized versions of this process are used to produce 1) feeder failure rankings, 2) cable, joint,
terminator and transformer rankings, 3) feeder MTBF (Mean Time Between Failure) estimates and 4) manhole events vulnerability
rankings. The process in its most general form can handle diverse, noisy, sources that are historical (static), semi-real-time, or real-time,
incorporates state-of-the-art machine learning algorithms for prioritization (supervised ranking or MTBF), and includes an evaluation of
results via cross-validation and blind test. Above and beyond the ranked lists and MTBF estimates are business management interfaces
that allow the prediction capability to be integrated directly into corporate planning and decision support; such interfaces rely on several
important properties of our general modeling approach: that machine learning features are meaningful to domain experts, that the
processing of data is transparent, and that prediction results are accurate enough to support sound decision making. We discuss the
challenges in working with historical electrical grid data that were not designed for predictive purposes. The “rawness” of these data
contrasts with the accuracy of the statistical models that can be obtained from the process; these models are sufficiently accurate to
assist in maintaining New York City’s electrical grid.

Index Terms—applications of machine learning, electrical grid, smart grid, knowledge discovery, supervised ranking, computational
sustainability, reliability

1 I NTRODUCTION data records regarding equipment and past failures,


but those records are generally not being used to their
One of the major findings of the U.S. Department of
full extent for predicting grid reliability and assisting
Energy’s “Grid 2030” strategy document [1] is that
with maintenance. This is starting to change. This paper
“America’s electric system, ‘the supreme engineering
presents steps towards proactive maintenance programs
achievement of the 20th century’ is aging, inefficient,
for electrical grid reliability based on the application of
congested, incapable of meeting the future energy needs
knowledge discovery and machine learning methods.
[. . .].” Reliability will be a key issue as electrical grids
transform throughout the next several decades, and grid Most power grids in U.S. cities have been built gradu-
maintenance will become even more critical than it is ally over the last 120 years. This means that the electrical
currently. A 2007 survey by the NERC [2] states that equipment (transformers, cables, joints, terminators, and
“aging infrastructure and limited new construction” is associated switches, network protectors, relays, etc.) vary
the largest challenge to electrical grid reliability out of in age; for instance, at least 5% of the low voltage cables
all challenges considered by the survey (also see [3]). in Manhattan were installed before 1930, and a few
The smart grid will bring operations and maintenance of the original high voltage distribution feeder sections
more online – moving the industry from reactive to installed during the Thomas Edison era are still in active
proactive operations. Power companies keep historical use in New York City. In NYC there are over 94,000
miles of high voltage underground distribution cable,

enough to wrap around the earth three and a half times.
• Center for Computational Learning Systems, Columbia University,
475 Riverside Drive MC 7717 (850 Interchurch Center), New York, NY Florida Power and Light Company (FPL) has 24,000
10115, U.S.A miles of underground cable1 and many other utilities
† MIT Sloan School of Management, Massachusetts Institute of
manage similarly large underground electric systems.
Technology, 77 Massachusetts Avenue, Cambridge MA 02139, U.S.A. Maintaining a large grid that is a mix of new and old
E-mail: [email protected] components is more difficult than managing a new grid
‡ Consolidated Edison Company of New York, 4 Irving Place, New
(for instance, as is being laid in some parts of China). The
York, NY, 10003, U.S.A. U.S. grid is generally older than many European grids
that were replaced after WWII, and older than grids in
o Now at: Google, Inc., 76 Ninth Avenue, New York, NY 10011
Manuscript received ?; revised ?. 1. http://www.fpl.com/faqs/underground.shtml
2

places where infrastructure must be continually replen- been developed for: 1) feeder failure ranking for dis-
ished due to natural disasters (for instance, Japan has tribution feeders, 2) cable section, joint, terminator and
earthquakes that force power systems to be replenished). transformer ranking for distribution feeders, 3) feeder
The smart grid will not be implemented overnight; MTBF (Mean Time Between Failure) estimates for dis-
to create the smart grid of the future, we must work tribution feeders, and 4) manhole vulnerability ranking.
with the electrical grid that is there now. For instance, Each specialized process was designed to handle data
according to the Brattle Group [4], the cost of updating with particular characteristics. In its most general form,
the grid by 2030 could be as much as $1.5 trillion. the process can handle diverse, noisy, sources that are
The major components of the smart grid will (for an historical (static), semi-real-time, or real-time; the process
extended period) be the same as the major components incorporates state of the art machine learning algorithms
of the current grid, and new intelligent meters must for prioritization (supervised ranking or MTBF), and
work with the existing equipment. Converting to a smart includes an evaluation of results via cross-validation on
grid can be compared to replacing worn parts of an past data, and by blind evaluation. The blind evalua-
airplane while it is in the air. As grid parts are replaced tion is performed on data generated as events unfold,
gradually and as smart components are added, the old giving a true barrier to information in the future. The
components, including cables, switches, sensors, etc., data used by the machine learning algorithms include
will still need to be maintained. Further, the state of past events (failures, replacements, repairs, tests, load-
the old components should inform the priorities for the ing, power quality events, etc.) and asset features (type
addition of new smart switches and sensors. of equipment, environmental conditions, manufacturer,
The key to making smart grid components effective is specifications, components connected to it, borough and
to analyze where upgrades would be most useful, given network where it is installed, date of installation, etc.).
the current system. Consider the analogy to human Beyond the ranked lists and MTBF estimates, we have
patients in the medical profession, a discipline for which designed graphical user interfaces that can be used by
many of the machine learning algorithms and techniques managers and engineers for planning and decision sup-
used here for the smart grid were originally developed port. Successful NYC grid decision support applications
and tested. While each patient (a feeder, transformer, based on our models are used to assist with prioritizing
manhole, or joint) is made up of the same kinds of repairs, prioritizing inspections, correcting of overtreat-
components, they wear and age differently, with variable ment, generating plans for equipment replacement, and
historic stresses and hereditary factors (analogous to prioritizing protective actions for the electrical distribu-
different vintages, loads, manufacturers) so that each tion system. How useful these interfaces are depends
patient must be treated as a unique individual. Nonethe- on how accurate the underlying predictive models are,
less individuals group into families, neighborhoods, and and also on the interpretation of model results. It is
populations (analogous to networks, boroughs) with rel- an important property of our general approach that
atively similar properties. The smart grid must be built machine learning features are meaningful to domain
upon a foundation of helping the equipment (patients) experts, in that the data processing and the way causal
improve their health, so that the networks (neighbor- factors are designed is transparent. The transparent use
hoods) improve their life expectancy, and the population of data serves several purposes: it allows domain experts
(boroughs) lives more sustainably. to troubleshoot the model or suggest extensions, it al-
In the late 1990’s, NYC’s power company, Con Edison, lows users to find the factors underlying the root causes
hypothesized that historical power grid data records of failures, and it allows managers to understand, and
could be used to predict, and thus prevent, grid failures thus trust, the (non-black-box) model in order to make
and possible associated blackouts, fires and explosions. decisions.
A collaboration was formed with Columbia University, We implicitly assume that data for the modeling tasks
beginning in 2004, in order to extensively test this hy- will have similar characteristics when collected by any
pothesis. This paper discusses the tools being developed power company. This assumption is broadly sound but
through this collaboration for predicting different types there can be exceptions; for instance feeders will have
of electrical grid failures. The tools were created for the similar patterns of failure across cities, and data are
NYC electrical grid; however, the technology is general probably collected in a similar way across many cities.
and is transferrable to electrical grids across the world. However, the levels of noise within the data and the
In this work, we present new methodologies for particular conditions of the city (maintenance history,
maintaining the smart grid, in the form of a general maintenance policies, network topologies, weather, etc.)
process for failure prediction that can be specialized are specific to the city and to the methods by which data
for individual applications. Important steps in the pro- are collected and stored by the power company.
cess include data processing (cleaning, pattern matching, Our goals for this paper are to demonstrate that data
statistics, integration), formation of a database, machine collected by electrical utilities can be used to create sta-
learning (time aggregation, formation of features and tistical models for proactive maintenance programs, to
labels, ranking methods), and evaluation (blind tests, show how this can be accomplished through knowledge
visualization). Specialized versions of the process have discovery and machine learning, and to encourage com-
3

panies across the world to reconsider the way data are


being collected and stored in order to be most effective
for prediction and decision-support applications.
In Section 2, we discuss the electrical grid maintenance
tasks. Section 3 contains the general process by which
data can be used to accomplish these tasks. In Section
4 we discuss the specific machine learning methods
used for the knowledge discovery process. Section 5
presents the specialized versions of the general process
for the four prediction tasks. In Section 6 we give sample
results for the NYC power grid. Section 7 discusses the
prototype tools for management we have developed in
order to make the results useable, and to assist in knowl-
edge discovery. Section 8 presents related work. Section Fig. 1. Typical Electrical Infrastructure in Cities. Source:
9 presents lessons learned from the implementation of Con Edison.
these systems on the NYC grid.

erally lie along main streets or avenues and distribute


2 P ROACTIVE M AINTENANCE TASKS
power from substations to the secondary grid.
Power companies are beginning to switch from reactive A feeder may experience an outage due to a fault
maintenance plans (fix when something goes wrong) somewhere along the feeder, or due to deliberate de-
to proactive maintenance plans (fix potential problems energizing (so maintenance can be performed). If one
before they happen). There are advantages to this: re- component, such as a feeder, fails or is taken out of
active plans, which allow failures to happen, can lead service, this failure is called a “first contingency,” and
to dangerous situations, for instance fires and cascading if two components in the same network fail, it is called
failures, and costly emergency repairs. However, it is a “second contingency,” and so forth. Loss of a small
not a simple task to determine where limited resources number of feeders generally does not result in any
should be allocated in order to most effectively repair interruption in customers’ electricity service, due to ex-
potentially vulnerable components. tensive redundancy in the system. (For instance, Con
In large power systems, electricity flows from source
Edison’s underground system is designed to operate
to consumer through transmission lines to substations,
under second contingency.) However, once one or more
then to primary feeder cables (“feeders”), and associated
feeders in a network are out of service, the remaining
cable sections, joints, and terminators, through trans-
feeders and their associated transformers have to pick
formers, and to the secondary (low-voltage) electrical
up the load of the feeders that were disconnected. This
distribution grid (see Figure 1). There are two types of
added load elevates the risk of failure for the remaining
feeders, “distribution feeders” and “transmission feeders.”
feeders and transformers, and past a certain point, the
Our work has mainly focused on distribution feeders
network will experience a cascading failure, where the
(the term “feeder” will indicate distribution feeders),
remaining components are unable to carry the network’s
which are large medium to high-voltage cables that form
load, and the entire network must be shut down until
a tree-like structure, with transformers at the leaves.
the system can be repaired.
In some cities, these transformers serve buildings or
Each feeder consists of many cable sections (called
a few customers, and a feeder failure leads to service
“sections” in what follows); for instance, the average
interruptions for all of these downstream customers. In
number of sections per feeder in NYC is approximately
other cities, including NYC, the secondary cables form
150. Each section runs between two manholes, and has
a mesh or grid-like structure that is fed by redundant
“joints” at each end. Sections are often made up of
high-voltage feeders, with the goal of continuing service,
three bundled cables, one for each voltage phase. Joints
even if one or more feeders fail. There can be possible
can attach two single cable sections, or can branch two
weaknesses in any of these components: a feeder may
or more ways. Ultimately feeder sections end at trans-
go out of service, the cables, joints and terminators can
formers that step down the voltage to 120 or 240 Volts
fail, transformers can fail, and insulation breakdown of
needed for the secondary system. Feeder sections are
cables in the secondary electrical grid can cause failures.
connected to transformers by “hammerheads,” which are
In what follows, we discuss how data-driven preemptive
terminators that are named for their distinctive shape.
maintenance policies can assist with preventing these
Feeder failures generally occur at the joints or within a
failures.
cable section. In this subsection, we discuss the problem
of predicting whether a given feeder will have a failure
2.1 Feeder Rankings (including failures on any of its subcomponents), and
Primary distribution feeder cables are large cables; in in the following subsection, we discuss the prediction of
NYC they operate at 13,600 or 27,000 volts. They gen- failures on individual feeder components, specifically on
4

PILC sections); date put into service; records of pre-


vious “open autos” (feeder failures), previous power
quality events (disturbances), scheduled work, and test-
ing; electrical characteristics, obtained from electric load
flow simulations (e.g., how much current a feeder is
expected to carry under various network conditions);
and dynamic data, from real-time telemetry attached
to the feeder. Approximately 300 summary features are
computed from the raw data, for example, the total
number of open autos per feeder over the period of data
collection. For Con Edison, these features are reasonably
complete and not too noisy. The feeder failure rank
lists are used to provide guidance for Con Edison’s
contingency analysis and winter/spring replacement
programs. In the early spring of each year, a number
of feeders are improved by removing PILC sections,
changing the topology of the feeders to better balance
loading, or to support changing power requirements for
Fig. 2. Number of feeder outages in NYC per day during new buildings. Loading is light in spring, so feeders can
2006-2007, lower curve with axis at left, and system-wide be taken out of service for upgrading with low risk. Pri-
peak system load, upper curve at right. oritizing feeders is important: scheduled replacement of
each section costs about $18,000. Feeder failures require
even more expensive emergency replacements and also
the individual cable sections, joints and hammerheads.
carry a risk of cascading failures.
We use the results from the individual component failure
predictions as input to the feeder failure prediction
model. 2.2 Cable Sections, Joints, Terminators and Trans-
One kind of joint, the “stop joint,” is the source of formers Ranking
a disproportionate number of failures. Stop joints con- In Section 2.1 we discussed the task of predicting
nect old “PILC” to modern cables with solid dielectrics. whether a failure would happen to any component of
PILC stands for Paper-Insulated Lead-sheathed Cable, a (multi-component) feeder. We now discuss the task
an older technology used in most urban centers from of modeling failures on individual feeder components;
1906 through the 1960’s. PILC sections are filled with modeling how individual components fail brings an
oil, so stop joints must not only have good electrical extra level to the understanding of feeder failure. Fea-
connections and insulation (like all joints) but must also tures of the components can be more directly related
cap off the oil to prevent it from leaking. Even though all to localized failures and kept in a non-aggregated form;
utilities are aggressively removing lead cable from their for instance, a feature for the component modeling task
systems, it is going to be a long time before the work might encode that a PILC section was made by Okonite
is complete.2 For instance, in NYC, the Public Service in 1950 whereas a feature for the feeder modeling task
Commission has mandated that all ∼30,000 remaining might instead be a count of PILC sections greater than
PILC sections be replaced by 2020. Note however that 40 years old for the feeder. The component rankings
some PILC sections have been in operation for a very can also be used to support decisions about which
long time without problems, and it is important to components to prioritize after a potentially susceptible
make the best use of the limited maintenance budget feeder is chosen (guided by the results of the feeder
by replacing the most unreliable sections first. ranking task). In that way, if budget constraints prohibit
As can be seen in Figure 2, a small number of feeder replacement of all the bad components of a feeder, the
failures occur daily in NYC throughout the year. The rate components that are most likely to fail can be replaced.
of failures noticeably increases during warm weather; For Con Edison, the data used for ranking sections,
air conditioning causes electricity usage to increase by joints and hammerheads was diverse and fairly noisy,
roughly 50% during the summer. It is during these times though in much better shape than the data used for the
when the system is most at risk. manhole events prediction project we describe next.
The feeder failure ranking application, described in
Section 5.1, orders feeders from most at-risk to least at- 2.3 Manhole Ranking
risk. Data for this task include: physical characteristics
A small number of serious “manhole events” occur each
of the feeder, including characteristics of the underlying
year in many cities, including fires and explosions. These
components that compose the feeder (e.g., percent of
events are usually caused by insulation breakdown of
2. For more details, see the article about replacement of PILC in NYC the low-voltage cable in the secondary network. Since
http://www.epa.gov/waste/partnerships/npep/success/coned.htm the insulation can break down over a long period of
5

MORINO (SPLICER) CLAIMS CONDITION YELLOW F/O 411 W.95 ST.


ALSO————(LOW VOLTAGE TO PARKING GARAGE ———JEC
01/26/00 08:57 MDE.VETHI DISPATCHED BY 71122 01/26/00 09:21
MDE.VETHI ARRIVED BY 23349
01/26/00 11:30 VETHI REPORTS: FOUND COVER ON NOT SMOKING..
SB-110623 F/O 413 W.95 ST..1 AC LEG COPPERED..CUT CLEARED
AND REJOINED....MADE REPAIRS TO DC CRABS...ALL B/O CLEARED
CO = 0PPM —> SB-110623 F/O 413 W.95 ST
01/26/00 11:34 MDE.VETHI COMPLETE BY 23349
**************ELIN REPORT MADE OUT**************MC

Fig. 3. Excerpt from Sample Smoking Manhole (SMH)


Trouble Ticket

time, it is reasonable to try to predict future serious


events from the characteristics of past serious and non- Fig. 4. Bathtub curve. Source Wikipedia:
serious events. We consider events within two some- http://en.wikipedia.org/wiki/Bathtub curve
what simplified categories: serious events (fires, explo-
sions, serious smoking manholes) and potential pre-
cursor events (burnouts, flickering lights, etc). Potential 3 A P ROCESS FOR FAILURE P REDICTION IN
precursor events can be indicators of an area-wide net- P OWER G RIDS
work problem, or they can indicate that there is a local Our general goal is “knowledge discovery,” that is, find-
problem affecting only 1-2 manholes. ing information in data that is implicit, novel, and po-
Many power companies keep records of all past events tentially extremely useful [5]. Harding et al. [6] provide
in the form of trouble tickets, which are the shorthand an overview of knowledge discovery in manufacturing.
notes taken by dispatchers. An example ticket for an The general CRISP-DM framework [7] captures the data
NYC smoking manhole event appears in Figure 3. Any processing for (potentially) extremely raw data, however
prediction algorithm must consider how to effectively the traditional knowledge discovery in databases (KDD)
process these tickets. outline [8] does not encompass this. The general process
presented here can be considered a special case of CRISP-
2.4 MTBF (Mean time between failures) Modeling DM, but it is outside the realm of KDD.
The general knowledge discovery process for power
A common and historical metric for reliability perfor-
grid data is shown in Figure 5. The data can be struc-
mance is mean time between failures (MTBF) for compo-
tured text or categorical data, numerical data, or un-
nents or systems that can be repaired, and mean time to
structured text documents. The data are first cleaned and
failure (MTTF) for components that cannot.3 Once MTBF
integrated into a single database that can be accurately
or MTTF is estimated, a cost versus benefit analysis
queried. Then one or more machine learning problems
can be performed, and replacement policies, inspection
are formulated over an appropriate timescale. Ideally,
policies, and reliability improvement programs can be
the features used in the machine learning models are
planned. Feeders are made up of multiple components
meaningful to the domain experts. The parameters in
that can fail, and these components can be replaced
the machine learning algorithm are tuned or tested by
separately, so MTBF (rather than MTTF) is applicable
cross-validation, and evaluated for prediction accuracy
for feeder failures. When an individual joint (or other
by blind prediction tests on data that are not in the
component of a feeder) fails it is then replaced with a
database. Domain experts also evaluate the model using
new one, so MTTF is applicable instead for individual
business management tools and suggest improvements
component failures.
(usually in the initial handling and cleaning of data).
In general the failure rate of a component or a com-
The data processing/cleaning is the key piece that
posite system like a feeder will have a varying MTBF
ensures the integrity of the resulting model. This view
over its lifetime. A system that is new or has just had
agrees with that of Hsu et al. [9], who state that “. . . the
maintenance may have early failures, known as “infant
often neglected pre-processing and post-processing steps
mortality.” Then, systems settle down into their mid-life
in knowledge discovery are the most critical elements
with a lower failure rate, and finally the failure rate in-
in determining the success of a real-life data mining
creases at the end of their lifetimes. (See Figure 4.) PILC
application.” Data cleaning issues have been extensively
can have very long lifetimes and it is hard to determine
discussed in the literature, for instance in e-commerce
an end of life signature for them. Transformers do show
[10]. Often, the application of machine learning tech-
aging with an increase in failure rate.
niques directly (without the data cleaning step) does not
3. See Wikipedia’s MTBF page: lead to useful or meaningful models. In electrical utility
http://en.wikipedia.org/wiki/Mean time between failures applications, these data can be extremely raw: data can
6

of these two tables, based on a unique manhole identifier


that is the union of three fields – manhole type, number,
and local 3-block code – provided a match to only about
half of the cable records. We then made a first round
of corrections to the data, where we unified the spelling
of the manhole identifiers within both tables, and found
matches to neighboring 3-block codes (the neighboring
3-block code is often mistakenly entered for manholes on
a border of the 3 blocks). The next round of corrections
used the fact that main cables have limited length: if only
one of the two ends of the cable was uniquely matched to
a manhole, with several possible manholes for the other
end, then the closest of these manholes was selected
(the shortest possible cable length). This processing gave
a match to about three quarters of the cable records.
A histogram of the cable length then indicated that
about 5% of these joined records represented cables that
were too long to be real. Those cables were used to
troubleshoot the join again. Statistics can generally assist
in finding pockets of data that are not joined properly
to other relevant data.
Data can be either: static (representing the topology
of the network, such as number of cables, connectivity),
semi-dynamic (e.g., only changes when a section is
removed or replaced, or when a feeder is split into two),
or dynamic (real-time, with timestamps). The dynamic
Fig. 5. Process Diagram data can be measured electronically (e.g., feeder loading
measurements), or it can be measured as failures occur
(e.g., trouble tickets). For the semi-dynamic and dynamic
data, a timescale of aggregation needs to be chosen for
come from diverse sources throughout the company, the features and labels for machine learning.
with different schemes for recording times for events For all four applications, machine learning models are
or identities of components, they may be incomplete or formed, trained, and cross-validated on past data, and
extremely noisy, or they may contain large numbers of evaluated via “blind test” on more recent data, discussed
free-text documents (for example, trouble tickets). The further in Section 4.
data processing step fully defines the interpretation of For ranking algorithms, the evaluation measure is
the data that will be used by the machine learning usually a statistic of a ranked list (a rank statistic), and
model. This processing turns historical data from diverse ranked lists are visualized as ROC (Receiver Operator
sources into useable features and labels for learning. Characteristic) curves. Evaluation measures include:
Data cleaning can include many steps such as pattern • Percent of successes in the top k%: the percent of
matching (for instance, finding regular expressions in components that failed within the top k% of the
structured or unstructured data), information extraction, ranked list (similar to “precision” in information
text normalization, using overlapping data to find incon- retrieval).
sistencies, and inferring related or duplicated records. • AUC or weighted AUC: Area under the ROC curve
Statistics can be used to assess whether data are missing, [11], or Wilcoxon Mann Whitney U statistic, as
and for sanity checks on inferential joins. formulated in Section 4 below. The AUC is related
An inferential join is the process by which multiple to the number of times a failure is ranked below a
raw data tables are united into one database. Inferential non-failure in the list. Weighted AUC metrics (for
joins are a key piece of data processing. An example to instance, as used the P-Norm Push algorithm [12]
illustrate the logic behind using basic pattern matching derived in Section 4) are more useful when the top
and statistics for inferential joining is the uniting of the of the list is the most important.
main cable records to the raw manhole location data for For MTBF/MTTF estimation, the sum of squared dif-
the manhole event process in NYC, to determine which ferences between estimated MTBF/MTTF and true
cables enter into which manholes. Main cables connect MTBF/MTTF is the evaluation measure.
two manholes (as opposed to service or streetlight cables The evaluation stage often produces changes to the
that enter only one manhole). The cable data comes from initial processing. These corrections are especially impor-
Con Edison’s accounting department, which is different tant for ranking problems. In ranking problems where
from the source of the manhole location data. A raw join the top of the list is often the most relevant, there
7

is a possibility that top of the list will be populated


completely by outliers that are caused by incorrect or in-
complete data processing, and thus the list is essentially
useless. This happens particularly when the inferential
joins are noisy; if a feeder is incorrectly linked to a
few extra failure events, it will seem as if this feeder
is particularly vulnerable. It is possible to troubleshoot
this kind of outlier by performing case studies of the
components on the top of the ranked lists.
Fig. 6. Sample timeline for rare event prediction
4 M ACHINE L EARNING M ETHODS : R ANKING
FOR R ARE E VENT P REDICTION
The subfield of ranking in machine learning has ex- Failure prediction is performed in a rare event predic-
panded rapidly over the past few years as the infor- tion framework, meaning the goal is to predict events
mation retrieval (IR) community has started developing within a given “prediction interval” using data prior to
and using these methods extensively (see the LETOR that interval. There is a separate prediction interval for
website4 and references therein). Ranking algorithms can training and testing. The choice of prediction intervals
be used for applications beyond information retrieval; determines the labels y for the machine learning problem
our interest is in developing and applying ranking algo- and the features hj . Specifically, for training, yi is +1
rithms to rank electrical grid components according to if component i failed during the training prediction
the probability of failure. In IR, the goal is to rank a set interval and -1 otherwise. The features are derived from
of documents in order of relevance to a given query. For the time period prior to the prediction interval. For
both electrical component ranking and IR, the top of the instance, as shown in Figure 6, if the goal is to rank
list is considered to be the most important. components for vulnerability with respect to 2010, the
The ranking problems considered here fall under the model is trained on features derived from prior to 2009
general category of supervised learning problems, and and labels derived from 2009. The features for testing are
specifically supervised bipartite ranking. In supervised derived from pre-2010 data. The choice of the prediction
bipartite ranking tasks, the goal is to rank a set of interval’s length is application dependent; if the interval
randomly drawn examples (the “test set”) according to is too small, there may be no way to accurately character-
the probability of possessing a particular attribute. To ize failures. If the length is too large, the predictions may
do this, we are given a “training set” that consists of be too coarse to be useful. For manhole event prediction
examples with labels: in NYC, this time period was chosen to be one year, and
{(xi , yi )}m time aggregation was performed using the method of
i=1 , xi ∈ X , yi ∈ {−1, +1}.
Figure 6 for manhole event prediction. A more elaborate
In this case, the examples are electrical components, and time aggregation scheme is discussed in Section 5.1 for
the label we want to predict is whether a failure will feeder failure ranking, where “time shifted” features
occur within a given time interval. It is assumed that were used.
the training and test examples are both drawn randomly The ranking algorithm uses the training set to con-
from the same unknown distribution. The examples are struct a scoring function, which is a linear combination
characterized by features: of the features:
n
{hj }nj=1 , hj : X → R. X
fλ (x) = λj hj (x),
The features should encode all information that is rele- j=1
vant for predicting the vulnerability of the components,
and the examples are rank-ordered by their scores. The
for instance, characteristics of past performance, equip-
ranking algorithm constructs f by minimizing, with
ment manufacturer, and type of equipment. To demon-
respect to the vector of coefficients λ := [λ1 , . . . , λn ], a
strate, we can have: h1 (x) = the age of component x,
quality measure (a statistic) of the ranked list, denoted
h2 (x) = the number of past failures involving component
R(fλ ). The procedure for optimizing R(fλ ) is “empirical
x, h3 (x) = 1 if x was made by a particular manufacturer.
risk minimization” where the statistic is optimized on
These features can be either correlated or uncorrelated
the training set, and the hope is that the solution gen-
with failure prediction; the machine learning algorithm
eralizes to the full unknown distribution. Particularly,
will use the training set to choose which features to
it is hoped that the scoring function will rank the test
use and determine the importance of each feature for
examples accurately, so that the positive examples are on
predicting future failures.
the top of the list. Probabilistic generalization bounds are
4. http://research.microsoft.com/en-us/um/beijing/projects/ used to theoretically justify this type of approach (e.g.,
letor/paper.aspx [13, 14]).
8

A common quality measure in supervised ranking RankBoost [14], which uses the exponential loss and no
is the probability that a new pair of randomly chosen regularization; and the P-Norm Push [12]. The P-Norm
examples is misranked (see [14]), which should be min- Push uses price function g(z) = z p , which forces the
imized: value of the objective to be determined mainly by the
highest ranked negative examples when p is large; the
PD {misrank(fλ )}
power p acts as a soft max. Since most of the value of the
:= PD {fλ (x+ ) ≤ fλ (x− ) | y+ = 1, y− = −1}. (1) objective is determined by the top portion of the list, the
The notation PD indicates the probability with respect to algorithm concentrates more on the top. The full P-Norm
a random draw of (x+ , y+ ) and (x− , y− ) from distribu- Push algorithm is:
tion D on X ×{−1, +1}. The empirical risk corresponding λ∗ ∈ arg infλ Rp (λ) where
to (1) is the number of misranked pairs in the training  p
set: X X  
X X Rp (λ) :=  exp − [fλ (xi ) − fλ (xk )]  .
R(fλ ) := 1[fλ (xi )≤fλ (xk )] {k:yk =−1} {i:yi =1}
{k:yk =−1} {i:yi =1}
Vector λ∗ is not difficult to compute, for instance by
= #misranks(fλ ). (2) gradient descent. The P-Norm Push is used currently in
The pairwise misranking error is directly related to the the manhole event prediction tool. An SVM algorithm
(negative of the) area under the ROC curve; the only dif- with `2 regularization is used currently in the feeder
ference is that ties are counted as misranks in (2). Thus, failure tool.
a natural ranking algorithm is to choose a minimizer of Algorithms designed via empirical risk minimization
R(fλ ) with respect to λ: are not designed to be able to produce density estimates,
that is estimates of P (y = 1|x), though in some cases it is
λ∗ ∈ argminλ∈Rn R(fλ ) possible, particularly when the loss function is smooth.
and to rank the components in the test set in descending These algorithms are instead designed specifically to
order of fλ∗ (x) := j λ∗j hj (x).
P produce an accurate ranking of examples according to
There are three shortcomings to this algorithm: first, these probabilities.
it is computationally hard to minimize R(fλ ) directly. It is important to note that the specific choice of
Second, the misranking error R(fλ ) considers all mis- machine learning algorithm is not the major component
ranks equally, in the sense that misranks at the top of success in this domain; rather, the key to success is
of the list are counted equally with misranks towards the data cleaning and processing as discussed in Section
the bottom, even though in failure prediction problems 3. If the machine learning features and labels are well
it is clear that misranks at the top of the list should constructed, any reasonable algorithm will perform well;
be considered more important. A third shortcoming is the inverse holds too, in that badly constructed features
the lack of regularization usually imposed to enable and labels will not yield a useful model regardless of the
generalization (prediction ability) in high dimensions. A choice of algorithm.
remedy for all of these problems is to use special cases For our MTBF application, MTBF is estimated indi-
of the following ranking objective that do not fall into rectly through failure rates; the predicted failure rate
any of the traps listed above: is converted to MTBF by taking the reciprocal of the
  rate. Failure rate is estimated rather than MTBF for
X X   numerical reasons: good feeders with no failures have an
R`g (fλ ) := g ` fλ (xi ) − fλ (xk ) +Ckλk2 , infinite MTBF. The failure rate is estimated by regression
{k:yk =−1} {i:yi =1} algorithms, for instance SVM-R (support vector machine
(3) regression) [16], CART (Classification and Regression
where g is called the price function and ` is called the Trees) [17], ensemble based techniques such as Random
loss function. R(fλ ) given in (2) is a special case of Forests [18], and statistical methods, e.g. Cox Propor-
R`g (fλ ) with `(z) = 1z≤0 and g(z) = z. The objec- tional Hazards [19].
tive is convex in λ when the exponential loss is used
`(z) = e−z [14], or the SVM (support vector machine)
hinge loss `(z) = (1 − z)+ [15]; several other convex loss 5 S PECIFIC P ROCESSES AND C HALLENGES
functions are also commonly used. The norm used in In this section, we discuss how the general process needs
the regularization term is generally either a norm in a to be adapted in order to handle data processing and ma-
Reproducing Kernel Hilbert space (for SVMs), P which in chine learning challenges specific to each of our electrical
the simplest casePis the `2 norm kλk22 = λ
j j
2
, or an reliability tasks in NYC. Con Edison currently operates
`1 norm kλk1 = j |λj |. The constant C can be set by the world’s largest underground electric system, which
cross-validation. delivers up to a current peak record of about 14,000 MW
Special cases of the objective (3) are: SVM Rank [15] of electricity to over 3 million customers. A customer
which uses the hinge loss, g(z) = z as the price function, can be an entire office building or apartment complex
and Reproducing Kernel Hilbert space regularization; in NYC so that up to 15 million people are served with
9

electricity. Con Edison is unusual among utilities in that


it started keeping data records on the manufacturer, age,
and maintenance history of components over a century
ago, with an increased level of Supervisory Control and
Data Acquisition (SCADA) added over the last 15 years.
While real-time data are collected from all transformers
for loading and power quality information, that is much
less than will be needed for a truly smart grid.
We first discuss the challenges of feeder ranking and
specifics of the feeder failure ranking process developed
for Con Edison (also called “Outage Derived Data Sets
- ODDS”) in Section 5.1. We discuss the data processing Fig. 7. Example illustrating the training and test time win-
challenges for cables, joints, terminators and transform- dows in ODDS. The current time is 8/13/2008, and failure
ers in Section 5.2. The manhole event prediction process data for training was derived from the prediction period of
is discussed in Section 5.3, and the MTBF estimation 7/30/2007 - 8/27/2007 and 7/30/2008 - 8/13/2008.
process is discussed in Section 5.4.

5.1 Feeder Ranking in NYC determine the features that affect reliability. For instance,
Con Edison data regarding the physical composition failure can be due to: concurrent or prior outages that
of feeders are challenging to work with; variations in stress the feeder and other feeders in the network; aging;
the database entry and rewiring of components from power quality events (e.g., voltage spikes); overloads
one feeder to another make it difficult to get a perfect (that have seasonal variation, like summer heat waves);
snapshot of the current state of the system. It is even known weak components (e.g., joints connecting PILC to
more difficult to get snapshots of past states of the other sections); at-risk topologies (where cascading fail-
system; the past state needs to be known at the time ures could occur); the stress of “HiPot” (high potential)
of each past failure because it is used in training the testing; and de-energizing/re-energizing of feeders that
machine learning algorithm. A typical feeder is com- can result in multiple failures within a short time span
posed of over a hundred cable sections, connected by a due to “infant mortality.” Other data scarcity problems
similar number of joints, and terminating in a few tens of are caused by the range in MTBF of the feeders; while
transformers. For a single feeder, these subcomponents some feeders are relatively new and last for a long time
are a hodgepodge of types and ages, for example a between failures (for example, more than five years),
brand-new cable section may be connected to one that is others can have failures within a few tens of days
many decades old; this makes it challenging to “roll-up” of each other. In addition, rare seasonal effects (such
the feeder into a set of features for learning. The features as particularly high summer temperatures) can affect
we currently use are statistics of the ages, numbers, and failure rates of feeders.
types of components within the feeder; for instance, we We have focused on the most serious failure type
have considered maxima, averages, and 90th percentiles for distribution feeders, where the entire feeder is auto-
(robust versions of the maxima). matically taken offline by emergency substation relays,
Dynamic data presents a similar problem to physi- due to some type of fault being detected by sensors.
cal data, but here the challenge is aggregation in time Our current system for generating data sets attempts
instead of space. Telemetry data are collected at rates to address the challenge of learning with rare positive
varying from hundreds of times per second (for power examples (feeder failures). An actual feeder failure inci-
quality data) to only a few measurements per day dent is instantaneous, so a snapshot of the system at that
(weather data). These can be aggregated over time, again moment will have only one failure example. To better
using functions such as max or average, using different balance the number of positive and negative examples in
time windows (as we describe shortly). Some of the time the data, we tried the rare event prediction setup shown
windows are relatively simple (e.g., aggregating over 15 in Figure 6, labeling any example that had experienced
or 45 days), while others take advantage of the system’s a failure over some time window as positive. However,
periodicity, and aggregate over the most recent data plus the dynamic features for these examples are constructed
data from the same time of year in previous years. from the timeframe before the prediction period, and
One of the challenges of the feeder ranking application thus do not represent the precise conditions at the time
is that of imbalanced data, or scarcity of data charac- of failure. This was problematic, as the domain experts
terizing the failure class, which causes problems with believed that some of the dynamic data might only have
generalization. Specifically, primary distribution feeders predictive value in the period right before the failure.
are susceptible to different kinds of failures, and we To solve this problem, we decided to switch to “time-
have very few training examples for each kind, making shifted” positive examples, where the positive examples
it difficult to reliably extract statistical regularities or are still created from the past outages within the predic-
10

tion period, but the dynamic features are derived only


from the time period shortly before the failure happened.
This allows our model to capture short-term precur-
sors to failures. The features of non-failures (negative
examples) are characteristics of the current snapshot of
all feeders in the system. Not only does this approach,
which we call “ODDS” for Outage Derived Data Sets,
capture the dynamic data from right before the failure,
it helps to reduce the imbalance between positive and
negative examples. Figure 7 shows an example of the
periods used to train and test the model.
Another challenge raised by our feeder failure ranking
application is pervasive “concept drift,” meaning that
patterns of failure change fairly rapidly over time, so
that a machine learning model generated on data from
the past may not be completely representative of future
failure patterns. Features can become inactive or change
in quality. Causes of this include: repairs being made
on components, causing the nature of future failures to
change; new equipment having different failure proper-
ties than current equipment; and seasonal variation in
failure modes (e.g., a greater likelihood of feeder failure
in the summer). To address this challenge, ODDS creates
a new model every 4 hours on the current dataset. (See
also [20, 21, 22].)
An outline of the overall process is shown in Figure
8. A business management application called the Con-
tingency Analysis Program (CAP), discussed in Section Fig. 8. Process diagram for feeder ranking, using ODDS
7, uses the machine learning results to highlight areas of
risk through graphical displays and map overlays.
As in many real-life applications, our application suf-
fers from the problem of missing data. Techniques such of feeder component failures called CAJAC. It captures
as mean-imputation are used to fill in missing values. failure data of joints in detail. Con Edison autopsies
failed components and the failure reasons they discover
are captured in this database. Though the joint failure
5.2 Cables, Joints, Terminators, and Transformers data are recorded in detail, it is challenging to construct a
Ranking in NYC complete list of the set of installed joints within the grid;
The main challenges to constructing rankings of feeder the set of installed joints is imputed from the features of
components overlap somewhat with those faced in con- the cables being connected. In addition, short lengths
structing rankings for feeders: the use of historical data, of cable, called “inserts,” that are sometimes used to
and the data imbalance problem. make connections in manholes, are not yet captured in
Ideally, we should be able to construct a consistent and the Vision Mapping system, so the number of joints in
complete set of features for each component and also any manhole can only be estimated in general. Also,
its connectivity, environmental, and operational contexts the nature of the joint (type of joint, manufacturer, etc.)
at the time of failure. At Con Edison, the cable data has had to be inferred from the date of installation.
used for cable, joint, and terminator rankings resides in We do this by assuming that the policy in force at the
the “Vision Mapping” system and are designed to only installation date was used for that joint, which allows us
represent the current layout of cables in the system, and to infer the manufacturers and techniques used.
not to provide the layout at particular times in the past. To create the transformer database, several data
We began to archive cable data starting in 2005 and also sources were merged using inferential joins, includ-
relied on other snapshots of cable data that Con Edison ing data from Con Edison’s accounting department,
made, for example, cable data captured for Con Edison’s the inspection record database, and the dissolved gas
“Network Reliability Indicator” program that allowed us database. Transformer ranking has several challenges.
to go back as far as 2002 configurations. We are working with a transformer population that is
Generating training data for joints is especially chal- actively monitored and aggressively replaced by Con
lenging. Joints are the weakest link in feeders with cer- Edison at any sign of impending trouble, meaning that
tain heat-sensitive joint types having accelerated failure vulnerable transformers that had not failed have been
rates during heat waves. Con Edison keeps a database replaced, leading to right censoring (meaning missing
11

information after a certain time in the life of the trans-


former). Further, for a transformer that was replaced, it is
always a challenge to determine whether a failure would
have occurred if the transformer had not been replaced,
causing label bias for the machine learning.
As demonstrated for several of the projects discussed
here, components that have multiple roles or that act as
interfaces between multiple types of components present
the challenge of bringing together multiple databases to
capture the full context for the component. In order to
rank hammerheads, we built a database that joined splice
ticket data, cable data, and transformer data, where
transformer data itself came from an earlier join of large
databases described above.
While working with various data sets involving date-
time information, we had to be careful about the mean-
ing of the date and time. In some cases the date entry
represents a date when work was done or an event
occurred, in other cases, the date is when data was
entered into the database. In some instances there was
confusion as to whether time was provided in GMT, EST
or EDT, leading to some cases where our machine learn-
ing systems made perfect predictions, but for the wrong
reasons: they learned to detect inevitable outcomes of
failures, but where these outcomes apparently predated
the outages because of data timing skew.

Fig. 9. Process diagram for manhole event ranking


5.3 Manhole Ranking in NYC
One major challenge for manhole event prediction was to
determine which of many data sources, and which fields
within these sources, to trust; it only made sense to put a information present in the trouble tickets, including a
lot of effort into cleaning data that had a higher chance of street address (possibly misspelled or abbreviated, e.g.,
assisting with prediction. The data used for the manhole 325 GREENWHICH ST), structure names typed within
event prediction process is described in detail in [23], and the text of the ticket (S/B 153267) and structure names
includes: information about the infrastructure, namely a sometimes included in the structured fields of three ta-
table of manhole locations and types, and a snapshot bles (ECS, ELIN, or ESR ENE). All location information
of recent cable data from Con Edison’s accounting de- was typed by hand, and these data are very noisy – for
partment (type of cable, manholes at either end of cable, instance, the term “service box” was written in at least
installation dates); five years of inspection reports filled 38 different ways – and no one source of information
out by inspectors; and most importantly, event data. The is complete. The redundancy in the data was used in
event data came from several different sources includ- order to obtain reliable location information: structure
ing: ECS (Emergency Control Systems) trouble tickets numbers were extracted from the ticket text using infor-
which included both structured fields and unstructured mation extraction techniques (see Figure 10), then tickets
text, a table of structured data containing additional were geocoded to determine the approximate location
details about manhole events (called ELIN – ELectrical of the event. If the geocoded address was not within
INcidents), and a table regarding electrical shock and a short distance (200m) of the structure named within
energized equipment events (called ESR ENE). These the ticket, the information was discarded. The remaining
data were the input for the manhole event prediction (twice verified) matches were used, so that the ticket
process outlined in Figure 9. was identified correctly with the manholes that were
The trouble tickets are unstructured text documents, involved in the event.
so a representation of the ticket had to be defined It was necessary also to determine the seriousness of
for the learning problem. This representation encodes events; however ECS trouble tickets were not designed
information about the time, location, and nature (degree to contain a description of the event itself, and there is
of seriousness) of the event. The timestamps on the no structured field to encode the seriousness directly. On
ticket are directly used, but the location and seriousness the other hand, the tickets do have a “trouble type” field,
must be inferred (and/or learned). The locations of which is designed to encode the nature of the event
events were inferred using several sources of location (e.g., an underground AC event is “UAC,” flickering
12

place over months or years. A prediction period of one


year was chosen for the machine learning ranking task,
as illustrated in Figure 6.
The cable data, which is a snapshot at one (recent)
point in time, was unified with the other data to con-
struct “static” features and labels for the ranking task.
This assumes implicitly that the snapshot approximately
represents the number and type of cables over the time
period of prediction; this assumption is necessary since
the exact state of cables in the manhole at a given
time in the past may not be available. However, this
assumption is not universally true; for instance it is not
true for neutral (non-current carrying, ground) cables
at Con Edison, and neutral cable data thus cannot be
used for failure prediction, as discussed in [23]. Often,
manholes that have had serious events also have had
cables replaced, and more neutrals put in; a higher
Fig. 10. Ticket processing percentage of neutral cables indicate an event in the past,
not necessarily an event in the future.
The P-Norm Push (see Section 4) was used as the main
ranking algorithm for manhole ranking.
lights is “FLT”). Originally, we used the trouble type
to characterize the seriousness of the event: the codes
5.4 MTBF Modeling in NYC
“MHX” (manhole explosion), “MHF” (manhole fire),
and “SMH” (smoking manhole) were used to identify It became apparent that to really make our feeder pre-
serious events. However, we later performed a study diction models valuable for proactive maintenance, we
[24] that showed that the trouble type did not agree had to also produce estimates that allow for an absolute
with experts’ labeling of tickets, and is not a good measure of vulnerability, rather than a relative (ranking)
measure of seriousness. In order to better estimate the measure; many asset replacement decisions are made by
seriousness of events, we created a representation of each assessing how much reliability in days is gained if a
ticket based on information extracted from the ticket text, particular choice is made (for instance, to replace a PILC
including the length of the ticket, the presence of serious section versus another replacement at the same cost).
metadata (for instance, the term “SMOKING LIGHTLY”), Machine learning techniques can be used to estimate
and whether cable sizes appeared in the text (indicating MTBF. Figure 11 shows the application of one of these
the replacement of a cable). This information extraction techniques [26] to predicting survival times of PILC
was performed semi-automatically using text-processing sections in Queens. This technique can accommodate
tools, including the Generalized Architecture for Text censored data through inequality constraints in SVM
Engineering “GATE” [25]. regression. Each row of the table represents one feeder,
The ticket representation was used to classify the and each column indicates a time interval (in years). The
tickets into the categories: serious events, possible pre- color in a particular bin gives the count of cable sections
cursor events, and non-events. This classification was within the feeder that are predicted to survive that time
performed with either a manual, rule-based method or interval. That is, each row is a histogram of the predicted
general machine learning clustering methods (k-means MTBF for the feeder’s cable sections. The histogram for
clustering). So there are two machine learning steps in one feeder (one row) is not necessarily smooth in time.
the manhole event ranking process: a ticket classification This is because the different cable sections within the
step, and a manhole ranking step. feeder were installed at different times (installation not
being a smooth function of time), and these installation
One challenge faced early on was in choosing the
dates influence the predicted survival interval.
timeframes for the rare event prediction framework. We
started originally trying to predict manhole events on a
short timescale (on the order of 60 days) based on the 6 E VALUATION IN NYC
domain experts’ intuition that such a timescale would We describe the results of our specific processes
yield useful predictions. However, it became clear that as applied to the NYC power grid through the
manhole events could not easily be predicted over such Columbia/Con Edison collaboration. We have generated
a short time; for instance if it is known that a manhole machine learning models for ranking the reliability of
event will occur within 60 days after a prior event, it all 1,000+ high voltage (13-27 KV) feeders that form
is almost impossible to predict when within those 60 the backbone of the NYC’s power distribution system;
days it will happen. In fact, insulation breakdown, which for each of the ∼150,000 cable sections and ∼150,000
causes manhole events, can be a slow process, taking joints that connect them; for the ∼50,000 transformers
13

Fig. 11. Predictions from a support vector censored


regression algorithm on PILC sections of 33 feeders in
Queens.

Fig. 12. ROC-like curves from tests of the machine


learning ranking of specific components.
and ∼50,000 terminators that join the transformers
to the feeders; and for ∼150,000 secondary structures
(manholes and service boxes) through which low
voltage (120-240 V) power from the transformers is
distributed to buildings in NYC.

Feeder and Component Ranking Evaluation


Our machine learning system for computing feeder sus-
ceptibility based on the ODDS system has been on-
line since July 2008, generating a new model every 4
hours. ODDS is driven by the feeds from three dynamic
real time systems: load pocket weight,5 power quality,
and outage history. We found that separate training
in Brooklyn and Queens, with their 27KV networks,
and Manhattan and Bronx, with their 13KV networks,
produced better results.
We track the performance of our machine learning Fig. 13. ROC-like curve for blind test of Crown Heights
models by checking the rank of the failed feeder and the feeders.
ranks of its components whenever a failure happens. We
also compile ROC-like curves showing the components
that failed and the feeder that automatically opened its AUC is reported in Figures 12 and 13. The machine
circuit breaker when the failure occurred. These blind learning system has improved to the point where 60%
tests provide validation that the algorithms are working of failures occur in the 15% of feeders that are ranked as
sufficiently to assist with operations decisions for Con most susceptible to failure. As importantly, fewer than
Edison’s maintenance programs. 1% of failures occur on feeders in the best 25% of ODDS
Figure 12 shows the results of a blind test for pre- feeder susceptibility rankings (Figure 14).
dicting feeder failures in Crown Heights, Brooklyn, with To determine what the most important features are,
prediction period from May, 2008 to January, 2009. Fig- we create “tornado” diagrams like Figure 15. This figure
ure 13 shows results of various tests on the individual illustrates the influence of different categories of features
components. At each point (x, y) on the plot, x gives a under different weather conditions. For each weather
position on the ranked list, and y is the percent of failures condition, the influence of each category (the sum of
that are ranked at or above x in the list. coefficients λj for that
P category divided by the total
We use rank statistics for each network to continually sum of coefficients j λj ) is displayed as a horizontal
measure performance of the ODDS system. For instance, bar. Only the top few categories are shown. For both
snowy and hot weather, features describing power qual-
5. Load Pocket Weight (LPW) is a expert-derived measure of trouble ity events have been the most influential predictors of
in delivering power to the secondary network in localized areas. It is failure according to our model.
a weighted score of the number of open (not in service) network pro-
tector switches, open secondary mains, open fuses, and non-reporting The categories of features in Figure 15 are: power qual-
transformers, and other signs of service outage. ity, which are features that count power quality events
14

Fig. 14. Percent of feeder outages in which the feeder that


failed was within the worst 15% (left) of the ranked list, or
best 25% (right), where the predictions being evaluated Fig. 15. Influence of different categories of features under
are those just before the time of failure. The system different weather conditions. Red: hot weather of August
improved from less than 30% of the failures in the worst 2010; Blue: snowy January 2011; Yellow: rainy February
15% in 2005 to greater than 60% in 2008, for example. 2011; Turquoise: typical fall weather in October 2010.

test whether this improvement is significant, we use


(disturbances) preceding the outage over various time a nonparametric statistical test, called the logrank test,
windows; system load in megawatts; outage history, that compares the survival distributions of two sam-
which include features that count and characterize the ples. In this case, we wished to determine if the 2009
prior outage history (failure outages, scheduled outages, summer MTBF values are statistically larger than the
test failures, immediate failures after re-energization, and 2002 summer MTBF values. The performance of the
urgent planned outage); load pocket weight, which mea- system showed significant improvement, in that there is
sures the difficulty in delivering power to the end user; a less than one in a billion chance that the treatment
transformers, particularly features encoding the types population in 2009 did not improve over the control
and ages of transformers (e.g., percent of transformers population from 2002. In 2009, for example, there were
made by a particular manufacturer); stop joints and 1468 out of 4590 network-days that were failure free, or
paper joints, which include features that count joints one out of every three summer days, but in the 2002
types, configurations, and age, where these features are control group, there were only 908 network-days that
associated with joining PILC to other PILC and more were failure free, or one out of five summer days, that
modern cable; cable rank, which encodes the results of were failure free. The larger the percentage of network-
the cable section ranking model; the count of a specific days that were failure free, the lower the likelihood of
type of cable (XP and EPR) in various age categories; multiple outages happening at the same time.
HiPot index features, which are derived by Con Edison Figure 16 shows MTBF predicted by our model for
to estimate how vulnerable the feeders are to heat sensi- each underground network in the Con Edison system
tive component failures; number of shunts on the feeder, on both January 1, 2002 (purple) and December 31, 2008
where these shunts equalize the capacitance and also (yellow). The yellow bars are generally larger than the
condition the feeder to power quality events; an indi- purple bars, indicating an increase in MTBF.
cator for non-network customers, where a non-network We have performed various studies to predict MTBF
customer is a customer that gets electricity from a radial of feeders. Figure 17 shows the accuracy of our outage
overhead connection to the grid; count of PILC sections rate predictions for all classes of unplanned outages
along the feeder; percent of joints that are solid joints, over a three-year period using a using a support vector
which takes into account the fact that joining modern machine regression model that predicts feeder MTBF.
cable is simpler and less failure-prone than joining PILC; While the results are quite strong, there are two sources
shifted load features that characterize how well a feeder of inaccuracy in this study. First, the study did not model
transfers load to other feeders if it were to go out of “infant mortality,” the increased likelihood of failure
service. after a repaired system is returned to service. This led
to an underestimation of failures for the more at-risk
MTBF Modeling Evaluation feeders (visible particularly in the upper right of the
We have tracked the improvement in MTBF for each graph). Empirically we observed an increased likelihood
network as preventive maintenance work has been done of infant mortality for about six weeks following an out-
by Con Edison to improve performance since 2002. To age. Second, the study has difficulties handling censored
15

Fig. 16. Linear regression used to determine the Mean


Time Between Failures for January 1, 2002 (purple), and
December 31, 2008 (yellow) in each underground net-
work in the Con Edison system. Networks are arranged
along the horizontal axis from worst (left) to best (right),
according to Con Edison’s “Network Reliability Index”.
Fig. 18. ROC-like curve for 2009 Bronx blind test of the
machine learning ranking for vulnerability of manholes to
serious events (fires and explosions).

Before the start of the project, it was not clear whether


manhole events could be predicted at all from the sec-
ondary data. These results show that indeed manhole
events are worthwhile to model for prediction.

7 M ANAGEMENT S OFTWARE
Prototype interfaces were developed jointly with Con
Edison in order to make the results useful, and to assist
in knowledge discovery.

CAP – Contingency Analysis Program


Fig. 17. Scatter plot of SVM predicted outage rate versus CAP is a tool designed by Con Edison and used at
actual rate for all classes of unplanned outages. The their main control centers. It brings together informa-
diagonal line depicts a perfect model. tion relevant to the outage of a primary feeder cable.
When a contingency occurs, Con Edison already has
applications in use (integrated into the CAP tool) that
preemptively model the network for the possibility of
data. If events are very infrequent, it is not possible for
additional feeders failing. These applications determine
the algorithm to accurately predict their frequency. This
the failures that could have the worst consequences for
right-censoring effect for the low outage rate feeders, due
the system. Columbia’s key contribution to the CAP tool
to lack of failures in the three-year observation window,
is a feeder susceptibility indicator (described in Section
is visible in the lower left of the plot.
5.1) that gives the operators a new important piece of in-
Manhole Ranking Evaluation formation: an indicator of which feeders are most likely
The most recent evaluation of the manhole rankings was to fail next. Operators can use this information to help
a blind test for predicting 2009 events in the Bronx. The determine the allocation of effort and resources towards
Columbia database has data through 2007, incomplete preventing a cascade. The “worst consequences” feeder
2008 data, and no data from 2009 or after. There are may not be the same as the “most likely to fail” feeder, so
27,212 manholes in the Bronx. The blind test showed: the operator can choose to allocate resources to feeders
that are both likely to fail, and for which a failure could
• the most at-risk 10% (2,721/27,212) of the ranked
lead to more serious consequences. Figure 19 shows a
list contained 44% (8/18) of the manholes that ex-
snapshot of the CAP tool interface.
perienced a serious event,
• the most at-risk 20% (5,442/27,212) of the ranked CAPT – Capital Asset Prioritization Tool
list contained 55% (10/18) of the trouble holes for CAPT is a prototype application designed by Columbia
serious events. and Con Edison that offers an advanced mechanism
Figure 18 contains the ROC-like curve for the full ranked for helping engineers and managers plan upgrades to
list. the feeder systems of NYC. Using a graphic interface,
16

Fig. 19. Screen capture of the Contingency Analysis Fig. 20. A screen capture of the Con Edison CAPT
Program tool during a 4th contingency event in the sum- evaluation, showing an improvement in MTBF from 140
mer of 2008, with the feeders at most risk of failing next to 192 days if 34 of the most at-risk PILC sections were
highlighted in red. The feeder ranking at the time of failure to be replace on a feeder in Brooklyn at an estimated cost
is shown in a blow-up ROC-like plot in the center. of $650,000.

shown in Figure 20, users first enter constraints on


work they would hypothetically like to do. For instance,
users can specify a borough or network, one or more
specific feeder sections or type of feeder section, dollar
amount to be allocated, etc. CAPT then produces benefit
versus cost curves of various replacement strategies with
the objective of optimizing “bang for the buck”– the
greatest increase in system MTBF for the dollars spent.
Such a tool, if proven robust in production tests could
become a valuable contributor to capital asset alloca-
tions in the future. Typical maintenance plans might
attempt to target replacement of at-risk sections, joints,
or secondary components. The key components of CAPT
include 1) the model (currently an ODDS model along
with a regression between SVM scores and observed
MTBF) used to estimate MTBF for feeders both before
and after any hypothetical changes; 2) the ranked lists
for cable sections and joints, based on component rank- Fig. 21. Example of cost benefit analysis of possible
ings, allowing CAPT to recommend good candidates for replacement strategies for specific at-risk components
replacement; and 3) a system that displays, in chart form analyzed by the machine learning system. The solid line
for the user, tradeoff (Pareto) curves of benefit vs. cost approximates the “efficient frontier” in portfolio manage-
for various replacement strategies (Figure 21). ment theory.

Manhole Event Structure Profiling Tool and Visualiza-


tion Tool
We developed several tools that allow a qualitative
the model, and allows the vulnerability of a manhole
evaluation of results and methods by secondary system
to be roughly estimated at a glance by domain experts.
engineers. The most useful of these tools is the “structure
We also developed a visualization tool (discussed in
profiling tool,” (also called the “report card” tool at Con
[28]) that uses Google Earth6 as a backdrop to display
Edison) that produces a full report of raw and pro-
the locations of events, manholes and cables. Figure 22
cessed data concerning any given individual manhole
displays two screen shots from the visualization tool.
[27]. Before this tool was implemented, an individual
case study of a manhole took days and resulted in an
incomplete study. This tool gives the reasons why a
particular manhole was assigned a particular rank by 6. earth.google.com
17

different technique is often small compared to the accu-


racy gained through other steps in the discovery process,
or by formulating the problem differently. The data in
power engineering problems is generally assumed to
be amenable to learning in its raw form, in contrast
with our treatment of the data. The second reason our
work is distinct from the power engineering literature
is that the machine learning techniques that have been
developed by the power engineering community are
often “black-box” methods such as neural networks and
genetic algorithms (e.g. [32, 33]). Neural networks and
genetic algorithms can be viewed as heuristic, non-
convex optimization procedures for objectives that have
multiple local minima; the algorithms’ output can be ex-
tremely sensitive to the initial conditions. Our work uses
mainly convex optimization procedures to avoid this
problem. Further, “black-box” algorithms do not gen-
erally produce interpretable/meaningful solutions (for
instance the input-output relationship of a multilayer
neural network is not generally interpretable), whereas
we use mainly simple linear combinations of features.
Fig. 22. Images from the manhole events visualiza- We are not aware of any other work that addresses the
tion tool, where labels were enlarged for clarity. Top: challenges in mining historical power grid data of the
Geocoded ticket addresses, colored by trouble type. Yel- same level of complexity as those discussed here. Our
low indicates a serious event type, purple indicates a work contrasts with a subset of work in power engineer-
potential precursor. If the user clicks on a ticket, the ing where data come entirely from Monte Carlo (MC)
full ticket text is displayed. Bottom: Manholes and main simulations [34, 35], and the MC simulated failures are
cables within the same location, where manholes are predicted using machine learning algorithms. In a sense,
colored by predicted vulnerability. Note that a ticket within our work is closer to data mining challenges in other
the top figure does not necessarily correspond to the fields such as e-commerce [10], criminal investigation
nearest manhole on the bottom figure. [36], or medical patient processing [9] that encompass
the full discovery process. For instance, it is interesting to
contrast our work on manhole events with the study of
8 R ELATED W ORK Cornélusse et al. [37] who used domain experts to label
Machine learning has been used for applications in power “frequency incidents” at generators, and constructed a
engineering since the early days of artificial intelligence, machine learning model from the frequency signals and
with a surge of recent interest in the last decade. Venues labels. The manhole event prediction task discussed here
for these works include the 1999 ACAI workshop on also used domain experts to label trouble tickets as to
Machine Learning Applications to Power Systems (sum- whether they represent serious events; however, the level
marized by Hatziargyriou [29]), the proceedings of the of processing required to clean and represent the tickets,
yearly International Conference on Intelligent System along with the geocoding and information extraction
Applications to Power Systems,7 and the 2009 Special required to pinpoint event locations, coupled with the
Session on Machine Learning in Energy Applications integration of the ticket labeling machine learning task
at the International Conference on Machine Learning with the machine learning ranking task makes the latter
and Applications (ICMLA ’09). There are also several task a much more substantial undertaking.
books summarizing work on machine learning in power
engineering (e.g., [30, 31]). Applications include the pre- 9 L ESSONS L EARNED
diction of power security breaches, forecasting, power
system operation and control, and classification of power There are several “take-away” messages from the
system disturbances. The power engineering work bears development of our knowledge discovery processes on
little similarity to the current work for two reasons. First, the NYC grid:
much of the power engineering work focuses on specific
machine learning techniques, yet for our application Prediction is Possible
the specific machine learning techniques are not the We have shown successes in predicting failures of
primary reason for success, as discussed earlier. In our electrical components based on data collected by
applications, the predictive accuracy gained by using a a major power utility company. It was not clear
at the outset that knowledge discovery and data
7. http://www.isap-power.org/ mining approaches would be able to predict electrical
18

component failures, let alone assist domain engineers


with proactive maintenance programs. We are now
involved in a Smart Grid Demonstration Project to
verify that these techniques can be scaled to robust
system use. For example, prior to our successes on the
manhole event project, many Con Edison engineers did
not view manhole event prediction as a realistic goal.
The Con Edison trouble ticket data could easily have
become what Fayyad et al. [8] consider a “data tomb.”
In this case, the remedy created by Columbia and Con
Edison involved a careful problem formulation, the use
of sophisticated text processing tools, and state-of-the-
art machine learning techniques.

Data Are the Key


Power companies already collect a great deal of data,
however, if these data are going to be used for prediction
of failures, they should ideally have certain properties: Fig. 23. Overtreatment in the High Potential (HiPot)
first, the data should be as clean as possible, meaning Preventive Maintenance program was identified by com-
for instance, that unique identifiers should be used for paring to control group performance. Modified and A/C
each component. Second, if a component is replaced, it is Hipot tests are now used by Con Edison instead of DC
important to record the properties of the old component Hipot tests.
(and its surrounding context if it is used to derive
features) before the replacement; otherwise it cannot be
determined what properties are common to those being
will be a substantial advantage in using algorithms
replaced.
designed precisely for the task of prioritization.
For trouble tickets, unstructured text fields should
not be eliminated. It is true that structured data are
Reactive Maintenance Can Lead to Overtreatment
easier to analyze; on the other hand, free-text can be
We have demonstrated with a statistical method called
much more reliable. This was also discussed by Dalal
“propensity” [39] that the High Potential (HiPot) testing
et al. [38] in dealing with trouble tickets from web
program at Con Edison was overtreating the “patient,”
transaction data; in their case, a 40 character free-text
i.e., the feeders. HiPot is, by definition, preventive main-
field contained more information than any other field in
tenance in that incipient faults are driven to failure by
the database. In the case of Con Edison trouble tickets,
intentionally stressing the feeder. We found however,
our representation based on the free-text can much
that the DC (direct current) HiPot testing, in particular,
more reliably determine the seriousness of events than
was not outperforming a “placebo” control group which
the (structured) trouble type code. Further, the type of
was scored by Con Edison to be equally “sick” but
information that is generally recorded in trouble tickets
on which no work was done (Figure 23). When a new
cannot easily fit into a limited number of categories,
AC (alternating current) test was added by Con Edison
and asking operators to choose the category under
to avoid some of the overtreatment, we were able to
time pressure is not practical. We have demonstrated
demonstrate that as the test was being perfected on the
that analysis of unstructured text is possible, and even
system, the performance level increased and has now
practical.
surpassed that of the control group. Indeed, operations
and distribution engineering at Con Edison has since
Machine Learning Ranking Methods Are Useful for
added a modified AC test that improved on the perfor-
Prioritization
mance of the control group also. This interaction among
Machine learning methods for ranking are relatively
machine learning, statistics, preventive maintenance pro-
new, and currently they are not used in many application
grams, and domain experts will likely identify overtreat-
domains besides information retrieval. So far, we have
ment in most utilities that are predominantly reactive
found that in the domain of electrical grid maintenance,
to failures now. That has been the experience in other
the key to success is in the interpretation and processing
industries, including those for which these techniques
of data, rather than in the exact machine learning
have been developed, such as automotive and aerospace,
method used; however, these new ranking methods are
the military, and healthcare.
designed exactly for prioritization problems, and it is
possible that these methods can offer an edge over older
methods in many applications. Furthermore, as data 10 C ONCLUSIONS
collection becomes more automated, it is possible that Over the next several decades we will depend more
the dependence on processing will lessen, and there on an aging and overtaxed electrical infrastructure. The
19

reliability of the future smart grid will depend heavily the City College of New York in 1975. He has 40 years of
on the new preemptive maintenance policies that are experience in Distribution Engineering Design and Planning
currently being implemented around the world. Our at Con Edison, and 3 years of experience in power quality and
testing of overhead radial equipment.
work provides a fundamental means for constructing Delfina F. Isaac is a quality assurance manager and was
intelligent automated policies: machine learning and previously a senior statistical analyst in the Engineering and
knowledge discovery for prediction of vulnerable Planning organization at Con Edison. She received both an
components. Our main scientific contribution is a M.S. in statistics in 2000 and a B.S. in applied mathematics
general process that can be used by power utilities and statistics in 1998 from the State University of New York at
Stony Brook.
for failure prediction and preemptive maintenance. Arthur Kressner is the president of Grid Connections, LLC.
We showed specialized versions of this process to He recently retired from the Consolidated Edison Company in
feeder ranking, feeder component ranking (cables, New York City with over 40 years experience, most recently as
joints, hammerheads, and transformers), MTBF/MTTF the director of Research and Development.
estimation, and manhole vulnerability ranking. We have Rebecca J. Passonneau is a senior research scientist at the
Center for Computational Learning Systems, Columbia Uni-
demonstrated, through direct application to the New versity, where she works on knowledge extraction from noisy
York City power grid, that data already collected by textual data, spoken dialogue systems, and other applications
power companies can be harnessed to predict, and to of computational linguistics. She received her doctorate from
thus assist in preventing, grid failures. the University of Chicago Department of Linguistics.
Axinia Radeva obtained an M.S. degree in electrical engineer-
Cynthia Rudin is an assistant professor in the Operations Re- ing from the Technical University at Sofia, Bulgaria, and a
search and Statistics group at the MIT Sloan School of Manage- second M.S. degree in computer science from Eastern Michigan
ment, and she is an adjunct research scientist at the Center for University. Axinia is a staff associate at Columbia University’s
Computational Learning Systems, Columbia University. She Center for Computational Learning Systems.
received a Ph.D. in applied and computational mathematics Leon Wu (Member, IEEE) is a Ph.D. candidate at the De-
from Princeton University and B.S. and B.A. degrees from the partment of Computer Science and a senior research associate
University at Buffalo. at the Center for Computational Learning Systems, Columbia
David Waltz (Senior Member, IEEE) is the director of the University. He received his M.S. and M.Phil. in computer
Center for Computational Learning Systems at Columbia Uni- science from Columbia University and B.Sc. in physics from
versity, with prior positions as president of NEC Research Sun Yat-sen University.
Institute, director of Advanced Information Systems at Think-
ing Machines Corp., and faculty positions at Brandeis Univer-
sity and the University of Illinois at Urbana-Champaign. He R EFERENCES
received all his degrees from MIT. He is a fellow and past
president of AAAI (Association for the Advancement of AI), [1] Office of Electric Transmission United States Depart-
and Fellow of the ACM. ment of Energy and Distribution. “Grid 2030” a national
Roger N. Anderson (Member, IEEE) is a senior scholar at the vision for electricity’s second 100 years, July 2003.
Center for Computational Learning Systems, Columbia Uni- [2] North American Electric Reliability Corporation (NERC).
versity. Roger received his Ph.D. from the Scripps Institution Results of the 2007 survey of reliability issues, revision 1,
of Oceanography, University of California at San Diego. October 2007.
Albert Boulanger received a B.S. in physics at the University of [3] S. Massoud Amin. U.S. electrical grid gets less reliable.
Florida, Gainesville, in 1979 and an M.S. in computer science at IEEE Spectrum Magazine, January 2011.
the University of Illinois, Urbana-Champaign, in 1984. Albert [4] M Chupka, R Earle, P Fox-Penner, and R Hledik. Trans-
is a senior staff associate at Columbia University’s Center for forming America’s power industry: The investment chal-
Computational Learning Systems. lenge 2010-2030. Technical report, The Brattle Group,
Ansaf Salleb-Aouissi joined Columbia University’s Center for Prepared for The Edison Foundation, Washington, D.C.,
Computational Learning Systems as an associate research sci- 2008.
entist after a postdoctoral fellowship at INRIA Rennes (France). [5] William J. Frawley, Gregory Piatetsky-Shapiro, and
She received M.S. and Ph.D. degrees from the University of Christopher J. Matheus. Knowledge discovery in
Orleans (France) and an engineer degree in computer science databases: an overview. AI Magazine, 13(3):57–70, 1992.
from the University of Science and Technology Houari Boume- [6] J. A. Harding, M. Shahbaz, Srinivas, and A. Kusiak. Data
diene (USTHB), Algeria. mining in manufacturing: A review. Journal of Manufac-
Maggie Chow is a section manager at Consolidated Edison of turing Science and Engineering, 128(4):969–976, 2006.
New York. Her responsibilities focus on lean management and [7] Ana Azevedo and Manuel Filipe Santos. KDD, SEMMA
system reliability. Maggie received her B.E. from City College and CRISP-DM: a parallel overview. In Proceedings of the
of New York and her masters degree from NYU-Poly. IADIS European Conf. Data Mining, pages 182–185, 2008.
Haimonti Dutta is an associate research scientist at the Center [8] Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic
for Computational Learning Systems, Columbia University. Smyth. From data mining to knowledge discovery in
She received her Ph.D. degree in computer science and electri- databases. AI Magazine, 17:37–54, 1996.
cal engineering (CSEE) from the University of Maryland. [9] Wynne Hsu, Mong Li Lee, Bing Liu, and Tok Wang
Philip Gross received his B.S. from Columbia University in Ling. Exploration mining in diabetic patients databases:
1999 and his M.S. from Columbia University in 2001. Philip is findings and conclusions. In Proceedings of the Sixth ACM
a software engineer at Google. SIGKDD International Conference on Knowledge Discovery
Bert Huang is a Ph.D. candidate in the Department of Com- and Data Mining (KDD), pages 430–436. ACM, 2000.
puter Science, Columbia University. He received M.S. and [10] Ron Kohavi, Llew Mason, Rajesh Parekh, and Zijian
M.Phil. degrees from Columbia University and B.S. and B.A. Zheng. Lessons and challenges from mining retail e-
degrees from Brandeis University. commerce data. Machine Learning, Special Issue on Data
Steve Ierome received a B.S. in electrical engineering from Mining Lessons Learned, 57:83–113, 2004.
20

[11] Andrew P. Bradley. The use of the area under the ROC feedback for a machine learning task. In Proceedings of
curve in the evaluation of machine learning algorithms. the International Conference on Machine Learning and Appli-
Pattern Recognition, 30(7):1145–1159, July 1997. cations, 2009.
[12] Cynthia Rudin. The P-Norm Push: A simple convex [28] Haimonti Dutta, Cynthia Rudin, Rebecca Passonneau,
ranking algorithm that concentrates at the top of the list. Fred Seibel, Nandini Bhardwaj, Axinia Radeva, Zhi An
Journal of Machine Learning Research, 10:2233–2271, October Liu, Steve Ierome, and Delfina Isaac. Visualization of man-
2009. hole and precursor-type events for the Manhattan electri-
[13] Cynthia Rudin and Robert E. Schapire. Margin-based cal distribution system. In Proceedings of the Workshop on
ranking and an equivalence between AdaBoost and Rank- Geo-Visualization of Dynamics, Movement and Change, 11th
Boost. Journal of Machine Learning Research, 10:2193–2232, AGILE International Conference on Geographic Information
October 2009. Science, Girona, Spain, May 2008.
[14] Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram [29] Nikos D. Hatziargyriou. Machine learning applications to
Singer. An efficient boosting algorithm for combining power systems. In Machine Learning and Its Applications,
preferences. Journal of Machine Learning Research, 4:933– pages 308–317, New York, NY, USA, 2001. Springer-Verlag
969, 2003. New York, Inc.
[15] Thorsten Joachims. A support vector method for mul- [30] Abhisek Ukil. Intelligent Systems and Signal Processing in
tivariate performance measures. In Proceedings of the Power Engineering. Power Engineering. Springer, 2007.
International Conference on Machine Learning (ICML), 2005. [31] Louis A. Wehenkel. Automatic learning techniques in power
[16] Harris Drucker, Chris J.C. Burges, Linda Kaufman, Alex systems. Springer, 1998.
Smola, and Vladimir Vapnik. Support vector regression [32] A. Saramourtsis, J. Damousis, A. Bakirtzis, and
machines. In Advances in Neural Information Processing P. Dokopoulos. Genetic algorithm solution to the
Systems, volume 9, pages 155–161. MIT Press, 1996. economic dispatch problem - application to the electrical
[17] Leo Breiman, Jerome Friedman, Charles J. Stone, and power grid of Crete island. In Proceedings of the Workshop
R.A. Olshen. CART: Classification and Regression Trees. on Machine Learning Applications to Power Systems (ACAI),
Wadsworth Press, 1983. pages 308–317, 2001.
[18] Leo Breiman. Random forests. Machine Learning, 45(1):5– [33] Yiannis A. Katsigiannis, Antonis G. Tsikalakis, Pavlos S.
32, October 2001. Georgilakis, and Nikos D. Hatziargyriou. Improved wind
[19] David R. Cox. Regression models and life-tables. Journal power forecasting using a combined neuro-fuzzy and
of the Royal Statistical Society, Series B (Methodological), artificial neural network model. In Proceedings of the 4th
34(2):187–220, 1972. Helenic Conference on Artificial Intelligence, (SETN), pages
[20] Phil Gross, Ansaf Salleb-Aouissi, Haimonti Dutta, and 105–115, 2006.
Albert Boulanger. Ranking electrical feeders of the New [34] P. Geurts and L. Wehenkel. Early prediction of electric
York power grid. In Proceedings of the International Confer- power system blackouts by temporal machine learning.
ence on Machine Learning and Applications (ICMLA), pages In Proceedings of the ICML98/AAAI98 workshop on predicting
725–730, 2009. the future: AI approaches to time series analysis, pages 21–28,
[21] Philip Gross, Albert Boulanger, Marta Arias, David L. 1998.
Waltz, Philip M. Long, Charles Lawson, Roger Anderson, [35] Louis Wehenkel, Mevludin Glavic, Pierre Geurts, and
Matthew Koenig, Mark Mastrocinque, William Fairechio, Damien Ernst. Automatic learning for advanced sensing,
John A. Johnson, Serena Lee, Frank Doherty, and Arthur monitoring and control of electric power systems. In
Kressner. Predicting electricity distribution feeder failures Proceedings of the Second Carnegie Mellon Conference in
using machine learning susceptibility analysis. In Proceed- Electric Power Systems, 2006.
ings of the Eighteenth Conference on Innovative Applications [36] Hsinchun Chen, Wingyan Chung, Jennifer Jie Xu, Gang
of Artificial Intelligence (IAAI), 2006. Wang, Yi Qin, and Michael Chau. Crime data mining: a
[22] Hila Becker and Marta Arias. Real-time ranking with general framework and some examples. IEEE Computer,
concept drift using expert advice. In Proceedings of the 37(4):50–56, 2004.
13th ACM SIGKDD International Conference on Knowledge [37] Bertrand Cornélusse, Claude Wera, and Louis Wehenkel.
Discovery and Data Mining (KDD), pages 86–94, 2007. Automatic learning for the classification of primary fre-
[23] Cynthia Rudin, Rebecca Passonneau, Axinia Radeva, Hai- quency control behaviour. In Proceedings of the IEEE Power
monti Dutta, Steve Ierome, and Delfina Isaac. A process Tech Conference, Lausanne, 2007.
for predicting manhole events in Manhattan. Machine [38] S. R. Dalal, D. Egan, and M. Rosenstein Y. Ho. The
Learning, 80:1–31, 2010. promise and challenge of mining web transaction data.
[24] Rebecca Passonneau, Cynthia Rudin, Axinia Radeva, and In R. Khatree and C. R. Rao, editors, Statistics in Industry
Zhi An Liu. Reducing noise in labels and features for a (Handbook of Statistics), volume 22. Elsevier, 2003.
real world dataset: Application of NLP corpus annotation [39] Paul R. Rosenbaum and Donald B. Rubin. The central role
methods. In Proceedings of the 10th International Conference of the propensity score in observational studies for causal
on Computational Linguistics and Intelligent Text Processing effects. Biometrika, 70(1):45–55, 1983.
(CICLing), 2009.
[25] Hamish Cunningham, Diana Maynard, Kalina Bontcheva,
and Valentin Tablan. GATE: A framework and graphical
development environment for robust NLP tools and ap-
plications. In Proceedings of the 40th Anniversary Meeting
of the Association for Computational Linguistics (ACL), July
2002.
[26] Pannaga Shivaswamy, Wei Chu, and Martin Jansche. A
support vector approach to censored targets. In Proceed-
ings of the International Conference on Data Mining (ICDM),
2007.
[27] Axinia Radeva, Cynthia Rudin, Rebecca Passonneau, and
Delfina Isaac. Report cards for manholes: Eliciting expert

You might also like