Machine Learning For The New York City Power Grid: Citation
Machine Learning For The New York City Power Grid: Citation
The MIT Faculty has made this article openly available. Please share
how this access benefits you. Your story matters.
Citation Rudin, Cynthia et al. “Machine Learning for the New York City
Power Grid.” IEEE Transactions on Pattern Analysis and
Machine Intelligence 34.2 (2012): 328-345.
As Published http://dx.doi.org/10.1109/tpami.2011.108
Publisher Institute of Electrical and Electronics Engineers
Abstract—Power companies can benefit from the use of knowledge discovery methods and statistical machine learning for preventive
maintenance. We introduce a general process for transforming historical electrical grid data into models that aim to predict the risk
of failures for components and systems. These models can be used directly by power companies to assist with prioritization of
maintenance and repair work. Specialized versions of this process are used to produce 1) feeder failure rankings, 2) cable, joint,
terminator and transformer rankings, 3) feeder MTBF (Mean Time Between Failure) estimates and 4) manhole events vulnerability
rankings. The process in its most general form can handle diverse, noisy, sources that are historical (static), semi-real-time, or real-time,
incorporates state-of-the-art machine learning algorithms for prioritization (supervised ranking or MTBF), and includes an evaluation of
results via cross-validation and blind test. Above and beyond the ranked lists and MTBF estimates are business management interfaces
that allow the prediction capability to be integrated directly into corporate planning and decision support; such interfaces rely on several
important properties of our general modeling approach: that machine learning features are meaningful to domain experts, that the
processing of data is transparent, and that prediction results are accurate enough to support sound decision making. We discuss the
challenges in working with historical electrical grid data that were not designed for predictive purposes. The “rawness” of these data
contrasts with the accuracy of the statistical models that can be obtained from the process; these models are sufficiently accurate to
assist in maintaining New York City’s electrical grid.
Index Terms—applications of machine learning, electrical grid, smart grid, knowledge discovery, supervised ranking, computational
sustainability, reliability
places where infrastructure must be continually replen- been developed for: 1) feeder failure ranking for dis-
ished due to natural disasters (for instance, Japan has tribution feeders, 2) cable section, joint, terminator and
earthquakes that force power systems to be replenished). transformer ranking for distribution feeders, 3) feeder
The smart grid will not be implemented overnight; MTBF (Mean Time Between Failure) estimates for dis-
to create the smart grid of the future, we must work tribution feeders, and 4) manhole vulnerability ranking.
with the electrical grid that is there now. For instance, Each specialized process was designed to handle data
according to the Brattle Group [4], the cost of updating with particular characteristics. In its most general form,
the grid by 2030 could be as much as $1.5 trillion. the process can handle diverse, noisy, sources that are
The major components of the smart grid will (for an historical (static), semi-real-time, or real-time; the process
extended period) be the same as the major components incorporates state of the art machine learning algorithms
of the current grid, and new intelligent meters must for prioritization (supervised ranking or MTBF), and
work with the existing equipment. Converting to a smart includes an evaluation of results via cross-validation on
grid can be compared to replacing worn parts of an past data, and by blind evaluation. The blind evalua-
airplane while it is in the air. As grid parts are replaced tion is performed on data generated as events unfold,
gradually and as smart components are added, the old giving a true barrier to information in the future. The
components, including cables, switches, sensors, etc., data used by the machine learning algorithms include
will still need to be maintained. Further, the state of past events (failures, replacements, repairs, tests, load-
the old components should inform the priorities for the ing, power quality events, etc.) and asset features (type
addition of new smart switches and sensors. of equipment, environmental conditions, manufacturer,
The key to making smart grid components effective is specifications, components connected to it, borough and
to analyze where upgrades would be most useful, given network where it is installed, date of installation, etc.).
the current system. Consider the analogy to human Beyond the ranked lists and MTBF estimates, we have
patients in the medical profession, a discipline for which designed graphical user interfaces that can be used by
many of the machine learning algorithms and techniques managers and engineers for planning and decision sup-
used here for the smart grid were originally developed port. Successful NYC grid decision support applications
and tested. While each patient (a feeder, transformer, based on our models are used to assist with prioritizing
manhole, or joint) is made up of the same kinds of repairs, prioritizing inspections, correcting of overtreat-
components, they wear and age differently, with variable ment, generating plans for equipment replacement, and
historic stresses and hereditary factors (analogous to prioritizing protective actions for the electrical distribu-
different vintages, loads, manufacturers) so that each tion system. How useful these interfaces are depends
patient must be treated as a unique individual. Nonethe- on how accurate the underlying predictive models are,
less individuals group into families, neighborhoods, and and also on the interpretation of model results. It is
populations (analogous to networks, boroughs) with rel- an important property of our general approach that
atively similar properties. The smart grid must be built machine learning features are meaningful to domain
upon a foundation of helping the equipment (patients) experts, in that the data processing and the way causal
improve their health, so that the networks (neighbor- factors are designed is transparent. The transparent use
hoods) improve their life expectancy, and the population of data serves several purposes: it allows domain experts
(boroughs) lives more sustainably. to troubleshoot the model or suggest extensions, it al-
In the late 1990’s, NYC’s power company, Con Edison, lows users to find the factors underlying the root causes
hypothesized that historical power grid data records of failures, and it allows managers to understand, and
could be used to predict, and thus prevent, grid failures thus trust, the (non-black-box) model in order to make
and possible associated blackouts, fires and explosions. decisions.
A collaboration was formed with Columbia University, We implicitly assume that data for the modeling tasks
beginning in 2004, in order to extensively test this hy- will have similar characteristics when collected by any
pothesis. This paper discusses the tools being developed power company. This assumption is broadly sound but
through this collaboration for predicting different types there can be exceptions; for instance feeders will have
of electrical grid failures. The tools were created for the similar patterns of failure across cities, and data are
NYC electrical grid; however, the technology is general probably collected in a similar way across many cities.
and is transferrable to electrical grids across the world. However, the levels of noise within the data and the
In this work, we present new methodologies for particular conditions of the city (maintenance history,
maintaining the smart grid, in the form of a general maintenance policies, network topologies, weather, etc.)
process for failure prediction that can be specialized are specific to the city and to the methods by which data
for individual applications. Important steps in the pro- are collected and stored by the power company.
cess include data processing (cleaning, pattern matching, Our goals for this paper are to demonstrate that data
statistics, integration), formation of a database, machine collected by electrical utilities can be used to create sta-
learning (time aggregation, formation of features and tistical models for proactive maintenance programs, to
labels, ranking methods), and evaluation (blind tests, show how this can be accomplished through knowledge
visualization). Specialized versions of the process have discovery and machine learning, and to encourage com-
3
A common quality measure in supervised ranking RankBoost [14], which uses the exponential loss and no
is the probability that a new pair of randomly chosen regularization; and the P-Norm Push [12]. The P-Norm
examples is misranked (see [14]), which should be min- Push uses price function g(z) = z p , which forces the
imized: value of the objective to be determined mainly by the
highest ranked negative examples when p is large; the
PD {misrank(fλ )}
power p acts as a soft max. Since most of the value of the
:= PD {fλ (x+ ) ≤ fλ (x− ) | y+ = 1, y− = −1}. (1) objective is determined by the top portion of the list, the
The notation PD indicates the probability with respect to algorithm concentrates more on the top. The full P-Norm
a random draw of (x+ , y+ ) and (x− , y− ) from distribu- Push algorithm is:
tion D on X ×{−1, +1}. The empirical risk corresponding λ∗ ∈ arg infλ Rp (λ) where
to (1) is the number of misranked pairs in the training p
set: X X
X X Rp (λ) := exp − [fλ (xi ) − fλ (xk )] .
R(fλ ) := 1[fλ (xi )≤fλ (xk )] {k:yk =−1} {i:yi =1}
{k:yk =−1} {i:yi =1}
Vector λ∗ is not difficult to compute, for instance by
= #misranks(fλ ). (2) gradient descent. The P-Norm Push is used currently in
The pairwise misranking error is directly related to the the manhole event prediction tool. An SVM algorithm
(negative of the) area under the ROC curve; the only dif- with `2 regularization is used currently in the feeder
ference is that ties are counted as misranks in (2). Thus, failure tool.
a natural ranking algorithm is to choose a minimizer of Algorithms designed via empirical risk minimization
R(fλ ) with respect to λ: are not designed to be able to produce density estimates,
that is estimates of P (y = 1|x), though in some cases it is
λ∗ ∈ argminλ∈Rn R(fλ ) possible, particularly when the loss function is smooth.
and to rank the components in the test set in descending These algorithms are instead designed specifically to
order of fλ∗ (x) := j λ∗j hj (x).
P produce an accurate ranking of examples according to
There are three shortcomings to this algorithm: first, these probabilities.
it is computationally hard to minimize R(fλ ) directly. It is important to note that the specific choice of
Second, the misranking error R(fλ ) considers all mis- machine learning algorithm is not the major component
ranks equally, in the sense that misranks at the top of success in this domain; rather, the key to success is
of the list are counted equally with misranks towards the data cleaning and processing as discussed in Section
the bottom, even though in failure prediction problems 3. If the machine learning features and labels are well
it is clear that misranks at the top of the list should constructed, any reasonable algorithm will perform well;
be considered more important. A third shortcoming is the inverse holds too, in that badly constructed features
the lack of regularization usually imposed to enable and labels will not yield a useful model regardless of the
generalization (prediction ability) in high dimensions. A choice of algorithm.
remedy for all of these problems is to use special cases For our MTBF application, MTBF is estimated indi-
of the following ranking objective that do not fall into rectly through failure rates; the predicted failure rate
any of the traps listed above: is converted to MTBF by taking the reciprocal of the
rate. Failure rate is estimated rather than MTBF for
X X numerical reasons: good feeders with no failures have an
R`g (fλ ) := g ` fλ (xi ) − fλ (xk ) +Ckλk2 , infinite MTBF. The failure rate is estimated by regression
{k:yk =−1} {i:yi =1} algorithms, for instance SVM-R (support vector machine
(3) regression) [16], CART (Classification and Regression
where g is called the price function and ` is called the Trees) [17], ensemble based techniques such as Random
loss function. R(fλ ) given in (2) is a special case of Forests [18], and statistical methods, e.g. Cox Propor-
R`g (fλ ) with `(z) = 1z≤0 and g(z) = z. The objec- tional Hazards [19].
tive is convex in λ when the exponential loss is used
`(z) = e−z [14], or the SVM (support vector machine)
hinge loss `(z) = (1 − z)+ [15]; several other convex loss 5 S PECIFIC P ROCESSES AND C HALLENGES
functions are also commonly used. The norm used in In this section, we discuss how the general process needs
the regularization term is generally either a norm in a to be adapted in order to handle data processing and ma-
Reproducing Kernel Hilbert space (for SVMs), P which in chine learning challenges specific to each of our electrical
the simplest casePis the `2 norm kλk22 = λ
j j
2
, or an reliability tasks in NYC. Con Edison currently operates
`1 norm kλk1 = j |λj |. The constant C can be set by the world’s largest underground electric system, which
cross-validation. delivers up to a current peak record of about 14,000 MW
Special cases of the objective (3) are: SVM Rank [15] of electricity to over 3 million customers. A customer
which uses the hinge loss, g(z) = z as the price function, can be an entire office building or apartment complex
and Reproducing Kernel Hilbert space regularization; in NYC so that up to 15 million people are served with
9
5.1 Feeder Ranking in NYC determine the features that affect reliability. For instance,
Con Edison data regarding the physical composition failure can be due to: concurrent or prior outages that
of feeders are challenging to work with; variations in stress the feeder and other feeders in the network; aging;
the database entry and rewiring of components from power quality events (e.g., voltage spikes); overloads
one feeder to another make it difficult to get a perfect (that have seasonal variation, like summer heat waves);
snapshot of the current state of the system. It is even known weak components (e.g., joints connecting PILC to
more difficult to get snapshots of past states of the other sections); at-risk topologies (where cascading fail-
system; the past state needs to be known at the time ures could occur); the stress of “HiPot” (high potential)
of each past failure because it is used in training the testing; and de-energizing/re-energizing of feeders that
machine learning algorithm. A typical feeder is com- can result in multiple failures within a short time span
posed of over a hundred cable sections, connected by a due to “infant mortality.” Other data scarcity problems
similar number of joints, and terminating in a few tens of are caused by the range in MTBF of the feeders; while
transformers. For a single feeder, these subcomponents some feeders are relatively new and last for a long time
are a hodgepodge of types and ages, for example a between failures (for example, more than five years),
brand-new cable section may be connected to one that is others can have failures within a few tens of days
many decades old; this makes it challenging to “roll-up” of each other. In addition, rare seasonal effects (such
the feeder into a set of features for learning. The features as particularly high summer temperatures) can affect
we currently use are statistics of the ages, numbers, and failure rates of feeders.
types of components within the feeder; for instance, we We have focused on the most serious failure type
have considered maxima, averages, and 90th percentiles for distribution feeders, where the entire feeder is auto-
(robust versions of the maxima). matically taken offline by emergency substation relays,
Dynamic data presents a similar problem to physi- due to some type of fault being detected by sensors.
cal data, but here the challenge is aggregation in time Our current system for generating data sets attempts
instead of space. Telemetry data are collected at rates to address the challenge of learning with rare positive
varying from hundreds of times per second (for power examples (feeder failures). An actual feeder failure inci-
quality data) to only a few measurements per day dent is instantaneous, so a snapshot of the system at that
(weather data). These can be aggregated over time, again moment will have only one failure example. To better
using functions such as max or average, using different balance the number of positive and negative examples in
time windows (as we describe shortly). Some of the time the data, we tried the rare event prediction setup shown
windows are relatively simple (e.g., aggregating over 15 in Figure 6, labeling any example that had experienced
or 45 days), while others take advantage of the system’s a failure over some time window as positive. However,
periodicity, and aggregate over the most recent data plus the dynamic features for these examples are constructed
data from the same time of year in previous years. from the timeframe before the prediction period, and
One of the challenges of the feeder ranking application thus do not represent the precise conditions at the time
is that of imbalanced data, or scarcity of data charac- of failure. This was problematic, as the domain experts
terizing the failure class, which causes problems with believed that some of the dynamic data might only have
generalization. Specifically, primary distribution feeders predictive value in the period right before the failure.
are susceptible to different kinds of failures, and we To solve this problem, we decided to switch to “time-
have very few training examples for each kind, making shifted” positive examples, where the positive examples
it difficult to reliably extract statistical regularities or are still created from the past outages within the predic-
10
7 M ANAGEMENT S OFTWARE
Prototype interfaces were developed jointly with Con
Edison in order to make the results useful, and to assist
in knowledge discovery.
Fig. 19. Screen capture of the Contingency Analysis Fig. 20. A screen capture of the Con Edison CAPT
Program tool during a 4th contingency event in the sum- evaluation, showing an improvement in MTBF from 140
mer of 2008, with the feeders at most risk of failing next to 192 days if 34 of the most at-risk PILC sections were
highlighted in red. The feeder ranking at the time of failure to be replace on a feeder in Brooklyn at an estimated cost
is shown in a blow-up ROC-like plot in the center. of $650,000.
reliability of the future smart grid will depend heavily the City College of New York in 1975. He has 40 years of
on the new preemptive maintenance policies that are experience in Distribution Engineering Design and Planning
currently being implemented around the world. Our at Con Edison, and 3 years of experience in power quality and
testing of overhead radial equipment.
work provides a fundamental means for constructing Delfina F. Isaac is a quality assurance manager and was
intelligent automated policies: machine learning and previously a senior statistical analyst in the Engineering and
knowledge discovery for prediction of vulnerable Planning organization at Con Edison. She received both an
components. Our main scientific contribution is a M.S. in statistics in 2000 and a B.S. in applied mathematics
general process that can be used by power utilities and statistics in 1998 from the State University of New York at
Stony Brook.
for failure prediction and preemptive maintenance. Arthur Kressner is the president of Grid Connections, LLC.
We showed specialized versions of this process to He recently retired from the Consolidated Edison Company in
feeder ranking, feeder component ranking (cables, New York City with over 40 years experience, most recently as
joints, hammerheads, and transformers), MTBF/MTTF the director of Research and Development.
estimation, and manhole vulnerability ranking. We have Rebecca J. Passonneau is a senior research scientist at the
Center for Computational Learning Systems, Columbia Uni-
demonstrated, through direct application to the New versity, where she works on knowledge extraction from noisy
York City power grid, that data already collected by textual data, spoken dialogue systems, and other applications
power companies can be harnessed to predict, and to of computational linguistics. She received her doctorate from
thus assist in preventing, grid failures. the University of Chicago Department of Linguistics.
Axinia Radeva obtained an M.S. degree in electrical engineer-
Cynthia Rudin is an assistant professor in the Operations Re- ing from the Technical University at Sofia, Bulgaria, and a
search and Statistics group at the MIT Sloan School of Manage- second M.S. degree in computer science from Eastern Michigan
ment, and she is an adjunct research scientist at the Center for University. Axinia is a staff associate at Columbia University’s
Computational Learning Systems, Columbia University. She Center for Computational Learning Systems.
received a Ph.D. in applied and computational mathematics Leon Wu (Member, IEEE) is a Ph.D. candidate at the De-
from Princeton University and B.S. and B.A. degrees from the partment of Computer Science and a senior research associate
University at Buffalo. at the Center for Computational Learning Systems, Columbia
David Waltz (Senior Member, IEEE) is the director of the University. He received his M.S. and M.Phil. in computer
Center for Computational Learning Systems at Columbia Uni- science from Columbia University and B.Sc. in physics from
versity, with prior positions as president of NEC Research Sun Yat-sen University.
Institute, director of Advanced Information Systems at Think-
ing Machines Corp., and faculty positions at Brandeis Univer-
sity and the University of Illinois at Urbana-Champaign. He R EFERENCES
received all his degrees from MIT. He is a fellow and past
president of AAAI (Association for the Advancement of AI), [1] Office of Electric Transmission United States Depart-
and Fellow of the ACM. ment of Energy and Distribution. “Grid 2030” a national
Roger N. Anderson (Member, IEEE) is a senior scholar at the vision for electricity’s second 100 years, July 2003.
Center for Computational Learning Systems, Columbia Uni- [2] North American Electric Reliability Corporation (NERC).
versity. Roger received his Ph.D. from the Scripps Institution Results of the 2007 survey of reliability issues, revision 1,
of Oceanography, University of California at San Diego. October 2007.
Albert Boulanger received a B.S. in physics at the University of [3] S. Massoud Amin. U.S. electrical grid gets less reliable.
Florida, Gainesville, in 1979 and an M.S. in computer science at IEEE Spectrum Magazine, January 2011.
the University of Illinois, Urbana-Champaign, in 1984. Albert [4] M Chupka, R Earle, P Fox-Penner, and R Hledik. Trans-
is a senior staff associate at Columbia University’s Center for forming America’s power industry: The investment chal-
Computational Learning Systems. lenge 2010-2030. Technical report, The Brattle Group,
Ansaf Salleb-Aouissi joined Columbia University’s Center for Prepared for The Edison Foundation, Washington, D.C.,
Computational Learning Systems as an associate research sci- 2008.
entist after a postdoctoral fellowship at INRIA Rennes (France). [5] William J. Frawley, Gregory Piatetsky-Shapiro, and
She received M.S. and Ph.D. degrees from the University of Christopher J. Matheus. Knowledge discovery in
Orleans (France) and an engineer degree in computer science databases: an overview. AI Magazine, 13(3):57–70, 1992.
from the University of Science and Technology Houari Boume- [6] J. A. Harding, M. Shahbaz, Srinivas, and A. Kusiak. Data
diene (USTHB), Algeria. mining in manufacturing: A review. Journal of Manufac-
Maggie Chow is a section manager at Consolidated Edison of turing Science and Engineering, 128(4):969–976, 2006.
New York. Her responsibilities focus on lean management and [7] Ana Azevedo and Manuel Filipe Santos. KDD, SEMMA
system reliability. Maggie received her B.E. from City College and CRISP-DM: a parallel overview. In Proceedings of the
of New York and her masters degree from NYU-Poly. IADIS European Conf. Data Mining, pages 182–185, 2008.
Haimonti Dutta is an associate research scientist at the Center [8] Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic
for Computational Learning Systems, Columbia University. Smyth. From data mining to knowledge discovery in
She received her Ph.D. degree in computer science and electri- databases. AI Magazine, 17:37–54, 1996.
cal engineering (CSEE) from the University of Maryland. [9] Wynne Hsu, Mong Li Lee, Bing Liu, and Tok Wang
Philip Gross received his B.S. from Columbia University in Ling. Exploration mining in diabetic patients databases:
1999 and his M.S. from Columbia University in 2001. Philip is findings and conclusions. In Proceedings of the Sixth ACM
a software engineer at Google. SIGKDD International Conference on Knowledge Discovery
Bert Huang is a Ph.D. candidate in the Department of Com- and Data Mining (KDD), pages 430–436. ACM, 2000.
puter Science, Columbia University. He received M.S. and [10] Ron Kohavi, Llew Mason, Rajesh Parekh, and Zijian
M.Phil. degrees from Columbia University and B.S. and B.A. Zheng. Lessons and challenges from mining retail e-
degrees from Brandeis University. commerce data. Machine Learning, Special Issue on Data
Steve Ierome received a B.S. in electrical engineering from Mining Lessons Learned, 57:83–113, 2004.
20
[11] Andrew P. Bradley. The use of the area under the ROC feedback for a machine learning task. In Proceedings of
curve in the evaluation of machine learning algorithms. the International Conference on Machine Learning and Appli-
Pattern Recognition, 30(7):1145–1159, July 1997. cations, 2009.
[12] Cynthia Rudin. The P-Norm Push: A simple convex [28] Haimonti Dutta, Cynthia Rudin, Rebecca Passonneau,
ranking algorithm that concentrates at the top of the list. Fred Seibel, Nandini Bhardwaj, Axinia Radeva, Zhi An
Journal of Machine Learning Research, 10:2233–2271, October Liu, Steve Ierome, and Delfina Isaac. Visualization of man-
2009. hole and precursor-type events for the Manhattan electri-
[13] Cynthia Rudin and Robert E. Schapire. Margin-based cal distribution system. In Proceedings of the Workshop on
ranking and an equivalence between AdaBoost and Rank- Geo-Visualization of Dynamics, Movement and Change, 11th
Boost. Journal of Machine Learning Research, 10:2193–2232, AGILE International Conference on Geographic Information
October 2009. Science, Girona, Spain, May 2008.
[14] Yoav Freund, Raj Iyer, Robert E. Schapire, and Yoram [29] Nikos D. Hatziargyriou. Machine learning applications to
Singer. An efficient boosting algorithm for combining power systems. In Machine Learning and Its Applications,
preferences. Journal of Machine Learning Research, 4:933– pages 308–317, New York, NY, USA, 2001. Springer-Verlag
969, 2003. New York, Inc.
[15] Thorsten Joachims. A support vector method for mul- [30] Abhisek Ukil. Intelligent Systems and Signal Processing in
tivariate performance measures. In Proceedings of the Power Engineering. Power Engineering. Springer, 2007.
International Conference on Machine Learning (ICML), 2005. [31] Louis A. Wehenkel. Automatic learning techniques in power
[16] Harris Drucker, Chris J.C. Burges, Linda Kaufman, Alex systems. Springer, 1998.
Smola, and Vladimir Vapnik. Support vector regression [32] A. Saramourtsis, J. Damousis, A. Bakirtzis, and
machines. In Advances in Neural Information Processing P. Dokopoulos. Genetic algorithm solution to the
Systems, volume 9, pages 155–161. MIT Press, 1996. economic dispatch problem - application to the electrical
[17] Leo Breiman, Jerome Friedman, Charles J. Stone, and power grid of Crete island. In Proceedings of the Workshop
R.A. Olshen. CART: Classification and Regression Trees. on Machine Learning Applications to Power Systems (ACAI),
Wadsworth Press, 1983. pages 308–317, 2001.
[18] Leo Breiman. Random forests. Machine Learning, 45(1):5– [33] Yiannis A. Katsigiannis, Antonis G. Tsikalakis, Pavlos S.
32, October 2001. Georgilakis, and Nikos D. Hatziargyriou. Improved wind
[19] David R. Cox. Regression models and life-tables. Journal power forecasting using a combined neuro-fuzzy and
of the Royal Statistical Society, Series B (Methodological), artificial neural network model. In Proceedings of the 4th
34(2):187–220, 1972. Helenic Conference on Artificial Intelligence, (SETN), pages
[20] Phil Gross, Ansaf Salleb-Aouissi, Haimonti Dutta, and 105–115, 2006.
Albert Boulanger. Ranking electrical feeders of the New [34] P. Geurts and L. Wehenkel. Early prediction of electric
York power grid. In Proceedings of the International Confer- power system blackouts by temporal machine learning.
ence on Machine Learning and Applications (ICMLA), pages In Proceedings of the ICML98/AAAI98 workshop on predicting
725–730, 2009. the future: AI approaches to time series analysis, pages 21–28,
[21] Philip Gross, Albert Boulanger, Marta Arias, David L. 1998.
Waltz, Philip M. Long, Charles Lawson, Roger Anderson, [35] Louis Wehenkel, Mevludin Glavic, Pierre Geurts, and
Matthew Koenig, Mark Mastrocinque, William Fairechio, Damien Ernst. Automatic learning for advanced sensing,
John A. Johnson, Serena Lee, Frank Doherty, and Arthur monitoring and control of electric power systems. In
Kressner. Predicting electricity distribution feeder failures Proceedings of the Second Carnegie Mellon Conference in
using machine learning susceptibility analysis. In Proceed- Electric Power Systems, 2006.
ings of the Eighteenth Conference on Innovative Applications [36] Hsinchun Chen, Wingyan Chung, Jennifer Jie Xu, Gang
of Artificial Intelligence (IAAI), 2006. Wang, Yi Qin, and Michael Chau. Crime data mining: a
[22] Hila Becker and Marta Arias. Real-time ranking with general framework and some examples. IEEE Computer,
concept drift using expert advice. In Proceedings of the 37(4):50–56, 2004.
13th ACM SIGKDD International Conference on Knowledge [37] Bertrand Cornélusse, Claude Wera, and Louis Wehenkel.
Discovery and Data Mining (KDD), pages 86–94, 2007. Automatic learning for the classification of primary fre-
[23] Cynthia Rudin, Rebecca Passonneau, Axinia Radeva, Hai- quency control behaviour. In Proceedings of the IEEE Power
monti Dutta, Steve Ierome, and Delfina Isaac. A process Tech Conference, Lausanne, 2007.
for predicting manhole events in Manhattan. Machine [38] S. R. Dalal, D. Egan, and M. Rosenstein Y. Ho. The
Learning, 80:1–31, 2010. promise and challenge of mining web transaction data.
[24] Rebecca Passonneau, Cynthia Rudin, Axinia Radeva, and In R. Khatree and C. R. Rao, editors, Statistics in Industry
Zhi An Liu. Reducing noise in labels and features for a (Handbook of Statistics), volume 22. Elsevier, 2003.
real world dataset: Application of NLP corpus annotation [39] Paul R. Rosenbaum and Donald B. Rubin. The central role
methods. In Proceedings of the 10th International Conference of the propensity score in observational studies for causal
on Computational Linguistics and Intelligent Text Processing effects. Biometrika, 70(1):45–55, 1983.
(CICLing), 2009.
[25] Hamish Cunningham, Diana Maynard, Kalina Bontcheva,
and Valentin Tablan. GATE: A framework and graphical
development environment for robust NLP tools and ap-
plications. In Proceedings of the 40th Anniversary Meeting
of the Association for Computational Linguistics (ACL), July
2002.
[26] Pannaga Shivaswamy, Wei Chu, and Martin Jansche. A
support vector approach to censored targets. In Proceed-
ings of the International Conference on Data Mining (ICDM),
2007.
[27] Axinia Radeva, Cynthia Rudin, Rebecca Passonneau, and
Delfina Isaac. Report cards for manholes: Eliciting expert