Optimal Power Flow Using Graph Neural Networks
Optimal Power Flow Using Graph Neural Networks
ABSTRACT proven to be NP hard [1, 3]. One approach is to apply convex relax-
ation techniques resulting in semi-definite programs [5]. There have
Optimal power flow (OPF) is one of the most important optimization been many attempts to apply machine intelligence to the problem.
problems in the energy industry. In its simplest form, OPF attempts The work in [6] offers a comprehensive review of such approaches.
to find the optimal power that the generators within the grid have to These include include applying evolutionary programming, evolu-
produce to satisfy a given demand. Optimality is measured with re- tionary strategies, genetic algorithms, artificial neural networks, sim-
spect to the cost that each generator incurs in producing this power. ulated annealing, fuzzy set theory, ant colony optimization and par-
The OPF problem is non-convex due to the sinusoidal nature of elec- ticle swarm optimization. However, none of these approaches has
trical generation and thus is difficult to solve. Using small angle been shown to work on networks larger than the 30 node IEEE test
approximations leads to a convex problem known as DC OPF, but case [7–9].
this approximation is no longer valid when power grids are heavily Recently, motivated by the possibility of accurately generating
loaded. Many approximate solutions have been since put forward, large amounts of data, machine learning methods have been consid-
but these do not scale to large power networks. In this paper, we ered as solutions to this problem. More specifically, [10] propose
propose using graph neural networks (which are localized, scalable a system that uses multiple stepwise regression [11] to imitate the
parametrizations of network data) trained under the imitation learn- output of ACOPF. Each node gathers information from a subset of
ing framework to approximate a given optimal solution. While the nodes, although not necessarily neighboring nodes. They then use
optimal solution is costly, it is only required to be computed for net- multiple stepwise regression to predict the optimal amount of power
work states in the training set. During test time, the GNN adequately generated at each node, imitating the ACOPF solution. However,
learns how to compute the OPF solution. Numerical experiments are this solution is not local, since it uses information from nodes that
run on the IEEE-30 and IEEE-118 test cases. are not adjacent in the network. The work in [12] uses a fully con-
Index Terms— graph neural networks, smart grids, optimal nected network (MLP) to imitate the output of ACOPF. Yet, MLPs
power flow, imitation learning are not local either, tend to overfit and have trouble scaling up. In-
stead, exploiting the structure of the problem is necessary for a scal-
able solution. An exact, albeit costly solution, can be obtained by
1. INTRODUCTION using interior point methods to solve the ACOPF problem, which, in
practice, converges to the optimal solution – though not always and
Optimal power flow (OPF) is one of the most important optimization without guarantee of optimality [1, 3].
problems for the energy industry [1]. It is used for system planning, In this work, we use imitation learning and graph neural net-
establishing prices on day-ahead markets, and to allocate generation works [13–15] to find a local and scalable solution to the OPF prob-
capacity efficiently throughout the day. Even though the problem lem. More specifically, we adopt a parametrized GNN model, which
was formulated over half a century ago, we still do not have a fast is local and scalable by design, and we train it to imitate the opti-
and robust technique for solving it, which would save tens of billions mal solution obtained using an interior point solver [16], which is
of dollars annually [1]. centralized and does not converge for large networks. Once trained,
In its simplest form, OPF attempts to find the optimal power the GNN offers an efficient computation of ACOPF. The paper is
that the generators within the grid have to produce to satisfy a given structured as follows. In Sec. 2 we describe the OPF problem and in
demand. Optimality is measured with respect to the cost that each Sec. 3 we introduce GNNs. In Sec. 4 we test the imitation learn-
generator incurs in producing the required power. Even though this ing framework on the IEEE-30 and IEEE-118 power system test
problem is at the heart of daily electricity grid operation, the sinu- cases [17] and in Sec. 5 we draw conclusions.
soidal nature of electrical generation makes it difficult to solve [1–3].
One alternative is to linearly approximate the problem by assum- 2. OPTIMAL POWER FLOW
ing constant bus voltage and taking small angle approximations for
trigonometric terms of voltage angle difference. This approach has Let G = (V, E, W) be a graph with a set of N nodes V, a set of
limitations. The small angle approximations fail for heavily-loaded edges E ⊆ V × V and an edge weight function W : E → R+ .
networks, since differences in voltage angles become large [4]. Nev- We use this graph to model the power grid, where the nodes are the
ertheless, this approximation is commonly used in industry [1]. The buses and the edge weights W(i, j) = wij model the lines, which
exact formulation of the problem is commonly referred to as ACOPF depend on the impedance zij (in ohms) between bus i and bus j.
and the approximation as DCOPF. More specifically, we use the Gaussian kernel wij = exp(−k|zij |2 )
Incorporating AC dynamics results in a non-convex problem due where k is a scaling factor. We ignore links whose weight is less
to nonlinearities in the power flow equations [2, 5] and has been than a threshold ω, so that E = {(i, j) ∈ V × V : wij > ω}.
Additionally, denote by W ∈ RN ×N the adjacency matrix of the
Supported by NSF CCF 1717120, ARO W911NF1710438, ARL DCIST network, [W]ij = wij if (i, j) ∈ E and 0 otherwise. Since the
CRA W911NF-17-2-0181, ISTC-WAS and Intel DevCloud. graph is undirected, the matrix W is symmetric.
Authorized licensed use limited to: IEEE Xplore. Downloaded on May 16,2020 at 22:36:23 UTC from IEEE Xplore. Restrictions apply.
To describe the state of the power grid, we assign to each node The objective of this paper is to learn the optimal solution in a
n a vector xn = [vn , δn , pn , qn ] ∈ R4 where vn and δn are the decentralized and scalable manner by exploiting a framework known
voltage magnitude and angle, respectively, and where pn and qn are as imitation learning. Denote by p∗ the optimal solution obtained
the active and reactive powers, respectively. We note that the state by IPOPT, by Φ(X) a (likely, nonlinear) map of the data and by L
of each node can be accurately measured locally. We can collect some given loss function. Then, in imitation learning we want to
these measurements across all nodes and denote them as v ∈ RN , solve
δ ∈ RN , p ∈ RN and q ∈ RN . Thus, the collection of the states at ∗
min E L p , Φ(X) (6)
all nodes becomes a matrix Φ
5931
Authorized licensed use limited to: IEEE Xplore. Downloaded on May 16,2020 at 22:36:23 UTC from IEEE Xplore. Restrictions apply.
xg`−1 WX`−1 W2 X`−1 W3 X`−1
W W W
H`0 H`1 H`2 H`3 K−1
X
Wk X`−1 H`k
k=0 X`
+ + + + σ`
Fig. 1. Graph neural networks. Every node takes its data value X`−1 and weighs it by H`0 (first graph). Then, all the nodes exchange
information with their one-hop neighbors to build WX`−1 , and weigh the result by H`1 (second graph). Next, they exchange their values
of WX`−1 again to build W2 X`−1 and weigh it by H`2 (third graph). This procedure continues for K steps until all Sk X`−1 H`k have
been computed for k = 0, . . . , K − 1, and added up to obtain the output of the graph convolution operation (10). Then, the nonlinearity σ`
applied to compute X` . To avoid cluttering, this operation is illustrated on only 5 nodes. In each case, the corresponding neighbors accessed
by successive relays of information are indicated by the colored disks.
with Hk ∈ RF ×G the matrix of coefficients for k = 0, . . . , K − 1. first one, allows the GNN to learn from fewer datapoints by exploit-
The output Y ∈ RN ×G is another graph signal with G features on ing the topological symmetries of the network, while the second one
each node. As we analyzed in (9), the operation WX computes a allows the network to have a good performance when used on differ-
linear combination of neighboring values. Likewise, repeated appli- ent networks than the one it was trained in, as long as these networks
cation of W computes a linear combination of values located farther are similar.
away, i.e. Wk X collects the feature values at nodes in the k-hop In summary, in this paper we propose using the GNN (11) as a
neighborhood. The value of Wk X = W(Wk−1 X) can be com- local and scalable model Φ(X; H, W) that we use to imitate the op-
puted locally by k repeated exchanges with the one-hop neighbor- timal solution p∗ . We find the best model parameters H by solving
hood. We note that multiplication by Hk on the right does not affect (7) over a given dataset T .
the locality of the graph convolution (10), since Hk acts by mixing
the features local to each node. That is, it takes the F input fea-
tures, and mixes them linearly to obtain G new features. To draw 4. NUMERICAL EXPERIMENTS
further analogies with filtering, we observe that the output Y of the
graph convolution (10) is the result of applying a bank of F G linear We construct two datasets based on the IEEE-30 and IEEE-118
shift-invariant graph filters [27]. power system test cases [17]. Each dataset sample consists of a
A graph neural network (GNN) is a nonlinear map Φ(X; H, W) given load, where pL ∈ RN , qL ∈ RN are the load components at
that is applied to the input X and takes into account the underlying each node, a given sub-optimal actual state of the power grid X, and
graph W. It consists of a cascade of L layers, each of them applying the optimal power generated at each node, p∗ (computed by means
a graph convolution (10) followed by a pointwise nonlinearity σ` of IPOPT). The total power at each node is the difference between
(see Fig. 1 for an illustration) the generated power, pG ∈ RN and qG ∈ RN , and the load at that
" K −1
`
# node, pL , qL . Certainly, only generators are capable of generating
X` = σ`
X k
W X`−1 H`k (11) power, while all other buses just consume their power load. These
k=0
are related to the OPF equations (3) and (4) as follows
for ` = 1, . . . , L, where X0 = X the input signal and H`k ∈ pm = [pG ]m − [pL ]m , m ∈ VG (12)
RF`−1 ×F` are the coefficients of the graph convolution (10) at layer
G L
` [13–15]. The state X` ∈ RN ×F` at each layer ` is a graph signal qm = [q ]m − [q ]m , m ∈ VG (13)
with F` features and we consider the state at the last layer XL to be pn = −[p ]n , L
n ∈ V\VG (14)
the output of the GNN Φ(X; H, W) = XL .
L
The computation of the state X` in each of the ` layers can qn = −[q ]n . n ∈ V\VG (15)
be carried out entirely in a local fashion, by means of repeated ex-
changes with one-hop neighbors. Also, the number of filter taps in The test cases provide reference load, pL L
ref and qref [17]. Each
H`k is F`−1 F` , independent of the size of the network, and thus, the given load is obtained synthetically as a sample from the uniform
GNN (11) is a scalable architecture [15]. This justifies the consid- distribution around the reference, following the same methodology
eration of the GNN (11) as the model of choice in (7), Φ(X; H) = used in [12]
Φ(X; H, W) with parameters H = {H`k , k = 0, . . . , K`−1 , ` =
1, . . . , L} totaling |H| = L
P
`=1 F`−1 F` K` , independent of the size pL ∼ Uniform(0.9 pL L
ref , 1.1 pref ) (16)
N of the network. Furthermore, GNNs exhibit the properties of per- L
mutation equivariance and stability to graph perturbations [21]. The q ∼ Uniform(0.9 qL
ref , 1.1 qL
ref ) (17)
5932
Authorized licensed use limited to: IEEE Xplore. Downloaded on May 16,2020 at 22:36:23 UTC from IEEE Xplore. Restrictions apply.
Table 1: The RMSE for each architecture and dataset.
5933
Authorized licensed use limited to: IEEE Xplore. Downloaded on May 16,2020 at 22:36:23 UTC from IEEE Xplore. Restrictions apply.
6. REFERENCES [16] L. Thurner, A. Scheidler, F. Schäfer, J. Menke, J. Dollichon,
F. Meier, S. Meinecke, and M. Braun, “Pandapower: An open-
[1] M. B. Cain, R. P. O’Neill, and A. Castillo, “History of op- source python tool for convenient modeling, analysis, and opti-
timal power flow and formulations,” Federal Energy Regula- mization of electric power systems,” IEEE Trans. Power Syst.,
tory Commission, Increasing Efficiency through Improved Soft- vol. 33, no. 6, pp. 6510–6521, Nov. 2018.
ware, pp. 1–31, Dec. 2012.
[17] R. D. Zimmerman, C. E. Murillo-Sánchez, and R. J. Thomas,
[2] B. C. Lesieutre and I. A. Hiskens, “Convexity of the set of fea- “MATPOWER: Steady-state operations, planning, and analy-
sible injections and revenue adequacy in FTR markets,” IEEE sis tools for power systems research and education,” IEEE
Trans. Power Syst., vol. 20, no. 4, pp. 1790–1798, Nov. 2005. Trans. Power Syst., vol. 26, no. 1, pp. 12–19, Feb. 2011.
[3] D. Bienstock and A. Verma, “Strong NP-hardness of AC power [18] A. Castillo and R. P. O’Neill, “Computational performance
flows feasibility,” Operations Research Letters, vol. 47, no. 6, of solution techniques applied to the ACOPF,” Federal En-
pp. 494–501, Nov. 2019. ergy Regulatory Commission, Increasing Efficiency through
[4] S. Chatzivasileiadis, “Lecture notes on optimal power flow Improved Software, pp. 1–34, Feb. 2013.
OPF,” arXiv:1811.00943v1 [cs.SY], 2 Nov. 2018. [19] A. Wächter and L. T. Biegler, “On the implementation of an
[5] D. K. Molzahn and I. A. Hiskens, “Convex relaxations of op- interior-point filter line-search algorithm for large-scale non-
timal power flow problems: An illustrative example,” IEEE linear programming,” Mathematical Programming, vol. 106,
Trans. Circuits Syst. I, vol. 63, no. 5, pp. 650–660, May 2016. no. 1, pp. 25–57, March 2006.
[6] M. R. AlRashidi and M. E. El-Hawary, “Applications of com- [20] G. Lan and Z. Zhou, “Algorithms for stochastic
putational intelligence techniques for solving the revived opti- optimization with functional or expectation constraints,”
mal power flow problem,” Electric Power Syst. Research, vol. arXiv:1604.03887v7 [match.OC], 8 Aug. 2019.
79, no. 4, pp. 694–702, Apr. 2009. [21] F. Gama, J. Bruna, and A. Ribeiro, “Stability properties of
[7] A. G. Bakirtzis, P. N. Biskas, C. E. Zoumas, and V. Petridis, graph neural networks,” arXiv:1905.04497v2 [cs.LG], 4 Sep.
“Optimal power flow by enhanced genetic algorithm,” IEEE 2019.
Trans. Power Syst., vol. 17, no. 2, pp. 229–236, May 2002. [22] M. Eisen and A. Ribeiro, “Optimal wireless resource
[8] M. Todorovksi and D. Rajicic, “A power flow method suitable allocation with random edge graph neural networks,”
for solving OPF problems using genetic algorithms,” in The arXiv:1909.01865v2 [eess.SP], 3 Oct. 2019.
IEEE Region 8 EUROCON 2003, Ljubljana, Slovenia, 22-24 [23] E. Tolstaya, F. Gama, J. Paulos, G. Pappas, V. Kumar, and
Sep. 2003, pp. 215–219, IEEE. A. Ribeiro, “Learning decentralized controllers for robot
[9] C.-R. Wang, H.-J. Yan, Z.-Q. Huang, J.-W. Zhang, and C.- swarms with graph neural networks,” in Conf. Robot Learning
J. Sun, “A modified particle swarm optimization algorithm 2019, Osaka, Japan, 30 Oct.-1 Nov. 2019, Int. Found. Robotics
and its application in optimal power flow problem,” in 4th Res.
Int. Conf. Machine Learning Cybernetics, Guangzhou, China, [24] A. Sandryhaila and J. M. F. Moura, “Discrete signal processing
7 Nov. 2005, pp. 2885–2889, IEEE. on graphs,” IEEE Trans. Signal Process., vol. 61, no. 7, pp.
[10] R. Dobbe, O. Sondermeijer, D. Fridovich-Keil, D. Arnold, 1644–1656, Apr. 2013.
D. Callaway, and C. Tomlin, “Towards distributed energy ser- [25] D. I Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Van-
vices: Decentralizing optimal power flow with machine learn- dergheynst, “The emerging field of signal processing on
ing,” arXiv:1806.06790v3 [cs.LS], 13 Aug. 2019. graphs: Extending high-dimensional data analysis to networks
[11] O. Sondermeijer, R. Dobbe, D. Arnold, C. Tomlin, and and other irregular domains,” IEEE Signal Process. Mag., vol.
T. Keviczky, “Regression-based inverter control for de- 30, no. 3, pp. 83–98, May 2013.
centralized optimal power flow and voltage regulation,” [26] A. Ortega, P. Frossard, J. Kovačević, J. M. F. Moura, and
arXiv:1902.08594v1 [cs.SY], 20 Feb. 2019. P. Vandergheynst, “Graph signal processing: Overview, chal-
[12] N. Guha, Z. Wang, and A. Majumdar, “Machine learning for lenges and applications,” Proc. IEEE, vol. 106, no. 5, pp. 808–
AC optimal power flow,” in 36th Int. Conf. Mach. Learning, 828, May 2018.
Long Beach, CA, 9-15 June 2019. [27] S. Segarra, A. G. Marques, and A. Ribeiro, “Optimal graph-
[13] J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun, “Spectral net- filter design and applications to distributed linear network op-
works and deep locally connected networks on graphs,” in 2nd erators,” IEEE Trans. Signal Process., vol. 65, no. 15, pp.
Int. Conf. Learning Representations, Banff, AB, 14-16 Apr. 4117–4131, Aug. 2017.
2014, pp. 1–14, Assoc. Comput. Linguistics.
[14] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolu-
tional neural networks on graphs with fast localized spectral fil-
tering,” in 30th Conf. Neural Inform. Process. Syst., Barcelona,
Spain, 5-10 Dec. 2016, pp. 3844–3858, Neural Inform. Pro-
cess. Foundation.
[15] F. Gama, A. G. Marques, G. Leus, and A. Ribeiro, “Convo-
lutional neural network architectures for signals supported on
graphs,” IEEE Trans. Signal Process., vol. 67, no. 4, pp. 1034–
1049, Feb. 2019.
5934
Authorized licensed use limited to: IEEE Xplore. Downloaded on May 16,2020 at 22:36:23 UTC from IEEE Xplore. Restrictions apply.