00 Anderson 2017 Very Good Idea For Profile Comparison
00 Anderson 2017 Very Good Idea For Profile Comparison
a r t i c l e i n f o a b s t r a c t
Article history: This paper assesses the feasibility of determining key household characteristics based on temporal load profiles of
Received 16 October 2015 household electricity demand. It is known that household characteristics, behaviours and routines drive a number
Received in revised form 19 April 2016 of features of household electricity loads in ways which are currently not fully understood. The roll out of domestic
Accepted 15 June 2016
smart meters in the UK and elsewhere could enable better understanding through the collection of high temporal
Available online 1 July 2016
resolution electricity monitoring data at the household level. Such data affords tremendous potential to invert the
Keywords:
established relationship between household characteristics and temporal load profiles. Rather than use household
Census characteristics as a predictor of loads, observed electricity load profiles, or indicators based on them, could instead
Smart meter be used to impute household characteristics. These micro level imputed characteristics could then be aggregated
Transactional data at the small area level to produce ‘census-like’ small area indicators. This work briefly reviews the nature of current
Big data and future census taking in the UK before outlining the household characteristics that are to be found in the UK cen-
Households sus and which are also known to influence electricity load profiles. It then presents descriptive analysis of a large scale
smart meter-like dataset of half-hourly domestic electricity consumption before reviewing the correlation between
household attributes and electricity load profiles. The paper then reports the results of multilevel model-based anal-
ysis of these relationships. The work concludes that a number of household characteristics of the kind to be found in
UK census-derived small area statistics may be predicted from particular load profile indicators. A discussion of the
steps required to test and validate this approach and the wider implications for census taking is also provided.
© 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/).
1. Energy monitoring for a ‘Smart Census’ least two years between data collection and reporting (Dugmore,
Furness, Leventhal, & Moy, 2011b).
Area based population statistics in the United Kingdom (UK) have Currently considered approaches for the future provision of popula-
historically been derived from the decadal census of housing and popu- tion statistics include decennial census-taking, more frequent social sur-
lation. In addition to basic demographic statistics, the socio-economic veys or administrative (Government held) data linkage and aggregation
information collected is used to produce robust small area estimates (ONS, 2013). In contrast, this work explores the possibility of deriving
of a range of characteristics for every neighbourhood. Representing ‘a small area estimates of traditional socio-economic indicators from ‘dig-
definitive snapshot of the nation’ (Calder & Teague, 2013) this data pro- ital trace’ or transactional data collected by utility (or other) services as
vides a backbone for commercial, academic and social research as well part of normal service provision. As a number of recent authors have
as policy analysis, a decadal ‘re-grouping’ and ‘re-basing’ of all small noted large-scale geo-coded transactional datasets, such as those col-
area population projections statistics (Norman, 2013) and, crucially, na- lected in the retail, telecommunications, finance and utilities sectors
tional and local resource allocation (Eurostat, 2011; Norman, 2013). could offer opportunities to supplement census based small area statis-
Nonetheless, the UK census has also faced criticism as a costly and fre- tics by supporting the delivery of area-based population statistics, and
quently outdated source of population statistics, with a time lag of at generating novel indicators at a neighbourhood level (Deville et al.,
2014; Dugmore et al., 2011b; Struijs, Braaksma, & Daas, 2014). For the
United Kingdom Statistics Authority, via its executive office the Office
for National Statistics (ONS) in England and Wales, the use of commer-
⁎ Corresponding author.
E-mail addresses: [email protected] (B. Anderson), [email protected] (S. Lin),
cial data to support census taking may therefore help address census
[email protected] (A. Newing), [email protected] (A. Bahaj), users' requests for more frequent and timely reporting of census-type
[email protected] (P. James). statistics in the intercensal periods.
http://dx.doi.org/10.1016/j.compenvurbsys.2016.06.003
0198-9715/© 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
B. Anderson et al. / Computers, Environment and Urban Systems 63 (2017) 58–67 59
Recent related work suggests that commercial ‘big data’ could both households and their usual residents - determining characteristics
support near real time census taking and also provide unique insights such as ethnic composition, education, socio-economic status, religion
into household or individual behaviours (Carroll, Lyons, & Denny, and employment – offer greatest value to the academic and commercial
2014; Claxton, Reades, & Anderson, 2012; Deville et al., 2014; sector.
Douglass, Meyer, Ram, Rideout, & Song, 2015; Dugmore et al., 2011b; Census data are not made available at the individual household level
Pucci, Manfredini, & Tagliolato, 2015). In this work we consider house- but are published as non-disclosive aggregated counts within a hierar-
hold level data held by a range of utility companies before focusing in chy of ‘output zones’ or areas. These are built from unit postcodes, de-
particular on smart meter derived electricity consumption data. Com- signed for the release of aggregate population statistics and represent
pared to a number of other forms of potentially useful ‘big data’, a small areas ranging from Output Areas (OAs – typically containing
grid-connected electricity supply is almost universally available in the around 125 households) through to local authority districts (LADs) or
UK, almost universally connected to domestic dwellings and metering Unitary Authorities (UAs). The former represents an important analyti-
of consumption is mandatory. Furthermore the planned universal roll- cal unit for resource allocation and policy making at the local level, espe-
out of electricity smart meters collecting at least half-hourly consump- cially within the commercial sector (Dugmore, 2013; ONS, 2014a). It is
tion data (DECC, 2013) means that consideration of the value of this combination of universal geographic coverage at the small area
suitably anonymised and aggregated smart meter data in the produc- level coupled with detailed attribute data that represents a major
tion of official statistics is now timely. strength of the census (House of Commons Treasury Committee, 2008).
The use of this kind of data for market segmentation and other elec- However, inevitably increasing costs, difficulties of ensuring full re-
tricity related services has been noted in the literature (McKenna, sponse, concerns over the decadal reporting cycle and the two year
Richardson, & Thomson, 2012) and was noted by Dugmore et al. time-lag between census-taking and the delivery of initial outputs has
(Dugmore et al., 2011b) in the context of future census data collection. given rise to a search for alternatives (Dugmore et al., 2011b). This
However, as far as we are aware only one published study has investi- work has been conducted by the ONS ‘Beyond 2011’ programme
gated its potential in the development of official and/or small area sta- (ONS, 2014a) and, together with subsequent reviews of international
tistics (Carroll et al., 2014). A growing literature suggests that census taking practice (see for example Dugmore, Furness, Leventhal,
household level electricity load data, collected via smart metering, and Moy (2011a); and Martin (2006)), has highlighted a variety of ap-
could provide considerable opportunities to infer household character- proaches to collecting area based statistics including the use of govern-
istics (Beckel, Sadamori, & Santini, 2012; Newing, et al., 2015; Struijs et mental administrative sources (e.g. Netherlands and Denmark) or a
al., 2014) The link between household characteristics and household rolling census (France). However the work also showed that a number
energy consumption is long established and the literature recognises of options under consideration by ‘Beyond 2011’, particularly those
that household characteristics will give rise to different load profiles driven by administrative data, were unable to provide the level of
and subsequent demand on the electricity supply network (e.g. see socio-economic attribute data that many census users rely upon for
McLoughlin, Duffy, & Conlon, 2013 for a summary). Consequently, the commercial analysis, policy making and resource allocation (Calder &
energy sector uses household or area based indicators of household Teague, 2013; ONS, 2014a). Additionally, concerns have been raised
composition and characteristics to predict electricity ‘demand’ in order over the likely success and practicalities of a census based on an admin-
to manage networks and target interventions designed to reduce or istrative or register based system given the lack of a population register
time-shift peak loads (e.g. see Elexon, 2013; Hamidi, Li, & Robinson, within the UK (Skinner, Hollis, & Murphy, 2013).
2009; Wright & Firth, 2007). Based on the recommendations of the Beyond 2011 program (ONS,
The purpose of this work is to explore the value of inverting this ap- 2014a) on, extensive user consultation (ONS, 2014b) and an indepen-
proach to assess the feasibility of using observed high temporal resolu- dent review (Skinner et al., 2013), the UK Statistics Authority recom-
tion electricity consumption data to infer household characteristics as a mended to parliament that a ‘traditional’ decadal census should be
first step in the aggregation of household characteristics to form ‘nor- carried out in 2021 (Dilnot, 2014). They also noted that this should be
mal’ area level population statistics. It should be emphasised therefore primarily carried out online and that the considerable potential of
that the overall objective is not to characterise or ‘profile’ individual utilising administrative data and larger scale household surveys as a
households, rather we seek to aggregate inferred household character- supplement to census based statistics should be developed further
istics to develop area based ‘neighbourhood’ indicators similar to or in (Dilnot, 2014).
combination with Census estimates or other appropriate datasets. Whilst recognising that data held by commercial organisations may
This work briefly reviews the future provision of area based statistics offer more cost effective or timely reporting (Dugmore et al., 2011b),
in the UK, recognising the opportunities to enhance or supplement the this avenue has received far less attention and discussion has tended
census taking process with digital trace data. It then considers the ex- to refer only to ‘customer information’ recorded in customer service da-
tent to which digital trace data from the commercial sector could repre- tabases and/or retail transaction data. As far as we are aware, commer-
sent a novel tool to generate census type small-area statistics, before cial data does not currently feature within the national statistical census
focusing on the use high resolution electricity consumption monitoring taking or population statistics of any nation. As Struijs et al. (2014) note
data collected via smart metering. Based on preliminary analyses of a such data could be used to provide substantial additional data over and
‘smart meter-like’ dataset the research highlights the potential value above basic address listings.
of the approach and then discusses significant challenges and concludes
by setting out a research programme which could systematically test 3. Smart meters for a Smart Census
the value of the approach.
The nascent roll-out of domestic electricity smart meters in a num-
2. Future provision of area based population statistics in the UK ber of major markets including the US, China, Brazil, India and Japan
(Deloitte, 2011) and the UK (DECC, 2012) provides an opportunity for
As a consistent and robust source of small area population statistics, the exploration of precisely the scenario described above.
the United Kingdom census is used to allocate billions of pounds of gov- In the UK, smart meters incorporate communication infrastructure
ernment and commercial investment at the local level. It represents a allowing them to transmit near real-time energy usage data to in
fundamental tool for market research, policy making, commercial deci- home display units (IHDs), to energy demand service operators selected
sion making, resource allocation and for academic research (ONS, 2013; by the customer and to a centralised data retrieval service to extract
Watson, 2009). Estimates of population counts by age and sex are a key half-hourly data from all smart meters for use by energy suppliers (bill-
census output, yet the detailed attribute information related to ing and fraud prevention); network operators (network management)
60 B. Anderson et al. / Computers, Environment and Urban Systems 63 (2017) 58–67
or other authorised third parties (e.g. switching agencies). Unlike other Table 1
household level transactional data sources, the universal coverage and Irish CER Smart Meter Trial Household samples.
Table 2
Descriptive statistics for mid-week electricity consumption in kWh for the household sample (mid-week days) – October 2009.
Lomas, Wright, & Wall, 2008; Owen, 2012; Wright, 2008). Temporal pe- economic well-being and the lack of information on income available
riods where the greatest inter-household variability in load profiles may via population statistics is frequently cited as a weakness (See e.g.
be evident, such as the evening peak period, could offer greatest value in Dugmore et al., 2011b). The CER survey recorded household response
identifying household characteristics, on the assumption that it is differ- person's self-reported net annual income via income bands.
ences in those characteristics which are likely to drive differences in Employment status is reported via population statistics in relation to
household behaviour and routines, and thus loads at these times of economic activity, and forms an important tool at the local, regional and
the day. Recent findings (Newing et al., 2015) suggest that the load pro- national level for policy making and intervention, enabling household
files for our study households exhibit a number of key features which classification and acting as a predictor of behaviours and routines. The
may assist in differentiating between households based on their literature clearly identifies that employment status impacts upon the
characteristics. timing of electricity loads (Table 4). Household response person (HRP)
employment status formed part of the CER survey with over 59% of
4.2. Linked household survey data study HRPs in employment (incorporating full time, part time, freelance
and self-employment), almost 30% were retired, with the remaining
The CER electricity consumption dataset is linked to survey data that 11% representing HRPs not in active employment through unemploy-
incorporates a number of household characteristics of potential interest ment, study or full time care duties. The latter categories have been
in the production of small area population statistics. These include combined with retired households for subsequent analysis, giving two
householder and dwelling characteristics directly comparable with groups; ‘Employed’ and ‘Not in active employment’. Householder employ-
existing area based population statistics (such as dwelling type, number ment status should, however, be treated with some caution. Self-report-
of residents and employment status) alongside indicators not currently ed employment status must be treated as an indicator only as response
part of small area data collection but of considerable relevance to policy categories provided by the survey did not account for the full range of
makers (such as income), as Table 4 makes clear. nuanced employment patterns that may exist, such as homeworking
For policy applications, dwelling size (and in particular the number and flexible working arrangements which would impact considerably
of bedrooms) is an important indicator of household level overcrowd- on behaviours, routines and domestic electricity loads.
ing, used primarily by local authorities to tackle housing issues. Follow- It would, however, be an oversimplification to suggest that charac-
ing the 2011 census, this information has been reported as an teristics such as these could be predicted solely on the basis of house-
‘occupancy rating’ (relating the number of bedrooms to the number of hold electricity consumption. The literature clearly identifies that load
usually resident occupants) and notions of household overcrowding profiles are also a function of the number of household residents, and
and under-occupancy have also become policy relevant in the wake of the household composition, the latter referring to the age structure
the welfare reforms in the UK, whereby available benefits are cut if and presence of children which may drive routines associated with ed-
claimants have a spare bedroom within a council or housing association ucation, for example (Druckman & Jackson, 2008; Firth et al., 2008;
provided home (Ramsden, 2014). The existing literature provides evi- Owen, 2012; Wright, 2008; Zimmerman et al., 2012). The survey
dence that dwelling floor area is linked to electricity consumption dataset collected information on household composition, noting the
(Table 4) and we assess the extent to which, based on our sample, presence of children and presence of seniors, plus a count of the number
load profiles can be used to infer household floor area. This could in of household residents. Information of this nature is commonly collect-
turn be used to estimate the number of bedrooms in order to generate ed via the Census, household social surveys and a range of administra-
a more policy-relevant indicator. tive datasets. ONS recommendations to parliament following the
Household income does not form part of small area population sta- ‘Beyond2011’ programme noted the important role of administrative
tistics, yet represents an indicator of considerable value to policy makers data as a future source of information on household composition with
and the commercial sector, with its link to electricity loads well potential to provide population counts and basic household composi-
established (Table 4). In spite of frequent calls for its inclusion, plans tion at the small-area or address level (ONS, 2013, 2014a). Thus within
to collect this information within the 2011 UK census were dropped this analysis we do not attempt to predict these characteristics; rather
amidst concerns of under-response driven by the perceived intrusion they represent predictors that we assume would be available at the
posed by an income question. Income is an important indicator of small area level from administrative data. In our analysis, we therefore
Table 3
Descriptive statistics for half hourly mid-week electricity consumption in kWh for the household sample (mid-week days) by number of residents – October 2009.
N households N half-hours Mean total consumption per household Mean (half-hours) SD Median Skew Kurtosis
Fig. 1. Mean half-hourly electricity consumption per half hour (Tuesday – Thursday) by self-reported employment status of household response person. Source: Authors' calculation using
Irish CER Smart Meter Trial data October 2009 (n = 3488).
incorporate basic household composition alongside energy monitoring datasets often requires specialist high performance computing equip-
data to infer additional household characteristics of interest (income, ment and tools (See e.g. Thumim, Wilcox, & Roberts, 2013) or aggrega-
floor area and employment status). The following section outlines a se- tion and summary of time series data prior to analysis (Carroll et al.,
ries of indicators that can be used to summarise electricity loads for use 2014; McLoughlin, 2013). Even for just 3488 households over a four-
in subsequent analysis. week period, the thirty minute resolution measurement generated
over 4.6 million records. To simplify the analysis we used a series of pa-
5. Profile indicators and household characteristics rameters or ‘profile indicators’ to summarise some of the temporal and
magnitudinal features of household load profiles, facilitating compari-
Smart-meter like datasets such as the CER study present a number of son between households whilst also reducing the volume of data to be
challenges related to data storage, manipulation and analysis (Graham processed.
& Shelton, 2013). Manipulating and processing smart-meter derived The literature provides a number of examples of indicators de-
rived from load profiles as listed in Table 5. These indicators consider
characteristics including load magnitude (base load, peak load),
Table 4
Selected household characteristics collected or potentially collected by the Census togeth- summary statistics (e.g. mean load), temporal properties such as
er with evidence of their relationship to load profiles. the timing and duration of key features (e.g. time of use [max]) and
ratios of, for example, peak to off-peak loads. Thus profile indicators
Census 2011 household Existing evidence for links to load
level variables* profiles provided a series of summary measures for each household whilst
also helping preserve household privacy and removing redundant
Household Number of persons (Beckel et al., 2013)
Presence of person with
data. The process also considerably smoothed data on a household-
limiting long term by-household basis, reducing the impact of very rare or atypical
illness high load events (Williams, 2013). Nevertheless, the literature sug-
Number of children (Yohanis, Mondol, Wright, & Norton, gests that profile indicators maintain the ability to differentiate be-
2008)
tween households based on key features of their loads, such as the
Age distributions of all
persons magnitude or timing of their peak load (McLoughlin, 2013). The
Dwelling Household dwelling (Firth et al., 2008; McLoughlin et al., use of profile indicators could thus offer considerable advantages if
type 2012) this form of analysis were up-scaled to incorporate far larger samples
Household tenure (Druckman & Jackson, 2008) of households and time series of the order of months rather than
Number of (bed)rooms dwelling floor area as a proxy
Number of cars/vans
weeks, with a commensurate increase in the volume of data to be
Presence of and fuel (McLoughlin et al., 2013) stored, manipulated and handled.
used for heating As noted above we calculated the profile indicators listed in Table 1
Householder Ethnic group/country of over the midweek day (Tuesday – Thursday) periods based on the as-
birth of HRP/main
sumption that habits and routines associated with employment or
language
Age of HRP (McLoughlin et al., 2013) study, which could reveal important household characteristics, will be
NS-SEC of household (Druckman & Jackson, 2008; Hughes & more evident on weekdays. We have excluded Mondays and Fridays
reference person (HRP) Moreno, 2013; McLoughlin et al., 2013) as these represent transition points with the weekend and households
Economic activity of (Yohanis et al., 2008; McLoughlin et al., may thus exhibit atypical weekday behaviours.
HRP/h worked 2013)
Since all indicators summarise characteristics of the same load pro-
HRP Education level
Marital Status files, there may be a tendency for indicators to be strongly associated
Other with each other, especially where they represent similar measures of
Dwelling floor area (Beckel et al., 2013; Craig, Gary Polhill, magnitude. The ‘Morning Maximum’, ‘Total Power Consumed’ and
Dent, Galan-Diaz, & Heslop, 2014;
‘97.5th percentile load’ are likely to be strongly correlated and therefore
McLoughlin et al., 2013)
Household Income (Beckel et al., 2013; Craig et al., 2014; care is used when applying these indicators in subsequent analysis, en-
McLoughlin et al., 2013) suring that highly correlated indicators are not incorporated together
Daily consumption (Haben, Ward, Greetham, Singleton, & within regressions or classifications. However, no indicators have been
profile shape Grindrod, 2014) discounted as both the literature and prior exploratory analyses sug-
*
ONS. “Census 2001: Definitions.” London, 2004 gests that these indicators may reveal different household
B. Anderson et al. / Computers, Environment and Urban Systems 63 (2017) 58–67 63
Table 5
Parameters or ‘profile indicators’ to describe magnitude and temporal characteristics of load profiles.
Base load Mean load 2 am–5 am (Yohanis et al., 2008) Number of Residents,
size of dwelling
97.5th percentile 97.5th percentile of ranked load – used rather than peak load which often represents an (Price, 2010) Income, employment
Load extreme peak value, driven by very short-term use of high power equipment status
Load factor Ratio of mean daily load to maximum daily load (Carroll et al., 2014; Employment status,
McLoughlin et al., 2012) presence of children
Lunchtime load Mean load between midday and 2 pm (Chicco, Napoli, Postolache, Presence of seniors
Scutariu, & Toader, 2001)
Mean load Mean load across all timestamps (Beckel et al., 2012; Yohanis et Number of residents
al., 2008)
Morning maximum Maximum load between 6 am and 10.30 am (Carroll et al., 2014) Presence of children
Evening Mean load during the evening peak (4 pm–8 pm) relative to the mean load at all other times of (Powells, Bulkeley, Bell, & Employment status
consumption the day Judson, 2014)
Factor (ECF)
Total power Total power consumption (kWh) during the study period (McLoughlin, 2013) Number of residents,
consumed income
The second step was then to reverse the modelling process and test
u0i N 0; σ 20u
that ability of the load profile indicators to correctly predict household
attributes. As noted above it was assumed that the number of residents
and the number of children was already known through potentially
Table 6 available administrative data sources and the work reported here focus-
Coding of explanatory variables for multilevel models. es only on the household response person's employment status as an
Explanatory variable Coding scheme exemplar.
Income 6 income bands:
A logistic regression approach was therefore used to estimate the
b7,500 euro probability that a Household Response Person (HRP) was not in paid
b22,500 work on the basis of number of residents, the number of children and
b40,000 the profile indicators selected as being most likely to be of value in
b62,500
Table 7 based on their ability to predict the HRP work status in the ab-
b92,500
N92,500 sence of other factors (ECF and LF). By applying a success threshold of
Self-reported employment 0 = in paid work, 1 = not in paid work 50% an estimate of the percentage of correctly classified HRPs could
status of HRP (unemployed, retired or caring role) then be calculated as a simple within-sample validation test.
Presence of children 0 = no children, 1 = 1+ child The results of this initial model (model 1) are shown in Table 8 and
Number of residents 0 = 1 or 2 residents of any age, 1 = 3+ residents
they suggest that whilst the evening consumption factor and load factor
64 B. Anderson et al. / Computers, Environment and Urban Systems 63 (2017) 58–67
Table 7
Effectiveness of household characteristics in predicting electricity consumption ‘profile indicators’ (values in bold are significant at the 95% level).
Daily peak time Daily peak 06:00 to 10.30 Daily mean baseload (02:00–05:00) Daily mean
Daily sum Daily 97.5th percentile Evening consumption factor Load factor
beta Z beta Z beta Z beta Z
Constant −21.69 −2.05 −0.03 −0.05 1.72 4.24 0.08 1.10
Number of residents 11.64 6.69 0.83 6.77 0.06 0.90 0.00 −0.09
Income band 3.17 3.12 0.10 1.42 −0.02 −0.49 0.01 1.69
Number of children 5.49 4.22 0.44 4.81 0.08 1.59 0.00 0.21
Employment status of HRP 5.01 2.22 0.06 0.36 −0.18 −2.14 0.04 2.14
Marginal R2 20% 16% 1% 1%
Conditional R2 81% 66% 36% 53%
Residual R2 19% 34% 64% 47%
both had statistically significant predictive effects, the model was only immediately following half hours before a gentle decline and then rise
able to correctly predict HRP unemployment status in around 65% of to a higher correlation at the 24 h lag (i.e. the same time the next day)
cases. which in this case is represented by the 36th lag due to the removal of
In order to improve the performance and to test the relative value of sleep hours. The model included the coefficients at lags 36 (the same
the profile indicators and the ‘known’ demographic variables (number time tomorrow) and 72 (the day after tomorrow) however as Table 9
of residents and number of children), we estimated a series of increas- shows, the inclusion of this habituality indicator produced a marginal
ingly more complex models. Thus base model 2.1 (see Table 9) included improvement in performance with only the lag 36 coefficient proving
just the evening consumption factor (ECF) and the Load Factor (LF) as to be statistically significant.
was the case for model 1 but did not include the number of residents Finally (model 2.4) we re-introduced the presence of children and
or presence of children. Despite this the results suggest that nearly household size variables to understand the additional value of this po-
60% of HRPs were correctly classified. tentially administratively sourced data. As Table 9 shows, the increas-
In an attempt to improve classification performance we drew on ingly complex models were significantly different from the simpler
McLoughlin et al. (McLoughlin et al., 2013) to develop clusters of house- versions (LR test results) while the adjusted pseudo r-squared scores
holds with similar consumption profiles. Cluster membership was cal- (McFadden) also increased as the additional variables were added
culated via a weighted least squares and k-means clustering process reflecting increased improvement over the intercept model in each
using only the half-hour consumption profiles. This produced six clus- case. Re-adding the presence of children and dummy for larger house-
ters of households of which two captured the majority (33% and 28% re- holds increased the pseudo r-squared score substantially and the classi-
spectively) with the remainder distributed roughly evenly across the fication success by 6.2 percentage points and, as might be expected,
remaining four. As Table 9 shows the inclusion of these clusters in- reduced the predictive power of the ECF as well as most cluster
creased the performance of the model by just under 5 percentage points membership.
(model 2.2) with only membership of cluster 3 proving not to be a sta- Overall, these results suggest that model 2.4 is most comprehensive
tistically significant predictor. among the list of models tested, with significant predictors approximat-
In order to improve the model still further (model 2.3) we then in- ing household energy usages, energy usage behaviours and administra-
cluded an indicator of ‘habitual behaviour’ by calculating an autocorre- tive variables. Our simplest model (model 2.1) indicates that although
lation coefficient for the 24 h lag of each half hourly consumption for the absence of administrative data reduced the ability of electricity con-
each household on mid-week days after the hours of sleep (00:00– sumption profile indicators to predict HRP employment status, the suc-
06:00) were removed to avoid artificially increasing the lag correlation. cess rate was still close to 60%. This, together with the relatively
This coefficient is therefore an indicator of the degree to which mid- unchanging regression coefficients for the profile indicators in each
week consumption between 06:00 and 00:00 is replicated at the same model suggests that most of the differentiation captured by the profile
time on subsequent days for each household and, based on exploratory clusters and all of that captured by the ‘habitual behaviour’ indicator
analysis (not shown) we expected lower autocorrelation (less ‘regular may already be embodied in the profile indicators used in model 2.1.
habits’) for those not in paid work. In general as Fig. 2 shows the coeffi-
cients followed an expected 24 h profile with the highest being the 7. Conclusions and next steps
Table 8 This paper has started the process of assessing the feasibility of using
Logistic regression modelling results for HRP unemployment status (model 1). household electricity load profiles as a tool to infer key household char-
Beta t p value acteristics. Using a smart meter-like dataset, we generated a series of
load profile indicators that summarise key features of household load
Number of residents b 3 0.42 2.30 0.02
Number of residents ≥ 3 −0.70 −7.90 0.00 profiles, enabling differentiation between households. These indicators,
Children present −1.87 −16.82 0.00 coupled with household composition, offered a degree of predictive po-
Evening consumption factor (ECF) −0.15 −2.59 0.01 tential for the characteristic tested and, when compositional data was
Load factor (LF) 1.93 3.38 0.00 excluded but other consumption indicators included, this potential
Correct prediction: 65.42%
was still substantial especially when membership of twenty four hour
B. Anderson et al. / Computers, Environment and Urban Systems 63 (2017) 58–67 65
Table 9
Logistic regression modelling results for HRP employment status.
Model 2.1: base model Model 2.2: with cluster Model 2.3: with 24 h Model 2.4 with
membership autocorrelation presence of children
coefficient and 3+ persons
demand profile clusters was included. This suggests that electricity con- sample. Further the lack of any geo-coding in the data also precludes
sumption data of this kind could be used to independently estimate the analysis of regional differences within the sample itself.
employment status of HRPs as a means of validating (or contributing to) The use of load profile indicators may also be over-simplifying the
Census estimates and it could also be used to produce slightly less ro- nuanced detail within the load profiles. Given that the range of house-
bust estimates in the absence of administrative data. This may be espe- hold types available within the dataset is relatively narrow, detailed
cially pertinent to mid-census period estimates or to situations where temporal electricity consumption behaviour not captured by the profile
only smart meter data is available which may well be the case for stake- indictors may be useful in order to discriminate between households.
holders who do not have access to Government held administrative Whilst evidence from the literature suggests that profile indicators are
data sources. frequently used to extract meaningful information from load profiles,
However it must be recognised that the size of the sample used in it may be beneficial to work with more of the time-series data in
this work has precluded the testing of the differential performance of order to ensure that the profile indicators do not mask habits which
the process in different sub-populations or in different regions. In both may prove useful in differentiating between households. Such an ap-
cases we would expect reduced heterogeneity and thus increased per- proach may provide opportunities to build regression models or gener-
formance of the estimation process. Whilst it would in principle be pos- ate clusters for different days of the week, recognising that load profiles
sible to develop the work to test larger sub-groups of this sample, we may be very different on weekdays and at weekends, and that the dif-
concluded that the consequential reduction in statistical power would ference between weekday and weekend profiles may, for example,
mean that such work requires a much larger representative population allow inferences to be made about household characteristics whilst
Fig. 2. Mean and standard deviation of AR coefficient by lag for mid-week days in October 2009.
66 B. Anderson et al. / Computers, Environment and Urban Systems 63 (2017) 58–67
differences between seasons and school vs non-school holiday periods Bilbao-Osorio (Eds.), The Global Information Technology Report 2012. Geneva: World
Economic Forum.
may also be instructive. Craig, T., Gary Polhill, J., Dent, I., Galan-Diaz, C., & Heslop, S. (2014, June). The north east
Overall the analysis confirms existing literature suggesting that pro- scotland energy monitoring project: exploring relationships between household oc-
file indicators are potentially useful summaries of key features of house- cupants and energy usage. Energy and Buildings, 75, 493–503. http://dx.doi.org/10.
1016/j.enbuild.2014.02.038.
hold electricity load profiles. The findings suggest that analytic DECC (2012). Smart Metering Implementation Programme - Programme Update April 2012.
approaches such as regression and classification could offer potential London: Department of Energy and Climate Change.
in inferring key small area household characteristics from load profile DECC (2013). Smart Metering Equipment Technical Specifications Version 2. London: De-
partment of Energy and Climate Change.
indicators and basic household composition. This approach could add
Deloitte (2011). Empowering Ideas 2011: A Look at Ten of the Emerging Issues in the Power
considerable additional ‘value’ to domestic smart metering, enabling re- and Utilities Sector. Cleveland, Ohio: Deloite Center for Energy Solutions.
mote and quasi real-time estimation of small area population statistics. Deville, P., Linard, C., Martin, S., Gilbert, M., Stevens, F. R., Gaughan, A. E., ... Tatem, A. J.
(2014). Dynamic population mapping using mobile phone data. Proceedings of the
However this approach has the potential for impact beyond the de-
National Academy of Sciences, 111(45), 15888–15893.
livery of enhanced population statistics. Unlike existing small area sta- Dilnot, A. (2014). The Census and Future Provision of Population Statistics in England and
tistics or periodic household sample surveys, quasi-real time Wales. Letter to Rt. Hon. Francis Maude MP from the Chair of the UK Statistics Authority,
observation of energy use and behaviours could be used to both target Sir Andrew Dilnot CBE, 27th March 2014. London: UK Statistics Authority.
Douglass, R. W., Meyer, D. A., Ram, M., Rideout, D., & Song, D. (2015). High resolution pop-
and assess the impact of neighbourhood or household level (energy) ulation estimates from telecommunications data. EPJ Data Science, 4(1), 1–13.
policy or market interventions. Thus these datasets offer the potential Druckman, A., & Jackson, T. (2008). Household energy consumption in the UK: a highly
to target area based (energy) policy and to remotely evaluate impacts geographically and socio-economically disaggregated model. Energy Policy, 36(8),
3177–3192.
in near-real time without the need for household sample surveys. Dugmore, K. (2013). The Census and Future Provision of Population Statistics in England and
The next steps in this work must be to explore additional statistical Wales - Public Consultation September 2013. Response by the Demographics User Group
methods to more robustly estimate attributes from consumption pro- (DUG). London: Demographics User Group.
Dugmore, K., Furness, P., Leventhal, B., & Moy, C. (2011a). Beyond the 2011 census in the
files and to test the performance of these models using not only out- United Kingdom: with an international perspective. International Journal of Market
of-sample validation of household level estimates but also by aggregat- Research, 53, 619. http://dx.doi.org/10.2501/ijmr-53-5-619-650.
ing and validating against real Census data. Dugmore, K., Furness, P., Leventhal, B., & Moy, C. (2011b). Information collected by com-
mercial companies: what might be of value to ONS? International Journal of Market
In the case of statistical methods there may be considerable scope to Research, 53(5), 619–650.
develop multi-level hierarchical models of the kind proposed for use in Elexon (2013). Load profiles and their use in electricity settlement. (London) http://
the estimation of historical climate characteristics from sparse proxy www.elexon.co.uk/wp-content/uploads/2013/11/load_profiles_v2.0_cgi.pdf
Energy UK (2013). About Smart Meters.
observations (Hughes & Ammann, 2009; Tingley et al., 2012). Such
Eurostat (2011). EU Legislation on the 2011 Population and Housing Censuses: Explanatory
work should include the development of appropriate uncertainty mea- Notes. European Commission (Eurostat): Luxembourg.
sures for both point (household) and aggregated area level estimates. Firth, S., Lomas, K., Wright, A., & Wall, R. (2008). Identifying trends in the use of domestic
Future work should also consider the potential value of including area appliances from household electricity consumption measurements. Energy and
Buildings, 40(5), 926–936.
level and temporal co-variates to enable the function linking the con- Graham, M., & Shelton, T. (2013, November). Geography and the future of big data, big
sumption profiles to the household attributes to vary spatially and data and the future of geography. Dialogues in Human Geography, 3, 255–261.
over time. http://dx.doi.org/10.1177/2043820613513121.
Haben, S., Ward, J., Greetham, D. V., Singleton, C., & Grindrod, P. (2014). A new error mea-
In terms of household level validation, it is possible that the Irish CER sure for forecasts of household-level, high resolution electrical energy consumption.
data set used in this paper may be sufficiently large to support out-of- International Journal of Forecasting, 30(2), 246–256. http://dx.doi.org/10.1016/j.
sample validation but doing so is likely to considerably increase uncer- ijforecast.2013.08.002.
Hamidi, V., Li, F., & Robinson, F. (2009). Demand response in the UK's domestic sector.
tainty. It is likely that such validation will only become possible when Electric Power Systems Research, 79, 1722–1726. http://dx.doi.org/10.1016/j.epsr.
substantially larger samples of suitably linked consumption and house- 2009.07.013.
hold attribute data become available. Such data would also support the House of Commons Treasury Committee (2008). Counting the Population: Eleventh Report
of Session 2007-08. Volume 1, Report, Together with Formal Minutes,oral and Written Ev-
kind of sub-population and sub-regional analysis discussed above. idence. London: The Stationery Office Limited.
Finally, area level validation of aggregated estimates would require Hughes, M. K., & Ammann, C. M. (2009). The future of the past—an earth system frame-
access to anonymised large-scale smart meter data extracts from either work for high resolution paleoclimatology: editorial essay. Climatic Change, 94(3–
4), 247–259.
all households or a representative sample of them in known small area
Hughes, M., & Moreno, G. (2013). Further Analysis of Data from the Household Electricity
geographies which could be used as the basis for model-based estima- Usage Study: Consumer Archetypes. Cambridge: Element Energy Ltd.
tion of household characteristics. These estimates could then be aggre- Martin, D. (2006). Last of the censuses? The future of small area population data.
gated to current Census geographies and validated against recently Transactions of the Institute of British Geographers, 31, 6–18.
McKenna, E., Richardson, I., & Thomson, M. (2012, February). Smart meter data: balancing
observed Census-derived population statistics. Unfortunately as far as consumer privacy concerns with legitimate applications. Energy Policy, 41, 807–814.
we are aware, such large scale geo-coded datasets do not currently http://dx.doi.org/10.1016/j.enpol.2011.11.049.
exist in the UK. McLoughlin, F. (2013). Characterising Domestic Electricity Demand for Customer Load Profile
Segmentation [THESIS]. Dublin: Dublin Institute of Technology.
McLoughlin, F., Duffy, A., & Conlon, M. (2012). Characterising domestic electricity con-
References sumption patterns by dwelling and occupant socio-economic variables: an Irish
case study. Energy and Buildings (http://www.sciencedirect.com/science/article/pii/
Beckel, C., Sadamori, L., & Santini, S. (2012). Towards automatic classification of private S0378778812000680).
households using electricity consumption data. Proceedings of the Fourth ACM Work- McLoughlin, F., Duffy, A., & Conlon, M. (2013). Evaluation of time series techniques to
shop on Embedded Sensing Systems for Energy-Efficiency in Buildings (pp. 169–176). characterise domestic electricity demand. Energy, 50(1), 120–130.
ACM (http://dl.acm.org/citation.cfm?id=2422562). Newing, A., Anderson, B., Bahaj, A. B., & James, P. (2015, July). The role of digital trace data
Beckel, C., Sadamori, L., & Santini, S. (2013). Automatic socio-economic classification of in supporting the collection of population statistics - the case for smart metered elec-
households using electricity consumption data. Proceedings of the Fourth International tricity consumption data. Population, Space and Place. http://dx.doi.org/10.1002/psp.
Conference on Future Energy Systems (pp. 75–86). ACM (http://dl.acm.org/citation. 1972 (EarlyView).
cfm?id=2487175). Ning, Z., & Kirschen, D. (2010). Preliminary Analysis of High Resolution Domestic Load Data.
Calder, A., & Teague, A. (2013). The Census and Future Provision of Population Statistics in Manchester: School of Electrical & Electronic Engineering, University of Manchester.
England and Wales: Presentation Delivered at RGS “The Future of Small Area Population Norman, P. (2013). The Case for Small Area Data. Presentatopn Delivered at the Beyond 2011
Statistics” 21st October 2013. Newport: Office for National Statistics. Research Conference, Universiry of Southampton, 30th April-1st May 2013. Leeds: Uni-
Carroll, J., Lyons, S., & Denny, E. (2014). Reducing household electricity demand through versity of Leeds.
smart metering: the role of improved information about energy saving. Energy ONS (2013). Beyond 2011: Options Report. (London).
Economics, 45, 234–243. ONS (2014a). Beyond 2011: Final Options Report. (London).
Chicco, G., Napoli, R., Postolache, P., Scutariu, M., & Toader, C. (2001). Electric Energy Cus- ONS (2014b). The Census and Future Provision of Population Statistics in England and Wales:
tomer Characterisation for Developing Dedicated Market Strategies. (In). Report on the Public Consultation. Newport: Office for National Statistics.
Claxton, R., Reades, J., & Anderson, B. (2012). On the value of digital traces for commercial Owen, P. (2012). Powering the Nation: Household Electricity Habits Revealed. London: Ener-
strategy and public policy: telecommunications data as a case study. In S. Dutta, & B. gy Saving Trust.
B. Anderson et al. / Computers, Environment and Urban Systems 63 (2017) 58–67 67
Powells, G., Bulkeley, H., Bell, S., & Judson, E. (2014). Peak electricity demand and the flex- Watson, G. (2009). Making the Case for the 2011 Census. Presentation Delivered by at the
ibility of everyday life. Geoforum, 55, 43–52. http://dx.doi.org/10.1016/j.geoforum. DUG Annual Conference, 8th October 2009. Newport: Office for National Statistics.
2014.04.014. Williams, J. (2013). Clustering household electricity use profiles. Proceedings of Workshop
Price, P. (2010). Methods for Analyzing Electric Load Shape an Its Variability. Sacramento: on Machine Learning for Sensory Data Analysis - MLSDA ‘13, 19–26. ACM Press. http://
Lawrence Berkeley National Laboratory & California Energy Commission. dx.doi.org/10.1145/2542652.2542656.
Pucci, P., Manfredini, F., & Tagliolato, P. (2015). Mapping Urban Practices Through Mobile Wright, A. (2008). What is the relationship between built form and energy use in dwell-
Phone Data. SpringerBriefs in Applied Sciences and Technology. Cham: Springer Interna- ings? Energy Policy, 36(12), 4544–4547. http://dx.doi.org/10.1016/j.enpol.2008.09.
tional Publishing (http://link.springer.com/10.1007/978-3-319-14833-5). 014.
Ramsden, S. (2014). Briefing: Size Criteria (‘Bedroom Tax’). London: National Housing Wright, A., & Firth, S. (2007). The nature of domestic electricity-loads and effects of time
Federation. averaging on statistics and on-site generation calculations. Applied Energy, 84(4),
Skinner, C., Hollis, J., & Murphy, M. (2013). Beyond 2011: Independent Review of Methodol- 389–403.
ogy. London: Independent review for the UK Statistics Authority. Yohanis, Y. G., Mondol, J. D., Wright, A., & Norton, B. (2008). Real-life energy use in the
Struijs, P., Braaksma, B., & Daas, P. J. (2014). Official statistics and big data. Big Data & UK: how occupancy and dwelling characteristics affect domestic electricity use.
Society, 1(1), 2053951714538417–2053951714538417. http://dx.doi.org/10.1177/ Energy and Buildings, 40(6), 1053–1059.
2053951714538417. Zimmerman, J. -P., Evans, M., Griggs, J., King, N., Harding, L., Roberts, P., & Evans, C. (2012).
Thumim, J., Wilcox, T., & Roberts, S. (2013). Managing and Mining Smart Meter Data - at Household Electricity Survey: A Study of Domestic Electrical Product Usage. Milton
Scale. Presentation Delivered at the CSE Project Showcase, 9th July 2013. Bristol: Centre Keynes: Enertech.
for Sustainable Energy. Zoha, A., Gluhak, A., Imran, M. A., & Rajasegarar, S. (2012). Non-intrusive load monitoring
Tingley, M. P., Craigmile, P. F., Haran, M., Li, B., Mannshardt, E., & Rajaratnam, B. (2012, approaches for disaggregated energy sensing: a survey. Sensors, 12(12),
March). Piecing together the past: statistical insights into Paleoclimatic reconstruc- 16838–16866.
tions. Quaternary Science Reviews, 35, 1–22. http://dx.doi.org/10.1016/j.quascirev.
2012.01.012.