Identifying residential consumption patterns using data-mining techniques
Identifying residential consumption patterns using data-mining techniques
gies offers an opportunity to understand residential demand from new angles. Although
there exists a large body of research on demand response in short- and long-term forecast-
scenarios has not been conducted. The study’s novelty lies in its use of unsupervised ma-
chine learning tools to explore residential customers’ demand patterns and response without
the assistance of traditional survey tools. We investigate behavioural response in three dif-
3) extreme weather situations. The analysis is based on the smart metering data of 2,000
households in Chengdu, China over three years from 2014 to 2016. Workday/weekend pro-
files indicate that there are two distinct groups of households that appear to be white-collar
or relatively affluent families. Demand patterns at the major festivals in China, especially
the Spring Festival, reveal various types of lifestyle and households. In terms of extreme
weather response, the most striking finding was that in summer, at night-time, over 72% of
households doubled (or more) their electricity usage, while consumption changes in win-
ter do not seem to be significant. Our research offers more detailed insight into Chinese
1 Introduction
Driven by continued rapid growth in urbanisation and the improvement of living standards in
China, residential electricity demand has been increasing for the last two decades. Despite
slower energy consumption growth from 2014, the growth rate of the residential consumption
recovered in 2018 and spiked to a seven-year high of 10.4% (People’s Daily, 2019). In addition,
China Bureau of Statistics (2020) reported that around 60.6% of the population live in cities in
2019. The residential energy consumption is likely to continue its rapid growth. To achieve the
national low-carbon plan, the Chinese government has been devoted to delivering smart energy
systems, including smart meters, efficient grid operation, and improved electricity transport
network. State Grid Corporation of China (SGCC) has been deploying smart meters along with
data acquisition systems since 2009. By the end of 2018, more than 457 million smart meters
had been installed by SGCC covering 99.57% of the customers it serves (CPNN, 2018).
In the context of promoting energy efficiency and energy conservation, it is essential for
policymakers and the utilities to understand any differences in residential consumption patterns.
Before the widespread rollout of smart meters, door-to-door questionnaires were the traditional
method that had been widely used to identify household electricity behavioural patterns (Fu
et al., 2018). However, the limitations of the method are non-negligible: To avoid statistical
biases, large-scale data collection is essential, but the financial and time costs are generally
very high. Meanwhile, the reliability of findings highly depends on the quality of the survey
design, sampling method and response rate. Another popular method is the case study, which
is a more specific but also time-consuming approach. It often involves detailed investigation
and modelling of typical dwellings, however, it is not suitable for comprehensive studies of the
With greater availability of digital information and communication technologies (ICTs), es-
pecially the deployment of smart meters, obtaining high-resolution energy consumption data
becomes realistic. These meters are equipped with real-time or short-interval communication
capabilities that enable them to transmit fine-grained consumption information to the utilities or
other data aggregators. Establishing the smart grid has fundamentally changed the communica-
tion between customers and utilities and creates opportunities for researchers and companies to
understand customer demands and offer new services, such as customised tariff structures and
energy-efficient demand response programmes (Todd-Blick, Annika, et al., 2020; Friis, F., and
Christensen, T. H., 2016).
The sheer volume of data available has naturally led to numerous researchers using data
mining techniques in an effort to characterise the consumption behaviour of residential cus-
tomers (Stankovic, L. et al., 2016;Rashid, H. et al., 2019; Ushakova, A., and Mikhaylov, J.,
2020). Clustering is one of the most popular techniques for identifying similar patterns and
grouping them into a set of clusters. The most discussed topic in this field is how to improve
load forecasting accuracy by analysing load profiles and understanding demand patterns, as
well as to design demand response programmes. Both short-term and long-term load forecast-
ing can benefit from studies using high-resolution smart meter data (Mirasgedis et al., 2006;
Chaturvedi, Sinha and Malik, 2015; Atalla and Hunt, 2016). Clustering also assists in stud-
ies of tariff design, since it can divide different customers based on similar load profiles. Its
objective is to separate customers by the shape of daily load profiles over time. For example,
those customers with peak loads at the same time can be grouped together and offered a cus-
tomised tariff (D’hulst et al., 2015; Fu et al., 2018; Torriti, J., and Yunusov, T., 2020). Some
studies combine clustering and questionnaire results. The characteristics of households and of
mation of the house (ownership of appliances, size of house, number of bedrooms, etc.) (Firth
et al., 2008; Beckel et al., 2016; Satre-Meloy, 2019). The mixed-method approach often offers
a deeper and more comprehensive understanding of customer behaviour, although question-
naires are often not available due to privacy and research cost considerations. There is a large
family of algorithms applied to electricity demand cluster analysis. K-means, Hierarchical and
self-organising map (SOM) clustering are among those most frequently used, although there
workday/weekend profiles, seasonal profiles and/or even longer periods, such as monthly or
yearly profiles (McLoughlin, Duffy and Conlon, 2012; Afzalan and Jazizadeh, 2019). Ap-
proaching consumption profiles from different angles can offer a more complete picture of how
households behave under different situations and the similarities as well as discrepancies among
clusters.
There has been a wide range of studies investigating the residential electricity demand
profiles around the world, including in the U.S, Spain, Italy, U.K., and Denmark (Blázquez,
Boogen and Filippini, 2013; Rhodes et al., 2014; Alberini et al., 2019; Andersen et al., 2019;
Satre-Meloy, Diakonova and Grünewald, 2020). The characteristics of the clustered household
profiles among the countries exhibit distinct patterns and are country-wise. For example, the
demands of U.K. households were almost insensitive in summer but would respond to winter
weather, while the opposite was found in Spain households. These differences are understand-
able since the profile curves reflect not only the social-economic differences of the countries,
but also the cultural and geographical disparities. However, the vast majority of studies using
eira, 2016) have been in a more general filed investigating aggregated residential consumption,
which may be due to less penetration rates of smart meters in the less developed countries.
Despite being the leading country in terms of building area, electricity consumption and
smart meter deployment, there has been a notable lack of literature that examines household
electricity demand profiles in China. The majority of research on China’s residential electric-
ity consumption focuses on microeconomic analysis using survey data (Zhou and Teng, 2013;
Zheng et al., 2014; Du et al., 2015). The limitation of those studies is because of the reliance
on aggregate consumption data. The electricity data involved are mainly monthly billing data
or even based on the interviewee’s recollection rather than records from the utility. This type
of information clearly would not reflect households’ daily consumption profiles, due to the low
resolution of the data. Although a few studies have investigated residential profiles used data-
mining tools, these profiles have been based on either monthly or yearly load curves (Zhou,
Yang and Shen, 2017; Guo et al., 2018). A few papers use high-resolution one-minute data to
form clustered load curves, but unfortunately, these samples have included electricity data from
all sectors and the type of costumers are not distinguishable from the data. A key contribution
of this paper is to investigate the residential consumption patterns using higher resolution data
taken directly from smart meters of three intra-day periods (super off-peak (23:00-7:00), peak
(7:00-11:00 plus 19:00-23:00), and off-peak (11:00-19:00)). We also examine the differences
between seasonal workday/weekend profiles, festival profiles, and extreme weather profiles to
provide a comprehensive overview of residential demand.
The paper is organised as follows: Section 2 provides a review of the application of clus-
tering to residential electricity as well as past studies of China’s residential sector. The datasets
and the methodology including data preparation and clustering techniques used are presented
2 Literature review
This section firstly surveys the application of data mining to the study of electricity demand.
Then, a more specific review of the studies of electricity consumption of the residential sector
in China follows. In that part, the characteristics of household consumption in literature are pre-
sented as well as the important features that affect Chinese electricity consumption behaviour.
Finally, limited research based on the clustering technique is discussed.
The large-scale deployment of smart meters around the world has opened up possibilities
(minute or hour) level, uncertainties in those models are large. The fluctuations are often caused
by differences in customer behaviour. However, more detailed consumption information for
individual households was not available. Now equipped with advanced metering technologies,
the scope of residential electricity demand research have broadened (Yildiz, B. et al., 2017;
Razavi, R., and Gharipour, A., 2018). The technique that has attracted the most attention is
residential consumption pattern clustering (Räsänen et al., 2010; Ramos et al., 2015; Gouveia
and Seixas, 2016) since it allows for more detailed analysis of customer demand. There are
two key technical steps needed for consumption pattern clustering: 1) Algorithm selection and
2) Cluster number decision by cluster validity index. Researchers agree that there is no single
standard optimal algorithm or cluster validity index for all scenarios. K-means and its algorithm
family (K-medoids, K-medians, etc.) are the most popular method (McLoughlin, Duffy and
Possible choices of cluster validity indexes include Davies–Bouldin (DB) validity index, which
measures the average similarity of each cluster with its most similar cluster (McLoughlin, Duffy
and Conlon, 2012; Ozawa, Furusato and Yoshida, 2016; Viegas et al., 2016), and Silhouette
scores, which defines how similar a object is to the objects in the same cluster compared to
other clusters (Yilmaz et. al, 2019).
The applications using smart meter data and data mining techniques include both short-
term and long-term forecasting model improvement, tariff structure design, consumption pattern
modelling through electric appliance use detection, and classification of new customers. For in-
stance, for clustering-based load forecasting, Fu et al. (2018) applied Fuzzy C-means algorithm
to cluster the daily household-level data of 533 households from Quanzhou city, over the period
April 2014 to February 2015 under increasing-block pricing, which achieved a high accuracy of
load forecasting through better customer consumption profile classifications than the traditional
K-means method; Chaturvedi et al. (2018) adopted hybrid clustering methods—Artificial Neu-
ral Network (ANN) and wavelet transform and fuzzy system on 1-hour level data from India
to optimise the short-term load forecast performance. There have been a number of other load
forecast studies based on clustering. Kavousian et al. (2015), which used a hierarchical algo-
rithm on smart meter data to identify appliance energy efficiency based on a 30-min interval
dataset of 4231 households in Ireland. Mahmoudi-Kohan, et al. (2010) optimised selling price
to each cluster of customers to maximise the annual profits for utilities based on a profit function
using a weighted fuzzy K-means algorithm. Flath et al. (2012) employed a K-means algorithm
on customers in Germany and a segment-specific rate design of different prices for each group
of the customers; A study by Viegas et al. (2016) combined smart meter data and survey data
to classify new customers using a K-means clustering algorithm on the representative curves.
of US households since 1978. In China, by contrast, similar surveys of its kind, are rare and
the most comprehensive survey of residential energy consumption to date was carried out by
Zheng et al. (2014) conducted in 2012, involving 1450 total observations from 26 provinces.
They found that space heating and cooking were the most energy-intensive activities for the
Chinese families, accounting for 54% and 23% of total energy consumption respectively. They
also compared the international differences of energy consumption by end-use activities. One
extraordinary difference is that the share of cooking in China is far bigger versus nearly 0% to
6% in other developed countries, such as the US and EU-27. Chinese households mainly use
gas for cooking purposes and only families in Southern China use electricity for space heating.
Since the survey was for all energy types used, not exclusively for the electricity, the findings
may only be partial referential to the residential electricity usage in China. Zhou and Teng
(2013) also used survey data to estimate the urban residential electricity in Sichuan Province.
In the study, they found that both price and income were inelastic to electricity demand. The re-
sults also show that on a per capita basis, smaller households seem to consume more electricity.
Another finding was that the households that included those aged 50 years or more consumed
more electricity, because older people generally tend to stay at home longer. In terms of exam-
ined appliances (refrigerator, computer, TV, washing machine, and air conditioner), although
refrigerators currently are the largest consumers of electricity due to the highest ownership rate,
demand from air conditioners and computers will increase substantially as their penetration rate
and utilisation grows.
all year (Class V). They confirmed that the variation in demand for space cooling and hot-water
supply lead to the differences in electricity consumption across the classes. For example, cities
in Class IV usually have more than one room air-conditioner and are used more frequently,
while the consumption for space cooling in Class V households is very low. In addition, Class
IV cities have much higher unit consumption of electricity for water heaters than any other
classes. Another stream of research study could also be helpful for assessing the electricity
demand behaviour from a unique angle – The findings of residential building occupancy rate
could be good indicators for the possibility of presence of the family members at home and their
activities at different time of the day. Hu et al. (2019) investigated the occupancy schedules
of different room types in residential buildings in 3 cities in China –Beijing, Chengdu, and
Yinchuan – representing different climate zones. The authors conducted a survey for half-
hourly occupancy rate for living rooms, bedrooms, study(rooms) and kitchens. The results are a
useful guide for the time use of end-users of electricity demand at home. It can be seen that the
daytime occupancy rate of around 50% in Beijing is higher than in Chengdu or Yinchuan, which
are in the range of 20-40%. Each room, nevertheless, has a similar shape of the occupancy
schedules regardless of location. The living room is mainly used from 6:00 to 23:00, and most
intensively during the evening period from 18:00 to 22:00. Bedroom occupancy is usually
during the night time, from 22:00 to 6:00. Meal times can be surmised based on the kitchen
occupancy schedule: 6:00 to 8:00 (breakfast), 11:00 to 13:00 (lunchtime), and 17:00 to 19:30
(dinner). Another regional difference is that kitchen occupancy or meal time schedule is later
10
true in terms of intra-daily residential load profiles. A case study from Shanghai by Pan et al.
(2017) collected 15-minute residential consumption data from 138 households in Shanghai be-
tween May and December 2013. Shanghai is in the hot summer and cold winter zone (Class III)
and no large central heating systems (normally run jointly by the State companies and local gov-
ernments) is operated in Shanghai. They summarised that 4 cluster algorithm families are often
used in the field of residential consumption, including K-means, fuzzy K-means clustering, Hi-
erarchical clustering, and SOM. And K-means is one of the most popular cluster algorithms
and the one that they adopted. They divided customers into 10 clusters where different profiles
indicate differences in lifestyle. For instance, three of the 10 groups with double peaks and low
morning and longer evening consumption levels are categorised as mostly white-collar workers.
Apart from the routine analysis of the hourly consumption, their study also reports the results
based on seasonal load and differences between weekdays and weekends. They categorised the
ten consumption patterns into four sub-groups: (i) dominated by heating period, (ii) dominated
by the cooling period, (iii) dominated by transitional seasons; and (iv) no distinguished features.
The limitation of the research is that the sample of 138 families is small and so the clustering
results may be biased and not stable, especially given that the cluster number is large.
Additional clustering research on the Chinese residential sector is based on lower-resolution
data, e.g daily usage. Guo et al. (2018) collected daily household electricity demand data from
January 1, 2014 to December 31, 2014 for 3,000 households in Nanjing and 1,399 households
in Yancheng city, which are both in Jiangsu Province, China. They employed the K-means
algorithm and attempted to depict the clustered household profiles at two levels: 1) The daily
electricity consumption patterns during three Chinese major festivals, the Spring Festival, the
National Day holiday and the Labour Day; 2) Daily residential profiles of a month for each
11
different life-styles of the households. While the patterns in the National Holiday and the Spring
Festival are diverse, the load curves during the Labour Day holiday are relatively flat and less
diverse. The seasonal curves reveal that the fluctuation in winter is much higher than in spring.
A similar kind of the research but only focused on general residential consumption profiles was
conducted by Zhou et al. (2017) using Fuzzy K-means algorithm based on daily consumption
data from 1,312 households in Jiangsu Province during the month of December 2014. They
found 6 and 9 are the appropriate cluster numbers for two different scenarios. However, one
issue of these studies should be pointed out is that the monthly profile including every day
of a month could be less insightful to distinguish the activity differences of customers, since
household activities normally do not follow a cyclic pattern based on the day of a month, while
a week profile could reflect more about the consumption patterns. Besides, the number of the
clusters may be too large which may lead to a biased and less meaningful result, given that
the sizes of the samples are not large enough. Another problem is that the authors used raw
consumption data to cluster, while the cluster results more reflect the magnitude differences of
consumption, rather than the fluctuation of consumption behaviour. The number of the pattern
groups in the similar literature conducted in other countries is around 3 to 6 with larger sample
sizes (Viegas et al., 2016; Ramos et al., 2007).
As seen in the review above, studies on residential consumption patterns in China are scarce.
Research into higher-resolution data normally involves a limited number of households partici-
pating, whereas studies based on over 1,000 households unfortunately have lower time resolu-
12
sumption patterns using three intra-day period usage data (peak, offpeak and super-offpeak): 1)
The daily consumption patterns of a week in summer and winter for each intra-day period; 2)
holiday load profiles; and 3) load profiles for extremes of hot and cold weather.
3.2 Datasets
The electricity data was provided by the Electric Power Company of Sichuan Province, SGCC.
The daily electricity consumption data collected contains three points representing three intra-
day periods (super off-peak (23:00-7:00), peak (7:00-11:00 plus 19:00-23:00), off-peak (11:00-
19:00)). The analysis period is from January 2014 to January 2017 and includes 2,000 randomly
selected households from Chengdu. Chengdu City is a sub-provincial city and the capital of
Sichuan, a southwestern province of China, which is known for being a major agricultural
heartland. Chengdu is the fifth-most populous agglomeration in China. As of 2018, the resident
population of Chengdu was over 14.76 million, and the city’s total number of households was
over 5.63 million (Chengdu, 2019). The average size of the resident household was 2.76 people
per household in 2011 (Chengdu Bureau of Statistics, 2011). In 2018, Chengdu was ranked the
best-performing city in China in terms of economic growth, with great potential of electricity
demand growth (Bloomberg, 2018).
Chengdu is located in the southern monsoon climate zone and within humid subtropical
climate. From both Murata et al. (2008) and Hu et al., (2019), Chengdu was categorised as
being in Class III (hot summer and cold winter). The weather is generally warm with high
relative humidity all year and it has four distinct seasons. Due to the high humidity, summer
can be extremely uncomfortable and hot. However, no centralised heating supply is operated
in Chengdu. The space heating in winter, if needed, is normally done by electrical appliances.
13
the air-conditioner related consumption in summer could be very high, given around 1.5 air
conditioners per household.
The weather dataset for the extreme weather analysis include six variables: minimum daily
The data we requested from the SGCC was for residential customers as defined in their system.
However, a small number of the sample, which report an unusual and extreme high consump-
tion, may be small busineses run out of the family home. For example, online store owners.
Through the boxplots, the outliers that those households used more than 350kWH for an aver-
age monthly consumption were removed from the dataset. The households who have more than
ternatively, profiles can be created to distinguish seasons or workdays from weekends. In our
case, average load profiles were calculated separately at three levels:
1. Seasonal weekend/workday profiles: The Monday to Sunday profile for every household
during different seasons were created. For each season, it consisted of three load profiles
representing intra-day period separately. In total, each household had 3*4 load curves.
To better reveal the differences between workdays and weekends, we standardised the
14
where Ui is the consumption on the ith day of a week for the family j, represents the av-
erage usage of weekdays. The clustering was based on the profile Dj for each household
(D1,j , D2,j , . . . , D7,j ) which reflects the change percentage of consumption between each
day of a week with the average usage.
2. Holiday profiles. We chose the Spring Festival and National Day holidays since these
are the two longest and most important holidays in China. The potential changes in
behaviour or electricity consumption during those holidays would be very different from
other holidays. To identify the consumption change, we extend the clustering period to
three weeks – 7 days before the holidays, 7 days during the holidays, and 7 days after the
holidays.
• The Spring Festival period: the holiday dates are not fixed, since the date of the
Similarly to the seasonal profile standardisation, we wanted to compare the usage changes
between a normal day and a festival day. In this scenario, we used the average daily
consumptions in January and in September as the baselines for the Spring Festival and
the National Day holidays respectively. The calculation was as follow:
21
Fi − F h X
Fi,j = × 100% and Fh = Fi (2)
Fh i=1
where Fi is the consumption on the ith day of the 21-day observation period for the family
j, F h represents the average daily usage before the holidays. The clustering input for each
15
3. Extreme weather event profiles: Since there is no universally agreed definition of extreme
heatwaves or cold-waves in Chengdu, we used the following rule for this paper:
• For heatwaves: the consumption data of a day that the maximum or average temper-
ature of a day is over the top 95th percentile in July and August.
• For cold-waves: the consumption data of a day that the maximum or average tem-
perature of a day is lower than the 5th percentile in January and February.
• For the baselines: the day with average temperature falls in between the 45th and
55th percentile.
The standardisation process is similar to the 1st and 2st scenarios and the final input will
be a daily profile of consumption changes including three periods (off-peak, super off-
peak, and peak) between extreme weather days and average summer/winter days.
Although the only papers on Chinese residential consumption pattern clustering by Zhou
et al. (2014) and Guo et al. (2018) used raw data, normalisation is necessary as a standard
step (Panapakidis et al., 2012; Rhodes et al., 2014a), especially given the samples were not
large enough to eliminate biases and outliers. In addition, the primary focus of this research is
to examine the households with similar behavioural change in electricity use (i.e. variation in
profile shape). The direct use of raw data results in clusters, such as done by Guo et al. (2018),
only reflect load magnitudes and cannot reflect the consumption variations within a day. The
raw data method would also be much more sensitive to the outliers.
16
There is no consensus on the most suitable clustering approaches to residential metering data.
The selection of algorithms should be based on the objectives of the study and the data structure.
Yildiz et al. (2017) and Zhang et al. (2012) provide a detailed discussion of clustering methods.
Among the reviewed methods of K-means, fuzzy c-means, and SOM, K-means is considered
the most consistent clustering method based on their analysis. The K-means clustering method
is one of the most widely used algorithms in the residential sector, due to its fast computation
time and applicability to large datasets. This algorithm starts with the desired number of clusters
K and randomly inserts the K data pattern into the initial centroids for each cluster (Hernández
et al., 2012). The algorithm then iterates until the local minimum Euclidean distance between
pattern xi and its closest cluster centroid is reached. The obvious disadvantage is that the results
are affected by the initial set-up for each cluster.
Hierarchical clustering has also been explored in many clustering studies of electricity data
(Chicco, Napoli and Piglione, 2006; Gounveia and Seixas, 2016). The method starts with each
object as a separate cluster and in each successive iteration it merges the clusters with minimum
distances in the distance matrix, until no cluster can be merged, or a termination condition is
triggered. The advantage compared to K-means is that hierarchical algorithms do not need to
pre-set the number of the clusters. Although the number of clusters is not necessary for hierar-
chical clustering preparation, a distance metric and a linkage criterion need to be decided before
running the algorithms. Different distance metrics calculate the distances between each pair of
data points through various formulations. The distance metrics we tested included: Euclidean,
To obtain the optimal performance of the clustering results, both K-means and Hierarchical ap-
17
validity indexes. It is important to note that none of them prevails over the others uniformly
compared by Chicco (2012). For cross-validation we used two indexes in this study, including
DB and Silhouette score. The experience from the similar research show that the appropriate
number of clusters of residential customers are usually between 3 and 10. Although more clus-
ters can distinguish small groups with unusual consumption patterns, over-clustering should be
avoided since results with few observations may be biased and less meaningful.
In this section, we aim to explore the results from the clustering analysis describing the house-
hold electricity consumption profiles from the following perspectives: 1) Work/Weekend con-
sumption patterns in different seasons; 2) Major festival demand patterns, including the Na-
tional Day holidays and the Spring Festival; 3) Patterns of consumption changes associated
with extreme weather.
In comparing the K-means and Hierarchical algorithms, we found that K-means was more
suitable in this case based on the clustering validity indexes. Although the hierarchical al-
gorithm using the the linkage method of Ward and with distance defined by Sqeuclidean and
Chebyshev matrix has similar performance to K-means. However, this approach was slower and
less robust than K-means. Therefore, we adopted K-means in our study. In order to focus on the
clustering results, we did not show the comparison results between K-means and Hierarchical
18
The profiles shown in Figure 1 describe residential consumption trends over the course of a
week. One distinct seasonal difference in the patterns is that almost all groups have an in-
creasing demand on weekends in summer, apart from Cluster 2. However, around half of the
consumers use less-than-average-workday electricity on weekends (C0 and C4) in winter. In
addition, it can be seen that both in summer and winter, the majority of people falls into the
group with the least fluctuation. In summer, for the majority there is a slight increase of total
consumption on weekends, while the opposite is found in winter. The differences between the
seasons could be explained by the use of space cooling appliances. In summer, the longer the
households are in the home, the more electricity they may use on the appliances. The slightly
less consumption in winter weekends could be led by the outdoor activities, while the summer
in Chengdu is muggy and uncomfortable and people would tend to stay inside at home when
possible. It should be highlighted that despite the stuffy hot weather, there was still a small
part of households (Cluster 2) that would go outside during the summer. To understand the
reasons and the specific differences in demand, the patterns divided to the intraday periods can
be helpful.
Off-peak (11:00-19:00) usage patterns are shown in Figure 2. Of the three periods, be-
haviour patterns during the off-peak (daytime) period is least divided and more similar among
all households. For both summer and winter, there is a group with a significant drop in demand
on weekends (C4 in two seasons). This decline may indicate the nature of the property or the
household: Those people are more likely to be local white-collar workers that are relatively
richer with more than one property in the city, rather than immigrant workers. Because it ap-
pears that they only or mainly live in those properties during workdays and probably go back to
their real home or another property for weekends. On the other hand, C3 in winter and summer
show the exact opposite pattern with the lowest demand on workdays and highest on weekends.
19
They are possibly the “holiday” properties for the people that work (and live) at another loca-
tion during the workweek and only go home during weekends. Those clusters may also include
richer households since they can afford the cost of living in two properties.
To continue the analysis of intra-day patterns, evidence from super-off peak (night-time)
(23:00-7:00) shows even more clearly that there are residents (C3 or C4) who may only be living
in the property during either weekend or workday (Figure 3). For Cluster 3 and 4, the greater
differences in demand at bedtime between workdays and weekends, compared to other clusters,
indicate the possible non-occupancy during some days of a week, since they are not sleeping
at the property and the least electricity usage is needed. One interesting seasonal difference
is on Monday night. The weekend effect of delayed bed-time appears to extend to Monday
in summer, where half of the households (C1, C2, C4) still have a significant higher demand.
However, the similar trend is not found in winter. This could be explained by the summer effect
20
that people would tend to sleep later at night, especially on weekends. Furthermore, the increase
in demand on weekends in night-time exist in other groups as well. It may be due to the delayed
bed-time and more activities after 23:00 during the weekends including watching TV, playing
computer games, etc. The household with the larger magnitude difference could be the young
adult group, for example, C1 in summer, compared with the older people that would tend not to
stay up late even at weekends (C0).
The peak time profiles resemble the total consumption patterns (Figure 4). In both seasons,
the majority of households have relatively flat consumption on weekdays, while experiencing
higher demand on weekends. However, C1 (summer) and C2 (winter) show an opposite trend.
In winter, households who prefer to go outside during weekend peak times (C2) tend to stay
outdoors longer than their counterparts in summer (C1), since the magnitude of the fall in winter
is much bigger than it is in summer. One reason people may tend to stay inside in summer rather
21
than enjoy outdoor activities during their spare time could be the uncomfortable humid summer
weather in Chengdu. Nevertheless, it should be noted that C1 in summer still includes more
households than C1 in winter, which demonstrates that despite the hot weather, a larger fraction
of households would prefer to go outside at weekend peak times, even if they stay out for shorter
than those in winter.
The Spring Festival is the most important family gathering holiday for the Chinese and it is
also one of only two continuous 7-day public holidays in China. Electricity consumption fluc-
tuates dramatically between the start and end of the Festival, which reflect the different holiday
patterns during that period. On the first evening of the Spring Festival holidays, it has become
22
costumery to sit in front of the television and watch the Spring Festival Gala and TV programs
with families. From Figure 5 and Figure 6 of the consumption patterns in peak and super off-
peak times, we can identify four distinct types of households:
Type I: Cluster 0 (in both plots) is likely to be local families that their children live with or
closer to them. It is highly likely that the peak of Cluster 0 on the first day comes from watching
the gala with their family members. However, due to no extra people added in the house, the
older adult households but have guests and relatives come to visit, which add to the household
demand. The spike is caused by more people at home as well as more electrical equipment
usage. For example, for snack preparation/cooking and space-lighting demand.
Type III: Cluster 3 (in both plots) represents another classic holiday pattern in China and
23
could be the younger white-collar workers, while Cluster 1 could be the older or senior workers.
The drop in electricity consumption at night on the days before the Spring Festival may largely
be explained by workers leaving their residences for their hometown or to travel. The Spring
Festival travel season can be extremely hectic and many migrants will choose to leave days
before the public holidays start to avoid the terrible traffic. The differences between Cluster
1 and Cluster 3 are mainly at the timing they leave the residence (where the drop starts) and
when they return (when the consumption resumes to normal). It can be easily seen that Cluster
3 has a latter leaving date — around three days before the Festival, while Cluster 1 leaves
the town earlier at about 5 days before. The difference could be largely explained by the fact
that it would be almost impossible for the younger/junior workers in China to leave their work
many days before the Spring Festival, while the senior employees would be more likely to be
approved to leave earlier before the holiday. Another piece of evidence is that Cluster 3 starts to
return to their residences days before the Festival ends and the demand rapidly resumes to the
normal level after the Spring Festival holidays. Meanwhile Cluster 1 returns to the normal day
24
groups, the cluster has a much earlier departure date and a longer period to get back to normal
after the Festival. Compared to Type III, they seem to be able to leave their residences much
earlier and are not rush to back for work. Thus, two plausible hypothese are that this cluster
reflects either 1) retiree households leaving to visit their adult migrant children; or 2) labourers
who normally leave their work one or even two weeks before the Spring Festival and would not
return until the Lantern Festival (14 days after the Spring Festival).
National Day holidays are the other public holidays that last for seven consecutive days. How-
ever, the behaviour patterns are completely different from the Spring Festival. In general, the
behaviour patterns among households are relatively similar and notably less diverse, compared
25
to the Spring Festival. One of the reasons behind the lower dissimilarity is that fewer immi-
grants will choose to return to their hometown during National Day Holidays. Most Chinese
treat the National Holidays as an opportunity for relaxation whereas the Spring Festival is the
most important time for family gatherings. It would be expected therefore that residential de-
like the Spring Festival holiday, the majority of households fall into one group (C0), which
accounts for over 65% of the sample, which demonstrates that the consumption patterns are
much less diverse and more concentrated over the National Day Holiday. The fact that over
95% of households (i.e., all clusters apart from Cluster 3) do not show a dramatic drop in total
consumption confirms the relatively minimal travel during the Holiday. This may reflect shorter
duration trips, apart from the much larger decrease in Cluster 3, which may reflect relatively
longer-distance journeys.
26
In order to identify the daytime activities during the holidays, we examined consumption
patterns in the off-peak period (Figure 8). The trends in demand across the groups are similar
to the total consumption curves: Three clusters are similar while Cluster 3 may indicate long-
distance and/or longer-duration travel. One interesting finding is that a temporary rise in demand
Figure 9 and Figure 10 describe the patterns of change in residential electricity demand under
the extreme weather in both summer and winter. The consumption changes in winter and sum-
mer can be easily distinguished from each other. And the differences in the patterns are largely
driven by the popularity of air conditioners and limited ownership of space-heating appliances,
due to local climate conditions (typical temperatures in Chengdu in January are 9 °C for typical
high and 2 °C for the typical low).
In general, all customers in summer increase their consumptions rapidly, although to a dif-
ferent degree (Figure 9). During extremely hot weather, the most affected period is the night-
27
time when around 65% of households doubled their usage, apart from Cluster 0. The cluster
(C0) with the smallest rise in the super-off peak may be those who are most concerned with
energy saving or relatively poorer families, which do not own air conditioners. While other
clusters have the highest increase in super off-peak time, Cluster 3 have slightly different pat-
terns that experience a higher increment in the off-peak/day-time. The group is likely to include
those who are retired or self-employed and financially free, leading to the growth of the elec-
tricity during the daytime. Cluster 4, on the other hand, could be those who are more affluent
than other groups. The highest surge in consumptions, where sees an almost tripled demand,
could be led by either larger house sizes or/and more air conditioners.
The consumption patterns in winter differ significantly from the summer profiles (Figure
10). The categorisation is heavily concentrated on two clusters (C0 and C1) accounting for
over 80% of households. In other words, the majority of families share similar patterns of
consumption change during the top 5% coldest days. C0, the largest group, even have a slightly
lower than usual winter consumption during the bed-time, which could be caused by an earlier
bed-time. It should be noted that the ownership of space-heating appliances is not common in
Chengdu. Although portable heaters have become increasingly popular in recent years possibly
28
led by the rising household income, the possibility of leaving the heaters on for the whole night
is low due to safety concerns. In addition, it is also because although the sensible temperature
could be very cold due to the high relative humidity, the 5th percentile of winter temperatures in
Chengdu is still much higher than for major cities in Northern China – around average minimum
temperature of 3ºC versus -15ºC at night. Therefore, no such large central heating system run in
the northern cities has been operated in Chengdu. The people in Chengdu generally have used
to the winter coldness and heating appliances are not seen as necessities among most Chengdu
residents.
5 Conclusion
This paper presented a data-mining based approach to explore and structure a group of elec-
tricity demand profiles for 2,000 households in Chengdu, China. The clustering analysis was
applied to average household electricity profiles in three different contexts (weekday/weekend;
holidays; and extreme weather). Our innovative approach allows us to unravel or infer the life
style and household characteristics from residential electricity demand profiles, without the as-
29
on monthly profiles.
First, the results of the weekend/workday profiles show that there are two groups of house-
holds that appear to be following a pattern of moving between properties within the week. We
surmise those clusters are white-collar or relatively affluent families. In terms of the seasonal
differences between the weekend/workday, the summer weekend consumption for most of the
households is up to some degree, while the counterpart in winter remains unchanged or even
slightly drops. Furthermore, the demand patterns in the major festivals in China unveil various
types of lifestyle and behaviour, especially for the Spring Festival. For one group of older adults
living with or close to their offspring and close family will see a limited increase in electric-
ity use during the Spring Festival’s Eve. Compared to the Spring Festival, the patterns found
during the National Day Holidays are less diverse and more similar to each other. In terms of
the demand changes resulting from extreme weather, we learned that most strikingly, at night-
time, over 72% of households doubled their electricity usage. We expect that the huge increase
is driven by air conditioners due to the high penetration rate of space-cooling appliances. The
consumption changes in cold days, however, does not seem to be significant, which might be ex-
plained by the limited popularity of space-heating appliances in Chengdu and less harsh winter
weather than colder regions such as Northern China.
This paper extends the current knowledge of Chinese residential behaviour patterns. Fur-
ther research on customer classifications and load forecasting could benefit from the study. For
example, a better understanding of customer consumption patterns during festivals and under
extreme weather conditions would assist in load management on special occasions or under
special circumstances. Implementation could be extremely helpful because the critical demand
peak in Chengdu is in summer driven by air-conditioner spikes. In addition, the clustering algo-
30
be directly generalised to other areas due to different climatic and cultural backgrounds, the
methodology proposed in the study can be applied to any region and to build the geographic-
specific knowledge of the consumption behaviour in local areas for further studies. Meanwhile,
the clustered results can be used as the base for future customer classifications. The different
clustering methods offer a unique approach to classifying (new) customers and it could help
build better/specific tariffs. For instance, classification based on the temperature sensitivity in
summer could create a new tariff scheme that aims for shift in the critical peak, and result in
better load management.
There are, of course, some limitations of the current study and further investigation is
needed. First, hourly (or at least higher resolution) consumption data could undoubtedly of-
fer more detailed information on consumption behaviour. The non-time continuous data would
conceal important information. For example, peak-time consumption includes both usage in
the 7:00-9:00 and 19:00-23:00 time slots, although consumption in early morning should be far
lower than in the evening period. Second, it would be very useful to have customer-related at-
tributes and that could assist with new customer classification with greater accuracy. Not having
been a logstanding problem. We encourage more studies aimed at identifying the residential
consumption patterns without the need of such data.
31
To compare K-means with hierarchical clustering, linkages and distance matrixes need to be
selected for hierarchical algorithms before performing clustering. The next sections present
how the selection process was done. The hierarchical algorithms with the most suitable linkages
and distance matrixes were then picked to compare with K-means.
This appendix contains the linkage selection process. As a first step we compared the cluster-
ing results produced by different linkages. The standardised total demands of families on the
selected periods, Saturday and Monday is shown on Figure 1. It can be seen that only Ward and
Average linkages are not sensitive to outliers and can divide households better.
32
The comparison between these two linkages is followed (See Figure 2 and Figure 3). The
linkages with higher Silhouette scores and/or lower DBI scores were selected for different sce-
narios separately.
33
In this section, we used two methods to decide distance matrixes for hierarchical clustering.
Firstly, the Cophenet score was adopted (Figure 4) and a distance matrix with a higher Cophenet
score is regarded as a better-performed choice. To cross-validate the results, we also employed
DBI scores to help the decision process (Figure 5). Similarly, a lower DBI score would suggest
34
35
Based on the selection process shown above, we compared the selected hierarchical algorithms
with K-means through the clustering validity indexes, Silhouette scores, and DBI scores. Figure
6 and 7 demonstrated that irrelevant to cluster numbers, K-means outperformed the hierarchical
clustering (A higher Silhouette core or/and a lower DBI score means clusters are well apart
from each other and clearly distinguished). Therefore, K-means was chosen for the analysis.
Here we only present the clustering results for total demand profiles, because the conclusion
and findings remain the same to the intra-day period profiles.
36
37
aa
Afzalan, M. and Jazizadeh, F. (2019) ‘Residential loads flexibility potential for demand re-
sponse using energy consumption patterns and user segments’, Applied Energy. 254, p.
113693. doi: 10.1016/J.APENERGY.2019.113693.
Al-Wakeel, A., Wu, J. and Jenkins, N. (2016) ‘State estimation of medium voltage distribution
networks using smart meter measurements’, Applied Energy. 184, pp. 207–218.
doi: 10.1016/J.APENERGY.2016.10.010.
Alberini, A. et al. (2019) ‘Hot weather and residential hourly electricity demand in Italy’,
Energy. 177, pp. 44–56. doi: 10.1016/J.ENERGY.2019.04.051.
An, J., Yan, D. and Hong, T. (2018) ‘Clustering and statistical analyses of air-conditioning in-
tensity and use patterns in residential buildings’, Energy and Buildings. 174, pp. 214–227.
doi: 10.1016/J.ENBUILD.2018.06.035.
Andersen, F. M. et al. (2019) ‘Long-term projections of the hourly electricity consumption in
Danish municipalities’, Energy. 186, p. 115890.
doi: 10.1016/J.ENERGY.2019.115890.
Atalla, T. N. and Hunt, L. C. (2016) ‘Modelling residential electricity demand in the GCC
countries’, Energy Economics, 59, pp. 149–158. doi: 10.1016/j.eneco.2016.07.027.
Beckel, C. et al. (2016) ‘Automated customer segmentation based on smart meter data with
temperature and daylight sensitivity’, 2015 IEEE International Conference on Smart Grid
Communications, SmartGridComm 2015, pp. 653–658.
doi: 10.1109/SmartGridComm.2015.7436375.
Blázquez, L., Boogen, N. and Filippini, M. (2013) ‘Residential electricity demand in Spain:
New empirical evidence using aggregate data’, Energy Economics. 36, pp. 648–657. doi:
10.1016/J.ENECO.2012.11.010.
Bloomberg. 2019. China Best-Performing Cities Ranked: Chengdu Overtakes Shenzhen –
Bloomberg News. 23 October 2019.
Access from: https://www.bloomberg.com/news/articles/2019-10-23/chengdu-overtakes-
shenzhen-as-china-s-best-performing-city
Chaturvedi, D. K., Sinha, A. P. and Malik, O. P. (2015) ‘Short term load forecast using fuzzy
38
39
40
41
42
43