0% found this document useful (0 votes)
22 views

Lecture 3

Uploaded by

Basith Bhai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Lecture 3

Uploaded by

Basith Bhai
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

Lecture 3: Data-driven maintenance

Inverse modelling
Building Energy Management and Optimization
CIVE 5708

Araz Ashouri PhD


Adjunct Research Professor
Civil & Environmental Engineering
Taken from Intelligent Buildings and Big Data CABA Report
CopperTree – BEMS
Delta Controls - BAS
Ecotracker - Benchmarking
Data-driven methods
to guide maintenance
• Data-driven models
• Use data records of potential predictor variables (e.g., outdoor
temperature, humidity, solar radiation)
• To explain a variable of interest (e.g., indoor temperature, energy
use intensity)
• Greybox models – simplified physical models (e.g., RC network
models)
• Blackbox models – nonphysical models (e.g., ANNs and SVMs)
• Forward models
• To make predictions of a system's response
• e.g., when should I should start heating to meet the setpoints at 6
pm?
• Inverse models
• To characterize, monitor, benchmark a system's performance
• e.g., why does my home perform worse than my neighbour's home? Is
it due to high air-leakage rates?
Agenda
Conceptual background
• Mathematical models
• Types of problems in mathematical modelling
• Types of applied data analysis and modeling methods
• Multiple linear regression models
Applied theory
Matlab example
Conceptual background
Mathematical models
• Categorical data (categories)
• male/female, pass/fail
• Ordinal data (rank)
• hot/mild/cold day
• Count
• number of items in certain classes
• Metric
• measurements of time, weight, height

Dimensionality of data: univariate, bi-variate, multivariate


y=f(x1) y=f(x1, x2) y=f(x1, x2, … , xk)
Conceptual background
Mathematical models
Discrete vs. continuous value
Conceptual background
Mathematical models
Conceptual background
Mathematical models
• A system is the object under study which could be as simple
or as complex as one may wish to consider. It is any ordered,
inter‐related set of things, and their attributes.
• A signal is the fundamental quantity of representing some
information. It could be a description of a parameter, or an
input or output to/from a system.
• A model is a construct which allows one to represent the
real‐life system or signal. A system model can be used to
predict the behavior of the system under various inputs or
scenarios. A signal model can be used to predict values of
the signal in an arbitrary time or space.
• input variables (x) are either controllable by the
experimenter, uncontrollable, or exogenous such as climatic
variables;
• system structure f(x) and parameters/properties provide
the necessary physical description of the systems in terms
of physical and material constants; for example, thermal
mass, overall heat transfer coefficients, mechanical
properties of the elements;
• output variables (y) describe response of the system to the
input variables.
Feedback

Many Types of Mathematical Models


Available to Define a System
Controlled inputs
Heat input from radiator
and VAV terminal unit Output
Indoor
Heat Transfer
temperature
Uncontrolled/Disturbances Model of a Thermal Zone
response
Heat gains from the
Sun, people, lighting,
electric equipment
Conceptual background
Mathematical models

• Distributed vs lumped parameter


• Dynamic vs static or steady-state
• Deterministic vs stochastic
• Continuous vs discrete
• Linear vs non-linear in the functional model
• Linear vs non-linear in the model parameters
• Time invariant vs time variant
• Physics based (white box) vs data based (black box) and
mix of both (grey box)
• Forward vs inverse
Distributed vs lumped parameter
Linear vs non-linear

Non-linear model examples:

y = a sinx + bx
(non-linear in model)

y = a exp(bx)
(non-linear in both model and parameters)
Representation of Physics
• White-box models are based on the laws of physics
• Black-box models are based on little or no physical behavior
of the system and rely on the available data to identify the
model structure
• Gray-box models fall in-between the two above categories
Forward problems
• The objective is to predict the response or state
variables of a specified model with known structure
and known parameters when subject to specified
input or forcing variables.

Unique solutions: Well‐defined or ‐specified problems

• This is the type of models which is implicitly


studied in classical mathematics and also in system
simulation design courses.
Inverse problems

• Inverse problems involve


identification of model
structure and/or estimates of
model parameters where the
system under study already
exists, and one uses measured
or observed inputs and
outputs to aid in the model
building.
Inverse problems
• Two types of inverse problem which both require some
sort of identification or estimation:
1. Calibrated forward models where one uses a mechanistic
model originally developed for the purpose of system
simulation and modifies or “tunes” the numerous model
parameters so that model predictions match observed system
behavior as closely as possible.

Calibrated BPS Model

Reddy, T. Agami, Itzhak Maor, and Chanin Panjapornpon. "Calibrating


detailed building energy simulation programs with measured data—
Part I: General methodology (RP-1051)." HVAC&R Research 13.2
(2007): 221-241.
2. Model selection and parameter estimation
where a suite of plausible model structures are formulated
from basic scientific and engineering principles involving
known influential and physically-relevant regressors, and
performing experiments (or identifying system performance
data) which allows these competing models to be evaluated
and the “best” model identified
Applied theory
Multiple linear regression models

Also known as
regressand,
output, and
dependent
variable

Also known as predictor, feature, input, and independent variable


Example for the least squares method:
You took three heating load measurements of 6 MW, 0 MW,
and 0 MW at 0 °C, 10 °C, and 20°C, respectively. Find the
best fit line through these data points.
𝐴𝑥 = 𝑏 7
6

𝑥 = 𝐴𝑇 𝐴 −1 𝐴𝑇 𝑏 5

Heating load (MW)


4
3
2
1
𝑎1 + 𝑎2 𝑥 = 𝑏 → 0
-1 0 5 10 15 20 25
𝑎1 + 𝑎2 ⋅ 0° 𝐶 = 6
-2
𝑎1 + 𝑎2 ⋅ 10° 𝐶 = 0 Outdoor temperature
𝑎1 + 𝑎2 ⋅ 20° 𝐶 = 0
1 0 6
𝐴 = 1 10 b= 0
1 20 0
𝑥 = 𝐴𝑇 𝐴 −1 𝐴𝑇 𝑏
1 0 −1 6
1 1 1 1 1 1 5
𝑥= 1 10 0 =
0 10 20 0 10 20 −0.3
1 20 0
Error metrics
Applied theory
Multiple linear regression models
Model evaluation

Coefficient of determination R2
Sum of
Squares Total
explained variation of y SSR
2
R = 
total variation of y SST Sum of Squares
Regression

For a perfect fit, R2 = 1, while R2 = 0 indicates that either the model is useless
or that no relationship exists.
R2 is a misleading statistic if models with different number of regressor variables
are to be compared. The reason for this is that R2 does not account for the
number of degrees of freedom, it cannot but increase as additional variables are
included in the model even if these variables have very little explicative power.
Root mean squared error
1/2 Sum of
 SSE  Squares Error
RMSE   
 n 

The RMSE is an absolute measure and its range is 0 ≤ RMSE ≤ ∞.


Its units are the same as those of the y variable. It is also referred
to as “standard error of the estimate”.
Note that:

SST = SSR + SSE


Coefficient of Variance of Root Mean Square Error

RMSE
CV 
y
Mean Absolute Error
Assuring model parsimony*

*Explaining data with minimum number of parameters


1st degree Polynomial 2nd degree Polynomial

3nd degree Polynomial 4th degree Polynomial

𝑦 = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥 2 + ⋯ + 𝑎𝑛 𝑥 𝑛
Which one is better?

Credit: Andrew Moore


Cross Validation

1. Randomly choose 30% of the data


to be in a test set
2. The remainder is a training set

Credit: Andrew Moore


1. Randomly choose 30% of the data
to be in a test set
2. The remainder is a training set
3. Perform your regression on the
training set

Credit: Andrew Moore


RMSE 1.5

1. Randomly choose 30% of the data


to be in a test set
2. The remainder is a training set
RMSE 0.6
3. Perform your regression on the
training set
4. Estimate the performance of the
trained model in predicting the test
set
RMSE 1.6

1. Randomly choose 30% of the data


to be in a test set
2. The remainder is a training set
RMSE 0.6
3. Perform your regression on the
training set
4. Estimate the performance of the
trained model in predicting the test
set
• It is useful to study the significance of individual regressors on the
overall statistical fit in the presence of all other regressors. The student
t-statistic is widely used for this purpose. Simply put, it enables an
answer to the following question: would the fit become poorer if the
regressor in question is not used in the model at all?
• Your regression software compares the t-statistic on your variable with
values in the Student's t distribution to determine the P-value. Make the
decision by looking at the P-value

• The standard error is an estimate of the standard deviation of the coefficient


• The t-statistic is the coefficient divided by its standard error
• A P-value of 5% or less is the generally accepted

Source: https://dss.princeton.edu/online_help/analysis/interpreting_regression.htm
AIC– alternative measure for parsimony
• Akaike Information Criterion
• measures of the badness of a statistical model with
parameters determined by the method of maximum
likelihood
• Given a collection of models for the data, AIC estimates the
quality of each model, relative to each of the other models
• For a given dataset, you can build multiple models and select
the one that gives the smallest AIC value

𝑺𝑺𝑬 mdl = fitlm(x,y);


𝑨𝑰𝑪 = 𝒏 ∙ 𝒍𝒐𝒈 +𝟐∙𝒑
𝒏 AIC = mdl.ModelCriterion.AIC
Handling multiple predictors
e.g., outdoor temperature
solar radiation
e.g., heating energy use
occupancy
wind speed

e.g., indoor temperature


e.g., frequency at which
outdoor temperature
complaints are generated
relative humidity
Multi-collinearity*

*Situation where predictors in a multiple regression problem are highly linearly related
• Let’s assume that we have three input candidates
to predict the frequency of “too hot” complaints:
i. Outdoor temperature
ii. Indoor temperature
iii. Relative humidity
• Each variable seems to exhibit some correlation to
the response variable
• But, they all seem to correlate with each other as
well: e.g., indoor temperature is affected by the
outdoor temperature; relative humidity is affected
by the indoor and outdoor temperatures
• Do you need all three variables to build the model
or only a subset? If so, which ones?
Exhaustive search approach
Fit a model by using every possibly combination
possible and select the best one possible
a) Tin
b) Tout
c) RH
d) Tin + Tout Exhaustive search works well with a
e) Tin + RH few input candidates, but not so
f) Tout + RH well with many input candidates
g) Tin + Tout + RH
p
 p
p number of

 i 
models
3 7

i 1  
4 15
6 63
10 1023
Forward stepwise regression
Fit a model with each input candidate individually,
and keep adding a new input to the best inputs from
the previous round.

a) Tin
a) Tin + Tout a) Tin + Tout + RH
b) Tout
b) Tin + RH
c) RH
p number of
models
3 6
Select the best of each round’s top model (bolded)
4 10
That’s your selected model!
6 21
10 55
Backward stepwise regression
• Start with all input candidates are used, remove the
worst parameter one at a time

a) Tin + Tout
a) Tin
a) Tin + Tout + RH b) Tin + RH
b) Tout
c) Tout + RH

Select the best of each round’s top model (bolded)


That’s your selected model!
Assuring model appropriateness
Residual analysis
• The difference between the observed value of the
dependent variable (y) and the predicted value (ŷ) is called
the residual (e). Each data point has one residual.

x y ŷ e
60 70 65.411 4.589
70 65 71.849 -6.849
80 70 78.288 -8.288
85 95 81.507 13.493
95 85 87.945 -2.945
Unbiased residuals

Side note: The standard deviation of the residuals is equal to the RMSE
When dealing with time series…

The residuals should be


uncorrelated to their past.

When you shift a residuals by a


certain lag, the resultant
autocorrelation should be near zero.

Read about Matlab function


autocorr(y, numLags)
Ensure that your model works at all regions of the solution space (when it is
cold / hot, windy / still, sunny / overcast, etc.)
Residuals should be unbiased and relatively similar at different parts of the
solution space
Summary
• Least squares and multiple linear regression models
• Error metrics
• RMSE, MAE, R2, CV, p-value, t-statistic
• Model selection
• Cross validation
• AIC
• Model development
• Multi-collinearity
• Forward stepwise regression
• Backward stepwise regression
• Exhaustive search
• Model appropriateness
• Residual analysis
Matlab Example
clc; clear; close all;

[num,txt,raw] =
xlsread('SpaceHeatingCoolingLoads.xlsx','Cooling Load');

time = datenum(txt(2:end,1));

tOut = num(:,1); % outdoor temperature (degC)


sWind = num(:,2); % wind speed (m/s)
qSol = num(:,3); % solar irradiance (W/m2)
qClg = num(:,end); % cooling energy use intensity (W/m2)
%% 1. Forward stepwise regression

% 1.1. first round


mdl_1 = fitlm(tOut,qClg);
mdl_2 = fitlm(sWind,qClg);
mdl_3 = fitlm(qSol,qClg);

% plot models for visual inspection


plot(time,qClg,'k'); % measured
hold on
plot(time,mdl_1.Fitted,'b');
hold on
plot(time,mdl_2.Fitted,'r');
hold on
plot(time,mdl_3.Fitted,'g');
dateFormat = 3; %
https://www.mathworks.com/help/matlab/ref/datetick.html#btpmlwj-1-
dateFormat
datetick('x',dateFormat,'keepticks')
close all

[ind,ind] = min([mdl_1.ModelCriterion.AIC,...
mdl_2.ModelCriterion.AIC,...
mdl_3.ModelCriterion.AIC]);
% 1.2. second round
mdl_4 = fitlm([tOut,sWind],qClg);
mdl_5 = fitlm([tOut,qSol],qClg);

% plot models for visual inspection


plot(time,qClg,'k'); % measured
hold on
plot(time,mdl_4.Fitted,'b');
hold on
plot(time,mdl_5.Fitted,'r');
dateFormat = 3; %
https://www.mathworks.com/help/matlab/ref/datetick.html#btpmlw
j-1-dateFormat
datetick('x',dateFormat,'keepticks')
close all

[ind,ind] = min([mdl_1.ModelCriterion.AIC,...
mdl_2.ModelCriterion.AIC,...
mdl_3.ModelCriterion.AIC,...
mdl_4.ModelCriterion.AIC,...
mdl_5.ModelCriterion.AIC]);
% 1.3. third round
mdl_6 = fitlm([tOut,qSol,sWind],qClg);

% plot models for visual inspection


plot(time,qClg,'k'); % measured
hold on
plot(time,mdl_6.Fitted,'b');
dateFormat = 3; %
https://www.mathworks.com/help/matlab/ref/datetick.html#btpml
wj-1-dateFormat
datetick('x',dateFormat,'keepticks')
close all

[ind,ind] = min([mdl_1.ModelCriterion.AIC,...
mdl_2.ModelCriterion.AIC,...
mdl_3.ModelCriterion.AIC,...
mdl_4.ModelCriterion.AIC,...
mdl_5.ModelCriterion.AIC,...
mdl_6.ModelCriterion.AIC]);
%% 2. model assessment

mdl_6.Coefficients

% 2.1 residual analysis

histogram(mdl_6.Residuals.Raw,'Normalization','pdf')
xlabel('Residuals (W/m^{2})')
ylabel('Frequency')
close all

autocorr(mdl_6.Residuals.Raw,24*7)
close all
Bibliography
Kissock, J. K., Reddy, T. A., & Claridge, D. E. (1998). Ambient-temperature regression analysis
for estimating retrofit savings in commercial buildings. Journal of Solar Energy
Engineering, 120(3), 168-176.
Moore, A. W. (2001). Cross-validation for detecting and preventing overfitting. School of
Computer Science Carneigie Mellon University.
Gunay, H. B., O'Brien, W., Beausoleil-Morrison, I., & Bursill, J. (2018). Development and
implementation of a thermostat learning algorithm. Science and Technology for the Built
Environment, 24(1), 43-56.
Gunay, H. B., Shen, W., & Yang, C. (2017). Blackbox modeling of central heating and cooling
plant equipment performance. Science and Technology for the Built Environment, 1-14.
Gunay, B., Shen, W., & Newsham, G. (2017). Inverse blackbox modeling of the heating and
cooling load in office buildings. Energy and Buildings, 142, 200-210.

You might also like