Lecture 3
Lecture 3
Inverse modelling
Building Energy Management and Optimization
CIVE 5708
y = a sinx + bx
(non-linear in model)
y = a exp(bx)
(non-linear in both model and parameters)
Representation of Physics
• White-box models are based on the laws of physics
• Black-box models are based on little or no physical behavior
of the system and rely on the available data to identify the
model structure
• Gray-box models fall in-between the two above categories
Forward problems
• The objective is to predict the response or state
variables of a specified model with known structure
and known parameters when subject to specified
input or forcing variables.
Also known as
regressand,
output, and
dependent
variable
𝑥 = 𝐴𝑇 𝐴 −1 𝐴𝑇 𝑏 5
Coefficient of determination R2
Sum of
Squares Total
explained variation of y SSR
2
R =
total variation of y SST Sum of Squares
Regression
For a perfect fit, R2 = 1, while R2 = 0 indicates that either the model is useless
or that no relationship exists.
R2 is a misleading statistic if models with different number of regressor variables
are to be compared. The reason for this is that R2 does not account for the
number of degrees of freedom, it cannot but increase as additional variables are
included in the model even if these variables have very little explicative power.
Root mean squared error
1/2 Sum of
SSE Squares Error
RMSE
n
RMSE
CV
y
Mean Absolute Error
Assuring model parsimony*
𝑦 = 𝑎0 + 𝑎1 𝑥 + 𝑎2 𝑥 2 + ⋯ + 𝑎𝑛 𝑥 𝑛
Which one is better?
Source: https://dss.princeton.edu/online_help/analysis/interpreting_regression.htm
AIC– alternative measure for parsimony
• Akaike Information Criterion
• measures of the badness of a statistical model with
parameters determined by the method of maximum
likelihood
• Given a collection of models for the data, AIC estimates the
quality of each model, relative to each of the other models
• For a given dataset, you can build multiple models and select
the one that gives the smallest AIC value
*Situation where predictors in a multiple regression problem are highly linearly related
• Let’s assume that we have three input candidates
to predict the frequency of “too hot” complaints:
i. Outdoor temperature
ii. Indoor temperature
iii. Relative humidity
• Each variable seems to exhibit some correlation to
the response variable
• But, they all seem to correlate with each other as
well: e.g., indoor temperature is affected by the
outdoor temperature; relative humidity is affected
by the indoor and outdoor temperatures
• Do you need all three variables to build the model
or only a subset? If so, which ones?
Exhaustive search approach
Fit a model by using every possibly combination
possible and select the best one possible
a) Tin
b) Tout
c) RH
d) Tin + Tout Exhaustive search works well with a
e) Tin + RH few input candidates, but not so
f) Tout + RH well with many input candidates
g) Tin + Tout + RH
p
p
p number of
i
models
3 7
i 1
4 15
6 63
10 1023
Forward stepwise regression
Fit a model with each input candidate individually,
and keep adding a new input to the best inputs from
the previous round.
a) Tin
a) Tin + Tout a) Tin + Tout + RH
b) Tout
b) Tin + RH
c) RH
p number of
models
3 6
Select the best of each round’s top model (bolded)
4 10
That’s your selected model!
6 21
10 55
Backward stepwise regression
• Start with all input candidates are used, remove the
worst parameter one at a time
a) Tin + Tout
a) Tin
a) Tin + Tout + RH b) Tin + RH
b) Tout
c) Tout + RH
x y ŷ e
60 70 65.411 4.589
70 65 71.849 -6.849
80 70 78.288 -8.288
85 95 81.507 13.493
95 85 87.945 -2.945
Unbiased residuals
Side note: The standard deviation of the residuals is equal to the RMSE
When dealing with time series…
[num,txt,raw] =
xlsread('SpaceHeatingCoolingLoads.xlsx','Cooling Load');
time = datenum(txt(2:end,1));
[ind,ind] = min([mdl_1.ModelCriterion.AIC,...
mdl_2.ModelCriterion.AIC,...
mdl_3.ModelCriterion.AIC]);
% 1.2. second round
mdl_4 = fitlm([tOut,sWind],qClg);
mdl_5 = fitlm([tOut,qSol],qClg);
[ind,ind] = min([mdl_1.ModelCriterion.AIC,...
mdl_2.ModelCriterion.AIC,...
mdl_3.ModelCriterion.AIC,...
mdl_4.ModelCriterion.AIC,...
mdl_5.ModelCriterion.AIC]);
% 1.3. third round
mdl_6 = fitlm([tOut,qSol,sWind],qClg);
[ind,ind] = min([mdl_1.ModelCriterion.AIC,...
mdl_2.ModelCriterion.AIC,...
mdl_3.ModelCriterion.AIC,...
mdl_4.ModelCriterion.AIC,...
mdl_5.ModelCriterion.AIC,...
mdl_6.ModelCriterion.AIC]);
%% 2. model assessment
mdl_6.Coefficients
histogram(mdl_6.Residuals.Raw,'Normalization','pdf')
xlabel('Residuals (W/m^{2})')
ylabel('Frequency')
close all
autocorr(mdl_6.Residuals.Raw,24*7)
close all
Bibliography
Kissock, J. K., Reddy, T. A., & Claridge, D. E. (1998). Ambient-temperature regression analysis
for estimating retrofit savings in commercial buildings. Journal of Solar Energy
Engineering, 120(3), 168-176.
Moore, A. W. (2001). Cross-validation for detecting and preventing overfitting. School of
Computer Science Carneigie Mellon University.
Gunay, H. B., O'Brien, W., Beausoleil-Morrison, I., & Bursill, J. (2018). Development and
implementation of a thermostat learning algorithm. Science and Technology for the Built
Environment, 24(1), 43-56.
Gunay, H. B., Shen, W., & Yang, C. (2017). Blackbox modeling of central heating and cooling
plant equipment performance. Science and Technology for the Built Environment, 1-14.
Gunay, B., Shen, W., & Newsham, G. (2017). Inverse blackbox modeling of the heating and
cooling load in office buildings. Energy and Buildings, 142, 200-210.