Statistical Analysis & Data Interpretation Part: Question No. 1 Part (A)
Statistical Analysis & Data Interpretation Part: Question No. 1 Part (A)
ID:0700732
1
Statistical Analysis & Data Interpretation Part
Question No. 1
Part (a)
Step 1: Data entry
The thermocouple readings are keyed to
the excel sheet as shown.
Use column A for thermocouple TC
1
and
column D for TC
2
.
Step 2: Data analysis
From drop down menu Tools, select Data analysis, then select Descriptive analysis.
ME5534 Research Methodology and Innovation
ID:0700732
2
In the descriptive statistics window, choose the input range for TC
1
from cell A2 to cell
A33 and then select grouped by columns. For the output range select cell A35 and tick
summary statistics and confidence level for means, then set the confidence level at
90, as shown in the figure below. Repeat that for TC
2
with input range from cell D2 to
D32 and the output range as cell D35. Press enter for every step.
The results are in the form:
For (TC
1
)
For (TC
2
)
ME5534 Research Methodology and Innovation
ID:0700732
3
The results are as listed in the following table
TC
1
TC
2
Mean 0.2994 0.3315
Median 0.2998 0.3006
Mode 0.3001 #N/A
Range 0.0109 0.3547
Standard Deviation 0.00234 0.09937
Note that the Mode value for the TC
2
is not available.
Part (b)
The confidence interval depends on the mean value ( x ) and the uncertainty (o ). The
uncertainty depends on the confidence interval, which is 90%.
The confidence level is 90% = 0.9 = o 1 , where o is the level of significance.
1 . 0 9 . 0 1 = = o . Therefore, the probability ) 2 / ( 1 ) ( o = s
c
z z P =0.95.
USE Excel function NORMSINV to evaluate
c
z as shown:
The value of
c
z is 1.645.
For TC
1
, the standard deviation o is 0.00234 and the number of data n is 32. Therefore,
the uncertainty (o ) is ( )
32
0.00234
645 . 1 = =
n
z
c
o
o = 0.00012. The confidence interval is
expressed as o x . Then: the 90% confidence intervals for TC
1
is 00012 . 0 2994 . 0 .
For TC
2
, the standard deviation o is 0.09937 and the number of data n is 31. Therefore,
the uncertainty (o ) is ( )
31
0.09937
645 . 1 = =
n
z
c
o
o = 0.00527. Then: the 90% confidence
intervals for TC
2
is 00527 . 0 3315 . 0 .
ME5534 Research Methodology and Innovation
ID:0700732
4
Part (c)
To construct the histogram for TC
1
, the bins are to be decided. The range is 0.29 to 0.31
and to include 5 classes. Therefore, in any cell (say B5) type the lower limit i.e. 0.29. In the
B6 cell, type the function (=B5 + ((0.310-0.290)/5) and press enter so that the B6 value
becomes 0.294. Use auto fill to fill the cells B7 through B10 i.e. the values will be 0.298,
0.302, 0.306, and 0.310. Omit the first value in the B5 cell and use the values in the cells
B6 to B10 as bins. From Tools menu, select Data Analysis then Histogram.
In the Histogram window, for the Input range, select the entire rage of TC1 measurements.
In the Bin range, select the Bin values in cells B6 to B10, as shown. In the output range,
select cell A35 and tick Chart Output as shown.
ME5534 Research Methodology and Innovation
ID:0700732
5
Repeat the same for the other thermocouple and the results are as follows
The histogram results for TC
1
are:
Bin Frequency
0.294 2
0.298 3
0.302 25
0.306 2
0.310 0
More 0
TC
1
Histogram
0
5
10
15
20
25
30
0.294 0.298 0.302 0.306 0.310 More
Thermocouple Measurement (mV)
F
r
e
q
u
e
n
c
y
Frequency
ME5534 Research Methodology and Innovation
ID:0700732
6
The histogram results for TC
2
are:
Bin Frequency
0.224 5
0.298 9
0.372 8
0.446 4
0.520 5
More 0
Part (d)
To check that the mean measurement for thermocouple 1 is significantly different from the
mean measurements of thermocouple 2 with 98% confidence depends on the sample size.
Since both samples are more than 30 data points, they are large and considered of normally
distributed means. Accordingly, test for significance can be conducted using z test, using
Excel. First the variances are to be evaluated.
First set the null hypothesis
2 1
: x x H
o
=
TC
2
Histogram
0
1
2
3
4
5
6
7
8
9
10
0.224 0.298 0.372 0.446 0.520 More
Thermocouple Measurement (mV)
F
r
e
q
u
e
n
c
y
Series1
ME5534 Research Methodology and Innovation
ID:0700732
7
In any empty cell select Insert menu and then Function then from statistical choose
VAR. Select the data range as shown and press enter to obtain the variance. Do it for
both measurement sets of TC
1
and TC
2
.
The variance for TC
1
is 5.385E-06 and for TC
2
is 0.009875162.
From Tools menu select Statistical Analysis and z-Test: two samples for means.
In the test window set the data range for two samples and set the mean difference to 0 and
use the variance of each sample. Use alpha 0.02 (as 98% confidence is required) and set
the output to any cell.
ME5534 Research Methodology and Innovation
ID:0700732
8
The results are as follows
The absolute value of z is less than
c
z for both one-tail and two-tail tests. Since
c
z z s ,
the hypothesis
o
H is accepted. Therefore, it can be said that there is not significant
difference between the means of the two thermocouple measurements with 98%
confidence.
Part (e)
The population mean
o
is 0.3 mV and standard deviation o is 0.01 mV. The confidence
level is 95% therefore the significance level o is 0.05.
For TC
1
:
Set the null hypothesis
o o
x H =
1
:
The mean
1
x is 0.2994 mV, the standard deviation
1
o is 0.00234 mV and the number of
data
1
n is 32. Calculate z from
|
|
.
|
\
|
=
1
1
n
x
z
o
o
=
|
.
|
\
|
32
01 . 0
3 . 0 2994 . 0
= -1.96
) 2 / ( 1 ) ( o = s
c
z z P = 1- (0.05/2) = 0.975
Use NORMSINV in Excel to find
c
z = 1.96
ME5534 Research Methodology and Innovation
ID:0700732
9
The hypothesis is accepted as
c
z z = . Therefore, the thermocouple TC
1
is reliable with
95% confidence, i.e. the mean is not significantly different from 0.30 mV.
For TC
2
:
Set the null hypothesis
o o
x H =
2
:
The mean
2
x is 0.3315 mV, the standard deviation
2
o is 0.09937 mV and the number of
data
2
n is 31. Calculate z from
|
|
.
|
\
|
=
2
2
n
x
z
o
o
=
|
.
|
\
|
31
01 . 0
3 . 0 3315 . 0
= 97.67.
) 2 / ( 1 ) ( o = s
c
z z P = 1- (0.05/2) = 0.975. Use NORMSINV in Excel to find
c
z = 1.96.
The hypothesis is rejected as
c
z z > . Therefore, the thermocouple TC
2
is not reliable with
95% confidence, i.e. the mean is significantly different from 0.30 mV.
ME5534 Research Methodology and Innovation
ID:0700732
10
Question No. 2
Part (1)
The vapour pressure (
v
p ) follows an exponential relationship to the inverse of the
temperature (T ) according to the equation:
|
.
|
\
|
=
T
C
D p
v
exp (1)
Taking the natural logarithms of each side to obtain:
( ) ( )
|
.
|
\
|
+ =
T
C
D p
v
ln ln (2)
Define:
( )
v
p y ln = (3)
T
x
1
= (4)
C a = (5)
( ) D b ln = (6)
Therefore, equation 2 takes the linear form b ax y + = . Enter the values to the Excel sheet
as shown in the figure.
ME5534 Research Methodology and Innovation
ID:0700732
11
Evaluate the following:
x = 0.0343878 y = 47.838213 xy = 0.1440682
x
2
= 0.0001086 (x)
2
= 0.0011825
Evaluate the constants as follows:
( )( ) ( )( )
( ) ( )( )
=
2
2
x n x
xy n y x
a = -5200.762
( )( ) ( )( )
( ) ( )( )
=
2
2
2
x n x
y x xy x
b = 20.607
From equation 5, the parameter C = 5200.762.
From equation 6, the parameter ) 607 . 20 exp( ) exp( = = b D = 890,238,534 = 8.902410
8
.
Then equation 1 can be rewritten as:
( )
|
.
|
\
|
=
T
p
v
762 . 5200
exp 10 9024 . 8
8
Part (2)
Plot of the data
0
100
200
300
400
500
600
700
800
250 300 350 400
Temperature (K)
P
r
e
s
s
u
r
e
(
m
m
H
g
)
ME5534 Research Methodology and Innovation
ID:0700732
12
The plot of the curve fitted is as follows and on the plot the equation of the linear best fit is
displayed. It is clear that the results obtained for the constants from the manual calculations
and from the Excel are very close.
y = -5200.8x + 20.607
0
1
2
3
4
5
6
7
0.0025 0.0030 0.0035 0.0040
(1/T)
L
n
(
p
v
)
ME5534 Research Methodology and Innovation
ID:0700732
13
Question No. 3
Part (1)
The data are entered to the excel sheet as follows:
The model is given as:
3 2 23 3 1 13 2 1 12
2
3 33
2
2 22
2
1 11 3 3 2 2 1 1
x x a x x a x x a x a x a x a x a x a x a a y
o
+ + + + + + + + + =
Define
1
1 x X =
2
2 x X =
3
3 x X =
2
1
4 x X =
2
2
5 x X =
2
3
6 x X =
2 1
7 x x X =
3 1
8 x x X =
3 2
9 x x X =
Therefore, the model becomes:
ME5534 Research Methodology and Innovation
ID:0700732
14
9 23 8 13 7 12 6 33 5 22 4 11 3 3 2 2 1 1
X a X a X a X a X a X a X a X a X a a y
o
+ + + + + + + + + =
The values of X1 through X9 are entered to excel along with y as shown:
From the menu Tools, select Statistical Analysis, select Regression.
In the regression window, select the data range for y and the whole data range for all X
values. Tick confidence level 95% and select cell for the data to be displayed. Tick the
choices as shown in the following figure.
ME5534 Research Methodology and Innovation
ID:0700732
15
The results are as follows:
ME5534 Research Methodology and Innovation
ID:0700732
16
The parameters in the model equation are:
Coefficients
Intercept o
o
2122.125
X Variable 1 1
o
-379.625
X Variable 2 2
o
109.375
X Variable 3 3
o
-23.183
X Variable 4 11
o
16.5
X Variable 5 22
o
-4.5
X Variable 6 33
o
0.0678
X Variable 7 12
o
-4
X Variable 8 13
o
2.0167
X Variable 9 23
o
-0.683
From the results obtained, the variable x
2
should be removed from the model mainly
because all the terms includes this variable showed considerably higher P value which
indicate the non significance of the parameter.
ME5534 Research Methodology and Innovation
ID:0700732
17
Part (2)
The model is given as:
3 1 13
2
3 33
2
1 11 3 3 1 1
x x x x x x y
o
| | | | | | + + + + + =
Define
y Y = 1
1
1 x X =
3
2 x X =
2
1
3 x X =
2
3
4 x X =
3 1
5 x x X =
Therefore, the model becomes:
5 4 3 2 1
13 33 11 3 1
X X X X X Y
o
| | | | | | + + + + + =
ME5534 Research Methodology and Innovation
ID:0700732
18
The regression results are as follows:
The parameters in the model equation are:
Coefficients
Intercept o
|
167.56
X Variable 1 1
|
-27.75
X Variable 2 3
|
-1.66
X Variable 3 11
|
1.23
X Variable 4 33
|
0.0044
X Variable 5 13
|
0.1356
ME5534 Research Methodology and Innovation
ID:0700732
19
Part (3)
The model prediction errors for all experiment setting with both models are as follows:
For the first model
y
First Model Second Model
Predicted Y Residuals (Errors) Predicted Y Residuals (Error)
24 13.750 10.250 24.654 -0.654
28 20.625 7.375 20.656 7.344
40 56.875 -16.875 49.958 -9.958
42 28.000 14.000 24.654 17.346
11 4.375 6.625 20.933 -9.933
16 21.000 -5.000 20.305 -4.305
126 109.125 16.875 109.003 16.997
34 31.125 2.875 20.933 13.067
32 46.000 -14.000 42.373 -10.373
32 38.625 -6.625 41.003 -9.003
34 44.250 -10.250 42.373 -8.373
17 24.375 -7.375 23.317 -6.317
30 21.000 9.000 20.305 9.695
17 21.000 -4.000 20.305 -3.305
50 52.875 -2.875 41.003 8.997
From the prediction error, the first model is better. The mean of the error distribution from
the first model is 0.00 whereas the mean from the second model is 0.748. If the error term
for each observation is drawn from a distribution that has a mean of zero, then the sum of
squared errors criterion generates estimates that are unbiased and consistent. That is, we
can imagine that for each observation in the sample, nature draws an error term from a
different probability distribution. As long as each of these distributions has a mean of zero,
the minimum sum of squared errors (SSE) criterion is unbiased and consistent. This
assumption is logically sufficient to ensure that one other condition holdsnamely, that
each of the explanatory variables in the model is uncorrelated with the expected value of
the error term.