Results
Results
Group 5
2023-12-18
3 Results
Contingency Table
The very first column indicates the Socio-economic status of student’s family, while the very first row of
this contingency table indicates the Type of program conducted in this statistics. In the middle of this
contingency table, we can see the outcomes. The last column and last row indicates the summary for the
outcomes.
Chi-square test
Interpretation: Provided that p-value (0.0482) is less than 0.05, the null hypothesis is rejected, suggesting
that there is an association between Socio-economic status of student’s family and the Type of program.
Both Variables are dependent from each other.
1
Summary Statistics
x
nbr.val 100
nbr.null 0
nbr.na 0
min 29
max 69
range 40
sum 5242
median 53
mean 52
SE.mean 1
CI.mean.0.95 2
var 85
std.dev 9
coef.var 0
skewness 0
skew.2SE -1
kurtosis -1
kurt.2SE -1
normtest.W 1
normtest.p 0
The summary statistics provides the average mean score of 52 for science. The difference in gap between the
highest and lowest score is 46 with a median at 53. The skewness is at 0 indicating that the data is fairly
symmetrical. The Kurtosis is at -1 indicating that the distribution is light tails or a platykurtic distribution.
2
Histogram
0.02
0.00
30 40 50 60 70
Science
Interpretation: The Histogram shows a fairly symmetrical distribution for science with a mean score of
52. The density curve aids with the interpretation of the distribution.
3
30 40 50 60 70
Interpretation: Since p-value (0.5) is equal to 0.05 so we do not reject the null hypothesis. Therefore,
there is no sufficient evidence to conclude that the true mean is greater than 52.
One-way ANOVA
Interpretation: Provided that p-value (0.02703) is less than 0.05, the null hypothesis is rejected, suggesting
that there is difference in the science means based on the program.
4
Regression Analysis
Call:
lm(formula = science ~ ., data = hsb)
Residuals:
Min 1Q Median 3Q Max
-13.3612 -2.5502 -0.0756 3.0082 11.3224
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.21537 5.94419 -0.204 0.838474
sexmale 2.55652 1.28261 1.993 0.049409 *
raceasian -0.43339 3.10646 -0.140 0.889372
racehispanic -2.53456 2.47415 -1.024 0.308512
racewhite 2.34978 1.80717 1.300 0.196988
seslow -0.69219 2.01909 -0.343 0.732570
sesmiddle -1.13038 1.59743 -0.708 0.481092
schtyppublic 1.25630 1.56326 0.804 0.423823
proggeneral 4.02147 1.63033 2.467 0.015622 *
progvocational 4.10986 1.89177 2.172 0.032570 *
read 0.22523 0.08334 2.703 0.008290 **
write 0.37379 0.09783 3.821 0.000251 ***
math 0.31383 0.09408 3.336 0.001257 **
socst 0.01062 0.07893 0.135 0.893269
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Response: science
Df Sum Sq Mean Sq F value Pr(>F)
sex 1 1.80 1.80 0.0546 0.815864
race 3 1497.15 499.05 15.1574 5.309e-08 ***
ses 2 382.06 191.03 5.8020 0.004328 **
schtyp 1 6.26 6.26 0.1902 0.663826
prog 2 327.43 163.72 4.9724 0.009047 **
read 1 2074.58 2074.58 63.0101 7.063e-12 ***
write 1 956.71 956.71 29.0578 6.064e-07 ***
math 1 366.25 366.25 11.1240 0.001259 **
socst 1 0.60 0.60 0.0181 0.893269
Residuals 86 2831.51 32.92
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Interpretation: The Multiple R squared is 0.5558 or 55.58%. The model is statistically significant since
the p-value (1.533e-10) is less than the alpha. Variables that are predictors of science scores are read, write,
and math. These variables have low p-value which indicates that these predictors are significant. The table
for ANOVA shows that the rest sex, ses, schtyp, and socst have values greater than 0.05.
5
Best Subsets Regression
-------------------------------------------------------------
Model Index Predictors
-------------------------------------------------------------
1 read
2 read math
3 race write math
4 prog read write math
5 race prog read write math
6 sex race prog read write math
7 sex race schtyp prog read write math
8 sex race ses schtyp prog read write math
9 sex race ses schtyp prog read write math socst
-------------------------------------------------------------
Diagnostic Checking
Normality
Interpretation Our p-value (0.7723) is greater than 0.05, do not reject the null hypothesis. Therefore, we
conclude that the residuals/errors are normally distributed.
Homoscedasticity
6
studentized Breusch-Pagan test
data: mmodel
BP = 23.364, df = 13, p-value = 0.0375
Goldfeld-Quandt test
data: mmodel
GQ = 0.53089, df1 = 36, df2 = 36, p-value = 0.9693
alternative hypothesis: variance increases from segment 1 to 2
Interpretation Using the studentized Breusch-Pagan test, we are able to identify that the error term is
the same across all values of the independent Variable since our p-value (0.4629)is greater than 0.05. Thus
Heteroscedasity is not present. We then use Goldfeld-Quandt test to check the variance of residuals which
in this case shows that p-value (0.9276) is greater than 0.05. Therefore, we conclude upon that there is
sufficient evidence to prove that Homoscedasticity is present in this model.
Muli-collinearity
GVIF Df GVIF^(1/(2*Df))
sex 1.241147 1 1.114068
race 1.425362 3 1.060851
ses 1.829725 2 1.163045
schtyp 1.142305 1 1.068787
prog 1.991833 2 1.187991
read 2.206004 1 1.485262
write 2.817441 1 1.678523
math 2.102655 1 1.450053
socst 2.214519 1 1.488126
Interpretation This measure of collinearity (VIF) shows that there are no values greater than 5. Therefore,
we conclude that there is no Multi-collineanarity in this model. No multicollinearity issue.
Independence
Durbin-Watson test
data: mmodel
DW = 1.9923, p-value = 0.5011
alternative hypothesis: true autocorrelation is greater than 0
Interpretation Since the p-value (0.7754) is greater than 0.05, we do not reject the null hypothesis. There-
fore, the errors are uncorrelated.
7
Standardized residuals
Residuals vs Fitted Q−Q Residuals
108
0 10
108
2
Residuals
0
−2
−15
63 113 113
63
35 40 45 50 55 60 65 −2 −1 0 1 2
Standardized residuals
Scale−Location Residuals vs Leverage
63 108
108 113
2
3
1.0
0
Cook's63distance
0.0
−3
35 40 45 50 55 60 65 0.00 0.10 0.20 0.30
8
Logistic Regression
chr [1:100] "female" "male" "female" "male" "male" "female" "male" "male" ...
Call:
glm(formula = sex ~ ., family = binomial, data = hsb)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.217629 2.415980 1.332 0.18292
raceasian -0.009584 1.396595 -0.007 0.99452
racehispanic -0.421459 1.033880 -0.408 0.68353
racewhite 0.043967 0.753661 0.058 0.95348
seslow -1.408884 0.902394 -1.561 0.11846
sesmiddle -0.080766 0.645067 -0.125 0.90036
schtyppublic -0.168951 0.604326 -0.280 0.77981
proggeneral 0.966928 0.690848 1.400 0.16163
progvocational 0.032704 0.757363 0.043 0.96556
read -0.017867 0.034153 -0.523 0.60087
write -0.157941 0.048976 -3.225 0.00126 **
math 0.031223 0.042956 0.727 0.46732
socst -0.011069 0.034275 -0.323 0.74673
science 0.099075 0.046535 2.129 0.03325 *
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Interpretation The model shows a logistic regression analysis for sample data where sex is the response
variable, and the rest are predictors. The variables seslow, proggeneral, write, and science have values less
than 0.05 which makes these variables significant to the model.
9
Diagnostic Checking for logistic regression
70
70
70
60
60 60
50
50
50
40
40
40
predictor.value
30 30
40 40
30 30
20
−2.5 0.0 2.5 5.0 −2.5 0.0 2.5 5.0
logit
Interpretation The Smoothed scatter plots show that not all variables are linear or the variables are
non-linear. Thus, it might need some transformations.
10
Cook's distance
0.20
188
0.15
Cook's distance
51
84
0.10
63
0.05
0.00
0 20 40 60 80 100
Obs. number
glm(sex ~ .)
Interpretation This table shows Cook’s distance values for each observation number from the model.
There are some outliers with a value of 163, 82, 51, 88, and 174.
11