0% found this document useful (0 votes)
13 views11 pages

Results

The document contains analysis of data on students' socioeconomic status, type of program enrolled in, and science scores. A contingency table shows relationships between socioeconomic status and program. Chi-square tests show an association. Summary statistics of science scores are presented, along with a histogram, boxplot, and tests of the mean. Regression analysis identifies reading, writing and math as significant predictors of science scores. Diagnostic checks find that residuals are normally distributed.

Uploaded by

macasaquit17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views11 pages

Results

The document contains analysis of data on students' socioeconomic status, type of program enrolled in, and science scores. A contingency table shows relationships between socioeconomic status and program. Chi-square tests show an association. Summary statistics of science scores are presented, along with a histogram, boxplot, and tests of the mean. Regression analysis identifies reading, writing and math as significant predictors of science scores. Diagnostic checks find that residuals are normally distributed.

Uploaded by

macasaquit17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Group 5

Group 5

2023-12-18

3 Results

Contingency Table

Table 1: Socio-economic status of student’s family vs Type of Program

academic general vocational Sum


high 22 2 2 26
low 10 11 5 26
middle 22 11 15 48
Sum 54 24 22 100

The very first column indicates the Socio-economic status of student’s family, while the very first row of
this contingency table indicates the Type of program conducted in this statistics. In the middle of this
contingency table, we can see the outcomes. The last column and last row indicates the summary for the
outcomes.

Chi-square test

TestStatistic DegreesOfFreedom PValue


X-squared 17.18055 4 0.0017829

Interpretation: Provided that p-value (0.0482) is less than 0.05, the null hypothesis is rejected, suggesting
that there is an association between Socio-economic status of student’s family and the Type of program.
Both Variables are dependent from each other.

1
Summary Statistics

Table 2: Summary Statistics for Science

x
nbr.val 100
nbr.null 0
nbr.na 0
min 29
max 69
range 40
sum 5242
median 53
mean 52
SE.mean 1
CI.mean.0.95 2
var 85
std.dev 9
coef.var 0
skewness 0
skew.2SE -1
kurtosis -1
kurt.2SE -1
normtest.W 1
normtest.p 0

The summary statistics provides the average mean score of 52 for science. The difference in gap between the
highest and lowest score is 46 with a median at 53. The skewness is at 0 indicating that the data is fairly
symmetrical. The Kurtosis is at -1 indicating that the distribution is light tails or a platykurtic distribution.

2
Histogram

Figure 1 Histogram of Science

Histogram and Density curve for science


0.06
0.04
Density

0.02
0.00

30 40 50 60 70

Science

Interpretation: The Histogram shows a fairly symmetrical distribution for science with a mean score of
52. The density curve aids with the interpretation of the distribution.

Test and boxplot

One Sample z-test

data: User input summarized values for x


z = 0, p-value = 0.5
alternative hypothesis: true mean is greater than 52
95 percent confidence interval:
51.47985 Inf
sample estimates:
mean of x
52

3
30 40 50 60 70

Interpretation: Since p-value (0.5) is equal to 0.05 so we do not reject the null hypothesis. Therefore,
there is no sufficient evidence to conclude that the true mean is greater than 52.

One-way ANOVA

One-way analysis of means

data: science and prog


F = 2.3554, num df = 2, denom df = 97, p-value = 0.1003

Interpretation: Provided that p-value (0.02703) is less than 0.05, the null hypothesis is rejected, suggesting
that there is difference in the science means based on the program.

4
Regression Analysis

Call:
lm(formula = science ~ ., data = hsb)

Residuals:
Min 1Q Median 3Q Max
-13.3612 -2.5502 -0.0756 3.0082 11.3224

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.21537 5.94419 -0.204 0.838474
sexmale 2.55652 1.28261 1.993 0.049409 *
raceasian -0.43339 3.10646 -0.140 0.889372
racehispanic -2.53456 2.47415 -1.024 0.308512
racewhite 2.34978 1.80717 1.300 0.196988
seslow -0.69219 2.01909 -0.343 0.732570
sesmiddle -1.13038 1.59743 -0.708 0.481092
schtyppublic 1.25630 1.56326 0.804 0.423823
proggeneral 4.02147 1.63033 2.467 0.015622 *
progvocational 4.10986 1.89177 2.172 0.032570 *
read 0.22523 0.08334 2.703 0.008290 **
write 0.37379 0.09783 3.821 0.000251 ***
math 0.31383 0.09408 3.336 0.001257 **
socst 0.01062 0.07893 0.135 0.893269
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 5.738 on 86 degrees of freedom


Multiple R-squared: 0.6647, Adjusted R-squared: 0.614
F-statistic: 13.11 on 13 and 86 DF, p-value: 2.209e-15

Analysis of Variance Table

Response: science
Df Sum Sq Mean Sq F value Pr(>F)
sex 1 1.80 1.80 0.0546 0.815864
race 3 1497.15 499.05 15.1574 5.309e-08 ***
ses 2 382.06 191.03 5.8020 0.004328 **
schtyp 1 6.26 6.26 0.1902 0.663826
prog 2 327.43 163.72 4.9724 0.009047 **
read 1 2074.58 2074.58 63.0101 7.063e-12 ***
write 1 956.71 956.71 29.0578 6.064e-07 ***
math 1 366.25 366.25 11.1240 0.001259 **
socst 1 0.60 0.60 0.0181 0.893269
Residuals 86 2831.51 32.92
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Interpretation: The Multiple R squared is 0.5558 or 55.58%. The model is statistically significant since
the p-value (1.533e-10) is less than the alpha. Variables that are predictors of science scores are read, write,
and math. These variables have low p-value which indicates that these predictors are significant. The table
for ANOVA shows that the rest sex, ses, schtyp, and socst have values greater than 0.05.

5
Best Subsets Regression
-------------------------------------------------------------
Model Index Predictors
-------------------------------------------------------------
1 read
2 read math
3 race write math
4 prog read write math
5 race prog read write math
6 sex race prog read write math
7 sex race schtyp prog read write math
8 sex race ses schtyp prog read write math
9 sex race ses schtyp prog read write math socst
-------------------------------------------------------------

Subsets Regression Summary


--------------------------------------------------------------------------------------------------------
Adj. Pred
Model R-Square R-Square R-Square C(p) AIC SBIC SBC MSEP
--------------------------------------------------------------------------------------------------------
1 0.4233 0.4174 0.4003 51.9057 678.3502 393.0571 686.1657 4969.1385
2 0.5189 0.5090 0.486 29.3781 662.2180 377.2216 672.6387 4188.2686
3 0.5689 0.5460 0.5082 18.5566 657.2454 368.6756 675.4816 3792.5282
4 0.6134 0.5928 0.5627 9.1628 646.3690 360.6656 664.6052 3437.8664
5 0.6436 0.6123 0.5706 3.4087 644.2267 355.5492 670.2784 3203.1142
6 0.6591 0.6250 0.5827 1.4346 641.7817 353.9647 670.4386 3097.1569
7 0.6623 0.6244 0.5825 2.6052 642.8287 355.4678 674.0907 3101.4908
8 0.6646 0.6184 0.5647 4.0181 646.1485 357.2337 682.6208 3114.6929
9 0.6647 0.6140 0.5568 6.0000 648.1274 359.5425 687.2050 3149.0264
--------------------------------------------------------------------------------------------------------
AIC: Akaike Information Criteria
SBIC: Sawa’s Bayesian Information Criteria
SBC: Schwarz Bayesian Criteria
MSEP: Estimated error of prediction, assuming multivariate normality
FPE: Final Prediction Error
HSP: Hocking’s Sp
APC: Amemiya Prediction Criteria

Diagnostic Checking

Normality

Exact two-sample Kolmogorov-Smirnov test

data: residuals(mmodel) and pnorm(mean = 0, sd = 1, 79)


D = 0.58, p-value = 0.8515
alternative hypothesis: two-sided

Interpretation Our p-value (0.7723) is greater than 0.05, do not reject the null hypothesis. Therefore, we
conclude that the residuals/errors are normally distributed.
Homoscedasticity

6
studentized Breusch-Pagan test

data: mmodel
BP = 23.364, df = 13, p-value = 0.0375

Goldfeld-Quandt test

data: mmodel
GQ = 0.53089, df1 = 36, df2 = 36, p-value = 0.9693
alternative hypothesis: variance increases from segment 1 to 2

Interpretation Using the studentized Breusch-Pagan test, we are able to identify that the error term is
the same across all values of the independent Variable since our p-value (0.4629)is greater than 0.05. Thus
Heteroscedasity is not present. We then use Goldfeld-Quandt test to check the variance of residuals which
in this case shows that p-value (0.9276) is greater than 0.05. Therefore, we conclude upon that there is
sufficient evidence to prove that Homoscedasticity is present in this model.
Muli-collinearity

GVIF Df GVIF^(1/(2*Df))
sex 1.241147 1 1.114068
race 1.425362 3 1.060851
ses 1.829725 2 1.163045
schtyp 1.142305 1 1.068787
prog 1.991833 2 1.187991
read 2.206004 1 1.485262
write 2.817441 1 1.678523
math 2.102655 1 1.450053
socst 2.214519 1 1.488126

Interpretation This measure of collinearity (VIF) shows that there are no values greater than 5. Therefore,
we conclude that there is no Multi-collineanarity in this model. No multicollinearity issue.
Independence

Durbin-Watson test

data: mmodel
DW = 1.9923, p-value = 0.5011
alternative hypothesis: true autocorrelation is greater than 0

Interpretation Since the p-value (0.7754) is greater than 0.05, we do not reject the null hypothesis. There-
fore, the errors are uncorrelated.

7
Standardized residuals
Residuals vs Fitted Q−Q Residuals
108

0 10
108

2
Residuals

0
−2
−15
63 113 113
63

35 40 45 50 55 60 65 −2 −1 0 1 2

Fitted values Theoretical Quantiles


Standardized residuals

Standardized residuals
Scale−Location Residuals vs Leverage
63 108
108 113

2
3
1.0

0
Cook's63distance
0.0

−3
35 40 45 50 55 60 65 0.00 0.10 0.20 0.30

Fitted values Leverage


Diagnostic Plots

8
Logistic Regression

chr [1:100] "female" "male" "female" "male" "male" "female" "male" "male" ...

Call:
glm(formula = sex ~ ., family = binomial, data = hsb)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.217629 2.415980 1.332 0.18292
raceasian -0.009584 1.396595 -0.007 0.99452
racehispanic -0.421459 1.033880 -0.408 0.68353
racewhite 0.043967 0.753661 0.058 0.95348
seslow -1.408884 0.902394 -1.561 0.11846
sesmiddle -0.080766 0.645067 -0.125 0.90036
schtyppublic -0.168951 0.604326 -0.280 0.77981
proggeneral 0.966928 0.690848 1.400 0.16163
progvocational 0.032704 0.757363 0.043 0.96556
read -0.017867 0.034153 -0.523 0.60087
write -0.157941 0.048976 -3.225 0.00126 **
math 0.031223 0.042956 0.727 0.46732
socst -0.011069 0.034275 -0.323 0.74673
science 0.099075 0.046535 2.129 0.03325 *
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 137.99 on 99 degrees of freedom


Residual deviance: 110.24 on 86 degrees of freedom
AIC: 138.24

Number of Fisher Scoring iterations: 5

Interpretation The model shows a logistic regression analysis for sample data where sex is the response
variable, and the rest are predictors. The variables seslow, proggeneral, write, and science have values less
than 0.05 which makes these variables significant to the model.

9
Diagnostic Checking for logistic regression

math read science

70
70
70
60
60 60
50
50
50
40
40
40
predictor.value

30 30

−2.5 0.0 2.5 5.0


socst write
70
60
60
50
50

40 40

30 30

20
−2.5 0.0 2.5 5.0 −2.5 0.0 2.5 5.0
logit

Interpretation The Smoothed scatter plots show that not all variables are linear or the variables are
non-linear. Thus, it might need some transformations.

10
Cook's distance
0.20

188
0.15
Cook's distance

51

84
0.10

63
0.05
0.00

0 20 40 60 80 100

Obs. number
glm(sex ~ .)
Interpretation This table shows Cook’s distance values for each observation number from the model.
There are some outliers with a value of 163, 82, 51, 88, and 174.

11

You might also like