0% found this document useful (0 votes)
4 views

Regression hw3

The document discusses various regression analyses, highlighting the best models for predicting costs and salaries based on different predictors. It emphasizes the significance of variables like PAPER and MACHINE in cost prediction and GENDER in salary analysis, while also addressing model selection techniques such as stepwise regression. Additionally, it presents the results of various statistical tests and models, including coefficients and R-squared values.

Uploaded by

詠芯謝
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Regression hw3

The document discusses various regression analyses, highlighting the best models for predicting costs and salaries based on different predictors. It emphasizes the significance of variables like PAPER and MACHINE in cost prediction and GENDER in salary analysis, while also addressing model selection techniques such as stepwise regression. Additionally, it presents the results of various statistical tests and models, including coefficients and R-squared values.

Uploaded by

詠芯謝
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Regression hw3

1.(a)The best model should contain PAPER and MACHINE as the predictors,
which gives the smallest AIC(210.8477).
(b)First step, we compare COST~ MACHINE, COST~PAPER, COST~OVERHEAD
and COST~LABOR. We found that COST~MACHINE is the best.
Second step, we compare COST~ MACHINE + PAPER, COST~ MACHINE +
OVERHEAD and COST~ MACHINE + LABOR. We find the best is COST~
MACHINE + PAPER.
Third step, we compare COST~ MACHINE + PAPER, COST~ MACHINE + PAPER
+ OVERHEAD and COST~ MACHINE + PAPER + LABOR. The best model remain
the same which is COST~ MACHINE + PAPER, therefore, the procedure will be
stop.
(c) COST = 59.432 + 0.949(PAPER)+2.386(MACHINE)

(d) R2 = 0.9987, adjusted R2 = 0.9986, residual standard error = 10.98


(e) The variables chosen are the same included in the final regression model for parts
(a) and(b).

2.(a) Using the all-possible regression technique, when there is a large number of candidate -
X variable, this approach may not be poetically feasible, because of the computational time.
Therefore, we would like to choose stepwise regression.
(b) PROD, FOV and HOUSE, are included in the final Model because they are significant
with SALES.
(c)

3(a) The SALARY is expected to increase 579.76 units for every unit increase in GENDER by 1,
keeping the YEARS, POSITION and EDUCAT constant.
(b) The residual degrees of freedom are d.f.= 47-5-1= 41
(d)

3
a
dataset <- read.csv("hwk3q3.csv")
model <- lm(SALARY ~ YEARS + as.factor(POSITION) + as.factor(EDUCAT)
+ as.factor(GENDER), data = dataset)
summary(model)

##
## Call:
## lm(formula = SALARY ~ YEARS + as.factor(POSITION) +
as.factor(EDUCAT) +
## as.factor(GENDER), data = dataset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1410.3 -204.5 -103.4 230.3 752.1
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1320.86 411.76 3.208 0.00272 **
## YEARS 20.38 41.65 0.489 0.62736
## as.factor(POSITION)2 186.91 479.54 0.390 0.69889
## as.factor(POSITION)3 -223.54 409.34 -0.546 0.58820
## as.factor(POSITION)4 1437.47 521.08 2.759 0.00888 **
## as.factor(POSITION)5 2301.07 518.38 4.439 7.52e-05 ***
## as.factor(EDUCAT)2 133.16 321.02 0.415 0.68063
## as.factor(EDUCAT)3 -685.85 477.76 -1.436 0.15932
## as.factor(EDUCAT)4 NA NA NA NA
## as.factor(GENDER)1 231.36 338.49 0.684 0.49842
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 495 on 38 degrees of freedom
## Multiple R-squared: 0.7504, Adjusted R-squared: 0.6979
## F-statistic: 14.28 on 8 and 38 DF, p-value: 2.407e-09

The estimated coefficient of GENDER indicates that the average expected monthly salary of
men is 231.36 units higher than that of women in the same situation

b
The remaining degrees of freedom of the model can be 47-10=37, which does not match the
R output

c
which(dataset$POSITION == 4 | dataset$POSITION == 5)

## [1] 4 7 8 10 15 16 20 21 24 26 30 33 34 35 41 42 43 45 46 47

which(dataset$EDUCAT == 3 | dataset$EDUCAT == 4)

## [1] 4 7 8 10 15 16 20 21 24 26 30 33 34 35 41 42 43 45 46 47

Data points with “POSITION” values of 4 or 5 and “EDUCAT” values of 3 or 4 appear to be


the same, suggesting that these higher positions and education levels correspond to the
same group of individuals in the data set.

d
full_model <- lm(SALARY ~ YEARS + as.factor(POSITION) +
as.factor(EDUCAT) + as.factor(GENDER), data = dataset)
reduced_model <- lm(SALARY ~ YEARS + as.factor(POSITION) +
as.factor(EDUCAT), data = dataset)
anova(reduced_model, full_model)

## Analysis of Variance Table


##
## Model 1: SALARY ~ YEARS + as.factor(POSITION) + as.factor(EDUCAT)
## Model 2: SALARY ~ YEARS + as.factor(POSITION) + as.factor(EDUCAT)
+ as.factor(GENDER)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 39 9424448
## 2 38 9309984 1 114464 0.4672 0.4984

The results of the partial F test comparing the simplified model (excluding gender) and the
full model (including gender) provide a P-value of 0.4984 for the inclusion of the gender
variable. This P-value is much higher than the common significance level of 0.05, suggesting
that adding a gender variable to the model does not significantly improve the model’s ability
to explain wage differences among BigTex Services employees, that is, gender does not
statistically significantly explain wage differences among employees in the provided dataset.

4(a)
LungCap = 1.05157 + (0.55823-0.0597)(Age) + 0.22601 (Smokeyes)
LungCap = 1.05157 + 0.55823(Age)

You might also like