Ba Note Bshsrhhsr
Ba Note Bshsrhhsr
BA-NOTE - bshsrhhsr
1. Frequencies
+ 1 categorical variable (chạy bảng tần số với 1 biến phân loại, không chạy biến
liên tục)
ethnicity
KL: The percentage of White people is the highest and that of Native people is the
lowest in the sample, accounting for 42.9% and 4.8% respectively.
The observation of White people is nine times that of Native people and nearly two
times that of Black people.
2. CROSSTABULATION
KL:
Analyze => descriptive => crosstabs => cell => tick only ROW (cộng tổng %
theo hàng ngang)
KL: Analyze: 3.1% of female who fail the exam while 9.8% of male fail the exam =>
male relatively three time than female
3. CHART
3.1. SIMPLE
GRAPHS => chart builder => simple bar => gender nằm ngang
Hiện tần số trên chart: DOUBLE click => chart editor => show data labels
KL: Overall, there has been a dramatic change in price. The most distinguishing
characteristic of this line graph occurred when its price fell below $60 on
January 1, 2006. However, by June 2008, its value skyrocketed, doubling to more
than $120, indicating a very high price => a significant increase in oil price.
=> graphs => chart builder => Pick scatter with fit line
Xóa bỏ số 0 ở sau: double click => double click vào số => number format => decimal
=0
KL: => Negative relationship between price and demand: the price is high, and
demand is low.
4. DESCRIPTIVE STATISTICS
Descriptive Statistics
Minimu Maximu Std.
N m m Mean Deviation
gender 105 1 2 1.39 .490
ethnicity 105 1 5 3.35 1.056
gpa 105 1.14 4.00 2.7789 .76380
total 105 51 124 100.57 15.299
Valid N 105
(listwise)
Gender và ethnicity là tự gán số => mean của nó không có ý nghĩa.
1.39 = (64*1 + 41*2)/105 bình quân có trọng số (cũng không có ý nghĩa)
Can't compare between variables because they are different from units.
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
gender 105 1 2 1.39 .490
ethnicity 105 1 5 3.35 1.056
gpa 105 1.14 4.00 2.7789 .76380
total 105 51 124 100.57 15.299
Valid N (listwise) 105
Mean will be changed if we use female: 2 and male: 3 => mean based on coding =>
change the coding then mean will change so mean gender/ethnicity/… don’t have
meaning.
1.7789 and 100.57 have meaning because they are continuous variables.
MĐ: dùng để so sánh giá trị TB của 2 nhóm khác nhau với nhau (compare
means)
analyze => compare means => independent sample t-test
test variable: dùng cho biến liên tục (continuous variable - total points / final grade)
grouping variable: dùng cho biến rời rạc (gender/ethnicity…); there are exactly two
levels of the variable
Group Statistics
gender N Mean Std. Deviation Std. Error Mean
total Female 64 102.03 13.896 1.737
Male 41 98.29 17.196 2.686
Độ lệch chuẩn càng to, dữ liệu càng phân tán so với giá trị trung bình
Sau đó:
0.224 > 0.05 => KL: there is no significant difference in mean of the 2 populations.
In other words, between males and females, there is no evidence of difference in
means of total scores.
Nếu < 0.05 => there is a significant difference in the mean of the 2 populations in
terms of GPA score.
ANOTHER EX:
EX: chạy t test gpa giữa 2 nhóm giới tính (gender)
=> chạy independent sample t test
Gpa: test variable
Gender: grouping variable
Group Statistics
gender N Mean Std. Deviation Std. Error Mean
gpa Female 64 2.8967 .74622 .09328
Male 41 2.5949 .76346 .11923
0.566 > 0.05 => The variances of the two populations are not different, using the t-test
results in the Equal variances assumed line.
0.048 < 0.05 => there is a significant difference in the mean of the 2 populations in
terms of GPA score.
MĐ: Use to compare the means of 1 group experienced the same condition.
=> 105 students take quiz 1, quiz 2, quiz 3, 4,5 => compare the means of quiz 1 and
quiz 2.
KL: Sig. < 0.05 (0.005 < 0.05) => there is a significant difference between the mean
of the two quizzes => the mean score on the second quiz (M=7.98) was significantly
greater than the mean score on the first quiz (M=7.47). (same as step 2 of
independent samples t test)
Change the overall_sat from ordinal into “scale”: chuyển xong mới có thể so sánh
được mean (để nguyên ordinal thì means không có nghĩa)
KL: This bar chart shows the comparison between the mean of satisfaction among
three statuses of Oddjob airways’s customers.
Among the three statuses, the blue status has the highest satisfaction score and the
gold status has the lowest satisfaction score.
Descriptives
Overall, I am satisfied with the price performance ratio of Oddjob Airways.
95% Confidence Interval for Mean
N Mean Std. Deviation Std. Error Lower Bound Upper Bound Minimum Maximum
Blue 677 4.47 1.641 .063 4.35 4.60 1 7
Silver 245 4.03 1.560 .100 3.84 4.23 1 7
Gold 143 3.99 1.556 .130 3.73 4.24 1 7
Total 1065 4.31 1.625 .050 4.21 4.40 1 7
677: observation
OPTION: TICK 1 3 5
ANOVA
Overall, I am satisfied with the price performance ratio of Oddjob Airways.
Sum of Squares df Mean Square F Sig.
Between Groups 51.755 2 25.878 9.963 .000
Within Groups 2758.455 1062 2.597
Total 2810.210 1064
KL: which yields a p-value of 0.00 (less than 0.05), suggesting that at least two of
the three groups differ significantly with regard to the mean of overall
price/performance satisfaction. (Có sự khác biệt có ý nghĩa thống kê giữa ít nhất 2
nhóm khách hàng về giá trị trung bình của sự hài lòng về giá.)
Analyze => compare means => anova =>post hoc => tick tukey
Multiple Comparisons
Dependent Variable: Overall, I am satisfied with the price performance ratio of Oddjob Airways.
Tukey HSD
Mean Difference 95% Confidence Interval
(I) Traveler Status (J) Traveler Status (I-J) Std. Error Sig. Lower Bound Upper Bound
Blue Silver .440* .120 .001 .16 .72
Gold .487* .148 .003 .14 .83
Silver Blue -.440* .120 .001 -.72 -.16
Gold .047 .170 .959 -.35 .44
Gold Blue -.487* .148 .003 -.83 -.14
Silver -.047 .170 .959 -.44 .35
*. The mean difference is significant at the 0.05 level.
When sig. < 0.05 => KL: there is a significant difference between two groups in
the mean of …
For example, we have 3 groups: gold, silver, blue => có 3 cặp, so sánh từng cặp với
nhau
ð KL: Silver and blue group have a significant difference in the mean of
satisfaction score (0.001 < 0.05)
ANOVA
total
Sum of Squares df Mean Square F Sig.
Between Groups 1033.572 4 258.393 1.109 .357
Within Groups 23310.142 100 233.101
Total 24343.714 104
⇨ 0.357 > 0.05
⇨ There is no significant difference in the mean of total score among ethnic groups. (có
thể check tiếp post hoc để xem sig. có lớn hơn 0.05 không)
⇨ Consist with the result from the ANOVA table.
KL: There is no significant difference in the mean of total score among ethnic
groups.
Absolute correlation : lấy giá trị tuyệt đối của hệ số tương quan
Correlations
S1 S2 S3 S4
VD: 0.739 > 0.4 => KL: the correlation between s1 (“ … with Oddjob Airways you
will arrive on time.”) and s2 (“ … the entire journey with Oddjob Airways will occur as
booked.”) is 0.739, which indicates a strong relationship.
MĐ: Check the relationship between X and Y => regression analysis (linear)
Y = X1 + X2 + X3 + X4
Ktra mqh xem X có tác động Y hay không? => hồi quy
Model Summary
Adjusted R Std. Error of the
Model R R Square Square Estimate
1 .731a .534 .492 5.308
MĐ: ANOVA for multiple regression: ANOVA tests for significance of the entire
model (đánh giá sự phù hợp của mô hình)
ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 1423.209 4 355.802 12.627 .000b
Residual 1239.852 44 28.178
Total 2663.061 48
a. Dependent Variable: Graduation %
b. Predictors: (Constant), Top 10% HS, Median SAT, Expenditures/Student, Acceptance Rate
⇨ Model is significant (sig. < 0.05)
Sig. so với 0.05
Nếu sig. < 0.05 => mô hình có ý nghĩa thống kê (significant) và phù hợp
Nếu sig. > 0.05 => mô hình không có ý nghĩa thống kê và không phù hợp
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 17.921 24.557 .730 .469
Median SAT .072 .018 .606 4.004 .000
Acceptance Rate -.249 .083 -.446 -2.990 .005
Expenditures/Student .000 .000 -.282 -2.057 .046
Top 10% HS -.163 .079 -.296 -2.051 .046
a. Dependent Variable: Graduation %
B1: Xem hệ số hồi quy (Regression coefficient) của từng IV có ý nghĩa thống kê
hay không? (p-values for the independent variables) (XEM BẢNG COEFFICIENT)
Sig. = significant = p-value (ý nghĩa thống kê); mức ý nghĩa thống kê phổ biến là 0.05
Nếu sig. > 0.05 => không có ý nghĩa thống kê => X không có tác động lên Y
KL: We reject the null hypothesis that each partial regression coefficient is zero and
conclude that each of them is statistically significant
Nếu dấu của hệ số (+): X có tác động tích cực lên Y. Ví dụ: điểm median SAT có tác
động tích cực lên tốt nghiệp (graduation)
Nếu dấu của hệ số (-): X có tác động tiêu cực/ngược chiều lên Y
=> Higher SAT scores and lower acceptance rates suggest higher graduation rates.
KL:
The coefficient of Median SAT is statistically significant and positive. This indicates
that the SAT score has a positive association/influence on the graduation rate. (Hệ số
của biến Median SAT có ý nghĩa thống kê và có có dấu dương. Điều này cho thấy
điểm SAT có tác động tích cực lên tỷ lệ tốt nghiệp.
If the median SAT increases by 1 point, the graduation rate increases by 0.072%,
keeping all other independent variables constant. (Nếu điểm trung vị SAT (median
SAT) tăng lên 1 điểm, thì tỷ lệ tốt nghiệp tăng lên 0.072%, giữ nguyên các yếu tố khác
không thay đổi).
Ktra đa cộng tuyến thì cũng chạy giống chạy correlation (hệ số tương quan)
Correlations
Acceptance Expenditures/St
Median SAT Rate udent Top 10% HS
Median SAT Pearson Correlation 1 -.602** .573** .503**
Sig. (2-tailed) .000 .000 .000
N 49 49 49 49
Acceptance Rate Pearson Correlation -.602** 1 -.284* -.610**
Sig. (2-tailed) .000 .048 .000
N 49 49 49 49
Expenditures/Student Pearson Correlation .573** -.284* 1 .506**
Sig. (2-tailed) .000 .048 .000
N 49 49 49 49
Top 10% HS Pearson Correlation .503** -.610** .506** 1
Sig. (2-tailed) .000 .000 .000
N 49 49 49 49
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).
SLIDE 103:
We will run the multicollinearity test before we run the regression. (Trước khi
chạy hồi quy thì chạy kiểm tra đa cộng tuyến giữa các IV)
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 893.588 1824.575 .490 .628
Age 1044.146 42.141 .975 24.777 .000
KL:
The coefficient of AGE is statistically significant and positive (0.975 > 0). This
indicates that the AGE has a positive association/influence on the SALARY.
If the AGE increases by 1 year, the salary increases by 1044.15 USD, keeping all
other independent variables constant.
The coefficient of MBA is statistically significant and positive. This indicates that
having an MBA degree has a positive association/influence on salary.
If an employee has an MBA degree, the salary increases by 14767.23 USD compared
with the one without an MBA degree, keeping all other independent variables
constant.
SLIDE 110,111
Age has an impact on salary => direct effect
MBA (moderating effect): positive, negative, no moderating effect.
* Test moderating effect (tác động điều tiết) => tạo ra interaction variable
(Age_MBA)
Sau khi tạo xong interaction variable rồi thì chạy hồi quy (regression) ↪ Analyze
=> Regression => Linear
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 3902.509 1336.398 2.920 .006
Age 971.309 31.069 .907 31.263 .000
MBA -2971.080 3026.242 -.086 -.982 .334
Age_MBA 501.848 81.552 .531 6.154 .000
Khi hồi quy với biến tương tác cần đưa biến thành phần vào, tuy nhiên không đọc kết
quả biến thành phần (age, MBA) vì sẽ gặp vấn đề đa cộng tuyến. Dấu của
AGE_MBA > 0 => KL: The result shows that the positive impact of Age on Salary
is higher when the employee has an MBA degree.
Chapter 7,8
Hồi quy với biến phân loại và biến phân loại có k levels => tạo ra k-1 additional
variables.
Có 4 levels => k-1=4-1=3 additional variables.
Biến loại A được dùng như 1 biến tham chiếu (type A => reference group) (B/C/D
cũng được)
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
⇨ Surface finish = 24.49 + 0.098 RPM - 13.31 type B - 20.49 type C - 26.04 type
D
KL: The coefficient of Type B is statistically significant and negative. This
indicates that using Type B cutting tool has a negative association/influence on
Surface Finish, compared with Type A, holding other variables constant. (Hệ
số của biến Type B có ý nghĩa thống kê và có dấu âm. Điều này cho thấy sử dụng
công cụ cắt Type B có tác động làm giảm Surface Finish, so với sử dụng công cụ
Type A, giữ nguyên các yếu tố khác không đổi.)