Assignment Statistic
Assignment Statistic
SUBJECT BENG2143
MARKS
Question 1
Question 2
Question 3
Total Marks
Percentage (10%)
TABLE OF CONTENT
1 Part A: Question 1
2 1. Background 1
3 2. Methodology 2
4 - Questionnaire 2–3
7 - Quantitative 6 – 10
8 4. Conclusion 11
10 5. 2.1 12 – 14
6. 2.2 15 – 16
11 7. Question 3 17 – 21
12 Appendix 22
PART A
BACKGROUND
1
METHODOLOGY
The respondents in the study comprised 50 university students and staffs. These
respondents are from all faculty in UTeM. They are including diploma and bachelor’s degree
student. The method that we use for this project is by conducting survey using “Google Form”.
The link for the survey is blast through “WhatsApp” application by spreading it from group to
group to obtain different respond from different faculty all over UTeM.
QUESTIONNARE
1. Gender
Male
Female
2. Faculty
FKM
FKE
FKP
FKEKK
FTMK
FTK
FPTT
2
4. How much did you spend for the watch?
0 – 100
100 – 200
200 – 300
300 – 400
400 – 500
3
DATA ANALYSIS AND RESULTS
1. Qualitative data
a) Pie Chart
Digital +Analog
20% Digital
26%
Analog
54%
4
b) Bar Graph
30
25
20
No. of student
15
10
0
Digital Analog Digital + Analog
Type of watch
As seen on bar graph above, analog watch have the highest number of users
compare to digital and hybrid (digital + analog) watch user combine. This happen
because analog watch is quite cheaper compare to the other two type of watches. It is
also because analog watch make the wearer look more elegant and stylish.
Compare to analog watch, hybrid watch usually are the most expensive due to
the complexity of the design structure that make it have the least user. As for digital
watch, it has second least user because it looks not to elegant compare to analog watch.
5
2. Quantitative data
a) Describe the centre of your data set. Based on the measures of central tendency,
describe the shape of your data set.
∑ 𝑚𝑓 𝓍̂ = 0 − 100 𝓍̃ = 162.5
𝓍̅ =
∑𝑓
9300
𝓍̅ =
50
𝓍̅ = 186
̂<𝔁
∴ 𝑇ℎ𝑒 𝑑𝑎𝑡𝑎 𝑖𝑠 𝑖𝑠 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑙𝑦 𝑠𝑘𝑒𝑤𝑒𝑑 𝑎𝑠 𝔁 ̃<𝔁
̅.
Mode
Median
Mean
6
b) Describe the spread/dispersion of your data set.
Based on the calculation, the data obtained is positively skewed data, the measures are
dispersed. The general relationship among the central tendency measures in the positively
skewed distribution can be expressed using inequalities shown above. The data mean is larger
than the data median. In positively – skewed to the right distribution, there are a few data values
that are substantially larger than others. This larger data values cause the mean to be inflated
while having little or in some cases no effect on the median.
c) Frequency table.
25
20
20
15
Frequency, f
10
10
8 8
5 4
0
0 – 100 100 – 200 200 – 300 300 – 400 400 – 500
7
25
20
Frequency, f
15
10
0
-100 0 150 250 350 450 550
Midpoint, m
Yes, there is an outlier in the data. Supposedly the second point should be a bit
higher than the third point so that the graph become perfect negative linear correlation.
This happen because if the user chooses to buy cheap watch the price range usually
between RM 100 and below, however if they go for higher quality watch the price range
usually starting from RM 200 and above.
8
f) Construct an ogive for your data set. From your ogive, find the median, first quartile
and the third quartile.
110
Percent Cumulative frequency, %
100
90
80
70
60
50
40
30
20
10
0
0 100 200 300 400 500
From ogive:
g) Calculate the variance and standard deviation for your grouped data set.
- Variance, 𝑆 2
1 (Σmf)2
𝑆2 = [Σm2 𝑓 − ]
Σ𝑓 − 1 Σf
1 93002
𝑆2 = [2 645 000 − ]
50 − 1 50
𝑆 2 = 18 677. 551
9
- Standard deviation, S
𝑆 = √𝑆
𝑆 = √18 677.55
𝑆 = 136.666
h) Briefly give your own comment to summarize your data based on any calculations or
any diagrams that you have done above.
Based on figure 1.4, it shows that the graph pattern is fluctuated. The highest
number of users is 20 that is in the price range of RM 0 – RM 100. The reason for this
is because most of them are students. The price is affordable for student since student
do not have any income, so they go for watch that suitable with their budget. While the
lowest is in the price range of RM 400 – RM 500 that is 4 users. For this price range
the user is UTeM staff since they have their own income, they go for higher quality
watch that can last longer.
10
CONCLUSION
As mentioned earlier in the introduction, the intent of this project is to identify the
number of UTeM citizen wearing different type of watches and the price range that their spend
for their watch. The finding is based on the analysis of data collected from questionnaires. One
of the significant findings in the study is that majority of respondent are using analog watch.
Since analog watch are cheaper in price, it is more suitable for student budget. However, the
findings are limited as the study does not include other respondent outside UTeM. This study
is extremely timely and important as it is has provided a new a path for the watch industry.
11
PART B
QUESTION 2
I. Do the algorithms differ in their mean cost estimation accuracy? Use a significance
level of 0.05
12
Hypothesis Testing
Step 1 – Hypothesis Statement
H₀ : µ₁ : µ₂ : µ₃ : µ₄ : µ₅ : µ₆
H₁ : At least two of the means are not equal
Step 2 – Test statistic (F-value)
F = 4.0329
Step 3 – α = 0.05
Fα : 5, 42 = 2.4377
Step 4 – Decision
Reject H₀ at α = 0.05
Step 5 – Conclusion
Yes, there is a significant different in mean cost estimation accuracy of algorithm.
13
(II) Table 2 shows the protopectin content (expressed as hydrochloric acid soluble fraction
mg/kg).
Storage Time Lot 1 Lot 2 Lot 3 Lot 4 Lot 5 Lot 6 Lot 7 Lot 8 Lot 9
0 days 1694 989 917.3 346.1 1260 965.6 1123 1106 1116
7 days 1802 1074 278.8 1375 544 672.2 818 406.8 461.6
14 days 1568 646.2 1820 1150 983.7 395.3 422.3 420 409.5
21 days 415.5 845.4 377.6 279.4 447.8 272.1 394.1 356.4 351.2
An article describes a study on the protopectin content of tomatoes during storage. Four storage
times were selected, and samples from nine lots of tomatoes lots of tomatoes were analysed.
(a) Researcher in this study hypothesized that mean protopectin content would be different
at different storages times. Can you confirm this hypothesis with a statistical test using
α = 0.75?
Table 2.4:
Storage Time Lot 1 Lot 2 Lot 3 Lot 4 Lot 5 Lot 6 Lot 7 Lot 8 Lot 9 Total Average
0 days 1694 989 917.3 346.1 1260 965.6 1123 1106 1116 9517 1057.444
7 days 1802 1074 278.8 1375 544 672.2 818 406.8 461.6 7432.4 825.8222
14 days 1568 646.2 1820 1150 983.7 395.3 422.3 420 409.5 7815 868.3333
21 days 415.5 845.4 377.6 279.4 447.8 272.1 394.1 356.4 351.2 3739.5 415.5
Total 28503.9 3167.1
14
Hypothesis Testing
H₀ : µ₁ : µ₂ : µ₃ : µ₄
F = 3.7390
Step 3 – α = 0.75
Fα : 3, 32 = 0.4056
Step 4 – Decision
Reject H₀ at α = 0.75
Step 5 – Conclusion
Yes, the hypothesized that mean protopectin content would be different at different
storage times.
15
(b) Find the p-value for the test in part (a).
≈ 0.0207
(c) Which specific storage times are different? Would you agree with the statement that
protopectin content decreases as storage time increases?
𝑀𝑆𝐸
Confidence Interval (CI) = t 3, 32√
𝑛
175684
= t 0.4056 √ 9
=56.6686
16
QUESTION 3
(a) Draw a scatter plot of y (blood pressure rises in millimetres of mercury) versus x (sound
pressure level in decibels). Is it reasonable to assume that y and x is linearly related?
10
9
y (Blood Pressure Rise of Mercury, mm)
0
50 60 70 80 90 100 110
x (Sound Pressure Level, dB)
Yes, it is reasonable to assume that X and Y is linearly related because in the plot above
of the Blood Pressure Rise and Sound Pressure Level data set how a straight line comfortably
fits through the data; hence a linear relationship exists. The scatter about the line is quite small,
so there is a strong linear relationship. The slope of the line is positive (small values
of X correspond to small values of Y; large values of X correspond to large values of Y), so
there is a positive correlation between X and Y.
17
(b) Find the correlation between these two variables. Interpret your result.
By using Excel, the value of Correlation, r is 0.865019 (positive, and close to 1) indicates that
the two variables have a strong positive correlation.
18
(c) Using the LINEST function in Microsoft Excel, find the simple linear regression
model.
For the least square regression line is ŷ = a +bx where a is y-intercept and b is slope.
Thus, by using LINEST function in Microsoft excel, the estimated regression model is
ŷ = -10.1315 + 0.174294x.
19
(d) Find the predicted mean rise in blood pressure level associated with a sound pressure
level of 85 decibels.
For x = 85
Thus, when the sound pressure level of 85 decibels, the mean rise in blood pressure level is
4.6764 mm.
20
99% Confidence interval is α = 0.01
21
APPENDIX
22