0% found this document useful (1 vote)
364 views

Assignment Statistic

The document is a group assignment submission for a statistics course. It contains a group member list, submission details, and a table of contents for the assignment. The assignment involves conducting a survey of 50 university students and staff about the type of watches they wear and how much they spent. The results are analyzed using qualitative and quantitative methods. Qualitatively, most respondents wore analog watches. Quantitatively, the average amount spent was 186 Malaysian Ringgit, with a positively skewed distribution where most spent under 200 RM.

Uploaded by

Rashiqah Razlan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
364 views

Assignment Statistic

The document is a group assignment submission for a statistics course. It contains a group member list, submission details, and a table of contents for the assignment. The assignment involves conducting a survey of 50 university students and staff about the type of watches they wear and how much they spent. The results are analyzed using qualitative and quantitative methods. Qualitatively, most respondents wore analog watches. Quantitatively, the average amount spent was 186 Malaysian Ringgit, with a positively skewed distribution where most spent under 200 RM.

Uploaded by

Rashiqah Razlan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

ASSIGNMENT

SUBJECT BENG2143

Group Name: Valkyrie


Group Members:

No Name Matrix Number


1 Ainul Zazrairie Bin Zairi B041810029
2 Muhammad Bin zulkifli B041810032
3 Izzat Najmi Bin Ibrahim B041810020
4 Nyak Syafikah Binti Tungku Rauyani B041810023
5 Nurazimah Binti Jamil B041810191

Submission date: 30. May. 2019 (Day)


Submitted to: Dr. Nurhidayah Binti Ismail

MARKS
Question 1

Question 2

Question 3

Total Marks

Percentage (10%)
TABLE OF CONTENT

No. Topic Page

1 Part A: Question 1

2 1. Background 1

3 2. Methodology 2

4 - Questionnaire 2–3

5 3. Data analysis and result

6 - Qualitative data 4-5

7 - Quantitative 6 – 10

8 4. Conclusion 11

9 Part B: Question 2 & Question 3

10 5. 2.1 12 – 14

6. 2.2 15 – 16

11 7. Question 3 17 – 21

12 Appendix 22
PART A

BACKGROUND

A watch is a timepiece intended to be carried or worn by a person. It is designed to keep


working despite the motions caused by the person's activities.

This project is conducted to complete the assignment for BENG2143 Engineering


Statistic. The purpose of this project is to observe what type of watches that UTeM citizens
wear and how much did they spend for their watch. It also to see what type of watch is more
preferable by UTeM citizens for daily use. This project is a survey on type of watch that UTeM
citizens use and how much they spend for it. The respondent of this survey is limited as they
are UTeM citizen only.

1
METHODOLOGY

The respondents in the study comprised 50 university students and staffs. These
respondents are from all faculty in UTeM. They are including diploma and bachelor’s degree
student. The method that we use for this project is by conducting survey using “Google Form”.
The link for the survey is blast through “WhatsApp” application by spreading it from group to
group to obtain different respond from different faculty all over UTeM.

QUESTIONNARE

Title: Types of hand watch wear by UTeM students

1. Gender
Male
Female

2. Faculty
FKM
FKE
FKP
FKEKK
FTMK
FTK
FPTT

3. What type of hand watch that you used?


Digital
Analog
Digital + Analog

2
4. How much did you spend for the watch?
0 – 100
100 – 200
200 – 300
300 – 400
400 – 500

5. How long have you been using the watch?


0 – 1 year
2 – 3 years
4 – 5 ears
6 – 7 years

3
DATA ANALYSIS AND RESULTS

1. Qualitative data

Table 1.1: Number of students wearing different types of watch

Type of watch No. of student


Digital 13
Analog 27
Digital +Analog 10
Total 50

a) Pie Chart

Digital +Analog
20% Digital
26%

Analog
54%

Figure 1.1: No. of students wearing different type of hand watch

4
b) Bar Graph

30

25

20
No. of student

15

10

0
Digital Analog Digital + Analog
Type of watch

Figure 1.2: No. of student wearing different type of watch

c) Interpret finding / Comment on result

As seen on bar graph above, analog watch have the highest number of users
compare to digital and hybrid (digital + analog) watch user combine. This happen
because analog watch is quite cheaper compare to the other two type of watches. It is
also because analog watch make the wearer look more elegant and stylish.
Compare to analog watch, hybrid watch usually are the most expensive due to
the complexity of the design structure that make it have the least user. As for digital
watch, it has second least user because it looks not to elegant compare to analog watch.

5
2. Quantitative data

Table 1.2: Number of student spending for their watch


Watch price (RM) No. of student
0 – 100 20
100 – 200 8
200 – 300 10
300 – 400 8
400 – 500 4
Total 50

a) Describe the centre of your data set. Based on the measures of central tendency,
describe the shape of your data set.

 Mean  Mode  Median

∑ 𝑚𝑓 𝓍̂ = 0 − 100 𝓍̃ = 162.5
𝓍̅ =
∑𝑓
9300
𝓍̅ =
50
𝓍̅ = 186

̂<𝔁
∴ 𝑇ℎ𝑒 𝑑𝑎𝑡𝑎 𝑖𝑠 𝑖𝑠 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑙𝑦 𝑠𝑘𝑒𝑤𝑒𝑑 𝑎𝑠 𝔁 ̃<𝔁
̅.

Mode
Median
Mean

Figure 1.3: Positively skewed distribution

6
b) Describe the spread/dispersion of your data set.

Based on the calculation, the data obtained is positively skewed data, the measures are
dispersed. The general relationship among the central tendency measures in the positively
skewed distribution can be expressed using inequalities shown above. The data mean is larger
than the data median. In positively – skewed to the right distribution, there are a few data values
that are substantially larger than others. This larger data values cause the mean to be inflated
while having little or in some cases no effect on the median.

c) Frequency table.

Table 1.3: Frequency distribution table for quantitative data


Cumulative
Cumulative
Class limit Lower Boundary Upper Boundary Frequency, Frequency Midpoint,
frequency, mf
(RM) (RM) (RM) f Percentage, m
cf
%
0 0 0
0 – 100 0 100 20 20 40 50 1000 50000
100 – 200 100 200 8 28 56 150 1200 180000
200 – 300 200 300 10 38 76 250 2500 625000
300 – 400 300 400 8 46 92 350 2800 980000
400 – 500 400 500 4 50 100 450 1800 810000
Total 50 9300 2645000

d) Based on the frequency table above, construct a histogram and a polygon.

25

20
20

15
Frequency, f

10
10
8 8

5 4

0
0 – 100 100 – 200 200 – 300 300 – 400 400 – 500

Class Boundary (RM)

Figure 1.4: Histogram of quantitative data

7
25

20
Frequency, f

15

10

0
-100 0 150 250 350 450 550
Midpoint, m

Figure 1.5: Frequency polygon of quantitative data

e) Do your data contain any outlier?

Yes, there is an outlier in the data. Supposedly the second point should be a bit
higher than the third point so that the graph become perfect negative linear correlation.
This happen because if the user chooses to buy cheap watch the price range usually
between RM 100 and below, however if they go for higher quality watch the price range
usually starting from RM 200 and above.

8
f) Construct an ogive for your data set. From your ogive, find the median, first quartile
and the third quartile.

110
Percent Cumulative frequency, %

100
90
80
70
60
50
40
30
20
10
0
0 100 200 300 400 500

Upper boundary (RM)

Figure 1.6: Ogive of quantitative data

From ogive:

 First quartile = 62.5


 Median = 162.5
 Third quartile = 295

g) Calculate the variance and standard deviation for your grouped data set.

- Variance, 𝑆 2

1 (Σmf)2
𝑆2 = [Σm2 𝑓 − ]
Σ𝑓 − 1 Σf
1 93002
𝑆2 = [2 645 000 − ]
50 − 1 50
𝑆 2 = 18 677. 551

∴ from excel 𝑆 2 = 18304

9
- Standard deviation, S

𝑆 = √𝑆
𝑆 = √18 677.55
𝑆 = 136.666

∴ from excel 𝑆 = 135.292

h) Briefly give your own comment to summarize your data based on any calculations or
any diagrams that you have done above.

Based on figure 1.4, it shows that the graph pattern is fluctuated. The highest
number of users is 20 that is in the price range of RM 0 – RM 100. The reason for this
is because most of them are students. The price is affordable for student since student
do not have any income, so they go for watch that suitable with their budget. While the
lowest is in the price range of RM 400 – RM 500 that is 4 users. For this price range
the user is UTeM staff since they have their own income, they go for higher quality
watch that can last longer.

Figure 1.7: Calculation of data using Microsoft Excel

10
CONCLUSION

As mentioned earlier in the introduction, the intent of this project is to identify the
number of UTeM citizen wearing different type of watches and the price range that their spend
for their watch. The finding is based on the analysis of data collected from questionnaires. One
of the significant findings in the study is that majority of respondent are using analog watch.
Since analog watch are cheaper in price, it is more suitable for student budget. However, the
findings are limited as the study does not include other respondent outside UTeM. This study
is extremely timely and important as it is has provided a new a path for the watch industry.

11
PART B

QUESTION 2

I. Do the algorithms differ in their mean cost estimation accuracy? Use a significance
level of 0.05

Table 2.1: Data on six different algorithms


Algorithm Project 1 Project 2 Project 3 Project 4 Project 5 Project 6 Project 7 Project 8 Total Average
1 1244 21 82 2221 905 839 527 122 5961 745.125
2 281 129 396 1306 336 910 473 199 4030 503.75
3 220 84 458 543 300 794 488 142 3029 378.625
4 225 83 425 552 291 826 509 153 3064 383
5 19 11 -34 121 15 103 87 -17 305 38.125
6 -20 35 -53 170 104 199 142 41 618 77.25
Total 17007 2125.875

Table 2.2: ANOVA table

12
Hypothesis Testing
Step 1 – Hypothesis Statement
H₀ : µ₁ : µ₂ : µ₃ : µ₄ : µ₅ : µ₆
H₁ : At least two of the means are not equal
Step 2 – Test statistic (F-value)
F = 4.0329
Step 3 – α = 0.05
Fα : 5, 42 = 2.4377
Step 4 – Decision
Reject H₀ at α = 0.05
Step 5 – Conclusion
Yes, there is a significant different in mean cost estimation accuracy of algorithm.

Figure 2.1: Calculation using Excel

13
(II) Table 2 shows the protopectin content (expressed as hydrochloric acid soluble fraction
mg/kg).

Table 2.3: Protopectin content

Storage Time Lot 1 Lot 2 Lot 3 Lot 4 Lot 5 Lot 6 Lot 7 Lot 8 Lot 9
0 days 1694 989 917.3 346.1 1260 965.6 1123 1106 1116
7 days 1802 1074 278.8 1375 544 672.2 818 406.8 461.6
14 days 1568 646.2 1820 1150 983.7 395.3 422.3 420 409.5
21 days 415.5 845.4 377.6 279.4 447.8 272.1 394.1 356.4 351.2

An article describes a study on the protopectin content of tomatoes during storage. Four storage
times were selected, and samples from nine lots of tomatoes lots of tomatoes were analysed.

(a) Researcher in this study hypothesized that mean protopectin content would be different
at different storages times. Can you confirm this hypothesis with a statistical test using
α = 0.75?
Table 2.4:
Storage Time Lot 1 Lot 2 Lot 3 Lot 4 Lot 5 Lot 6 Lot 7 Lot 8 Lot 9 Total Average
0 days 1694 989 917.3 346.1 1260 965.6 1123 1106 1116 9517 1057.444
7 days 1802 1074 278.8 1375 544 672.2 818 406.8 461.6 7432.4 825.8222
14 days 1568 646.2 1820 1150 983.7 395.3 422.3 420 409.5 7815 868.3333
21 days 415.5 845.4 377.6 279.4 447.8 272.1 394.1 356.4 351.2 3739.5 415.5
Total 28503.9 3167.1

Table 2.5: ANOVA Table

14
Hypothesis Testing

Step 1 – Hypothesis Statement

H₀ : µ₁ : µ₂ : µ₃ : µ₄

H₁ : At least two of the means are not equal

Step 2 – Test statistic (F-value)

F = 3.7390

Step 3 – α = 0.75

Fα : 3, 32 = 0.4056

Step 4 – Decision

Reject H₀ at α = 0.75

Step 5 – Conclusion

Yes, the hypothesized that mean protopectin content would be different at different
storage times.

Figure 2.2: Calculation using Excel

15
(b) Find the p-value for the test in part (a).

P- value = P (F 3, 32 > 0.4056)

≈ 0.0207

(c) Which specific storage times are different? Would you agree with the statement that
protopectin content decreases as storage time increases?

𝑀𝑆𝐸
Confidence Interval (CI) = t 3, 32√
𝑛

175684
= t 0.4056 √ 9

=56.6686

4 VS 1 = 1057.4444 - 415.5 = 641.9444 >56. 6686

4 VS 2 = 825.8222 - 415.5 = 410.3222 > 56.6686

4 VS 3 = 868.3333 - 415.5 = 452.8333 > 56.6686

3 VS 1 = 1057.4444 - 868.3333 = 189.1111 > 56.6686

3 VS 2 = 868.3333 - 825.8222 = 42.5111 < 56.6686

2 VS 1 = 1057.4444 - 825.8222 = 231.6222 > 56.6686

16
QUESTION 3

(a) Draw a scatter plot of y (blood pressure rises in millimetres of mercury) versus x (sound
pressure level in decibels). Is it reasonable to assume that y and x is linearly related?

10

9
y (Blood Pressure Rise of Mercury, mm)

0
50 60 70 80 90 100 110
x (Sound Pressure Level, dB)

Figure 3.1: Scatterplot of y (blood pressure rises) vs x (sound pressure level)

Yes, it is reasonable to assume that X and Y is linearly related because in the plot above
of the Blood Pressure Rise and Sound Pressure Level data set how a straight line comfortably
fits through the data; hence a linear relationship exists. The scatter about the line is quite small,
so there is a strong linear relationship. The slope of the line is positive (small values
of X correspond to small values of Y; large values of X correspond to large values of Y), so
there is a positive correlation between X and Y.

17
(b) Find the correlation between these two variables. Interpret your result.

Figure 3.2: Correlation between two variables (x and y)

By using Excel, the value of Correlation, r is 0.865019 (positive, and close to 1) indicates that
the two variables have a strong positive correlation.

18
(c) Using the LINEST function in Microsoft Excel, find the simple linear regression
model.

Figure 3.3: LINEST function for Simple Linear Regression model

For the least square regression line is ŷ = a +bx where a is y-intercept and b is slope.

Thus, by using LINEST function in Microsoft excel, the estimated regression model is

ŷ = -10.1315 + 0.174294x.

19
(d) Find the predicted mean rise in blood pressure level associated with a sound pressure
level of 85 decibels.

For x = 85

Ŷ = -10.1315 + 0.174294(85) = 4.68349

Thus, when the sound pressure level of 85 decibels, the mean rise in blood pressure level is
4.6764 mm.

(e) Compute a 99% confidence interval (CI) for the slope, B.

Figure 3.4: 99% Confidence Interval for slope, B

20
99% Confidence interval is α = 0.01

Standard deviation of slope b, Sb = 0.02382864

From the t-table, tα/2 degrees of freedom 18 = 2.878

Therefore, by using excel, a 99% confidence interval for slope, B is:

b ± tα/2sb = 0.10570469 to 0.242883257

21
APPENDIX

22

You might also like