0% found this document useful (0 votes)

36 views

Introduction To Data Analysis Solutions

This document provides an introduction to data analysis concepts including probability, distributions, hypothesis testing, and other statistical techniques. It includes 8 questions with examples and explanations of how to calculate probabilities, determine median and quartiles from a histogram, perform a chi-square test of independence, and other statistical analyses. The key concepts covered are probabilities, distributions, hypothesis testing, and applying statistical methods to analyze data.

Uploaded by

Oumaima Ziat

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

Introduction To Data Analysis Solutions

Uploaded by

Oumaima Ziat

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Introduction to Data Analysis

Correction
Q1:

a) P(26-39/+3 accidents) = 25/(25+25+50) = 25/100 = 0.25

b) P(2-3 accidents / 18-25) = 150/(150 + 100 + 50) = 150/300 = 0.5
c) P(0-1 accidents / 26-39 or 40-55) = (150 + 250)/ (150+25+25 + 250+125+25) = 400/600 = 0.66
d) P(40-55) = (250+125+25)/1000 = 0.4

Q2:

1) 0.75 of the employees are satisfactory (S).

2) 0.25 of the employees are unsatisfactory (S’).
3) 0.8 of the satisfactory employees had previous work experience (E) which means 0.25 of them
had no work experience (E’).
4) 0.15 of the unsatisfactory employees had no work experience which means 0.85 of them had
work experience.

Probability Tree:

Bayes theorem: P(A/B) = P(B/A)xP(A)/P(B)

The Question is what is the probability that the person will be an unsatisfactory given that he has had a previous work
experience? So here we are looking at the probability of S’ given E.

P(S’/E) = P(E/S’)xP(S’)/P(E)

P(E/S’) = 0.85 (using the tree)

P(S’) = 0.25

P(E) = P(E/S)xP(S) + P(E/S’)xP(S’) (because we don’t have the probability of E)

P(E) = 0.8x0.75 + 0.85x0.25 = 0.6 + 0.2125 = 0.8125

P(S’/E) = 0.85x0.25/0.8125 = 0.26

Q3:

Let random variable X denote the height (in inches) of male soccer players. The figure below shows the histogram of
the heights of 100 male soccer players. Using this histogram:

I. The sample size is 100 because we have 100 male soccer players.
II. The Q2 = Median (looking for 0.5 of the data or more):

0.05 < 0.5

0.05 + 0.03 = 0.08 < 0.5
0.08 + 0.15 = 0.23 < 0.5
0.23 + 0.4 = 0.63 >= 0.5 ➔ the median or Q2 is within this bin, thus we choose the median of that
bin which is 66.95

III. The value if Q1 is 66.95 because we reach 0.25 of the data in the 4th bin. But for the Q3 quartile we
need to add another bin to reach .75 of the data or more ➔ Q3 = 68.95.
IQR = Q3 – Q1 = 68.95 – 66.95 = 2

IV. The mean or the expected value is E(X) = ∑ x P(X=x )= 60.95x0.05 + 62.95x0.03 + ….74.95x0.01
i i

Q4:

Fair game means the expected value of Wining/losing equals zero.

a) Finding the value of N:

∑ x P(X=x )= 0 ➔ P(wining)xN + p(Losing)x(-1) = 0 (because we lose one dollar).
i i

P(Wining) = HHH or TTT ➔ 0.5x0.5x0.5 + 0.5x0.5x0.5 (probability of heads or tails is 0.5)

P(Wining) = 0.25
P(Losing) = 1 – P(Wining) = 1 – 0.25 = 0.75
P(wining)xN + p(Losing)x(-1) = 0 ➔ 0.25xN + 0.75x(-1) = 0 ➔ N=3

b) Standard deviation SD = sqrt(Variance)

Variance = E(X2) – E(X)2

E(X) = 0 (fair game) ➔ E(X)2 = 0

E(X2) = 0.25xN2 + 0.75x(-1)2

SD = sqrt (0.25.9 + 0.75x1) = sqrt(3)

Q5:
Expected Earnings Per Roll (Excluding Rolling a 6):

• Expected earnings per roll E = (1+2+3+4+5)/5=3

Expected Profit Function for N Rolls:

• Probability of not rolling a 6 in N rolls: (5/6)N

• Total expected earnings from N rolls: N×E= N×3.
• Expected profit for N rolls: 3*N*(5/6) N −3.

For finding the value of N that maximizes the expected profit, we can use different values of N = {1,2,3……15} for
example, and then we choose N that gives the maximum value of the expected profit. In this exercise N = 6.

Q6:

The obtained scores are normally distributed with mean = 82 and SD = 6. The question is how many
students had scores between 76 and 88?
P(76<X<88) ➔

We use Z-score transformation: P(76<X<88) ➔ p(76-82/6 < Z< 88-82/6) = P(-1 <Z< 1) = P(Z< 1) – P(Z<-1)

= P(Z<1) – (1 – P(Z<1))

= 0.84 – (1 – 0.84) = 0.68

Q7:
To solve this problem, we'll perform a hypothesis test for the difference between two proportions and then calculate a
95% confidence interval.

i. Hypothesis Test
1) State the hypotheses.
• Null Hypothesis (H0): The proportion of men using smartphones (Pm) is less than or equal
to that of women (Pw), i.e., Pm≤Pw.
• Alternative Hypothesis (H1): The proportion of men using smartphones is greater than that
of women, i.e., Pm>Pw.

2) Calculate Sample Proportions:

Pm = 973/379 = 0.38
Pw= 404/1304 = 0.30

3) Standard Error and Z-score:

• In this case, we don't use pooled proportion since the null hypothesis does not assume
the proportions are equal.
• Calculate the standard error (SE) using the formula:

• Compute the Z-score Z = (Pm - Pw)/SE

Z = 3.94
4) P-value:
P-value = 1 - Z3.94 = 0.002
5) Decision:
Since P-value < 0.05, so we reject H0.

ii. 95% Confidence Interval.

CI = point estimate +/- 1.96*SE = point estimate +/- ME

ME = 1.96* 0.020 = 0.039
lower = (0.38 – 0.3) - 0.039
upper = (0.38 – 0.3) + 0.039

CI = [0.04, 0.11]

Q8:
H0: There is no inconsistency between the observed and the expected counts. The
observed counts follow the same distribution as the expected counts.
HA: There is an inconsistency between the observed and the expected counts. The
observed counts do not follow the same distribution as the expected counts.

Observed distribution:
Underweightn BMI<18.5 = 20

Normal Weight BMI 18.5-24.9 = 932

Overweight BMI 25.0-29.9 = 1374

Obese BMI > 30 = 1000

Expected distribution:

Underweightn BMI<18.5 = 0.02 * 3326 = 66.52

Normal Weight BMI 18.5-24.9 = 0.39 * 3326 = 1297.14

Overweight BMI 25.0-29.9 = 0.36 * 3326 = 1197.36

Obese BMI > 30 = 0.23 * 3326 = 764.98

Calculation of chi-square:

X2 = ∑(Observed – Expected)^2/Expected

X2 = 233.58

The degree of freedom (df) is the number of columns (or rows) minus 1, here df= 4 – 1 = 3

From the table chi2-square with df = 3 and 5% level of significance the value is 7.81

Since the X2>> 7.81, we reject H0 which means the observed counts do not follow the same distribution as
the expected counts.

Q9:

H0: Males and females are independent.

HA: Males and Females are dependent.
Observed counts are in Black color and the expected counts are in red color:

For each cell in the table, calculate the expected frequency: Eij = (row total) * (column total)/total

Agree No opinion Disagree Row total

Males 75 / 95.2 10 / 8.74 85 / 66.05 170
Females 121 / 100.8 8 / 9.25 51 / 69.94 180
Column Total 196 18 136 350

Chi-Squared Statistic:
In this example X2 = 19.24

Degrees of Freedom:

df = (number of rows – 1)*(number of columns – 1)

df = (2 – 1)*(3-1) = 2

X2df=2, 0.05 = 5.99.

Since the p-value corresponding to the chi2 is very small (using python I got 6.6117999886364e-05) we
reject the H0.

Pengaruh Lingkungan Kerja, Disiplin Kerja Dan Motivasi
100% (1)
Pengaruh Lingkungan Kerja, Disiplin Kerja Dan Motivasi
10 pages
Solutions
No ratings yet
Solutions
8 pages
Exam
No ratings yet
Exam
7 pages
Statistics Test Attended For The Various SMEs Position
No ratings yet
Statistics Test Attended For The Various SMEs Position
31 pages
Activity
No ratings yet
Activity
11 pages
Solutions to Exam 1 Problem Set (3) (5)
No ratings yet
Solutions to Exam 1 Problem Set (3) (5)
24 pages
CC1 - 2019-2020 - Correction
No ratings yet
CC1 - 2019-2020 - Correction
8 pages
Chi Squared Tests
No ratings yet
Chi Squared Tests
9 pages
BAB210 Assignment3
No ratings yet
BAB210 Assignment3
5 pages
Continuous Continuous Continuous Continuous Continuous: Discrete Discrete
No ratings yet
Continuous Continuous Continuous Continuous Continuous: Discrete Discrete
13 pages
Kebebewe
No ratings yet
Kebebewe
8 pages
Assignment - Basics Statics Level 1
100% (2)
Assignment - Basics Statics Level 1
15 pages
X X Number of Class Intervals Number of Occurrencesof The Score - Total Number of Scores
No ratings yet
X X Number of Class Intervals Number of Occurrencesof The Score - Total Number of Scores
8 pages
Assignment 1
No ratings yet
Assignment 1
15 pages
Assignment1 of Data Science
No ratings yet
Assignment1 of Data Science
11 pages
Stats Exam 1 Cheat Sheet
No ratings yet
Stats Exam 1 Cheat Sheet
3 pages
Assignment
No ratings yet
Assignment
12 pages
Student Resources PDF
100% (3)
Student Resources PDF
943 pages
MMW (Data Management) - Part 2
No ratings yet
MMW (Data Management) - Part 2
43 pages
Stats Question Paper Solutions
No ratings yet
Stats Question Paper Solutions
29 pages
Lfs Project I - Answers For Master Sample - XLS: Distribution of Ages
No ratings yet
Lfs Project I - Answers For Master Sample - XLS: Distribution of Ages
9 pages
2DI36 - Statistics Final Exam Solution: June 28th, 2013
No ratings yet
2DI36 - Statistics Final Exam Solution: June 28th, 2013
4 pages
MATH101
No ratings yet
MATH101
10 pages
Discussion1 Solution
No ratings yet
Discussion1 Solution
5 pages
Stat 1124 Tables and Formulas (V. 202110)
No ratings yet
Stat 1124 Tables and Formulas (V. 202110)
7 pages
Assignment (Key) 1
100% (1)
Assignment (Key) 1
16 pages
Chi Squared Test
No ratings yet
Chi Squared Test
11 pages
Module 2
100% (1)
Module 2
7 pages
AS Revision Exercise Solution
No ratings yet
AS Revision Exercise Solution
6 pages
Final Statistics
No ratings yet
Final Statistics
5 pages
QN 10
No ratings yet
QN 10
11 pages
McNemara Test
No ratings yet
McNemara Test
11 pages
GOF Part 2 ANSWERS
No ratings yet
GOF Part 2 ANSWERS
17 pages
Statistics FinalReview
No ratings yet
Statistics FinalReview
8 pages
STAT 250 Practice Problem Solutions
100% (1)
STAT 250 Practice Problem Solutions
5 pages
Assignment
75% (4)
Assignment
13 pages
Homework 3 Solution
No ratings yet
Homework 3 Solution
7 pages
Assignment
No ratings yet
Assignment
18 pages
Prob-Stat - 222 Final - DUNG NGUYEN
No ratings yet
Prob-Stat - 222 Final - DUNG NGUYEN
41 pages
Biometrics 2011 II 7
No ratings yet
Biometrics 2011 II 7
16 pages
Prob-Stat - 222 Final
No ratings yet
Prob-Stat - 222 Final
41 pages
Sta1610 201 2015 2
100% (1)
Sta1610 201 2015 2
18 pages
Statistical Formula Sheet 1: X X N X N X F X N
No ratings yet
Statistical Formula Sheet 1: X X N X N X F X N
11 pages
AdditionalDSUSMaterial Complete
100% (1)
AdditionalDSUSMaterial Complete
403 pages
6A. Intro to stat inference
No ratings yet
6A. Intro to stat inference
29 pages
Assignment
No ratings yet
Assignment
10 pages
Chegg Marc
No ratings yet
Chegg Marc
7 pages
Solutions Exercises Chapter 4 - Statistics For Engineers Exercise 1 A. 1. Model: The Starting Salaries (In K
No ratings yet
Solutions Exercises Chapter 4 - Statistics For Engineers Exercise 1 A. 1. Model: The Starting Salaries (In K
6 pages
IGNOU MBA MS - 08 Solved Assignments 2011
No ratings yet
IGNOU MBA MS - 08 Solved Assignments 2011
12 pages
Biostatistics Assignment One
No ratings yet
Biostatistics Assignment One
6 pages
2022 Scheme Module 3 BCS302
No ratings yet
2022 Scheme Module 3 BCS302
17 pages
Merged_Statistics_II_Cheat_Sheet
No ratings yet
Merged_Statistics_II_Cheat_Sheet
9 pages
Homework1 Answers
No ratings yet
Homework1 Answers
9 pages
Sample Questions PUHE6003
No ratings yet
Sample Questions PUHE6003
19 pages
Exam 3 Solution
No ratings yet
Exam 3 Solution
8 pages
Assignment
No ratings yet
Assignment
11 pages
The Chi Square Test
No ratings yet
The Chi Square Test
57 pages
SAT Math Shortcuts
From Everand
SAT Math Shortcuts
Bella Biscotti
No ratings yet
Short Tricks of Math
From Everand
Short Tricks of Math
knoweldgeflow
No ratings yet
Basic Mathematics. Explained Easy | For Beginners
From Everand
Basic Mathematics. Explained Easy | For Beginners
ExaGrecation
No ratings yet
Easy Arithmetics
From Everand
Easy Arithmetics
Dilip Kr. Bandyopadhyay
No ratings yet
Amit Sir - Assignment
No ratings yet
Amit Sir - Assignment
19 pages
MLLABDSA
No ratings yet
MLLABDSA
16 pages
Chapter3 Anova Experimental Design Models
No ratings yet
Chapter3 Anova Experimental Design Models
36 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
T-Test: T-TEST PAIRS HB - Sebelum WITH HB - Sesudah (PAIRED) /CRITERIA CI (.9500) /missing Analysis
No ratings yet
T-Test: T-TEST PAIRS HB - Sebelum WITH HB - Sesudah (PAIRED) /CRITERIA CI (.9500) /missing Analysis
2 pages
Probability Cheat Sheet
No ratings yet
Probability Cheat Sheet
8 pages
Biostatistics 7th Semester (Final Exam)
No ratings yet
Biostatistics 7th Semester (Final Exam)
18 pages
CIE Review For : Inferential Statistics
No ratings yet
CIE Review For : Inferential Statistics
7 pages
Data Science
No ratings yet
Data Science
44 pages
Problem Set 1: Randomized Control Trials: Exercise 1
No ratings yet
Problem Set 1: Randomized Control Trials: Exercise 1
4 pages
QNT 561 Final Exam - QNT 561 Final Exam Questions and Answers - Transweb E Tutors
0% (2)
QNT 561 Final Exam - QNT 561 Final Exam Questions and Answers - Transweb E Tutors
8 pages
Gaussian Process For Nonstationary Time Series Prediction: So$ane Brahim-Belhouari, Amine Bermak
No ratings yet
Gaussian Process For Nonstationary Time Series Prediction: So$ane Brahim-Belhouari, Amine Bermak
8 pages
Decision Rule: NCR and NPR: Statistical Tool
No ratings yet
Decision Rule: NCR and NPR: Statistical Tool
2 pages
Jan-2025Maths-Updated Answerkey
No ratings yet
Jan-2025Maths-Updated Answerkey
18 pages
Beamer Pcs
No ratings yet
Beamer Pcs
22 pages
Data Analysis Activity
No ratings yet
Data Analysis Activity
3 pages
ANo VA
100% (5)
ANo VA
56 pages
Sas#16-Bam 069
No ratings yet
Sas#16-Bam 069
6 pages
Activity # 4.12
No ratings yet
Activity # 4.12
4 pages
0.1 Simulation Based Power Analysis For Factorial ANOVA Designs PDF
No ratings yet
0.1 Simulation Based Power Analysis For Factorial ANOVA Designs PDF
11 pages
Data Mining-All Correct
0% (1)
Data Mining-All Correct
2 pages
Avila, Jemuel T. - The Random-Effects Model and Randomized Complete Block Design
No ratings yet
Avila, Jemuel T. - The Random-Effects Model and Randomized Complete Block Design
3 pages
Stata OLS Regression Example
No ratings yet
Stata OLS Regression Example
21 pages
578 Assignment 1
No ratings yet
578 Assignment 1
6 pages
Preparing Data
No ratings yet
Preparing Data
37 pages
Cs3491 - Aiml - Unit III - Probabilistic Discriminative Model
No ratings yet
Cs3491 - Aiml - Unit III - Probabilistic Discriminative Model
9 pages
STAT-M-609 Ass.2
0% (1)
STAT-M-609 Ass.2
3 pages
Regression Project
100% (1)
Regression Project
60 pages
T TEST
100% (1)
T TEST
24 pages

Introduction To Data Analysis Solutions

Uploaded by

Introduction To Data Analysis Solutions

Uploaded by

Introduction to Data Analysis

a) P(26-39/+3 accidents) = 25/(25+25+50) = 25/100 = 0.25

1) 0.75 of the employees are satisfactory (S).

Bayes theorem: P(A/B) = P(B/A)xP(A)/P(B)

P(E/S’) = 0.85 (using the tree)

P(E) = P(E/S)xP(S) + P(E/S’)xP(S’) (because we don’t have the probability of E)

P(E) = 0.8x0.75 + 0.85x0.25 = 0.6 + 0.2125 = 0.8125

P(S’/E) = 0.85x0.25/0.8125 = 0.26

0.05 < 0.5

Fair game means the expected value of Wining/losing equals zero.

a) Finding the value of N:

P(Wining) = HHH or TTT ➔ 0.5x0.5x0.5 + 0.5x0.5x0.5 (probability of heads or tails is 0.5)

b) Standard deviation SD = sqrt(Variance)

Variance = E(X2) – E(X)2

E(X) = 0 (fair game) ➔ E(X)2 = 0

E(X2) = 0.25xN2 + 0.75x(-1)2

SD = sqrt (0.25.9 + 0.75x1) = sqrt(3)

• Expected earnings per roll E = (1+2+3+4+5)/5=3

Expected Profit Function for N Rolls:

• Probability of not rolling a 6 in N rolls: (5/6)N

= 0.84 – (1 – 0.84) = 0.68

2) Calculate Sample Proportions:

3) Standard Error and Z-score:

• Compute the Z-score Z = (Pm - Pw)/SE

ii. 95% Confidence Interval.

CI = point estimate +/- 1.96*SE = point estimate +/- ME

Normal Weight BMI 18.5-24.9 = 932

Overweight BMI 25.0-29.9 = 1374

Obese BMI > 30 = 1000

Underweightn BMI<18.5 = 0.02 * 3326 = 66.52

Normal Weight BMI 18.5-24.9 = 0.39 * 3326 = 1297.14

Overweight BMI 25.0-29.9 = 0.36 * 3326 = 1197.36

Obese BMI > 30 = 0.23 * 3326 = 764.98

H0: Males and females are independent.

Agree No opinion Disagree Row total

df = (number of rows – 1)*(number of columns – 1)

X2df=2, 0.05 = 5.99.

You might also like