Introduction To Data Analysis Solutions
Introduction To Data Analysis Solutions
Correction
Q1:
Q2:
Probability Tree:
The Question is what is the probability that the person will be an unsatisfactory given that he has had a previous work
experience? So here we are looking at the probability of S’ given E.
P(S’/E) = P(E/S’)xP(S’)/P(E)
P(S’) = 0.25
Q3:
Let random variable X denote the height (in inches) of male soccer players. The figure below shows the histogram of
the heights of 100 male soccer players. Using this histogram:
I. The sample size is 100 because we have 100 male soccer players.
II. The Q2 = Median (looking for 0.5 of the data or more):
III. The value if Q1 is 66.95 because we reach 0.25 of the data in the 4th bin. But for the Q3 quartile we
need to add another bin to reach .75 of the data or more ➔ Q3 = 68.95.
IQR = Q3 – Q1 = 68.95 – 66.95 = 2
IV. The mean or the expected value is E(X) = ∑ x P(X=x )= 60.95x0.05 + 62.95x0.03 + ….74.95x0.01
i i
Q4:
Q5:
Expected Earnings Per Roll (Excluding Rolling a 6):
For finding the value of N that maximizes the expected profit, we can use different values of N = {1,2,3……15} for
example, and then we choose N that gives the maximum value of the expected profit. In this exercise N = 6.
Q6:
The obtained scores are normally distributed with mean = 82 and SD = 6. The question is how many
students had scores between 76 and 88?
P(76<X<88) ➔
We use Z-score transformation: P(76<X<88) ➔ p(76-82/6 < Z< 88-82/6) = P(-1 <Z< 1) = P(Z< 1) – P(Z<-1)
= P(Z<1) – (1 – P(Z<1))
Q7:
To solve this problem, we'll perform a hypothesis test for the difference between two proportions and then calculate a
95% confidence interval.
i. Hypothesis Test
1) State the hypotheses.
• Null Hypothesis (H0): The proportion of men using smartphones (Pm) is less than or equal
to that of women (Pw), i.e., Pm≤Pw.
• Alternative Hypothesis (H1): The proportion of men using smartphones is greater than that
of women, i.e., Pm>Pw.
CI = [0.04, 0.11]
Q8:
H0: There is no inconsistency between the observed and the expected counts. The
observed counts follow the same distribution as the expected counts.
HA: There is an inconsistency between the observed and the expected counts. The
observed counts do not follow the same distribution as the expected counts.
Observed distribution:
Underweightn BMI<18.5 = 20
Expected distribution:
Calculation of chi-square:
X2 = ∑(Observed – Expected)^2/Expected
X2 = 233.58
The degree of freedom (df) is the number of columns (or rows) minus 1, here df= 4 – 1 = 3
From the table chi2-square with df = 3 and 5% level of significance the value is 7.81
Since the X2>> 7.81, we reject H0 which means the observed counts do not follow the same distribution as
the expected counts.
Q9:
For each cell in the table, calculate the expected frequency: Eij = (row total) * (column total)/total
Chi-Squared Statistic:
In this example X2 = 19.24
Degrees of Freedom:
df = (2 – 1)*(3-1) = 2
Since the p-value corresponding to the chi2 is very small (using python I got 6.6117999886364e-05) we
reject the H0.