We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30
Estimation
Biostatistics Course 2021-2022
Block 4 – Session 4.1
Ali Lateef Jasim
MBChB. Learning objectives
❑ Define statistical inference.
❑ Explain the concept of estimation. ❑ List and define the types of estimation; point and interval estimation. ❑ Define and appraise the importance of confidence interval. ❑ Practice some examples. Statistical Inference
❖ It is the procedure where inference about a population is
made on the basis of the results obtained from a sample drawn from that population. ❖ This can be achieved by : A. Hypothesis testing B. Estimation: ✓ Point estimation ✓ Interval estimation Estimation ❖ If the mean and the variance of a normal distribution are known, then the probabilities of various events can be determined. ❖ But almost always these values are not known , and we have to estimate these numerical values from information of a simple random sample. ❖ The process of estimation involves calculating from the data of a sample, some “statistic” which is an approximation of the corresponding “parameter” of the population from which the sample was drawn. Point Estimation ❖ It is a single numerical value obtained from a random sample used to estimate the corresponding population parameter. ❖ Sample mean (X) is the best point estimate for population mean (µ). ❖ Sample standard deviation (S) is the best point estimate for population standard deviation (σ). ❖ Sample proportion (P) is the best point estimator for population proportion (P). Point Estimation
❖ But, there is always a sort of sampling error that can be
measured by the Standard Error of the mean which relates to the precision of the estimated mean. ❖ Because of sampling variation we can not say that the exact parameter value is some specific number. ❖ But we can determine a range of values within which we are confident the unknown parameter lies. Interval Estimation
❖ It consists of two numerical values defining an interval
within which lies the unknown parameter we want to estimate with a specified degree of confidence. ❖ The values depend on the confidence level which is equal to 1-α (α is the probability of error) ❖ The interval estimate may be expressed as:
Estimator ± Reliability coefficient(Z) X standard error
Equations for SE Parameter < Estimator < Standard error
A. Population mean (µ) < Sample mean (X) <
SE= σ /√ n
B. Difference between two population means
(µ1-µ2) < Difference between two sample means σ σ (X1-X2) < SE=√( 12/n1) + ( 22/n2) Equations for SE Parameter < Estimator < Standard error
C. Population proportion (P) < Sample proportion
(P) < SE= √p(1-p)/n
D. Difference between two Population proportions
(P1-P2) < Difference between two samples proportions (P1-P2) < SE= √p1(1-p1)/n + p2(1- p2)/n Reliability Coefficient ❖ Is the value of Z 1-α /2 corresponding to the confidence level. Confidence Interval ❖ The Confidence Interval is central and symmetric around the sample mean , so that there is (α/2 %) chance that the parameter is more than the upper limit, and (α/2 %) chance that it is less than the lower limit. ❖ The width of the interval estimation is increased by: ✓ Increasing confidence level (i.e.: decreasing alpha value). ✓ Decreasing sample size. Confidence Interval Confidence level can shade the light on the following information: 1. The range within which the true value of the estimated parameter lies. 2. The statistical significance of a difference (in population means or proportions). If the ZERO value is included in the interval of such differences (i.e.: the range lies between a negative value and a positive value), then we can state that there is no statistically significant difference between the two population values (parameters), although the sample values (statistics) showed a difference. Confidence Interval Confidence level can shade the light on the following information: 3. The sample size. ✓ A narrow interval indicates a “large” sample size. ✓ While a wide interval indicates a “small” sample size (with fixed confidence level). Single Mean Exercise 1. The mean Serum indirect bilirubin level of 16 four-days-old infants was found to be 5.98 mg/dl. The population SD (σ) = 3.5 mg/dl assuming normality , find 90,95, 99% CI for µ. Answer: The interval estimate = Estimator (statistic) ± Reliability coefficient(Z) * standard error (SE) 1. Sample mean (estimator) = 5.98 mg/dl Population SD = 3.5 mg/dl Standard error (for a single mean)(SE) = σ /√ n 2. Standard error = 3.5 /√ 16 = 3.5 / 4 = 0.875 Exercise 1. 3. Reliability coefficient (Z) = according to the level of confidence : For 90% CI; Z= 1.645 For 95% CI; Z= 1.96 For 99% CI; Z= 2.58
A. For 90% CI for µ = X ± [Z * SE] = 5.98 ± (1.645 * 0.875)
= 5.98 ± 1.44 So, Confidence interval is (4.54 – 7.42). Exercise 1. B. For 95% CI for µ = X ± [Z * SE] = 5.98 ± (1.96 * 0.875) = 5.98 ± 1.715 So, Confidence interval is (4.265 – 7.695).
C. For 99% CI for µ = X ± [Z * SE] = 5.98 ± (2.58 * 0.875)
= 5.98 ± 2.26 So, Confidence interval is (3.72 – 8.24). Exercise 1. What happened to the CI on increasing the confidence level? ✓ 90% CI = (4.54 – 7.42). ✓ 95% CI = (4.265 – 7.695). ✓ 99% CI = (3.72 – 8.24).
We can notice that as the confidence level is increased
(lowering alpha level of the estimation) the Confidence Interval width increases (more values are included). Difference between two means Exercise 2. A sample of 10 twelve-year old boys and a sample of 10 twelve-year old girls yielded mean height of 59.8 inches (boys), and 58.5 inches (girls). Assuming normality and σ1=2 inches, and σ2= 3 inches . Find 90% CI for the difference in means of height between girls and boys at this age. Answer: We can see that we have 2 samples, 2 means and 2 standard deviations, so we have to use the equation for difference between two means. The interval estimate = Estimator (statistic) ± Reliability coefficient (Z) * standard error (SE) Exercise 2. 1. ( X boys – X girls ) (estimator) = 59.8 - 58.5 = 1.3 2. Standard Error (SE X boys – X girls ) = √( σ12/n1) + (σ22/n2) = √ ((2)2 / 10) + ((3)2 / 10) = √ ( 4 / 10 ) + ( 9 / 10 ) = √ 0.4 + 0.9 = √ 1.3 = 1.14 3. Reliability coefficient (Z) for 90% CI = 1.645
90% CI for µboys- µgirls = 1.3 ± ( 1.14 * 1.645)
= 1.3 ± 1.875 = ( -0.575 – 3.175) Exercise 2.
Since ZERO is included in the interval there is
no statistically significant difference between the two population means. Single Proportion Exercise 3. In a survey 300 adults were interviewed, 123 said they had yearly medical checkup. Find the 95% for the true proportion of adults having yearly medical checkup. Answer: We have a sample (300) and a proportion of them (123) did the check ups, so we have to use the equation of single proportion. The interval estimate = Estimator (statistic) ± Reliability coefficient (Z) * standard error (SE) Exercise 3. 1. To calculate the proportion (The estimator) = P = 123/300 = 0.41 2. Standard Error (SE) for P = √P(1-P)/n = √ 0.41 (1-0.41) / 300 = √ 0.41 (0.59) /300 = √ 0.242 / 300 = √ 0.0008 = 0.028 3. Reliability coefficient (Z) for 95% CI = 1.96 95% CI for P = P ± (Z*SE) = 0.41 ± (1.96 * 0.028) = 0.41 ± 0.055 = ( 0.355 – 0.465 ) Difference between two proprtions Exercise 4. 200 patients suffering from a certain disease were randomly divided into two equal groups. The first group received NEW treatment, 90 recovered in three days. Out of the other 100 who received the STANDARD treatment 78 recovered within three days. Find the 95% CI for the difference between the proportion of recovery among the populations receiving the two treatments. Answer: We have 2 groups (samples) and we have the proportions of who recovered from these 2 groups, so we need to use the equation of difference between 2 proportions. Exercise 4. The interval estimate = Estimator (statistic) ± Reliability coefficient (Z) * standard error (SE) 1. The estimator = the difference between 2 proportions = P1 - P2 = 90/100 – 78/100 = 0.9 – 0.78 = 0.12 2. Standard Error (SE) for P1-P2 = = √p1 (1 - p1) / n1 + p2 (1 - p2) / n2 = √ 0.9*(1-0.9) / 100 + 0.78*(1-0.78) / 100 = √ 0.0009 + 0.001716 = √ 0.002616 = 0.05 Exercise 4. 3. Reliability coefficient (Z) for 95% CI = 1.96 95% CI = 0.12 ± 1.96*0.05 = 0.12 ± 0.1 = (0.02 – 0.22)
Since ZERO is not included in the interval there is
statistically significant difference between the two population proportion. Thank You Ali Lateef Jasim MBChB.