SlideShare a Scribd company logo
Introduction to Biostatistics
Introduction
• Key words :
– Statistics , data , Biostatistics,
– Variable ,Population ,Sample
Mekele University: Biostatistics 2
Introduction
Some Basic concepts
Statistics is a field of study concerned with
1- collection, organization, summarization and
analysis of data.
2- drawing of inferences about a body of data
when only a part of the data is observed.
 Statisticians try to interpret and
communicate the results to others.
Mekele University: Biostatistics 3
* Biostatistics:
The tools of statistics are employed in many fields:
business, education, psychology, agriculture,
economics, … etc.
When the data analyzed are derived from the
biological science and medicine,
we use the term biostatistics to distinguish this
particular application of statistical tools and
concepts.
Mekele University: Biostatistics 4
Data:
The raw material of Statistics is data.
We may define data as figures. Figures result
from the process of counting or from taking a
measurement.
For example:
- When a hospital administrator counts the
number of patients (counting).
- When a nurse weighs a patient (measurement)
Mekele University: Biostatistics 5
Sources of
data
Records Surveys Experiments
Comprehensive Sample
Mekele University: Biostatistics 6
We search for suitable data to serve as the raw
material for our investigation.
Such data are available from one or more of the
following sources:
1- Routinely kept records.
For example:
- Hospital medical records contain immense
amounts of information on patients.
- Hospital accounting records contain a wealth of
data on the facility’s business activities.
Mekele University: Biostatistics 7
* Sources of Data:
2- Surveys:
The source may be a survey, if the data needed is
about answering certain questions.
For example:
If the administrator of a clinic wishes to obtain
information regarding the mode of transportation
used by patients to visit the clinic, then a survey
may be conducted among patients to obtain this
information.
Mekele University: Biostatistics 8
3- Experiments.
Frequently the data needed to answer
a question are available only as the
result of an experiment.
For example:
If a nurse wishes to know which of several strategies is
best for maximizing patient compliance,
she might conduct an experiment in which the
different strategies of motivating compliance
are tried with different patients.
Mekele University: Biostatistics 9
* A variable:
It is a characteristic that takes on different values
in different persons, places, or things.
For example:
- heart rate,
- the heights of adult males,
- the weights of preschool children,
- the ages of patients seen in a dental clinic.
Mekele University: Biostatistics 10
Types of variables
Quantitative Qualitative
Quantitative Variables
It can be measured in the
usual sense.
For example:
- the heights of adult
males,
- the weights of
preschool children,
- the ages of patients
seen in a dental clinic.
Mekele University: Biostatistics 11
Qualitative Variables
Many characteristics are not
capable of being measured.
Some of them can be ordered
or ranked.
For example:
- classification of people into socio-
economic groups,
- social classes based on income,
education, etc.
Types of quantitative
variables
Discrete Continuous
A discrete variable
is characterized by gaps or
interruptions in the
values that it can
assume.
For example:
- The number of daily
admissions to a general
hospital,
- The number of decayed,
missing or filled teeth per
child in an elementary
school.
Mekele University: Biostatistics 12
A continuous variable
can assume any value within a specified
relevant interval of values assumed
by the variable.
For example:
- Height,
- weight,
- skull circumference.
No matter how close together the
observed heights of two people, we
can find another person whose
height falls somewhere in between.
Interval
Types of variables &
scale of measurement
Quantitative variables
Numerical
Qualitative variables
Categorical
Ratio
Nominal
Ordinal
13Mekele University: Biostatistics
Nominal
unordered categories
numbers used to represent categories
averages are meaningless; look at
frequency/proportion in each category
dichotomous e.g. gender: male = 1, female = 0
polytomous e.g. blood type: O = 1, A = 2, B = 3,
AB = 4
Mekele University: Biostatistics 14
Ordinal
ordered categories
numbers used to represent categories
order matters; magnitude does not
differences between categories are
meaningless
Example:- severity of injury:
fatal = 1,
severe = 2,
moderate = 3,
minor = 4
Mekele University: Biostatistics 15
Interval
The differences between observational units is
equal
The zero point is arbitrary and does not infer the
absence of the property being measured
Examples:
Degrees Fahrenheit
The difference between 30 and 40 is the same as that
between 70 and 80 degrees. But 80 is not twice as hot as 40.
years:
The difference between 1993-1994 is the same as 1995-
1996, but year 0 was not the beginning of time.
Mekele University: Biostatistics 16
Ratio
The most detailed and objectively
interpretable of the measurement scales.
Interval scale with an absolute zero-it has a
true zero point (absence of property being
measured) as well as equal intervals
E.g. Height, weight, money, age, time, speed,
class size, the Kelvin scale of temperature
Mekele University: Biostatistics 17
Cont…
Independent variables
Precede dependent variables in time
Are often manipulated by the researcher
The treatment or intervention that is used in a
study
Dependent variables
What is measured as an outcome in a study
Values depend on the independent variable
Mekele University: Biostatistics 18
* A population:
It is the largest collection of values of a random
variable for which we have an interest at a
particular time.
For example:
• headache patients in a chiropractic office;
automobile crash victims in an emergency room
• In research, it is not practical to include all members
of a population
• Thus, a sample (a subset of a population) is taken
• Populations may be finite or infinite.
Mekele University: Biostatistics 19
A Sample
it is a part of a population
e.g. the fraction of these patients
Random sample
Subjects are selected from a population so that each
individual has an equal chance of being selected
Random samples are representative of the source
population
Non-random samples are not representative
May be biased regarding age, severity of the
condition, socioeconomic status etc
Mekele University: Biostatistics 20
21Mekele University: Biostatistics
Types of statistical methods
Descriptive statistics
Describe the data by summarizing them
Inferential statistics
Techniques, by which inferences are drawn for the
population parameters from the sample statistics
OR
sample statistics observed are inferred to the
corresponding population parameters
Mekele University: Biostatistics 22
Cont…
Parameter
Summary data from a population
Statistic
Summary data from a sample
Mekele University: Biostatistics 23
Examples of Scales of
Measurements
• Low income
ordinal
• CD4 count
ratio
• Year of birth
interval
• IQ scores
interval
• Severe injury
ordinal
• Raw score on a
statistics exam
interval
• Room temperature in
Kelvin
ratio
• Nationality of MU
students
nominal
24Mekele University: Biostatistics
Descriptive statistics
Strategies for understanding
the meanings of Data
Mekele University: Biostatistics 26
• Key words
Frequency table, bar chart ,range
width of interval , mid-interval
Histogram , Polygon
Descriptive statistics
Before performing any analyses, you must
first get to know your data
Descriptive statistics are used to
summarize data in the form of tables,
graphs and numerical measures
The summary technique used depends on
the data type under consideration
Mekele University: Biostatistics 27
Presentation techniques for
qualitative/categorical data
statistics
Frequency
Relative frequency
Cumulative frequency
Figure/chart
Pie chart
Bar chart
Mekele University: Biostatistics 28
Frequency Distribution
for Discrete Random Variables
Example:
Suppose that we take a sample of size 16 from children in
a primary school and get the following data about the
number of their decayed teeth,
3,5,2,4,0,1,3,5,2,3,2,3,3,2,4,1
To construct a frequency table:
1- Order the values from the smallest to the largest.
0,1,1,2,2,2,2,3,3,3,3,3,4,4,5,5
2- Count how many numbers are the same.
Relative
Frequency
FrequencyNo. of decayed
teeth
0.0625
0.125
0.25
0.3125
0.125
0.125
1
2
4
5
2
2
0
1
2
3
4
5
116Total
Mekele University: Biostatistics 31
Representing the simple frequency table
using the bar chart
Number of decayed teeth
5.004.003.002.001.00.00
Frequency
6
5
4
3
2
1
0
22
5
4
2
1
We can represent
the above simple
frequency table
using the bar
chart.
Ordinal or nominal
data
Height of each bar
is the frequency of
that category
CONTI …
32Mekele University: Biostatistics
Cont…
0
10
20
30
40
50
%
Single Married Divorced Widowed
Marital status
Male
Female
33Mekele University: Biostatistics
Cont…
instead of ā€œstacksā€ rising up from
horizontal (bar chart), we could plot
instead the shares of a pie
Recalling that a circle has 360 degrees
50% means 180 degrees
25% means 90 degrees
Mekele University: Biostatistics 34
Cont…
35Mekele University: Biostatistics
Mekele University: Biostatistics 36
Frequency Distribution
for Continuous Random Variables
For large samples, we can’t use the simple frequency table to
represent the data.
We need to divide the data into groups or intervals or
classes.
So, we need to determine:
1- The number of intervals (k).
Too few intervals are not good because information will be
lost.
Too many intervals are not helpful to summarize the data.
A commonly followed rule is that 6 ≤ k ≤ 15,
or the following formula may be used,
k = 1 + 3.322 (log n)
Mekele University: Biostatistics 37
2- The range (R).
It is the difference between the largest and the
smallest observation in the data set.
3- The Width of the interval (w).
Class intervals generally should be of the same
width. Thus, if we want k intervals, then w is
chosen such that
w ≄ R / k.
Mekele University: Biostatistics 38
Example:
Assume that the number of observations
equal 100, then
k = 1+3.322(log 100)
= 1 + 3.3222 (2) = 7.6  8.
Assume that the smallest value = 5 and the largest one of
the data = 61, then
R = 61 – 5 = 56 and
w = 56 / 8 = 7.
To make the summarization more
comprehensible, the class width may be 5
or 10 or the multiples of 10.
Table 1.4.1
Mekele University: Biostatistics 40
Example 2.3.1
• We wish to know how many class interval to have in the
frequency distribution of the data in Table 1.4.1 of ages of 189
subjects who Participated in a study on smoking cessation
Solution :
• Since the number of observations
equal 189, then
• k = 1+3.322(log 189)
• = 1 + 3.3222 (2.276)  9,
• R = 82 – 30 = 52 and
• w = 52 / 9 = 5.778
 It is better to let w = 10, then the intervals will be in the
form:
Mekele University: Biostatistics 41
FrequencyClass interval
1130 – 39
4640 – 49
7050 – 59
4560 – 69
1670 – 79
180 – 89
189Total
Sum of frequency
=sample size=n
Mekele University: Biostatistics 42
The Cumulative Frequency:
It can be computed by adding successive frequencies.
The Cumulative Relative Frequency:
It can be computed by adding successive relative
frequencies.
The Mid-interval:
It can be computed by adding the lower bound of the
interval plus the upper bound of it and then divide
over 2.
Mekele University: Biostatistics 43
For the above example, the following table represents the cumulative
frequency, the relative frequency, the cumulative relative frequency and the
mid-interval.
Cumulative
Relative
Frequency
Relative
Frequency
R.f
Cumulative
Frequency
Frequency
Freq (f)
Mid –
interval
Class
interval
0.05820.0582111134.530 – 39
-0.2434574644.540 – 49
0.6720-127-54.550 – 59
0.91010.2381-45-60 – 69
0.99480.08471881674.570 – 79
10.0053189184.580 – 89
1189Total
R.f= freq/n
Mekele University: Biostatistics 44
Example :
• From the above frequency table, complete the table then
answer the following questions:
1-The number of objects with age less than 50 years ?
2-The number of objects with age between 40-69 years ?
3-Relative frequency of objects with age between 70-79 years ?
4-Relative frequency of objects with age more than 69 years ?
5-The percentage of objects with age between 40-49 years ?
6- The percentage of objects with age less than 60 years ?
7-The Range (R) ?
8- Number of intervals (K)?
9- The width of the interval ( W) ?
Mekele University: Biostatistics 45
Representing the grouped frequency table using the
histogram
To draw the histogram, the true classes limits should be used. They can be
computed by subtracting 0.5 from the lower limit and adding 0.5 to the upper
limit for each interval.
FrequencyTrue class limits
1129.5 – <39.5
4639.5 – < 49.5
7049.5 – < 59.5
4559.5 – < 69.5
1669.5 – < 79.5
179.5 – < 89.5
189Total
0
10
20
30
40
50
60
70
80
34.5 44.5 54.5 64.5 74.5 84.5
Mekele University: Biostatistics 46
Representing the grouped frequency table using
the Polygon
0
10
20
30
40
50
60
70
80
34.5 44.5 54.5 64.5 74.5 84.5
Histogram
•continuous data divided into categories
•graphical representation of frequency
distribution
•height of each bar is the frequency of that
category
•assess skewness and modality of the data
47
Mekele University:
Biostatistics
CONTI…
48
Mekele University:
Biostatistics
frequency polygon
- is an alternative to the histogram
whereas in a histogram
- X-axis shows intervals of values
- Y-axis shows bars of frequencies
. in a frequency polygon
X-axis shows midpoints of intervals of
values
Y-axis shows dot instead of bars
49
Mekele University:
Biostatistics
CONTI…
50
Mekele University:
Biostatistics
Box plots
- discrete or continuous data
- displays the 25th, 50th and 75th percentiles of
the data also known as the first, second and third
quartiles respectively
- whiskers extend to adjacent values which are not
outliers
- outliers indicated as circles
- box shows the interquartile range of the data
can be used to assess skewness
51
Mekele University:
Biostatistics
CONT…
52
Mekele University:
Biostatistics
Two-way scatter plots
• used to assess the relationship between
two discrete or continuous measures
nature of the relationship described as
positive, negative or no relationship
53
Mekele University:
Biostatistics
Line graph
• two continuous measures each x value
has only one corresponding y value useful
for looking at patterns over time can be
used to compare 2 or more groups
54
Mekele University:
Biostatistics
CONTI…
0
0.5
1
1.5
2
2.5
3
3.5
0 0.5 1 1.5 2 2.5 3
55
Mekele University:
Biostatistics
Line Graph
0
10
20
30
40
50
60
1960 1970 1980 1990 2000
Year
MMR/1000
Year MMR
1960 50
1970 45
1980 26
1990 15
2000 12
Figure (1): Maternal mortality rate of (country), 1960-2000
56Mekele University: Biostatistics
GRAPHS & CHARTS – LINE GRAPH
57
Mekele University:
Biostatistics
Chapter-3
Measures of Central
Tendency
Mekele University: Biostatistics 59
key words:
Descriptive Statistic, measure of central
tendency ,statistic, parameter, mean (μ)
,median, mode.
Mekele University: Biostatistics 60
The Statistic and The Parameter
• A Statistic:
It is a descriptive measure computed from the data
of a sample.
• A Parameter:
It is a a descriptive measure computed from the data
of a population.
Since it is difficult to measure a parameter from the
population, a sample is drawn of size n, whose values are
 1 ,  2 , …,  n. From this data, we measure the statistic.
Mekele University: Biostatistics 61
Measures of Central Tendency
A measure of central tendency is a measure which indicates
where the middle of the data is.
The three most commonly used measures of central
tendency are:
The Mean, the Median, and the Mode.
The Mean:
It is the average of the data.
Mekele University: Biostatistics 62
The Population Mean:
 = which is usually unknown, then we use the
sample mean to estimate or approximate it.
The Sample Mean:
=
Example:
Here is a random sample of size 10 of ages, where
 1 = 42,  2 = 28,  3 = 28,  4 = 61,  5 = 31,
 6 = 23,  7 = 50,  8 = 34,  9 = 32,  10 = 37.
= (42 + 28 + … + 37) / 10 = 36.6
x
1
N
i
i
N
X

x
1
n
i
i
n
x

Mekele University: Biostatistics 63
Properties of the Mean:
• Uniqueness. For a given set of data there is one and
only one mean.
• Simplicity. It is easy to understand and to compute.
• Affected by extreme values. Since all values
enter into the computation.
Example: Assume the values are 115, 110, 119, 117, 121 and 126.
The mean = 118.
But assume that the values are 75, 75, 80, 80 and 280.
The mean = 118, a value that is not representative of the set of data
as a whole.
Mekele University: Biostatistics 64
The Median:
When ordering the data, it is the observation that divide
the set of observations into two equal parts such that half
of the data are before it and the other are after it.
• If n is odd, the median will be the middle of observations.
It will be the (n+1)/2 th ordered observation.
When n = 11, then the median is the 6th observation.
• If n is even, there are two middle observations.
• The median will be the mean of these two middle
observations.
It will be the [(n/2)th+((n/2)+1)th]/2 ordered observation.
When n = 12, then the median is an observation halfway
between the 6th and 7th ordered observation.
Mekele University: Biostatistics 65
Example:
For the same random sample, the ordered observations will
be as:
23, 28, 28, 31, 32, 34, 37, 42, 50, 61.
Since n = 10, then the median is the 5.5th observation, i.e. =
(32+34)/2 = 33.
Properties of the Median:
• Uniqueness. For a given set of data there is one and
only one median.
• Simplicity. It is easy to calculate.
• It is not affected by extreme values as is the
mean.
Mekele University: Biostatistics 66
The Mode:
It is the value which occurs most frequently.
If all values are different there is no mode.
Sometimes, there are more than one mode.
Example:
For the same random sample, the value 28 is
repeated two times, so it is the mode.
Properties of the Mode:
• Sometimes, it is not unique.
• It may be used for describing qualitative data.
Mekele University: Biostatistics 67
Quintiles
Quintiles
• Dividing the distribution of ordered values into
equal-sized parts
– Quartiles: 4 equal parts
– Deciles: 10 equal parts
– Percentiles: 100 equal parts
First 25% Second 25% Third 25% Fourth 25%
Q1 Q2 Q3
Q1:first quartile
Q2:second quartile = median
Q3:third quartile
Example:
Given the following data set (age of patients):-
18,59,24,42,21,23,24,32 find the third quartile
Solution:
sort the data from lowest to highest
18 21 23 24 24 32 42 59
3rd quartile = {3/4 (n+1)}th observation = (6.75)th
observation
= 32 + (42-32)x .75 = 39.5
Mekele University: Biostatistics 68
Measures of Dispersion
Mekele University: Biostatistics 70
key words:
Descriptive Statistic, measure of dispersion ,
range ,variance, coefficient of variation.
Mekele University: Biostatistics 71
Measures of Dispersion:
• A measure of dispersion conveys information regarding the
amount of variability present in a set of data.
• Note:
1. If all the values are the same
→ There is no dispersion .
2. If all the values are different
→ There is a dispersion:
3.If the values close to each other
→The amount of Dispersion is small.
b) If the values are widely scattered
→ The Dispersion is greater.
Mekele University: Biostatistics 72
Example
• ** Measures of Dispersion are :
1.Range (R).
2. Variance.
3. Standard deviation.
4.Coefficient of variation (C.V).
Mekele University: Biostatistics 73
1.The Range (R):
• Range =Largest value- Smallest value
=
• Note:
– Range concern only onto two values
– Highly sensitive to outliers
– Data: 43,66,61,64,65,38,59,57,57,50.
• Find Range? Range=66-38=28
• Inter-quartile range
– 3rd quartile – 1st quartile (75th – 25th percentile)
– Robust to outliers
– Middle 50% of observations
SL xx 
Mekele University: Biostatistics 74
2.The Variance:
• It measure dispersion relative to the scatter of the values a
bout their mean.
a) Sample Variance ( ) :
• ,where is sample mean
• Find Sample Variance of ages , = 56
• Solution:
• S2= [(43-56) 2 +(66-56) 2+…..+(50-56) 2 ]/ 10
• = 900/10 = 90
x
2
S
1
)(
1
2
2



ļƒ„ļ€½
n
xx
S
n
i
i
x
Mekele University: Biostatistics 75
• b)Population Variance ( ) :
where , is Population mean
3.The Standard Deviation:
• is the square root of variance=
a) Sample Standard Deviation = S =
b) Population Standard Deviation = σ =
2

N
x
N
i
iļƒ„ļ€½

 1
2
2
)( 

Varince
2
S
2


STANDARD DEVIATION SD
7 7
7 7 7
7
7 8
7 7 7
6
3 2
7 8 13
9
Mean = 7
SD=0
Mean = 7
SD=0.63
Mean = 7
SD=4.04
76Mekele University: Biostatistics
Standard deviation
Caution must be exercised when using standard
deviation as a comparative index of dispersion
Weights of newborn
elephants (kg)
929 853
878 939
895 972
937 841
801 826
Weights of newborn
mice (kg)
0.72 0.42
0.63 0.31
0.59 0.38
0.79 0.96
1.06 0.89
n=10 =887.1
sd = 56.50
X n=10 = 0.68
sd = 0.255
X
Incorrect to say that elephants show greater variation for birth-
weights than mice because of higher standard deviation77Mekele University: Biostatistics
Mekele University: Biostatistics 78
4.The Coefficient of Variation (C.V):
• Is a measure use to compare the dispersion in
two sets of data which is independent of the
unit of the measurement .
•
where S: Sample standard deviation.
: Sample mean.
)100(.
X
S
VC 
X
Coefficient of variance
Coefficient of variance expresses standard deviation
relative to its mean
X
s
cv 
Weights of newborn
elephants (kg)
929 853
878 939
895 972
937 841
801 826
Weights of newborn
mice (kg)
0.72 0.42
0.63 0.31
0.59 0.38
0.79 0.96
1.06 0.89
n=10, = 887.1
s = 56.50 cv = 0.0637
X n=10, = 0.68
s = 0.255 cv = 0.375
X
Mice show
greater birth-
weight
variation
79Mekele University: Biostatistics
Mekele University: Biostatistics 80
Example:
• Suppose two samples of human males yield the
following data:
Sampe1 Sample2
Age 25-year-olds 11year-olds
Mean weight 145 pound 80 pound
Standard deviation 10 pound 10 pound
Mekele University: Biostatistics 81
• We wish to know which is more variable.
Solution:
• c.v (Sample1)= (10/145)*100= 6.9
• c.v (Sample2)= (10/80)*100= 12.5
• Then age of 11-years old(sample2) is more variation
Mekele University: Biostatistics 82
When to use coefficient of variance
• When comparison groups have very different means
(CV is suitable as it expresses the standard deviation
relative to its corresponding mean)
• When different units of measurement are involved,
e.g. group 1 unit is mm, and group 2 unit is gm (CV is
suitable for comparison as it is unit free)
• In such cases, sd should not be used for comparison
Chapter-4
Elementary Probability and
probability distribution
• Key words:
• Probability, objective Probability, subjective
probability, equally likely Mutually exclusive,
multiplicative rule , Conditional Probability,
independent events
Mekele University: Biostatistics 84
Introduction
• The concept of probability is frequently encountered in
everyday communication. For example, a physician may
say that a patient has a 50-50 chance of surviving a certain
operation.
Another physician may say that she is 95 percent certain
that a patient has a particular disease.
• Most people express probabilities in terms of percentages.
• But, it is more convenient to express probabilities as
fractions. Thus, we may measure the probability of the
occurrence of some event by a number between 0 and 1.
• The more likely the event, the closer the number is to one.
An event that can't occur has a probability of zero, and an
event that is certain to occur has a probability of one.
Mekele University: Biostatistics 85
Two views of Probability objective and
subjective:
• *** Objective Probability
• ** Classical and Relative
• Some definitions:
1.Equally likely outcomes:
Are the outcomes that have the same chance of
occurring.
2.Mutually exclusive:
Two events are said to be mutually exclusive if they
cannot occur simultaneously such that A B =Φ .
Mekele University: Biostatistics 86
ļƒ‡
• The universal Set (S): The set all possible outcomes.
• The empty set Φ : Contain no elements.
• The event ,E : is a set of outcomes in S which has a
certain characteristic.
• Classical Probability : If an event can occur in N
mutually exclusive and equally likely ways, and if m
of these possess a triat, E, the probability of the
occurrence of event E is equal to m/ N .
• For Example: in the rolling of the die , each of the six
sides is equally likely to be observed . So, the
probability that a 4 will be observed is equal to 1/6.
Mekele University: Biostatistics 87
• Relative Frequency Probability:
• Def: If some posses is repeated a large number of
times, n, and if some resulting event E occurs m
times , the relative frequency of occurrence of E ,
m/n will be approximately equal to probability of E .
P(E) = m/n .
• *** Subjective Probability :
• Probability measures the confidence that a particular
individual has in the truth of a particular proposition.
• For Example : the probability that a cure for cancer
will be discovered within the next 10 years.
Mekele University: Biostatistics 88
Elementary Properties of Probability:
• Given some process (or experiment ) with n
mutually exclusive events E1, E2, E3,…………,
En, then
1. P(Ei ) ≄ 0, i= 1,2,3,……n
2. P(E1 )+ P(E2) +……+P(En )=1
3. P(Ei +EJ )= P(Ei )+ P(EJ ), Ei ,EJ are mutually
exclusive
Mekele University: Biostatistics 89
Rules of Probability
1-Addition Rule
P(A U B)= P(A) + P(B) – P (A∩B )
2- If A and B are mutually exclusive (disjoint) ,then
P (A∩B ) = 0
Then , addition rule is
P(A U B)= P(A) + P(B) .
3- Complementary Rule
P(A' )= 1 – P(A)
where, A' = complement event
Mekele University: Biostatistics 90
Example
TotalLater >18
(L)
Early = 18
(E)
Family history of
Mood Disorders
633528Negative(A)
573819Bipolar
Disorder(B)
854441Unipolar (C)
1136053Unipolar and
Bipolar(D)
318177141Total
Mekele University: Biostatistics 91
**Answer the following questions:
Suppose we pick a person at random from this sample.
1-The probability that this person will be 18-years old or younger?
2-The probability that this person has family history of mood orders
Unipolar(C)?
3-The probability that this person has no family history of mood
orders Unipolar( )?
4-The probability that this person is 18-years old or younger or has
no family history of mood orders Negative (A)?
5-The probability that this person is more than18-years old and has
family history of mood orders Unipolar and Bipolar(D)?
Mekele University: Biostatistics 92
C
Solution:
1. P(E)=141/318
2. P(C)=41/318
3. P( )= 1-P(C)=1-41/318
4. P(E U A)=P(E)+P(A)-P(E n A)
= (141/318) + (63/318) - 28/318
=141/318
5. P(L n D) = 60/318
C
Conditional Probability:
P(AB) is the probability of A assuming that B has
happened.
• P(AB)= , P(B)≠ 0
• P(BA)= , P(A)≠ 0
)(
)(
BP
BAP 
)(
)(
AP
BAP 
Mekele University: Biostatistics 94
Example
From previous example , answer
• suppose we pick a person at random and find he is 18
years or younger (E),what is the probability that this
person will be one who has no family history of mood
disorders (A)?
• Solution:
• P(A/E)=28/141, P(E)=141/318, P(AnE)=(28/318)
Mekele University: Biostatistics 95
exercise
• suppose we pick a person at random and
find he has family history of mood (D) what
is the probability that this person will be 18
years or younger (E)?
Mekele University: Biostatistics 96
Multiplicative Rule:
• P(A∩B)= P(AB)P(B)
• P(A∩B)= P(BA)P(A)
Where,
• P(A): marginal probability of A.
• P(B): marginal probability of B.
• P(BA):The conditional probability.
Mekele University: Biostatistics 97
Independent Events:
• If A has no effect on B, we said that A,B are
independent events.
• Then,
1- P(A∩B)= P(B)P(A)
2- P(AB)=P(A)
3- P(BA)=P(B)
Mekele University: Biostatistics 98
Example
• In a certain high school class consisting of 60 girls
and 40 boys, it is observed that 24 girls and 16 boys
wear eyeglasses . If a student is picked at random
from this class ,the probability that the student
wears eyeglasses , P(E), is 40/100 or 0.4 .
• What is the probability that a student picked at
random wears eyeglasses given that the student is a
boy?
• What is the probability of the joint occurrence of the
events of wearing eye glasses and being a boy?
Mekele University: Biostatistics 99
Example
• Suppose that of 1200 admission to a general
hospital during a certain period of time,750 are
private admissions. If we designate these as a set A,
then compute P(A) , P( ).A
Mekele University: Biostatistics 100
The Random Variable (X):
• When the values of a variable (height, weight, or
age) can’t be predicted in advance, the variable is
called a random variable.
• An example is the adult height.
• When a child is born, we can’t predict exactly his
or her height at maturity.
Mekele University: Biostatistics 101
4.2 Probability Distributions for Discrete
Random Variables
• Definition:
• The probability distribution of a discrete
random variable is a table, graph,
formula, or other device used to specify
all possible values of a discrete random
variable along with their respective
probabilities.
Mekele University: Biostatistics 102
The Cumulative Probability Distribution of X,
F(x):
• It shows the probability that the variable X is
less than or equal to a certain value, P(X ļ‚£ x).
Mekele University: Biostatistics 103
Mekele University: Biostatistics 104
Example :
F(x)=
P(X≤ x)
P(X=x)frequencyNumber of
Programs
0.20880.2088621
0.36700.1582472
0.49830.1313393
0.62960.1313394
0.82490.1953585
0.94950.1246376
0.96300.013547
1.00000.0370118
1.0000297Total
• Properties of probability distribution of discrete
random variable.
1.
2.
3. P(a ļ‚£ X ļ‚£ b) = P(X ļ‚£ b) – P(X ļ‚£ a-1)
4. P(X < b) = P(X ļ‚£ b-1)
Mekele University: Biostatistics 105
0 ( ) 1P X xļ‚£  ļ‚£
( ) 1P X x ļ€½ļƒ„
4.3 The Binomial Distribution:
• It is derived from a process known as a Bernoulli
trial.
• Bernoulli trial is :
When a random process or experiment called a
trial can result in only one of two mutually
exclusive outcomes, such as dead or alive, sick or
well, the trial is called a Bernoulli trial.
Mekele University: Biostatistics 106
The Bernoulli Process
• A sequence of Bernoulli trials forms a Bernoulli
process under the following conditions
1- Each trial results in one of two possible, mutually
exclusive, outcomes. One of the possible outcomes
is denoted (arbitrarily) as a success, and the other is
denoted a failure.
2- The probability of a success, denoted by p, remains
constant from trial to trial. The probability of a
failure, 1-p, is denoted by q.
3- The trials are independent, that is the outcome of
any particular trial is not affected by the outcome of
any other trial
Mekele University: Biostatistics 107
• The probability distribution of the binomial
random variable X, the number of successes in
n independent trials is:
• Where is the number of combinations of
n distinct objects taken x of them at a time.
* Note: 0! =1 Mekele University: Biostatistics 108
( ) ( ) , 0,1,2,....,X n X
n
f x P X x p q x n
x

 
   ļ€½ļƒ§  
 
n
x
 
  
 
!
!( )!
n n
x n xx
 
ļ€½ļƒ§   ļ€­ļƒØ 
! ( 1)( 2)....(1)x x x x  
Properties of the binomial distribution
• 1.
• 2.
• 3.The parameters of the binomial distribution
are n and p
• 4.
• 5.
Mekele University: Biostatistics 109
( ) 0f x 
( ) 1f x ļ€½ļƒ„
( )E X np  
2
var( ) (1 )X np p   
Example
• If we examine all birth records from the North
Carolina State Center for Health statistics for year
2001, we find that 85.8 percent of the pregnancies
had delivery in week 37 or later (full- term birth).
If we randomly selected five birth records from this
population what is the probability that exactly three
of the records will be for full-term births?
Mekele University: Biostatistics 110
Example
• Suppose it is known that in a certain
population 10 percent of the population is
color blind. If a random sample of 25 people is
drawn from this population, find the
probability that
a) Five or fewer will be color blind.
b) Six or more will be color blind
c) Between six and nine inclusive will be color
blind.
d) Two, three, or four will be color blind.
Mekele University: Biostatistics 111
Properties of continuous probability Distributions:
*continuous variable is one that can assume any
value within a specified interval of values
assumed by the variable.
1- Area under the curve = 1.
2- P(X = a) = 0, where a is a constant.
3- Area between two points a , b = P(a<x<b) .
Mekele University: Biostatistics 112
4.6 The normal distribution:
• It is one of the most important probability
distributions in statistics.
• The normal density is given by
• , - āˆž < x < āˆž, - āˆž < µ < āˆž, σ > 0
• Ļ€, e : constants
• µ: population mean.
• σ : Population standard deviation.
Mekele University: Biostatistics 113
2
2
2
)(
2
1
)( 





x
exf
Characteristics of the normal distribution
• The following are some important characteristics
of the normal distribution:
1- It is symmetrical about its mean, µ.
2- The mean, the median, and the mode are all
equal.
3- The total area under the curve above the x-axis is
one.
4-The normal distribution is completely determined
by the parameters µ and σ.
Mekele University: Biostatistics 114
5- The normal distribution
depends on the two
parameters  and .
 determines the
location of
the curve.
But,  determines
the scale of the curve, i.e.
the degree of flatness or
peaked ness of the curve.
Mekele University: Biostatistics 115
1 2 3
1 < 2 < 3

1
2
3
1 < 2 < 3
Note that :
1. P( µ- σ < x < µ+ σ) = 0.68
2. P( µ- 2σ< x < µ+ 2σ)= 0.95
3. P( µ-3σ < x < µ+ 3σ) = 0.997
Mekele University: Biostatistics 116
The Standard normal distribution:
• Is a special case of normal distribution with mean
equal 0 and a standard deviation of 1.
• The equation for the standard normal distribution is
written as
• , - āˆž < z < āˆž
Mekele University: Biostatistics 117
2
2
2
1
)(
z
ezf



Characteristics of the standard normal
distribution
1- It is symmetrical about 0.
2- The total area under the curve above the x-
axis is one.
3- We can use table D to find the probabilities
and areas.
Mekele University: Biostatistics 118
ā€œHow to use tables of Zā€
Note that
The cumulative probabilities P(Z ļ‚£ z) are given in
tables for -3.49 < z < 3.49. Thus,
P (-3.49 < Z < 3.49)  1.
For standard normal distribution,
P (Z > 0) = P (Z < 0) = 0.5
Example 4.6.1:
If Z is a standard normal distribution, then
1) P( Z < 2) = 0.9772
is the area to the left to 2
and it equals 0.9772.
Mekele University: Biostatistics 119
2
Example 4.6.2:
P(-2.55 < Z < 2.55) is the area between
-2.55 and 2.55, Then it equals
P(-2.55 < Z < 2.55) =0.9946 – 0.0054
= 0.9892.
Example 4.6.2:
P(-2.74 < Z < 1.53) is the area between
-2.74 and 1.53.
P(-2.74 < Z < 1.53) =0.9370 – 0.0031
= 0.9339.
Mekele University: Biostatistics 120
-2.74 1.53
-2.55 2.55
0
INTRODUCTION TO BIO STATISTICS
Example :
P(Z > 2.71) is the area to the right to 2.71.
So,
P(Z > 2.71) =1 – 0.9966 = 0.0034.
Example :
P(Z = 0.84) is the area at z = 2.71.
So,
P(Z = 0.84) =1 – 0.9966 = 0.0034
Mekele University: Biostatistics 122
0.84
2.71
How to transform normal distribution (X) to
standard normal distribution (Z)?
• This is done by the following formula:
• Example:
• If X is normal with µ = 3, σ = 2. Find the value of
standard normal Z, If X= 6?
• Answer:
Mekele University: Biostatistics 123



x
z
5.1
2
36






x
z
Normal Distribution Applications
The normal distribution can be used to model the distribution of
many variables that are of interest. This allow us to answer
probability questions about these random variables.
Example 4.7.1:
The ā€žUptime ā€Ÿis a custom-made light weight battery-operated
activity monitor that records the amount of time an individual
spend the upright position. In a study of children ages 8 to 15
years. The researchers found that the amount of time children
spend in the upright position followed a normal distribution with
Mean of 5.4 hours and standard deviation of 1.3.Find
Mekele University: Biostatistics 124
If a child selected at random ,then
1-The probability that the child spend less than 3
hours in the upright position 24-hour period
P( X < 3) = P( < ) = P(Z < -1.85) = 0.0322
-------------------------------------------------------------------------
2-The probability that the child spend more than 5
hours in the upright position 24-hour period
P( X > 5) = P( > ) = P(Z > -0.31)
= 1- P(Z < - 0.31) = 1- 0.3520= 0.648
-----------------------------------------------------------------------
3-The probability that the child spend exactly 6.2
hours in the upright position 24-hour period
P( X = 6.2) = 0

X
3.1
4.53
Mekele University: Biostatistics 125

X
3.1
4.55 
4-The probability that the child spend from 4.5 to
7.3 hours in the upright position 24-hour period
P( 4.5 < X < 7.3) = P( < < )
= P( -0.69 < Z < 1.46 ) = P(Z<1.46) – P(Z< -0.69)
= 0.9279 – 0.2451 = 0.6828

X
3.1
4.55.4 
Mekele University: Biostatistics 126
3.1
4.53.7 
Estimation
• Key words:
• Point estimate, interval estimate, estimator,
Confident level ,α , Confident interval for mean μ,
Confident interval for two means,
Confident interval for population proportion P,
Confident interval for two proportions
Mekele University: Biostatistics 128
• 6.1 Introduction:
• Statistical inference is the procedure by which we reach to
a conclusion about a population on the basis of the
information contained in a sample drawn from that
population.
• Suppose that:
• an administrator of a large hospital is interested in the
mean age of patients admitted to his hospital during a
given year.
1. It will be too expensive to go through the records of all
patients admitted during that particular year.
2. He consequently elects to examine a sample of the records
from which he can compute an estimate of the mean age
of patients admitted to his that year.
Mekele University: Biostatistics 129
• To any parameter, we can compute two types of estimate: a
point estimate and an interval estimate.
• A point estimate is a single numerical value used to estimate
the corresponding population parameter.
• An interval estimate consists of two numerical values defining
a range of values that, with a specified degree of confidence,
we feel includes the parameter being estimated.
• The Estimate and The Estimator:
• The estimate is a single computed value, but the estimator is
the rule that tell us how to compute this value, or estimate.
• For example,
• is an estimator of the population mean,. The single
numerical value that results from evaluating this
formula is called an estimate of the parameter .
n
x
x i
i

Mekele University: Biostatistics 130
Confidence Interval for a Population
Mean: (C.I)
Suppose researchers wish to estimate the mean of
some normally distributed population.
• They draw a random sample of size n from the
population and compute , which they use as a point
estimate of .
• Because random sampling involves chance, then
canā€Ÿt be expected to be equal to .
• The value of may be greater than or less than .
• It would be much more meaningful to estimate  by
an interval.
x
Mekele University: Biostatistics 131
x
The 1- percent confidence interval (C.I.)
for :
• We want to find two values L and U between which  lies with
high probability, i.e.
P( L ≤  ≤ U ) = 1-
Mekele University: Biostatistics 132
For example:
• When,
•  = 0.01,
then 1-  =
•  = 0.05,
then 1-  =
•  = 0.05,
then 1-  =
Mekele University: Biostatistics 133
INTRODUCTION TO BIO STATISTICS
INTRODUCTION TO BIO STATISTICS
We have the following cases
a) When the population is normal
1) When the variance is known and the sample size is large or small,
the C.I. has the form:
P( - Z (1- /2) /ļƒ–n <  < + Z (1- /2) /ļƒ–n) = 1- 
2) When variance is unknown, and the sample size is small, the C.I. has
the form:
P( - t (1- /2),n-1 s/ļƒ–n <  < + t (1- /2),n-1 s/ļƒ–n) = 1- 
x x
Mekele University: Biostatistics 136
xx
b) When the population is not normal and n
large (n>30)
1) When the variance is known the C.I. has the
form:
P( - Z (1- /2) /ļƒ–n <  < + Z (1- /2) /ļƒ–n) = 1- 
2) When variance is unknown, the C.I. has the
form:
P( - Z (1- /2) s/ļƒ–n <  < + Z (1- /2) s/ļƒ–n) = 1- 
x x
Mekele University: Biostatistics 137
x x
Example:
• Suppose a researcher , interested in obtaining an estimate
of the average level of some enzyme in a certain human
population, takes a sample of 10 individuals, determines
the level of the enzyme in each, and computes a sample
mean of approximately
Suppose further it is known that the variable of
interest is approximately normally distributed with a
variance of 45. We wish to estimate . (=0.05)
22x
Mekele University: Biostatistics 138
Solution:
• 1- =0.95→ =0.05→ /2=0.025,
• variance = σ2 = 45 → σ=ļƒ– 45,n=10
• 95%confidence interval for  is given by:
P( - Z (1- /2) /ļƒ–n <  < + Z (1- /2) /ļƒ–n) = 1- 
• Z (1- /2) = Z 0.975 = 1.96 (refer to table D)
• Z 0.975(/ļƒ–n) =1.96 (ļƒ– 45 / ļƒ–10)=4.1578
22 ± 1.96 (ļƒ– 45 / ļƒ–10) →
• (22-4.1578, 22+4.1578) → (17.84, 26.16)
• Exercise example 6.2.2 page 169
22x
x
Mekele University: Biostatistics 139
x
Example
The activity values of a certain enzyme measured in normal
gastric tissue of 35 patients with gastric carcinoma has a
mean of 0.718 and a standard deviation of 0.511.We want
to construct a 90 % confidence interval for the population
mean.
• Solution:
• Note that the population is not normal,
• n=35 (n>30) n is large and  is unknown ,s=0.511
• 1- =0.90→ =0.1
• → /2=0.05→ 1-/2=0.95,
Mekele University: Biostatistics 140
Then 90% confident interval for  is given by :
P( - Z (1- /2) s/ļƒ–n <  < + Z (1- /2) s/ļƒ–n) = 1- 
• Z (1- /2) = Z0.95 = 1.645 (refer to table D)
• Z 0.95(s/ļƒ–n) =1.645 (0.511/ ļƒ–35)=0.1421
0.718 ± 1.645 (0.511) / ļƒ–35→
(0.718-0.1421, 0.718+0.1421) →
(0.576,0.860).
• Exercise example 6.2.3 page 164:
xx
Mekele University: Biostatistics 141
Example6.3.1 Page 174:
• Suppose a researcher , studied the effectiveness of early
weight bearing and ankle therapies following acute repair
of a ruptured Achilles tendon. One of the variables they
measured following treatment the muscle strength. In 19
subjects, the mean of the strength was 250.8 with
standard deviation of 130.9
we assume that the sample was taken from is
approximately normally distributed population. Calculate
95% confident interval for the mean of the strength ?
Mekele University: Biostatistics 142
Solution:
• 1- =0.95→ =0.05→ /2=0.025,
• Standard deviation= S = 130.9 ,n=19
• 95%confidence interval for  is given by:
P( - t (1- /2),n-1 s/ļƒ–n <  < + t (1- /2),n-1 s/ļƒ–n) = 1- 
• t (1- /2),n-1 = t 0.975,18 = 2.1009 (refer to table E)
• t 0.975,18(s/ļƒ–n) =2.1009 (130.9 / ļƒ–19)=63.1
• 250.8 ± 2.1009 (130.9 / ļƒ–19) →
• (250.8- 63.1 , 22+63.1) → (187.7, 313.9)
• Exercise 6.2.1 ,6.2.2
• 6.3.2 page 171
8.250x
x
Mekele University: Biostatistics 143
x
6.3 Confidence Interval for the difference
between two Population Means: (C.I)
If we draw two samples from two independent population
and we want to get the confident interval for the
difference between two population means , then we have
the following cases :
a) When the population is normal
1) When the variance is known and the sample sizes is large
or small, the C.I. has the form:
Mekele University: Biostatistics 144
2
2
2
1
2
1
2
1
2121
2
2
2
1
2
1
2
1
21 )()(
nn
Zxx
nn
Zxx



 

2) When variances are unknown but equal, and the sample size is
small, the C.I. has the form:
2
)1()1(
11
)(
11
)(
21
2
22
2
112
21
)2(,
2
1
2121
21
)2(,
2
1
21
2121





nn
SnSn
S
where
nn
Stxx
nn
Stxx
p
p
nn
p
nn
 
Mekele University: Biostatistics 145
Example 6.4.1 P174:
The researcher team interested in the difference between serum uric
and acid level in a patient with and without Downā€Ÿs syndrome .In a
large hospital for the treatment of the mentally retarded, a sample of
12 individual with Downā€Ÿs Syndrome yielded a mean of
mg/100 ml. In a general hospital a sample of 15 normal individual of
the same age and sex were found to have a mean value of
If it is reasonable to assume that the two population of values are
normally distributed with variances equal to 1 and 1.5,find the 95%
C.I for μ1 - μ2
Solution:
1- =0.95→ =0.05→ /2=0.025 → Z (1- /2) = Z0.975 = 1.96
• 1.1 1.96(0.4282) = 1.1 0.84 = ( 0.26 , 1.94 )
5.41 x
4.32 x
Mekele University: Biostatistics 146
2
2
2
1
2
1
2
1
21 )(
nn
Zxx

 
 15
5.1
12
1
96.1)4.35.4( 
Example 6.4.1 P178:
The purpose of the study was to determine the effectiveness of an
integrated outpatient dual-diagnosis treatment program for
mentally ill subject. The authors were addressing the problem of substance abuse
issues among people with sever mental disorder. A retrospective chart review was
carried out on 50 patient ,the recherchƩ was interested in the number of inpatient
treatment days for physics disorder during a year following the end of the program.
Among 18 patient with schizophrenia, The mean number of treatment days was 4.7
with standard deviation of 9.3. For 10 subject with bipolar disorder, the mean
number of treatment days was 8.8 with standard deviation of 11.5. We wish to
construct 99% C.I for the difference between the means of the populations
Represented by the two samples
Mekele University: Biostatistics 147
Solution :
• 1-α =0.99 → α = 0.01 → α/2 =0.005 → 1- α/2 = 0.995
• n2 – 2 = 18 + 10 -2 = 26+n1
• t (1- /2),(n1+n2-2) = t0.995,26 = 2.7787, then 99% C.I for μ1 – μ2
• where
•
• then
(4.7-8.8) 2.7787 √102.33 √(1/18)+(1/10)
- 4.1 11.086 =( - 15.186 , 6.986)
Exercises: 6.4.2 , 6.4.6, 6.4.7, 6.4.8 Page 180
Mekele University: Biostatistics 148
21
)2(,
2
1
21
11
)(
21 nn
Stxx p
nn



33.102
21018
)5.119()3.917(
2
)1()1( 22
21
2
22
2
112







xx
nn
SnSn
Sp
6.5 Confidence Interval for a Population
proportion (P):
A sample is drawn from the population of interest ,then
compute the sample proportion such as
This sample proportion is used as the point estimator of the
population proportion . A confident interval is obtained by
the following formula
Pˆ
n
a
p 
samplein theelementofno.Total
isticcharachtarsomewithsamplein theelementofno.
ˆ
Mekele University: Biostatistics 149
n
PP
ZP
)ˆ1(ˆ
ˆ
2
1




Example 6.5.1
The Pew internet life project reported in 2003 that 18%
of internet users have used the internet to search for
information regarding experimental treatments or
medicine . The sample consist of 1220 adult internet
users, and information was collected from telephone
interview. We wish to construct 98% C.I for the
proportion of internet users who have search for
information about experimental treatments or medicine
Mekele University: Biostatistics 150
Solution :
1-α =0.98 → α = 0.02 → α/2 =0.01 → 1- α/2 = 0.99
Z 1- α/2 = Z 0.99 =2.33 , n=1220,
The 98% C. I is
0.18 0.0256 = ( 0.1544 , 0.2056 )
Exercises: 6.5.1 , 6.5.3 Page 187
18.0
100
18
ˆ p
1220
)18.01(18.0
33.218.0
)ˆ1(ˆ
ˆ
2
1




 n
PP
ZP 
Mekele University: Biostatistics 151
Confidence Interval for the difference
between two Population proportions :
Two samples is drawn from two independent population
of interest ,then compute the sample proportion for each
sample for the characteristic of interest. An unbiased
point estimator for the difference between two population
proportions
A 100(1-α)% confident interval for P1 - P2 is given by
21
ˆˆ PP 
Mekele University: Biostatistics 152
2
22
1
11
2
1
21
)ˆ1(ˆ)ˆ1(ˆ
)ˆˆ(
n
PP
n
PP
ZPP






Example
Connor investigated gender differences in proactive and
reactive aggression in a sample of 323 adults (68 female
and 255 males ). In the sample ,31 of the female and 53
of the males were using internet in the internet cafƩ. We
wish to construct 99 % confident interval for the
difference between the proportions of adults go to
internet cafƩ in the two sampled population .
Mekele University: Biostatistics 153
Solution :
1-α =0.99 → α = 0.01 → α/2 =0.005 → 1- α/2 = 0.995
Z 1- α/2 = Z 0.995 =2.58 , nF=68, nM=255,
The 99% C. I is
0.2481 2.58(0.0655) = ( 0.07914 , 0.4171 )
2078.0
255
53
ˆ,4559.0
68
31
ˆ 
M
M
M
F
F
F n
a
p
n
a
p
M
MM
F
FF
MF
n
PP
n
PP
ZPP
)ˆ1(ˆ)ˆ1(ˆ
)ˆˆ(
2
1






Mekele University: Biostatistics 154
255
)2078.01(2078.0
68
)4559.01(4559.0
58.2)2078.04559.0(




Chapter-8
Hypothesis Testing
Mekele University: Biostatistics 156
• Key words :
• Null hypothesis H0, Alternative hypothesis HA , testing
hypothesis , test statistic , P-value
Mekele University: Biostatistics 157
Hypothesis Testing
• One type of statistical inference, estimation,
was discussed previously.
• The other type ,hypothesis testing ,is
discussed in this session.
Mekele University: Biostatistics 158
Definition of a hypothesis
• It is a statement about one or more populations
.
It is usually concerned with the parameters of
the population. e.g. the hospital administrator
may want to test the hypothesis that the
average length of stay of patients admitted to
the hospital is 5 days
Mekele University: Biostatistics 159
Definition of Statistical hypotheses
• They are hypotheses that are stated in such a way
that they may be evaluated by appropriate statistical
techniques.
• There are two hypotheses involved in hypothesis
testing
• Null hypothesis H0: It is the hypothesis to be tested .
• Alternative hypothesis HA : It is a statement of what
we believe is true if our sample data cause us to
reject the null hypothesis
Mekele University: Biostatistics 160
Testing a hypothesis about the mean of a
population:
• We have the following steps:
1.Data: determine variable, sample size (n), sample
mean( ) , population standard deviation or
sample standard deviation (s) if is unknown
2. Assumptions : We have two cases:
• Case1: Population is normally or approximately
normally distributed with known or unknown
variance (sample size n may be small or large),
• Case 2: Population is not normal with known or
unknown variance (n is large i.e. n≄30).
x
Mekele University: Biostatistics 161
• 3.Hypotheses:
• we have three cases
• Case I : H0: μ=μ0
HA: μ μ0
• e.g. we want to test that the population mean is
different than 50
• Case II : H0: μ = μ0
HA: μ > μ0
• e.g. we want to test that the population mean is greater
than 50
• Case III : H0: μ = μ0
HA: μ< μ0
• e.g. we want to test that the population mean is less
than 50

Mekele University: Biostatistics 162
4.Test Statistic:
• Case 1: population is normal or approximately normal
σ2
is known σ2
is unknown
( n large or small)
n large n small
• Case2: If population is not normally distributed and n is
large
• i)If σ2
is known ii) If σ2
is unknown
n
X
Z

o-

n
s
X
Z o- 

n
s
X
T o- 

n
s
X
Z o- 

n
X
Z

o-

Mekele University: Biostatistics 163
5.Decision Rule:
i) If HA: μ μ0
• Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2
(when use Z - test)
Or Reject H 0 if T >t1-α/2,n-1 or T< - t1-α/2,n-1
(when use T- test)
• ii) If HA: μ> μ0
• Reject H0 if Z>Z1-α (when use Z - test)
Or Reject H0 if T>t1-α,n-1 (when use T - test)

Mekele University: Biostatistics 164
• iii) If HA: μ< μ0
Reject H0 if Z< - Z1-α (when use Z - test)
• Or
Reject H0 if T<- t1-α,n-1 (when use T - test)
Note:
Z1-α/2 , Z1-α , Zα are tabulated values obtained
from table
t1-α/2 , t1-α , tα are tabulated values obtained from
table E with (n-1) degree of freedom (df)
Mekele University: Biostatistics 165
• 6.Decision :
• If we reject H0, we can conclude that HA is
true.
• If ,however ,we do not reject H0, we may
conclude that H0 is true.
Mekele University: Biostatistics 166
An Alternative Decision Rule using the
p - value Definition
• The p-value is defined as the smallest value of
α for which the null hypothesis can be
rejected.
• If the p-value is less than or equal to α ,we
reject the null hypothesis (p ≤ α)
• If the p-value is greater than α ,we do not
reject the null hypothesis (p > α)
Mekele University: Biostatistics 167
Example
• Researchers are interested in the mean age of a
certain population.
• A random sample of 10 individuals drawn from the
population of interest has a mean of 27.
• Assuming that the population is approximately
normally distributed with variance 20,can we
conclude that the mean is different from 30 years ?
(α=0.05) .
• If the p - value is 0.0340 how can we use it in making
a decision?
Mekele University: Biostatistics 168
Solution
1-Data: variable is age, n=10, =27 ,σ2=20,α=0.05
2-Assumptions: the population is approximately
normally distributed with variance 20
3-Hypotheses:
• H0 : μ=30
• HA: μ 30
x

Mekele University: Biostatistics 169
4-Test Statistic:
Z = -2.12
5.Decision Rule
• The alternative hypothesis is
• HA: μ > 30
• Hence we reject H0 if Z >Z1-0.025/2= Z0.975
• or Z< - Z1-0.025/2= - Z0.975
• Z0.975=1.96(from table )
Mekele University: Biostatistics 170
• 6.Decision:
• We reject H0 ,since -2.12 is in the rejection
region .
• We can conclude that μ is not equal to 30
• Using the p value ,we note that p-value
=0.0340< 0.05,therefore we reject H0
Mekele University: Biostatistics 171
Example
• Referring to previous example.Suppose that
the researchers have asked: Can we
conclude that μ<30.
1.Data.see previous example
2. Assumptions .see previous example
3.Hypotheses:
• H0 μ =30
• HA: μ < 30
Mekele University: Biostatistics 172
4.Test Statistic :
• = = -2.12
5. Decision Rule: Reject H0 if Z< Z α, where
• Z α= -1.645. (from table)
6. Decision: Reject H0 ,thus we can conclude that the
population mean is smaller than 30.
n
X
Z

o-

10
20
3027 
Mekele University: Biostatistics 173
Example
• Among 157 African-American men ,the mean
systolic blood pressure was 146 mm Hg with a
standard deviation of 27. We wish to know if
on the basis of these data, we may conclude
that the mean systolic blood pressure for a
population of African-American is greater than
140. Use α=0.01.
Mekele University: Biostatistics 174
Solution
1. Data: Variable is systolic blood pressure,
n=157 , x=146, s=27, α=0.01.
2. Assumption: population is not normal, σ2 is
unknown
3. Hypotheses: H0 :μ=140
HA: μ>140
4.Test Statistic:
= = = 2.78
n
s
X
Z o- 

157
27
140146 
1548.2
6
Mekele University: Biostatistics 175
5. Desicion Rule:
we reject H0 if Z>Z1-α
= Z0.99= 2.33
(from table D)
6. Desicion: We reject H0.
Hence we may conclude that the mean systolic
blood pressure for a population of African-
American is greater than 140.
Mekele University: Biostatistics 176
Hypothesis Testing :The Difference between
two population mean :
• We have the following steps:
1.Data: determine variable, sample size (n), sample means,
population standard deviation or samples standard
deviation (s) if is unknown for two population.
2. Assumptions : We have two cases:
• Case1: Population is normally or approximately normally
distributed with known or unknown variance (sample size
n may be small or large),
• Case 2: Population is not normal with known variances (n is
large i.e. n≄30).
Mekele University: Biostatistics 177
• 3.Hypotheses:
• we have three cases
• Case I : H0: μ 1 = μ2 → μ 1 - μ2 = 0
• HA: μ 1 ≠ μ 2 → μ 1 - μ 2 ≠ 0
• e.g. we want to test that the mean for first population is
different from second population mean.
• Case II : H0: μ 1 = μ2 → μ 1 - μ2 = 0
HA: μ 1 > μ 2 →μ 1 - μ 2 > 0
• e.g. we want to test that the mean for first population is
greater than second population mean.
• Case III : H0: μ 1 = μ2 → μ 1 - μ2 = 0
HA: μ 1 < μ 2 → μ 1 - μ 2 < 0
• e.g. we want to test that the mean for first population
is greater than second population mean.
Mekele University: Biostatistics 178
4.Test Statistic:
• Case 1: Two population is normal or approximately
normal
σ2
is known σ2
is unknown if
( n1 ,n2 large or small) ( n1 ,n2 small)
population population Variances
Variances equal not equal
where
2
2
2
1
2
1
2121 )(-)X-X(
nn
Z





21
2121
11
)(-)X-X(
nn
S
T
p 



2
2
2
1
2
1
2121 )(-)X-X(
n
S
n
S
T




2
)1(n)1(n
21
2
22
2
112



nn
SS
Sp
Mekele University: Biostatistics 179
• Case2: If population is not normally distributed
• and n1, n2 is large(n1 ≄ 0 ,n2≄ 0)
• and population variances is known,
2
2
2
1
2
1
2121 )(-)X-X(
nn
Z





Mekele University: Biostatistics 180
5.Decision Rule:
i) If HA: μ 1 ≠ μ 2 → μ 1 - μ 2 ≠ 0
• Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2
(when use Z - test)
Or Reject H 0 if T >t1-α/2 ,(n1+n2 -2) or T< - t1-α/2,,(n1+n2 -2)
(when use T- test)
• __________________________
• ii) HA: μ 1 > μ 2 → μ 1 - μ 2 > 0
• Reject H0 if Z>Z1-α (when use Z - test)
Or Reject H0 if T>t1-α,(n1+n2 -2) (when use T - test)
Mekele University: Biostatistics 181
• iii) If HA: μ 1 < μ 2 → μ 1 - μ 2 < 0 Reject H0
if Z< - Z1-α (when use Z - test)
• Or
Reject H0 if T<- t1-α, ,(n1+n2 -2) (when use T - test)
Note:
Z1-α/2 , Z1-α , Zα are tabulated values obtained
from table D
t1-α/2 , t1-α , tα are tabulated values obtained from
table E with (n1+n2 -2) degree of freedom (df)
6. Conclusion: reject or fail to reject H0
Mekele University: Biostatistics 182
Example
• Researchers wish to know if the data have collected provide
sufficient evidence to indicate a difference in mean serum
uric acid levels between normal individuals and individual
with Downā€Ÿs syndrome. The data consist of serum uric
reading on 12 individuals with Downā€Ÿs syndrome from
normal distribution with variance 1 and 15 normal individuals
from normal distribution with variance 1.5 . The mean are
and α=0.05.
Solution:
1. Data: Variable is serum uric acid levels, n1=12 , n2=15,
σ2
1=1, σ2
2=1.5 ,α=0.05.
100/5.41 mgX  100/4.32 mgX 
Mekele University: Biostatistics 183
2. Assumption: Two population are normal, σ2
1 , σ2
2
are known
3. Hypotheses: H0: μ 1 = μ2 → μ 1 - μ2 = 0
• HA: μ 1 ≠ μ 2 → μ 1 - μ 2 ≠ 0
4.Test Statistic:
• = = 2.57
5. Desicion Rule:
Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2
Z1-α/2= Z1-0.05/2= Z0.975=1.96 (from table D)
6-Conclusion: Reject H0 since 2.57 > 1.96
Or if p-value =0.102→ reject H0 if p < α → then reject H0
2
2
2
1
2
1
2121 )(-)X-X(
nn
Z





15
5.1
12
1
)0(-3.4)-(4.5


Mekele University: Biostatistics 184
Example
The purpose of a study by Tam, was to investigate wheelchair
Maneuvering in individuals with over-level spinal cord injury (SCI)
And healthy control (C). Subjects used a modified a wheelchair to
incorporate a rigid seat surface to facilitate the specified
experimental measurements. The data for measurements of the
left ischial tuerosity for SCI and control C are shown below
16915011488117122131124115131C
14313011912113016318013015060SCI
Mekele University: Biostatistics 185
We wish to know if we can conclude, on the
basis of the above data that the mean of
left ischial tuberosity for control C lower
than mean of left ischial tuerosity for SCI,
Assume normal populations equal
variances. α=0.05, p-value = -1.33
Mekele University: Biostatistics 186
Solution:
1. Data:, nC=10 , nSCI=10, SC=21.8, SSCI=133.1 ,α=0.05.
• , (calculated from data)
2.Assumption: Two population are normal, σ2
1 , σ2
2 are
unknown but equal
3. Hypotheses: H0: μ C = μ SCI → μ C - μ SCI = 0
HA: μ C < μ SCI → μ C - μ SCI < 0
4.Test Statistic:
•
Where,
1.126CX 1.133SCIX
569.0
10
1
10
1
04.756
0)1.1331.126(
11
)(-)X-X(
21
2121







nn
S
T
p

04.756
21010
)3.32(9)8.21(9
2
)1(n)1(n 22
21
2
22
2
112







nn
SS
Sp
Mekele University: Biostatistics 187
5. Decision Rule:
Reject H 0 if T< - T1-α,(n1+n2 -2)
T1-α,(n1+n2 -2) = T0.95,18 = 1.7341 (from table E)
6-Conclusion: Fail to reject H0 since -0.569 < - 1.7341
Or
Fail to reject H0 since p = -1.33 > α =0.05
Mekele University: Biostatistics 188
Example
Dernellis and Panaretou examined subjects with hypertension
and healthy control subjects .One of the variables of interest was
the aortic stiffness index. Measures of this variable were
calculated From the aortic diameter evaluated by M-mode and
blood pressure measured by a sphygmomanometer. Physics wish
to reduce aortic stiffness. In the 15 patients with hypertension
(Group 1),the mean aortic stiffness index was 19.16 with a
standard deviation of 5.29. In the30 control subjects (Group 2),the
mean aortic stiffness index was 9.53 with a standard deviation of
2.69. We wish to determine if the two populations represented by
these samples differ with respect to mean stiffness index .we wish
to know if we can conclude that in general a person with
thrombosis have on the average higher IgG levels than persons
without thrombosis at α=0.01, p-value = 0.0559
Mekele University: Biostatistics 189
Solution:
1. Data:, n1=53 , n2=54, S1= 44.89, S2= 34.85 α=0.01.
2.Assumption: Two population are not normal, σ2
1 , σ2
2
are unknown and sample size large
3. Hypotheses: H0: μ 1 = μ 2 → μ 1 - μ 2 = 0
HA: μ 1 > μ 2 → μ 1 - μ 2 > 0
4.Test Statistic:
•
standard deviationSample
Size
Mean LgG levelGroup
44.895359.01Thrombosis
34.855446.61No
Thrombosis
59.1
54
85.34
53
89.44
0)61.4601.59()(-)X-X(
22
2
2
2
1
2
1
2121







n
S
n
S
Z

Mekele University: Biostatistics 190
5. Decision Rule:
Reject H 0 if Z > Z1-α
Z1-α = Z0.99 = 2.33 (from table D)
6-Conclusion: Fail to reject H0 since 1.59 > 2.33
Or
Fail to reject H0 since p = 0.0559 > α =0.01
Mekele University: Biostatistics 191
Hypothesis Testing A single population
proportion:
• Testing hypothesis about population proportion (P) is carried out
in much the same way as for mean when condition is necessary for
using normal curve are met
• We have the following steps:
1.Data: sample size (n), sample proportion( ) , P0
2. Assumptions :normal distribution ,
pˆ
n
a
p 
samplein theelementofno.Total
isticcharachtarsomewithsamplein theelementofno.
ˆ
Mekele University: Biostatistics 192
• 3.Hypotheses:
• we have three cases
• Case I : H0: P = P0
HA: P ≠ P0
• Case II : H0: P = P0
HA: P > P0
• Case III : H0: P = P0
HA: P < P0
4.Test Statistic:
Where H0 is true ,is distributed approximately as the standard
normal
n
qp
pp
Z
00
0
ˆ 

Mekele University: Biostatistics 193
5.Decision Rule:
i) If HA: P ≠ P0
• Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2
• _______________________
• ii) If HA: P> P0
• Reject H0 if Z>Z1-α
• _____________________________
• iii) If HA: P< P0
Reject H0 if Z< - Z1-α
Note: Z1-α/2 , Z1-α , Zα are tabulated values obtained from
table D
6. Conclusion: reject or fail to reject H0
Mekele University: Biostatistics 194
2. Assumptions : is approximately normally distributed
3.Hypotheses:
• we have three cases
• H0: P = 0.063
HA: P > 0.063
• 4.Test Statistic :
5.Decision Rule: Reject H0 if Z>Z1-α
Where Z1-α = Z1-0.05 =Z0.95= 1.645
21.1
301
)0.937(063.0
063.008.0ˆ
00
0





n
qp
pp
Z
pˆ
Mekele University: Biostatistics 195
6. Conclusion: Fail to reject H0
Since
Z =1.21 > Z1-α=1.645
Or ,
If P-value = 0.1131,
fail to reject H0 → P > α
Mekele University: Biostatistics 196
Example
Wagen collected data on a sample of 301 Hispanic women
Living in Texas .One variable of interest was the percentage
of subjects with impaired fasting glucose (IFG). In the
study,24 women were classified in the (IFG) stage .The article
cites population estimates for (IFG) among Hispanic women
in Texas as 6.3 percent .Is there sufficient evidence to
indicate that the population Hispanic women in Texas has a
prevalence of IFG higher than 6.3 percent ,let α=0.05
Solution:
1.Data: n = 301, p0 = 6.3/100=0.063 ,a=24,
q0 =1- p0 = 1- 0.063 =0.937, α=0.05
08.0
301
24
ˆ 
n
a
p
Mekele University: Biostatistics 197
Hypothesis Testing :The Difference
between two population proportion:
• Testing hypothesis about two population proportion (P1,, P2 ) is
carried out in much the same way as for difference between two
means when condition is necessary for using normal curve are met
• We have the following steps:
1.Data: sample size (n1 n2), sample proportions( ),
Characteristic in two samples (x1 , x2),
2- Assumption : Two populations are independent .
21
ˆ,ˆ PP
21
21
nn
xx
p



Mekele University: Biostatistics 198
• 3.Hypotheses:
• we have three cases
• Case I : H0: P1 = P2 → P1 - P2 = 0
HA: P1 ≠ P2 → P1 - P2 ≠ 0
• Case II : H0: P1 = P2 → P1 - P2 = 0
HA: P1 > P2 → P1 - P2 > 0
• Case III : H0: P1 = P2 → P1 - P2 = 0
HA: P1 < P2 → P1 - P2 < 0
4.Test Statistic:
Where H0 is true ,is distributed approximately as the standard
normal
21
2121
)1()1(
)()ˆˆ(
n
pp
n
pp
pppp
Z





Mekele University: Biostatistics 199
5.Decision Rule:
i) If HA: P1 ≠ P2
• Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2
• _______________________
• ii) If HA: P1 > P2
• Reject H0 if Z >Z1-α
• _____________________________
• iii) If HA: P1 < P2
• Reject H0 if Z< - Z1-α
Note: Z1-α/2 , Z1-α , Zα are tabulated values obtained from
table D
6. Conclusion: reject or fail to reject H0
Mekele University: Biostatistics 200
Example
Noonan is a genetic condition that can affect the heart growth,
blood clotting and mental and physical development. Noonan examined
the stature of men and women with Noonan. The study contained 29
Male and 44 female adults. One of the cut-off values used to assess
stature was the third percentile of adult height .Eleven of the males fell
below the third percentile of adult male height ,while 24 of the female
fell below the third percentile of female adult height .Does this study
provide sufficient evidence for us to conclude that among subjects with
Noonan ,females are more likely than males to fall below the respective
of adult height? Let α=0.05
Solution:
1.Data: n M = 29, n F = 44 , x M= 11 , x F= 24, α=0.05
479.0
4429
2411







FM
FM
nn
xx
p 545.0
44
24
ˆ,379.0
29
11
ˆ 
F
F
F
M
m
M
n
x
p
n
x
p
Mekele University: Biostatistics 201
2- Assumption : Two populations are independent .
3.Hypotheses:
• Case II : H0: PF = PM → PF - PM = 0
HA: PF > PM → PF - PM > 0
• 4.Test Statistic:
5.Decision Rule:
Reject H0 if Z >Z1-α , Where Z1-α = Z1-0.05 =Z0.95= 1.645
6. Conclusion: Fail to reject H0
Since Z =1.39 > Z1-α=1.645
Or , If P-value = 0.0823 → fail to reject H0 → P > α
39.1
29
)521.0)(479.0(
44
)521.0)(479.0(
0)379.0545.0(
)1()1(
)()ˆˆ(
21
2121









n
pp
n
pp
pppp
Z
An Introduction to the Chi-Square
Distribution
Mekele University: Biostatistics 203
TESTS OF INDEPENDENCE
• To test whether two criteria of classification are
independent . For example socioeconomic status and
area of residence of people in a city are independent.
• We divide our sample according to status, low, medium
and high incomes etc. and the same samples is
categorized according to urban, rural or suburban and
slums etc.
• Put the first criterion in columns equal in number to
classification of 1st criteria ( Socioeconomic status) and
the 2nd in rows, where the no. of rows equal to the no.
of categories of 2nd criteria (areas of cities).
Mekele University: Biostatistics 204
The Contingency Table
• Table Two-Way Classification of sample
First Criterion of Classification →
Second
Criterion ↓
1 2 3 ….. c Total
1
2
3
.
.
r
N11
N21
N31
.
.
Nr1
N12
N22
N32
.
.
Nr2
N13
N 23
N33
.
.
Nr3
……
……
…...
……
N1c
N2c
N3c
.
.
N rc
N1.
N2.
N3.
.
.
Nr.
Total N.1 N.2 N.3 …… N.c N
Mekele University: Biostatistics 205
Observed versus Expected Frequencies
• Oi j : The frequencies in ith row and jth column given in any
contingency table are called observed frequencies that result
form the cross classification according to the two
classifications.
• ei j :Expected frequencies on the assumption of independence
of two criterion are calculated by multiplying the marginal
totals of any cell and then dividing by total frequency
• Formula:
N
NN
e
ji
ij
)(( ļ‚·ļ‚·

Mekele University: Biostatistics 206
Chi-square Test
• After the calculations of expected frequency,
Prepare a table for expected frequencies and use Chi-square
Where summation is for all values of r xc = k cells.
• D.F.: the degrees of freedom for using the table are (r-1)(c-1)
for α level of significance
• Note that the test is always one-sided.




k
i
e
eo
i
ii
1
2
]
)(
[
2

Mekele University: Biostatistics 207
Example
The researcher are interested to determine that preconception
use of folic acid and race are independent. The data is:
Observed Frequencies Table Expected frequencies Table
Use of
Folic
Acid total
Yes
No
White
Black
Other
260
15
7
299
41
14
559
56
21
Total 282 354 636
Yes no Total
White
Black
Other
s
(282)(559)/636
=247.86
(282)(56)/636
=24.83
(282)((21)
=9.31
(354)(559)/63
6
=311.14
(354)(559)
=
31.17
21x354/636
=11.69
559
56
21
Mekele University: Biostatistics 208
Calculations and Testing
091.969.11/.....
14.311/86.247/
)69.1114(
)14.311299()86.247260(
2
222




• Data: See the given table
• Assumption: Simple random sample
• Hypothesis: H0: race and use of folic acid are independent
HA: the two variables are not independent. Let α = 0.05
• The test statistic is Chi Square given earlier
• Distribution when H0 is true chi-square is valid with (r-1)(c-1) = (3-
1)(2-1)= 2 d.f.
• Decision Rule: Reject H0 if value of is greater than
= 5.991
• Calculations:

2

2
)1)(1(,  cr
Mekele University: Biostatistics 209
Conclusion
• Statistical decision. We reject H0 since 9.08960> 5.991
• Conclusion: we conclude that H0 is false, and that there is a
relationship between race and preconception use of folic
acid.
• P value. Since 7.378< 9.08960< 9.210, 0.01<p <0.025
• We also reject the hypothesis at 0.025 level of significance but
do not reject it at 0.01 level.

More Related Content

What's hot (20)

PPT
Biostatistics Concept & Definition
Southern Range, Berhampur, Odisha
Ā 
PPTX
Introduction of biostatistics
khushbu
Ā 
PPTX
Fundamentals of biostatistics
Kingsuk Sarkar
Ā 
PPTX
Randomisation
sekharbabu41
Ā 
PPTX
Bias and confounder
Reena Titoria
Ā 
PPT
Bio stat
AbhishekDas15
Ā 
PDF
Power Analysis and Sample Size Determination
Ajay Dhamija
Ā 
PPTX
Sampling methods for research
DR HARDEV SINGH
Ā 
PPT
Biostatistics lec 1
Osmanmohamed38
Ā 
PPT
Introduction to biostatistics by Niraj Kumar Yadav
Niraj Kumar Yadav
Ā 
PPT
biostatstics :Type and presentation of data
naresh gill
Ā 
PPTX
Agreement analysis
Dhritiman Chakrabarti
Ā 
PPT
Meta-analysis and systematic reviews
coolboy101pk
Ā 
PPTX
Error, confounding and bias
Amandeep Kaur
Ā 
PPTX
Lec. biostatistics introduction
Riaz101
Ā 
PPTX
Survival analysis
Har Jindal
Ā 
PPTX
Confidence interval
Dr Renju Ravi
Ā 
PPT
1.introduction
abdi beshir
Ā 
PPT
Measures Of Association
ganesh kumar
Ā 
PPTX
Meta analysis ppt
SKVA
Ā 
Biostatistics Concept & Definition
Southern Range, Berhampur, Odisha
Ā 
Introduction of biostatistics
khushbu
Ā 
Fundamentals of biostatistics
Kingsuk Sarkar
Ā 
Randomisation
sekharbabu41
Ā 
Bias and confounder
Reena Titoria
Ā 
Bio stat
AbhishekDas15
Ā 
Power Analysis and Sample Size Determination
Ajay Dhamija
Ā 
Sampling methods for research
DR HARDEV SINGH
Ā 
Biostatistics lec 1
Osmanmohamed38
Ā 
Introduction to biostatistics by Niraj Kumar Yadav
Niraj Kumar Yadav
Ā 
biostatstics :Type and presentation of data
naresh gill
Ā 
Agreement analysis
Dhritiman Chakrabarti
Ā 
Meta-analysis and systematic reviews
coolboy101pk
Ā 
Error, confounding and bias
Amandeep Kaur
Ā 
Lec. biostatistics introduction
Riaz101
Ā 
Survival analysis
Har Jindal
Ā 
Confidence interval
Dr Renju Ravi
Ā 
1.introduction
abdi beshir
Ā 
Measures Of Association
ganesh kumar
Ā 
Meta analysis ppt
SKVA
Ā 

Viewers also liked (20)

PPTX
Application of Biostatistics
Jippy Jack
Ā 
PDF
Introduction to biostatistics
Ali Al Mousawi
Ā 
PPTX
Research methodology & Biostatistics
Kusum Gaur
Ā 
PPTX
biostatistics basic
jjm medical college
Ā 
PPT
statistics in pharmaceutical sciences
Techmasi
Ā 
PPT
Biostatistics
priyarokz
Ā 
PPT
Introduction to Biostatistics
Abdul Wasay Baloch
Ā 
PPT
Introduction To Statistics
albertlaporte
Ā 
PDF
Lecture 1 basic concepts2009
barath r baskaran
Ā 
PPSX
Introduction to statistics...ppt rahul
Rahul Dhaker
Ā 
PPTX
Role of Statistics in Scientific Research
Varuna Harshana
Ā 
PPTX
importance of biostatics in modern reasearch
sana sana
Ā 
PPTX
Simple understanding of biostatistics
Hamdi Alhakimi
Ā 
PPSX
Biostatistics
Pritam Gupta
Ā 
PPT
STATISTICS
pratikjeswani
Ā 
PPT
Statistics
pikuoec
Ā 
PPT
Statistical ppt
feminaargonza09
Ā 
PPT
Introduction to biostatistics
o_devinyak
Ā 
PPTX
Biostatistics : Types of Variable
Tarekk Alazabee
Ā 
PPT
Chapter1:introduction to medical statistics
ghalan
Ā 
Application of Biostatistics
Jippy Jack
Ā 
Introduction to biostatistics
Ali Al Mousawi
Ā 
Research methodology & Biostatistics
Kusum Gaur
Ā 
biostatistics basic
jjm medical college
Ā 
statistics in pharmaceutical sciences
Techmasi
Ā 
Biostatistics
priyarokz
Ā 
Introduction to Biostatistics
Abdul Wasay Baloch
Ā 
Introduction To Statistics
albertlaporte
Ā 
Lecture 1 basic concepts2009
barath r baskaran
Ā 
Introduction to statistics...ppt rahul
Rahul Dhaker
Ā 
Role of Statistics in Scientific Research
Varuna Harshana
Ā 
importance of biostatics in modern reasearch
sana sana
Ā 
Simple understanding of biostatistics
Hamdi Alhakimi
Ā 
Biostatistics
Pritam Gupta
Ā 
STATISTICS
pratikjeswani
Ā 
Statistics
pikuoec
Ā 
Statistical ppt
feminaargonza09
Ā 
Introduction to biostatistics
o_devinyak
Ā 
Biostatistics : Types of Variable
Tarekk Alazabee
Ā 
Chapter1:introduction to medical statistics
ghalan
Ā 
Ad

Similar to INTRODUCTION TO BIO STATISTICS (20)

PDF
lecture introduction to biostatics 1.pdf
gebeyehu5
Ā 
PPT
statistics introduction.ppt
CHANDAN PADHAN
Ā 
PPTX
Biostatistics introduction.pptx
MohammedAbdela7
Ā 
DOC
Ch 3 DATA.doc
AbedurRahman5
Ā 
PPTX
Introduction to Biostatistics for beginers
amareyirga2
Ā 
PPT
Bst322week1
Howard Realubit
Ā 
PPTX
Basic of Biostatistics and epidemology_1.pptx
haiderhighland
Ā 
PPTX
Lecture-2{This tell us about the statics basic info}_JIH.pptx
fahimhasan1217
Ā 
PDF
Basic Statistics, Biostatistics, and Frequency Distribution
Gaurav Patil
Ā 
PPTX
id biostatics.pptx
MohammedAbdela7
Ā 
PPTX
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
GauravBoruah
Ā 
PPTX
Biostatistics, lesson 101 (Introduction).pptx
DrAbdiwaliMohamedAbd
Ā 
DOCX
BIOSTATISTICS hypothesis testings ,sampling
hridyahp
Ā 
PDF
BIOSTATISTICS LESSON 1 INTRODUCTION-1.pdf
DrAbdiwaliMohamedAbd
Ā 
PPTX
BIOSTATISTICS + EXERCISES
MINANI Theobald
Ā 
PDF
1_Introduction to Biostatistics-2 (2).pdf
EvansMogaka2
Ā 
PDF
1_Introduction to Biostatistics-2 (2).pdf
elphaswalela
Ā 
PPTX
Biostatistics
Vaibhav Ambashikar
Ā 
PPTX
Biostatistic 2.pptx
imrantestmails
Ā 
PPT
Bio statistics 1
SivasankaranV
Ā 
lecture introduction to biostatics 1.pdf
gebeyehu5
Ā 
statistics introduction.ppt
CHANDAN PADHAN
Ā 
Biostatistics introduction.pptx
MohammedAbdela7
Ā 
Ch 3 DATA.doc
AbedurRahman5
Ā 
Introduction to Biostatistics for beginers
amareyirga2
Ā 
Bst322week1
Howard Realubit
Ā 
Basic of Biostatistics and epidemology_1.pptx
haiderhighland
Ā 
Lecture-2{This tell us about the statics basic info}_JIH.pptx
fahimhasan1217
Ā 
Basic Statistics, Biostatistics, and Frequency Distribution
Gaurav Patil
Ā 
id biostatics.pptx
MohammedAbdela7
Ā 
BIOSTATISTICS FUNDAMENTALS FOR BIOTECHNOLOGY
GauravBoruah
Ā 
Biostatistics, lesson 101 (Introduction).pptx
DrAbdiwaliMohamedAbd
Ā 
BIOSTATISTICS hypothesis testings ,sampling
hridyahp
Ā 
BIOSTATISTICS LESSON 1 INTRODUCTION-1.pdf
DrAbdiwaliMohamedAbd
Ā 
BIOSTATISTICS + EXERCISES
MINANI Theobald
Ā 
1_Introduction to Biostatistics-2 (2).pdf
EvansMogaka2
Ā 
1_Introduction to Biostatistics-2 (2).pdf
elphaswalela
Ā 
Biostatistics
Vaibhav Ambashikar
Ā 
Biostatistic 2.pptx
imrantestmails
Ā 
Bio statistics 1
SivasankaranV
Ā 
Ad

More from Meklelle university (20)

PPT
Chronic obstructive pulmonary disease ppt
Meklelle university
Ā 
PPTX
Thyroid neoplasms
Meklelle university
Ā 
PPTX
Acute urinary retention mgt
Meklelle university
Ā 
PPTX
Lung ca
Meklelle university
Ā 
PPT
Head injury (2)
Meklelle university
Ā 
PPTX
6 gall blader & biliary tree diseases
Meklelle university
Ā 
PPT
Dermatitis and eczema
Meklelle university
Ā 
PPT
Rehab of injuries to the wrist and hand power pt
Meklelle university
Ā 
PPTX
Rehab cervical through cocegeal power pt
Meklelle university
Ā 
PPTX
Rehab abdomen and thorax power pt
Meklelle university
Ā 
PPTX
Chapter 9 power pt
Meklelle university
Ā 
PPT
Research methodology by hw
Meklelle university
Ā 
PPTX
Thyroid neoplasms
Meklelle university
Ā 
PPTX
Goiter
Meklelle university
Ā 
PPTX
Diabetes mellitus
Meklelle university
Ā 
PPT
Breast ca
Meklelle university
Ā 
PPTX
Prenatal diagnosis
Meklelle university
Ā 
PPTX
Minor conditions of pregnancy
Meklelle university
Ā 
PPTX
Antenatal care
Meklelle university
Ā 
Chronic obstructive pulmonary disease ppt
Meklelle university
Ā 
Thyroid neoplasms
Meklelle university
Ā 
Acute urinary retention mgt
Meklelle university
Ā 
Head injury (2)
Meklelle university
Ā 
6 gall blader & biliary tree diseases
Meklelle university
Ā 
Dermatitis and eczema
Meklelle university
Ā 
Rehab of injuries to the wrist and hand power pt
Meklelle university
Ā 
Rehab cervical through cocegeal power pt
Meklelle university
Ā 
Rehab abdomen and thorax power pt
Meklelle university
Ā 
Chapter 9 power pt
Meklelle university
Ā 
Research methodology by hw
Meklelle university
Ā 
Thyroid neoplasms
Meklelle university
Ā 
Diabetes mellitus
Meklelle university
Ā 
Breast ca
Meklelle university
Ā 
Prenatal diagnosis
Meklelle university
Ā 
Minor conditions of pregnancy
Meklelle university
Ā 
Antenatal care
Meklelle university
Ā 

Recently uploaded (20)

PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
Ā 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
Ā 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
Ā 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
Ā 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
Ā 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
Ā 
PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
Ā 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
Ā 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
Ā 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
Ā 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
Ā 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
Ā 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
Ā 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
Ā 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
Ā 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
Ā 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
Ā 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
Ā 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
Ā 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
Ā 
Smart Trailers 2025 Update with History and Overview
Paul Menig
Ā 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
Ā 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
Ā 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
Ā 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
Ā 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
Ā 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
Ā 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
Ā 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
Ā 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
Ā 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
Ā 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
Ā 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
Ā 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
Ā 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
Ā 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
Ā 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
Ā 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
Ā 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
Ā 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
Ā 

INTRODUCTION TO BIO STATISTICS

  • 2. Introduction • Key words : – Statistics , data , Biostatistics, – Variable ,Population ,Sample Mekele University: Biostatistics 2
  • 3. Introduction Some Basic concepts Statistics is a field of study concerned with 1- collection, organization, summarization and analysis of data. 2- drawing of inferences about a body of data when only a part of the data is observed.  Statisticians try to interpret and communicate the results to others. Mekele University: Biostatistics 3
  • 4. * Biostatistics: The tools of statistics are employed in many fields: business, education, psychology, agriculture, economics, … etc. When the data analyzed are derived from the biological science and medicine, we use the term biostatistics to distinguish this particular application of statistical tools and concepts. Mekele University: Biostatistics 4
  • 5. Data: The raw material of Statistics is data. We may define data as figures. Figures result from the process of counting or from taking a measurement. For example: - When a hospital administrator counts the number of patients (counting). - When a nurse weighs a patient (measurement) Mekele University: Biostatistics 5
  • 6. Sources of data Records Surveys Experiments Comprehensive Sample Mekele University: Biostatistics 6
  • 7. We search for suitable data to serve as the raw material for our investigation. Such data are available from one or more of the following sources: 1- Routinely kept records. For example: - Hospital medical records contain immense amounts of information on patients. - Hospital accounting records contain a wealth of data on the facility’s business activities. Mekele University: Biostatistics 7 * Sources of Data:
  • 8. 2- Surveys: The source may be a survey, if the data needed is about answering certain questions. For example: If the administrator of a clinic wishes to obtain information regarding the mode of transportation used by patients to visit the clinic, then a survey may be conducted among patients to obtain this information. Mekele University: Biostatistics 8
  • 9. 3- Experiments. Frequently the data needed to answer a question are available only as the result of an experiment. For example: If a nurse wishes to know which of several strategies is best for maximizing patient compliance, she might conduct an experiment in which the different strategies of motivating compliance are tried with different patients. Mekele University: Biostatistics 9
  • 10. * A variable: It is a characteristic that takes on different values in different persons, places, or things. For example: - heart rate, - the heights of adult males, - the weights of preschool children, - the ages of patients seen in a dental clinic. Mekele University: Biostatistics 10
  • 11. Types of variables Quantitative Qualitative Quantitative Variables It can be measured in the usual sense. For example: - the heights of adult males, - the weights of preschool children, - the ages of patients seen in a dental clinic. Mekele University: Biostatistics 11 Qualitative Variables Many characteristics are not capable of being measured. Some of them can be ordered or ranked. For example: - classification of people into socio- economic groups, - social classes based on income, education, etc.
  • 12. Types of quantitative variables Discrete Continuous A discrete variable is characterized by gaps or interruptions in the values that it can assume. For example: - The number of daily admissions to a general hospital, - The number of decayed, missing or filled teeth per child in an elementary school. Mekele University: Biostatistics 12 A continuous variable can assume any value within a specified relevant interval of values assumed by the variable. For example: - Height, - weight, - skull circumference. No matter how close together the observed heights of two people, we can find another person whose height falls somewhere in between.
  • 13. Interval Types of variables & scale of measurement Quantitative variables Numerical Qualitative variables Categorical Ratio Nominal Ordinal 13Mekele University: Biostatistics
  • 14. Nominal unordered categories numbers used to represent categories averages are meaningless; look at frequency/proportion in each category dichotomous e.g. gender: male = 1, female = 0 polytomous e.g. blood type: O = 1, A = 2, B = 3, AB = 4 Mekele University: Biostatistics 14
  • 15. Ordinal ordered categories numbers used to represent categories order matters; magnitude does not differences between categories are meaningless Example:- severity of injury: fatal = 1, severe = 2, moderate = 3, minor = 4 Mekele University: Biostatistics 15
  • 16. Interval The differences between observational units is equal The zero point is arbitrary and does not infer the absence of the property being measured Examples: Degrees Fahrenheit The difference between 30 and 40 is the same as that between 70 and 80 degrees. But 80 is not twice as hot as 40. years: The difference between 1993-1994 is the same as 1995- 1996, but year 0 was not the beginning of time. Mekele University: Biostatistics 16
  • 17. Ratio The most detailed and objectively interpretable of the measurement scales. Interval scale with an absolute zero-it has a true zero point (absence of property being measured) as well as equal intervals E.g. Height, weight, money, age, time, speed, class size, the Kelvin scale of temperature Mekele University: Biostatistics 17
  • 18. Cont… Independent variables Precede dependent variables in time Are often manipulated by the researcher The treatment or intervention that is used in a study Dependent variables What is measured as an outcome in a study Values depend on the independent variable Mekele University: Biostatistics 18
  • 19. * A population: It is the largest collection of values of a random variable for which we have an interest at a particular time. For example: • headache patients in a chiropractic office; automobile crash victims in an emergency room • In research, it is not practical to include all members of a population • Thus, a sample (a subset of a population) is taken • Populations may be finite or infinite. Mekele University: Biostatistics 19
  • 20. A Sample it is a part of a population e.g. the fraction of these patients Random sample Subjects are selected from a population so that each individual has an equal chance of being selected Random samples are representative of the source population Non-random samples are not representative May be biased regarding age, severity of the condition, socioeconomic status etc Mekele University: Biostatistics 20
  • 22. Types of statistical methods Descriptive statistics Describe the data by summarizing them Inferential statistics Techniques, by which inferences are drawn for the population parameters from the sample statistics OR sample statistics observed are inferred to the corresponding population parameters Mekele University: Biostatistics 22
  • 23. Cont… Parameter Summary data from a population Statistic Summary data from a sample Mekele University: Biostatistics 23
  • 24. Examples of Scales of Measurements • Low income ordinal • CD4 count ratio • Year of birth interval • IQ scores interval • Severe injury ordinal • Raw score on a statistics exam interval • Room temperature in Kelvin ratio • Nationality of MU students nominal 24Mekele University: Biostatistics
  • 25. Descriptive statistics Strategies for understanding the meanings of Data
  • 26. Mekele University: Biostatistics 26 • Key words Frequency table, bar chart ,range width of interval , mid-interval Histogram , Polygon
  • 27. Descriptive statistics Before performing any analyses, you must first get to know your data Descriptive statistics are used to summarize data in the form of tables, graphs and numerical measures The summary technique used depends on the data type under consideration Mekele University: Biostatistics 27
  • 28. Presentation techniques for qualitative/categorical data statistics Frequency Relative frequency Cumulative frequency Figure/chart Pie chart Bar chart Mekele University: Biostatistics 28
  • 29. Frequency Distribution for Discrete Random Variables Example: Suppose that we take a sample of size 16 from children in a primary school and get the following data about the number of their decayed teeth, 3,5,2,4,0,1,3,5,2,3,2,3,3,2,4,1 To construct a frequency table: 1- Order the values from the smallest to the largest. 0,1,1,2,2,2,2,3,3,3,3,3,4,4,5,5 2- Count how many numbers are the same.
  • 31. Mekele University: Biostatistics 31 Representing the simple frequency table using the bar chart Number of decayed teeth 5.004.003.002.001.00.00 Frequency 6 5 4 3 2 1 0 22 5 4 2 1 We can represent the above simple frequency table using the bar chart. Ordinal or nominal data Height of each bar is the frequency of that category
  • 33. Cont… 0 10 20 30 40 50 % Single Married Divorced Widowed Marital status Male Female 33Mekele University: Biostatistics
  • 34. Cont… instead of ā€œstacksā€ rising up from horizontal (bar chart), we could plot instead the shares of a pie Recalling that a circle has 360 degrees 50% means 180 degrees 25% means 90 degrees Mekele University: Biostatistics 34
  • 36. Mekele University: Biostatistics 36 Frequency Distribution for Continuous Random Variables For large samples, we can’t use the simple frequency table to represent the data. We need to divide the data into groups or intervals or classes. So, we need to determine: 1- The number of intervals (k). Too few intervals are not good because information will be lost. Too many intervals are not helpful to summarize the data. A commonly followed rule is that 6 ≤ k ≤ 15, or the following formula may be used, k = 1 + 3.322 (log n)
  • 37. Mekele University: Biostatistics 37 2- The range (R). It is the difference between the largest and the smallest observation in the data set. 3- The Width of the interval (w). Class intervals generally should be of the same width. Thus, if we want k intervals, then w is chosen such that w ≄ R / k.
  • 38. Mekele University: Biostatistics 38 Example: Assume that the number of observations equal 100, then k = 1+3.322(log 100) = 1 + 3.3222 (2) = 7.6  8. Assume that the smallest value = 5 and the largest one of the data = 61, then R = 61 – 5 = 56 and w = 56 / 8 = 7. To make the summarization more comprehensible, the class width may be 5 or 10 or the multiples of 10.
  • 40. Mekele University: Biostatistics 40 Example 2.3.1 • We wish to know how many class interval to have in the frequency distribution of the data in Table 1.4.1 of ages of 189 subjects who Participated in a study on smoking cessation Solution : • Since the number of observations equal 189, then • k = 1+3.322(log 189) • = 1 + 3.3222 (2.276)  9, • R = 82 – 30 = 52 and • w = 52 / 9 = 5.778  It is better to let w = 10, then the intervals will be in the form:
  • 41. Mekele University: Biostatistics 41 FrequencyClass interval 1130 – 39 4640 – 49 7050 – 59 4560 – 69 1670 – 79 180 – 89 189Total Sum of frequency =sample size=n
  • 42. Mekele University: Biostatistics 42 The Cumulative Frequency: It can be computed by adding successive frequencies. The Cumulative Relative Frequency: It can be computed by adding successive relative frequencies. The Mid-interval: It can be computed by adding the lower bound of the interval plus the upper bound of it and then divide over 2.
  • 43. Mekele University: Biostatistics 43 For the above example, the following table represents the cumulative frequency, the relative frequency, the cumulative relative frequency and the mid-interval. Cumulative Relative Frequency Relative Frequency R.f Cumulative Frequency Frequency Freq (f) Mid – interval Class interval 0.05820.0582111134.530 – 39 -0.2434574644.540 – 49 0.6720-127-54.550 – 59 0.91010.2381-45-60 – 69 0.99480.08471881674.570 – 79 10.0053189184.580 – 89 1189Total R.f= freq/n
  • 44. Mekele University: Biostatistics 44 Example : • From the above frequency table, complete the table then answer the following questions: 1-The number of objects with age less than 50 years ? 2-The number of objects with age between 40-69 years ? 3-Relative frequency of objects with age between 70-79 years ? 4-Relative frequency of objects with age more than 69 years ? 5-The percentage of objects with age between 40-49 years ? 6- The percentage of objects with age less than 60 years ? 7-The Range (R) ? 8- Number of intervals (K)? 9- The width of the interval ( W) ?
  • 45. Mekele University: Biostatistics 45 Representing the grouped frequency table using the histogram To draw the histogram, the true classes limits should be used. They can be computed by subtracting 0.5 from the lower limit and adding 0.5 to the upper limit for each interval. FrequencyTrue class limits 1129.5 – <39.5 4639.5 – < 49.5 7049.5 – < 59.5 4559.5 – < 69.5 1669.5 – < 79.5 179.5 – < 89.5 189Total 0 10 20 30 40 50 60 70 80 34.5 44.5 54.5 64.5 74.5 84.5
  • 46. Mekele University: Biostatistics 46 Representing the grouped frequency table using the Polygon 0 10 20 30 40 50 60 70 80 34.5 44.5 54.5 64.5 74.5 84.5
  • 47. Histogram •continuous data divided into categories •graphical representation of frequency distribution •height of each bar is the frequency of that category •assess skewness and modality of the data 47 Mekele University: Biostatistics
  • 49. frequency polygon - is an alternative to the histogram whereas in a histogram - X-axis shows intervals of values - Y-axis shows bars of frequencies . in a frequency polygon X-axis shows midpoints of intervals of values Y-axis shows dot instead of bars 49 Mekele University: Biostatistics
  • 51. Box plots - discrete or continuous data - displays the 25th, 50th and 75th percentiles of the data also known as the first, second and third quartiles respectively - whiskers extend to adjacent values which are not outliers - outliers indicated as circles - box shows the interquartile range of the data can be used to assess skewness 51 Mekele University: Biostatistics
  • 53. Two-way scatter plots • used to assess the relationship between two discrete or continuous measures nature of the relationship described as positive, negative or no relationship 53 Mekele University: Biostatistics
  • 54. Line graph • two continuous measures each x value has only one corresponding y value useful for looking at patterns over time can be used to compare 2 or more groups 54 Mekele University: Biostatistics
  • 55. CONTI… 0 0.5 1 1.5 2 2.5 3 3.5 0 0.5 1 1.5 2 2.5 3 55 Mekele University: Biostatistics
  • 56. Line Graph 0 10 20 30 40 50 60 1960 1970 1980 1990 2000 Year MMR/1000 Year MMR 1960 50 1970 45 1980 26 1990 15 2000 12 Figure (1): Maternal mortality rate of (country), 1960-2000 56Mekele University: Biostatistics
  • 57. GRAPHS & CHARTS – LINE GRAPH 57 Mekele University: Biostatistics
  • 59. Mekele University: Biostatistics 59 key words: Descriptive Statistic, measure of central tendency ,statistic, parameter, mean (μ) ,median, mode.
  • 60. Mekele University: Biostatistics 60 The Statistic and The Parameter • A Statistic: It is a descriptive measure computed from the data of a sample. • A Parameter: It is a a descriptive measure computed from the data of a population. Since it is difficult to measure a parameter from the population, a sample is drawn of size n, whose values are  1 ,  2 , …,  n. From this data, we measure the statistic.
  • 61. Mekele University: Biostatistics 61 Measures of Central Tendency A measure of central tendency is a measure which indicates where the middle of the data is. The three most commonly used measures of central tendency are: The Mean, the Median, and the Mode. The Mean: It is the average of the data.
  • 62. Mekele University: Biostatistics 62 The Population Mean:  = which is usually unknown, then we use the sample mean to estimate or approximate it. The Sample Mean: = Example: Here is a random sample of size 10 of ages, where  1 = 42,  2 = 28,  3 = 28,  4 = 61,  5 = 31,  6 = 23,  7 = 50,  8 = 34,  9 = 32,  10 = 37. = (42 + 28 + … + 37) / 10 = 36.6 x 1 N i i N X  x 1 n i i n x 
  • 63. Mekele University: Biostatistics 63 Properties of the Mean: • Uniqueness. For a given set of data there is one and only one mean. • Simplicity. It is easy to understand and to compute. • Affected by extreme values. Since all values enter into the computation. Example: Assume the values are 115, 110, 119, 117, 121 and 126. The mean = 118. But assume that the values are 75, 75, 80, 80 and 280. The mean = 118, a value that is not representative of the set of data as a whole.
  • 64. Mekele University: Biostatistics 64 The Median: When ordering the data, it is the observation that divide the set of observations into two equal parts such that half of the data are before it and the other are after it. • If n is odd, the median will be the middle of observations. It will be the (n+1)/2 th ordered observation. When n = 11, then the median is the 6th observation. • If n is even, there are two middle observations. • The median will be the mean of these two middle observations. It will be the [(n/2)th+((n/2)+1)th]/2 ordered observation. When n = 12, then the median is an observation halfway between the 6th and 7th ordered observation.
  • 65. Mekele University: Biostatistics 65 Example: For the same random sample, the ordered observations will be as: 23, 28, 28, 31, 32, 34, 37, 42, 50, 61. Since n = 10, then the median is the 5.5th observation, i.e. = (32+34)/2 = 33. Properties of the Median: • Uniqueness. For a given set of data there is one and only one median. • Simplicity. It is easy to calculate. • It is not affected by extreme values as is the mean.
  • 66. Mekele University: Biostatistics 66 The Mode: It is the value which occurs most frequently. If all values are different there is no mode. Sometimes, there are more than one mode. Example: For the same random sample, the value 28 is repeated two times, so it is the mode. Properties of the Mode: • Sometimes, it is not unique. • It may be used for describing qualitative data.
  • 67. Mekele University: Biostatistics 67 Quintiles Quintiles • Dividing the distribution of ordered values into equal-sized parts – Quartiles: 4 equal parts – Deciles: 10 equal parts – Percentiles: 100 equal parts First 25% Second 25% Third 25% Fourth 25% Q1 Q2 Q3 Q1:first quartile Q2:second quartile = median Q3:third quartile
  • 68. Example: Given the following data set (age of patients):- 18,59,24,42,21,23,24,32 find the third quartile Solution: sort the data from lowest to highest 18 21 23 24 24 32 42 59 3rd quartile = {3/4 (n+1)}th observation = (6.75)th observation = 32 + (42-32)x .75 = 39.5 Mekele University: Biostatistics 68
  • 70. Mekele University: Biostatistics 70 key words: Descriptive Statistic, measure of dispersion , range ,variance, coefficient of variation.
  • 71. Mekele University: Biostatistics 71 Measures of Dispersion: • A measure of dispersion conveys information regarding the amount of variability present in a set of data. • Note: 1. If all the values are the same → There is no dispersion . 2. If all the values are different → There is a dispersion: 3.If the values close to each other →The amount of Dispersion is small. b) If the values are widely scattered → The Dispersion is greater.
  • 72. Mekele University: Biostatistics 72 Example • ** Measures of Dispersion are : 1.Range (R). 2. Variance. 3. Standard deviation. 4.Coefficient of variation (C.V).
  • 73. Mekele University: Biostatistics 73 1.The Range (R): • Range =Largest value- Smallest value = • Note: – Range concern only onto two values – Highly sensitive to outliers – Data: 43,66,61,64,65,38,59,57,57,50. • Find Range? Range=66-38=28 • Inter-quartile range – 3rd quartile – 1st quartile (75th – 25th percentile) – Robust to outliers – Middle 50% of observations SL xx 
  • 74. Mekele University: Biostatistics 74 2.The Variance: • It measure dispersion relative to the scatter of the values a bout their mean. a) Sample Variance ( ) : • ,where is sample mean • Find Sample Variance of ages , = 56 • Solution: • S2= [(43-56) 2 +(66-56) 2+…..+(50-56) 2 ]/ 10 • = 900/10 = 90 x 2 S 1 )( 1 2 2    ļƒ„ļ€½ n xx S n i i x
  • 75. Mekele University: Biostatistics 75 • b)Population Variance ( ) : where , is Population mean 3.The Standard Deviation: • is the square root of variance= a) Sample Standard Deviation = S = b) Population Standard Deviation = σ = 2  N x N i iļƒ„ļ€½   1 2 2 )(   Varince 2 S 2  
  • 76. STANDARD DEVIATION SD 7 7 7 7 7 7 7 8 7 7 7 6 3 2 7 8 13 9 Mean = 7 SD=0 Mean = 7 SD=0.63 Mean = 7 SD=4.04 76Mekele University: Biostatistics
  • 77. Standard deviation Caution must be exercised when using standard deviation as a comparative index of dispersion Weights of newborn elephants (kg) 929 853 878 939 895 972 937 841 801 826 Weights of newborn mice (kg) 0.72 0.42 0.63 0.31 0.59 0.38 0.79 0.96 1.06 0.89 n=10 =887.1 sd = 56.50 X n=10 = 0.68 sd = 0.255 X Incorrect to say that elephants show greater variation for birth- weights than mice because of higher standard deviation77Mekele University: Biostatistics
  • 78. Mekele University: Biostatistics 78 4.The Coefficient of Variation (C.V): • Is a measure use to compare the dispersion in two sets of data which is independent of the unit of the measurement . • where S: Sample standard deviation. : Sample mean. )100(. X S VC  X
  • 79. Coefficient of variance Coefficient of variance expresses standard deviation relative to its mean X s cv  Weights of newborn elephants (kg) 929 853 878 939 895 972 937 841 801 826 Weights of newborn mice (kg) 0.72 0.42 0.63 0.31 0.59 0.38 0.79 0.96 1.06 0.89 n=10, = 887.1 s = 56.50 cv = 0.0637 X n=10, = 0.68 s = 0.255 cv = 0.375 X Mice show greater birth- weight variation 79Mekele University: Biostatistics
  • 80. Mekele University: Biostatistics 80 Example: • Suppose two samples of human males yield the following data: Sampe1 Sample2 Age 25-year-olds 11year-olds Mean weight 145 pound 80 pound Standard deviation 10 pound 10 pound
  • 81. Mekele University: Biostatistics 81 • We wish to know which is more variable. Solution: • c.v (Sample1)= (10/145)*100= 6.9 • c.v (Sample2)= (10/80)*100= 12.5 • Then age of 11-years old(sample2) is more variation
  • 82. Mekele University: Biostatistics 82 When to use coefficient of variance • When comparison groups have very different means (CV is suitable as it expresses the standard deviation relative to its corresponding mean) • When different units of measurement are involved, e.g. group 1 unit is mm, and group 2 unit is gm (CV is suitable for comparison as it is unit free) • In such cases, sd should not be used for comparison
  • 84. • Key words: • Probability, objective Probability, subjective probability, equally likely Mutually exclusive, multiplicative rule , Conditional Probability, independent events Mekele University: Biostatistics 84
  • 85. Introduction • The concept of probability is frequently encountered in everyday communication. For example, a physician may say that a patient has a 50-50 chance of surviving a certain operation. Another physician may say that she is 95 percent certain that a patient has a particular disease. • Most people express probabilities in terms of percentages. • But, it is more convenient to express probabilities as fractions. Thus, we may measure the probability of the occurrence of some event by a number between 0 and 1. • The more likely the event, the closer the number is to one. An event that can't occur has a probability of zero, and an event that is certain to occur has a probability of one. Mekele University: Biostatistics 85
  • 86. Two views of Probability objective and subjective: • *** Objective Probability • ** Classical and Relative • Some definitions: 1.Equally likely outcomes: Are the outcomes that have the same chance of occurring. 2.Mutually exclusive: Two events are said to be mutually exclusive if they cannot occur simultaneously such that A B =Φ . Mekele University: Biostatistics 86 ļƒ‡
  • 87. • The universal Set (S): The set all possible outcomes. • The empty set Φ : Contain no elements. • The event ,E : is a set of outcomes in S which has a certain characteristic. • Classical Probability : If an event can occur in N mutually exclusive and equally likely ways, and if m of these possess a triat, E, the probability of the occurrence of event E is equal to m/ N . • For Example: in the rolling of the die , each of the six sides is equally likely to be observed . So, the probability that a 4 will be observed is equal to 1/6. Mekele University: Biostatistics 87
  • 88. • Relative Frequency Probability: • Def: If some posses is repeated a large number of times, n, and if some resulting event E occurs m times , the relative frequency of occurrence of E , m/n will be approximately equal to probability of E . P(E) = m/n . • *** Subjective Probability : • Probability measures the confidence that a particular individual has in the truth of a particular proposition. • For Example : the probability that a cure for cancer will be discovered within the next 10 years. Mekele University: Biostatistics 88
  • 89. Elementary Properties of Probability: • Given some process (or experiment ) with n mutually exclusive events E1, E2, E3,…………, En, then 1. P(Ei ) ≄ 0, i= 1,2,3,……n 2. P(E1 )+ P(E2) +……+P(En )=1 3. P(Ei +EJ )= P(Ei )+ P(EJ ), Ei ,EJ are mutually exclusive Mekele University: Biostatistics 89
  • 90. Rules of Probability 1-Addition Rule P(A U B)= P(A) + P(B) – P (A∩B ) 2- If A and B are mutually exclusive (disjoint) ,then P (A∩B ) = 0 Then , addition rule is P(A U B)= P(A) + P(B) . 3- Complementary Rule P(A' )= 1 – P(A) where, A' = complement event Mekele University: Biostatistics 90
  • 91. Example TotalLater >18 (L) Early = 18 (E) Family history of Mood Disorders 633528Negative(A) 573819Bipolar Disorder(B) 854441Unipolar (C) 1136053Unipolar and Bipolar(D) 318177141Total Mekele University: Biostatistics 91
  • 92. **Answer the following questions: Suppose we pick a person at random from this sample. 1-The probability that this person will be 18-years old or younger? 2-The probability that this person has family history of mood orders Unipolar(C)? 3-The probability that this person has no family history of mood orders Unipolar( )? 4-The probability that this person is 18-years old or younger or has no family history of mood orders Negative (A)? 5-The probability that this person is more than18-years old and has family history of mood orders Unipolar and Bipolar(D)? Mekele University: Biostatistics 92 C
  • 93. Solution: 1. P(E)=141/318 2. P(C)=41/318 3. P( )= 1-P(C)=1-41/318 4. P(E U A)=P(E)+P(A)-P(E n A) = (141/318) + (63/318) - 28/318 =141/318 5. P(L n D) = 60/318 C
  • 94. Conditional Probability: P(AB) is the probability of A assuming that B has happened. • P(AB)= , P(B)≠ 0 • P(BA)= , P(A)≠ 0 )( )( BP BAP  )( )( AP BAP  Mekele University: Biostatistics 94
  • 95. Example From previous example , answer • suppose we pick a person at random and find he is 18 years or younger (E),what is the probability that this person will be one who has no family history of mood disorders (A)? • Solution: • P(A/E)=28/141, P(E)=141/318, P(AnE)=(28/318) Mekele University: Biostatistics 95
  • 96. exercise • suppose we pick a person at random and find he has family history of mood (D) what is the probability that this person will be 18 years or younger (E)? Mekele University: Biostatistics 96
  • 97. Multiplicative Rule: • P(A∩B)= P(AB)P(B) • P(A∩B)= P(BA)P(A) Where, • P(A): marginal probability of A. • P(B): marginal probability of B. • P(BA):The conditional probability. Mekele University: Biostatistics 97
  • 98. Independent Events: • If A has no effect on B, we said that A,B are independent events. • Then, 1- P(A∩B)= P(B)P(A) 2- P(AB)=P(A) 3- P(BA)=P(B) Mekele University: Biostatistics 98
  • 99. Example • In a certain high school class consisting of 60 girls and 40 boys, it is observed that 24 girls and 16 boys wear eyeglasses . If a student is picked at random from this class ,the probability that the student wears eyeglasses , P(E), is 40/100 or 0.4 . • What is the probability that a student picked at random wears eyeglasses given that the student is a boy? • What is the probability of the joint occurrence of the events of wearing eye glasses and being a boy? Mekele University: Biostatistics 99
  • 100. Example • Suppose that of 1200 admission to a general hospital during a certain period of time,750 are private admissions. If we designate these as a set A, then compute P(A) , P( ).A Mekele University: Biostatistics 100
  • 101. The Random Variable (X): • When the values of a variable (height, weight, or age) can’t be predicted in advance, the variable is called a random variable. • An example is the adult height. • When a child is born, we can’t predict exactly his or her height at maturity. Mekele University: Biostatistics 101
  • 102. 4.2 Probability Distributions for Discrete Random Variables • Definition: • The probability distribution of a discrete random variable is a table, graph, formula, or other device used to specify all possible values of a discrete random variable along with their respective probabilities. Mekele University: Biostatistics 102
  • 103. The Cumulative Probability Distribution of X, F(x): • It shows the probability that the variable X is less than or equal to a certain value, P(X ļ‚£ x). Mekele University: Biostatistics 103
  • 104. Mekele University: Biostatistics 104 Example : F(x)= P(X≤ x) P(X=x)frequencyNumber of Programs 0.20880.2088621 0.36700.1582472 0.49830.1313393 0.62960.1313394 0.82490.1953585 0.94950.1246376 0.96300.013547 1.00000.0370118 1.0000297Total
  • 105. • Properties of probability distribution of discrete random variable. 1. 2. 3. P(a ļ‚£ X ļ‚£ b) = P(X ļ‚£ b) – P(X ļ‚£ a-1) 4. P(X < b) = P(X ļ‚£ b-1) Mekele University: Biostatistics 105 0 ( ) 1P X xļ‚£  ļ‚£ ( ) 1P X x ļ€½ļƒ„
  • 106. 4.3 The Binomial Distribution: • It is derived from a process known as a Bernoulli trial. • Bernoulli trial is : When a random process or experiment called a trial can result in only one of two mutually exclusive outcomes, such as dead or alive, sick or well, the trial is called a Bernoulli trial. Mekele University: Biostatistics 106
  • 107. The Bernoulli Process • A sequence of Bernoulli trials forms a Bernoulli process under the following conditions 1- Each trial results in one of two possible, mutually exclusive, outcomes. One of the possible outcomes is denoted (arbitrarily) as a success, and the other is denoted a failure. 2- The probability of a success, denoted by p, remains constant from trial to trial. The probability of a failure, 1-p, is denoted by q. 3- The trials are independent, that is the outcome of any particular trial is not affected by the outcome of any other trial Mekele University: Biostatistics 107
  • 108. • The probability distribution of the binomial random variable X, the number of successes in n independent trials is: • Where is the number of combinations of n distinct objects taken x of them at a time. * Note: 0! =1 Mekele University: Biostatistics 108 ( ) ( ) , 0,1,2,....,X n X n f x P X x p q x n x       ļ€½ļƒ§     n x        ! !( )! n n x n xx   ļ€½ļƒ§   ļ€­ļƒØ  ! ( 1)( 2)....(1)x x x x  
  • 109. Properties of the binomial distribution • 1. • 2. • 3.The parameters of the binomial distribution are n and p • 4. • 5. Mekele University: Biostatistics 109 ( ) 0f x  ( ) 1f x ļ€½ļƒ„ ( )E X np   2 var( ) (1 )X np p   
  • 110. Example • If we examine all birth records from the North Carolina State Center for Health statistics for year 2001, we find that 85.8 percent of the pregnancies had delivery in week 37 or later (full- term birth). If we randomly selected five birth records from this population what is the probability that exactly three of the records will be for full-term births? Mekele University: Biostatistics 110
  • 111. Example • Suppose it is known that in a certain population 10 percent of the population is color blind. If a random sample of 25 people is drawn from this population, find the probability that a) Five or fewer will be color blind. b) Six or more will be color blind c) Between six and nine inclusive will be color blind. d) Two, three, or four will be color blind. Mekele University: Biostatistics 111
  • 112. Properties of continuous probability Distributions: *continuous variable is one that can assume any value within a specified interval of values assumed by the variable. 1- Area under the curve = 1. 2- P(X = a) = 0, where a is a constant. 3- Area between two points a , b = P(a<x<b) . Mekele University: Biostatistics 112
  • 113. 4.6 The normal distribution: • It is one of the most important probability distributions in statistics. • The normal density is given by • , - āˆž < x < āˆž, - āˆž < µ < āˆž, σ > 0 • Ļ€, e : constants • µ: population mean. • σ : Population standard deviation. Mekele University: Biostatistics 113 2 2 2 )( 2 1 )(       x exf
  • 114. Characteristics of the normal distribution • The following are some important characteristics of the normal distribution: 1- It is symmetrical about its mean, µ. 2- The mean, the median, and the mode are all equal. 3- The total area under the curve above the x-axis is one. 4-The normal distribution is completely determined by the parameters µ and σ. Mekele University: Biostatistics 114
  • 115. 5- The normal distribution depends on the two parameters  and .  determines the location of the curve. But,  determines the scale of the curve, i.e. the degree of flatness or peaked ness of the curve. Mekele University: Biostatistics 115 1 2 3 1 < 2 < 3  1 2 3 1 < 2 < 3
  • 116. Note that : 1. P( µ- σ < x < µ+ σ) = 0.68 2. P( µ- 2σ< x < µ+ 2σ)= 0.95 3. P( µ-3σ < x < µ+ 3σ) = 0.997 Mekele University: Biostatistics 116
  • 117. The Standard normal distribution: • Is a special case of normal distribution with mean equal 0 and a standard deviation of 1. • The equation for the standard normal distribution is written as • , - āˆž < z < āˆž Mekele University: Biostatistics 117 2 2 2 1 )( z ezf   
  • 118. Characteristics of the standard normal distribution 1- It is symmetrical about 0. 2- The total area under the curve above the x- axis is one. 3- We can use table D to find the probabilities and areas. Mekele University: Biostatistics 118
  • 119. ā€œHow to use tables of Zā€ Note that The cumulative probabilities P(Z ļ‚£ z) are given in tables for -3.49 < z < 3.49. Thus, P (-3.49 < Z < 3.49)  1. For standard normal distribution, P (Z > 0) = P (Z < 0) = 0.5 Example 4.6.1: If Z is a standard normal distribution, then 1) P( Z < 2) = 0.9772 is the area to the left to 2 and it equals 0.9772. Mekele University: Biostatistics 119 2
  • 120. Example 4.6.2: P(-2.55 < Z < 2.55) is the area between -2.55 and 2.55, Then it equals P(-2.55 < Z < 2.55) =0.9946 – 0.0054 = 0.9892. Example 4.6.2: P(-2.74 < Z < 1.53) is the area between -2.74 and 1.53. P(-2.74 < Z < 1.53) =0.9370 – 0.0031 = 0.9339. Mekele University: Biostatistics 120 -2.74 1.53 -2.55 2.55 0
  • 122. Example : P(Z > 2.71) is the area to the right to 2.71. So, P(Z > 2.71) =1 – 0.9966 = 0.0034. Example : P(Z = 0.84) is the area at z = 2.71. So, P(Z = 0.84) =1 – 0.9966 = 0.0034 Mekele University: Biostatistics 122 0.84 2.71
  • 123. How to transform normal distribution (X) to standard normal distribution (Z)? • This is done by the following formula: • Example: • If X is normal with µ = 3, σ = 2. Find the value of standard normal Z, If X= 6? • Answer: Mekele University: Biostatistics 123    x z 5.1 2 36       x z
  • 124. Normal Distribution Applications The normal distribution can be used to model the distribution of many variables that are of interest. This allow us to answer probability questions about these random variables. Example 4.7.1: The ā€žUptime ā€Ÿis a custom-made light weight battery-operated activity monitor that records the amount of time an individual spend the upright position. In a study of children ages 8 to 15 years. The researchers found that the amount of time children spend in the upright position followed a normal distribution with Mean of 5.4 hours and standard deviation of 1.3.Find Mekele University: Biostatistics 124
  • 125. If a child selected at random ,then 1-The probability that the child spend less than 3 hours in the upright position 24-hour period P( X < 3) = P( < ) = P(Z < -1.85) = 0.0322 ------------------------------------------------------------------------- 2-The probability that the child spend more than 5 hours in the upright position 24-hour period P( X > 5) = P( > ) = P(Z > -0.31) = 1- P(Z < - 0.31) = 1- 0.3520= 0.648 ----------------------------------------------------------------------- 3-The probability that the child spend exactly 6.2 hours in the upright position 24-hour period P( X = 6.2) = 0  X 3.1 4.53 Mekele University: Biostatistics 125  X 3.1 4.55 
  • 126. 4-The probability that the child spend from 4.5 to 7.3 hours in the upright position 24-hour period P( 4.5 < X < 7.3) = P( < < ) = P( -0.69 < Z < 1.46 ) = P(Z<1.46) – P(Z< -0.69) = 0.9279 – 0.2451 = 0.6828  X 3.1 4.55.4  Mekele University: Biostatistics 126 3.1 4.53.7 
  • 128. • Key words: • Point estimate, interval estimate, estimator, Confident level ,α , Confident interval for mean μ, Confident interval for two means, Confident interval for population proportion P, Confident interval for two proportions Mekele University: Biostatistics 128
  • 129. • 6.1 Introduction: • Statistical inference is the procedure by which we reach to a conclusion about a population on the basis of the information contained in a sample drawn from that population. • Suppose that: • an administrator of a large hospital is interested in the mean age of patients admitted to his hospital during a given year. 1. It will be too expensive to go through the records of all patients admitted during that particular year. 2. He consequently elects to examine a sample of the records from which he can compute an estimate of the mean age of patients admitted to his that year. Mekele University: Biostatistics 129
  • 130. • To any parameter, we can compute two types of estimate: a point estimate and an interval estimate. • A point estimate is a single numerical value used to estimate the corresponding population parameter. • An interval estimate consists of two numerical values defining a range of values that, with a specified degree of confidence, we feel includes the parameter being estimated. • The Estimate and The Estimator: • The estimate is a single computed value, but the estimator is the rule that tell us how to compute this value, or estimate. • For example, • is an estimator of the population mean,. The single numerical value that results from evaluating this formula is called an estimate of the parameter . n x x i i  Mekele University: Biostatistics 130
  • 131. Confidence Interval for a Population Mean: (C.I) Suppose researchers wish to estimate the mean of some normally distributed population. • They draw a random sample of size n from the population and compute , which they use as a point estimate of . • Because random sampling involves chance, then canā€Ÿt be expected to be equal to . • The value of may be greater than or less than . • It would be much more meaningful to estimate  by an interval. x Mekele University: Biostatistics 131 x
  • 132. The 1- percent confidence interval (C.I.) for : • We want to find two values L and U between which  lies with high probability, i.e. P( L ≤  ≤ U ) = 1- Mekele University: Biostatistics 132
  • 133. For example: • When, •  = 0.01, then 1-  = •  = 0.05, then 1-  = •  = 0.05, then 1-  = Mekele University: Biostatistics 133
  • 136. We have the following cases a) When the population is normal 1) When the variance is known and the sample size is large or small, the C.I. has the form: P( - Z (1- /2) /ļƒ–n <  < + Z (1- /2) /ļƒ–n) = 1-  2) When variance is unknown, and the sample size is small, the C.I. has the form: P( - t (1- /2),n-1 s/ļƒ–n <  < + t (1- /2),n-1 s/ļƒ–n) = 1-  x x Mekele University: Biostatistics 136 xx
  • 137. b) When the population is not normal and n large (n>30) 1) When the variance is known the C.I. has the form: P( - Z (1- /2) /ļƒ–n <  < + Z (1- /2) /ļƒ–n) = 1-  2) When variance is unknown, the C.I. has the form: P( - Z (1- /2) s/ļƒ–n <  < + Z (1- /2) s/ļƒ–n) = 1-  x x Mekele University: Biostatistics 137 x x
  • 138. Example: • Suppose a researcher , interested in obtaining an estimate of the average level of some enzyme in a certain human population, takes a sample of 10 individuals, determines the level of the enzyme in each, and computes a sample mean of approximately Suppose further it is known that the variable of interest is approximately normally distributed with a variance of 45. We wish to estimate . (=0.05) 22x Mekele University: Biostatistics 138
  • 139. Solution: • 1- =0.95→ =0.05→ /2=0.025, • variance = σ2 = 45 → σ=ļƒ– 45,n=10 • 95%confidence interval for  is given by: P( - Z (1- /2) /ļƒ–n <  < + Z (1- /2) /ļƒ–n) = 1-  • Z (1- /2) = Z 0.975 = 1.96 (refer to table D) • Z 0.975(/ļƒ–n) =1.96 (ļƒ– 45 / ļƒ–10)=4.1578 22 ± 1.96 (ļƒ– 45 / ļƒ–10) → • (22-4.1578, 22+4.1578) → (17.84, 26.16) • Exercise example 6.2.2 page 169 22x x Mekele University: Biostatistics 139 x
  • 140. Example The activity values of a certain enzyme measured in normal gastric tissue of 35 patients with gastric carcinoma has a mean of 0.718 and a standard deviation of 0.511.We want to construct a 90 % confidence interval for the population mean. • Solution: • Note that the population is not normal, • n=35 (n>30) n is large and  is unknown ,s=0.511 • 1- =0.90→ =0.1 • → /2=0.05→ 1-/2=0.95, Mekele University: Biostatistics 140
  • 141. Then 90% confident interval for  is given by : P( - Z (1- /2) s/ļƒ–n <  < + Z (1- /2) s/ļƒ–n) = 1-  • Z (1- /2) = Z0.95 = 1.645 (refer to table D) • Z 0.95(s/ļƒ–n) =1.645 (0.511/ ļƒ–35)=0.1421 0.718 ± 1.645 (0.511) / ļƒ–35→ (0.718-0.1421, 0.718+0.1421) → (0.576,0.860). • Exercise example 6.2.3 page 164: xx Mekele University: Biostatistics 141
  • 142. Example6.3.1 Page 174: • Suppose a researcher , studied the effectiveness of early weight bearing and ankle therapies following acute repair of a ruptured Achilles tendon. One of the variables they measured following treatment the muscle strength. In 19 subjects, the mean of the strength was 250.8 with standard deviation of 130.9 we assume that the sample was taken from is approximately normally distributed population. Calculate 95% confident interval for the mean of the strength ? Mekele University: Biostatistics 142
  • 143. Solution: • 1- =0.95→ =0.05→ /2=0.025, • Standard deviation= S = 130.9 ,n=19 • 95%confidence interval for  is given by: P( - t (1- /2),n-1 s/ļƒ–n <  < + t (1- /2),n-1 s/ļƒ–n) = 1-  • t (1- /2),n-1 = t 0.975,18 = 2.1009 (refer to table E) • t 0.975,18(s/ļƒ–n) =2.1009 (130.9 / ļƒ–19)=63.1 • 250.8 ± 2.1009 (130.9 / ļƒ–19) → • (250.8- 63.1 , 22+63.1) → (187.7, 313.9) • Exercise 6.2.1 ,6.2.2 • 6.3.2 page 171 8.250x x Mekele University: Biostatistics 143 x
  • 144. 6.3 Confidence Interval for the difference between two Population Means: (C.I) If we draw two samples from two independent population and we want to get the confident interval for the difference between two population means , then we have the following cases : a) When the population is normal 1) When the variance is known and the sample sizes is large or small, the C.I. has the form: Mekele University: Biostatistics 144 2 2 2 1 2 1 2 1 2121 2 2 2 1 2 1 2 1 21 )()( nn Zxx nn Zxx      
  • 145. 2) When variances are unknown but equal, and the sample size is small, the C.I. has the form: 2 )1()1( 11 )( 11 )( 21 2 22 2 112 21 )2(, 2 1 2121 21 )2(, 2 1 21 2121      nn SnSn S where nn Stxx nn Stxx p p nn p nn   Mekele University: Biostatistics 145
  • 146. Example 6.4.1 P174: The researcher team interested in the difference between serum uric and acid level in a patient with and without Downā€Ÿs syndrome .In a large hospital for the treatment of the mentally retarded, a sample of 12 individual with Downā€Ÿs Syndrome yielded a mean of mg/100 ml. In a general hospital a sample of 15 normal individual of the same age and sex were found to have a mean value of If it is reasonable to assume that the two population of values are normally distributed with variances equal to 1 and 1.5,find the 95% C.I for μ1 - μ2 Solution: 1- =0.95→ =0.05→ /2=0.025 → Z (1- /2) = Z0.975 = 1.96 • 1.1 1.96(0.4282) = 1.1 0.84 = ( 0.26 , 1.94 ) 5.41 x 4.32 x Mekele University: Biostatistics 146 2 2 2 1 2 1 2 1 21 )( nn Zxx     15 5.1 12 1 96.1)4.35.4( 
  • 147. Example 6.4.1 P178: The purpose of the study was to determine the effectiveness of an integrated outpatient dual-diagnosis treatment program for mentally ill subject. The authors were addressing the problem of substance abuse issues among people with sever mental disorder. A retrospective chart review was carried out on 50 patient ,the recherchĆ© was interested in the number of inpatient treatment days for physics disorder during a year following the end of the program. Among 18 patient with schizophrenia, The mean number of treatment days was 4.7 with standard deviation of 9.3. For 10 subject with bipolar disorder, the mean number of treatment days was 8.8 with standard deviation of 11.5. We wish to construct 99% C.I for the difference between the means of the populations Represented by the two samples Mekele University: Biostatistics 147
  • 148. Solution : • 1-α =0.99 → α = 0.01 → α/2 =0.005 → 1- α/2 = 0.995 • n2 – 2 = 18 + 10 -2 = 26+n1 • t (1- /2),(n1+n2-2) = t0.995,26 = 2.7787, then 99% C.I for μ1 – μ2 • where • • then (4.7-8.8) 2.7787 √102.33 √(1/18)+(1/10) - 4.1 11.086 =( - 15.186 , 6.986) Exercises: 6.4.2 , 6.4.6, 6.4.7, 6.4.8 Page 180 Mekele University: Biostatistics 148 21 )2(, 2 1 21 11 )( 21 nn Stxx p nn    33.102 21018 )5.119()3.917( 2 )1()1( 22 21 2 22 2 112        xx nn SnSn Sp
  • 149. 6.5 Confidence Interval for a Population proportion (P): A sample is drawn from the population of interest ,then compute the sample proportion such as This sample proportion is used as the point estimator of the population proportion . A confident interval is obtained by the following formula Pˆ n a p  samplein theelementofno.Total isticcharachtarsomewithsamplein theelementofno. ˆ Mekele University: Biostatistics 149 n PP ZP )ˆ1(ˆ ˆ 2 1    
  • 150. Example 6.5.1 The Pew internet life project reported in 2003 that 18% of internet users have used the internet to search for information regarding experimental treatments or medicine . The sample consist of 1220 adult internet users, and information was collected from telephone interview. We wish to construct 98% C.I for the proportion of internet users who have search for information about experimental treatments or medicine Mekele University: Biostatistics 150
  • 151. Solution : 1-α =0.98 → α = 0.02 → α/2 =0.01 → 1- α/2 = 0.99 Z 1- α/2 = Z 0.99 =2.33 , n=1220, The 98% C. I is 0.18 0.0256 = ( 0.1544 , 0.2056 ) Exercises: 6.5.1 , 6.5.3 Page 187 18.0 100 18 ˆ p 1220 )18.01(18.0 33.218.0 )ˆ1(ˆ ˆ 2 1      n PP ZP  Mekele University: Biostatistics 151
  • 152. Confidence Interval for the difference between two Population proportions : Two samples is drawn from two independent population of interest ,then compute the sample proportion for each sample for the characteristic of interest. An unbiased point estimator for the difference between two population proportions A 100(1-α)% confident interval for P1 - P2 is given by 21 ˆˆ PP  Mekele University: Biostatistics 152 2 22 1 11 2 1 21 )ˆ1(ˆ)ˆ1(ˆ )ˆˆ( n PP n PP ZPP      
  • 153. Example Connor investigated gender differences in proactive and reactive aggression in a sample of 323 adults (68 female and 255 males ). In the sample ,31 of the female and 53 of the males were using internet in the internet cafĆ©. We wish to construct 99 % confident interval for the difference between the proportions of adults go to internet cafĆ© in the two sampled population . Mekele University: Biostatistics 153
  • 154. Solution : 1-α =0.99 → α = 0.01 → α/2 =0.005 → 1- α/2 = 0.995 Z 1- α/2 = Z 0.995 =2.58 , nF=68, nM=255, The 99% C. I is 0.2481 2.58(0.0655) = ( 0.07914 , 0.4171 ) 2078.0 255 53 ˆ,4559.0 68 31 ˆ  M M M F F F n a p n a p M MM F FF MF n PP n PP ZPP )ˆ1(ˆ)ˆ1(ˆ )ˆˆ( 2 1       Mekele University: Biostatistics 154 255 )2078.01(2078.0 68 )4559.01(4559.0 58.2)2078.04559.0(    
  • 156. Mekele University: Biostatistics 156 • Key words : • Null hypothesis H0, Alternative hypothesis HA , testing hypothesis , test statistic , P-value
  • 157. Mekele University: Biostatistics 157 Hypothesis Testing • One type of statistical inference, estimation, was discussed previously. • The other type ,hypothesis testing ,is discussed in this session.
  • 158. Mekele University: Biostatistics 158 Definition of a hypothesis • It is a statement about one or more populations . It is usually concerned with the parameters of the population. e.g. the hospital administrator may want to test the hypothesis that the average length of stay of patients admitted to the hospital is 5 days
  • 159. Mekele University: Biostatistics 159 Definition of Statistical hypotheses • They are hypotheses that are stated in such a way that they may be evaluated by appropriate statistical techniques. • There are two hypotheses involved in hypothesis testing • Null hypothesis H0: It is the hypothesis to be tested . • Alternative hypothesis HA : It is a statement of what we believe is true if our sample data cause us to reject the null hypothesis
  • 160. Mekele University: Biostatistics 160 Testing a hypothesis about the mean of a population: • We have the following steps: 1.Data: determine variable, sample size (n), sample mean( ) , population standard deviation or sample standard deviation (s) if is unknown 2. Assumptions : We have two cases: • Case1: Population is normally or approximately normally distributed with known or unknown variance (sample size n may be small or large), • Case 2: Population is not normal with known or unknown variance (n is large i.e. n≄30). x
  • 161. Mekele University: Biostatistics 161 • 3.Hypotheses: • we have three cases • Case I : H0: μ=μ0 HA: μ μ0 • e.g. we want to test that the population mean is different than 50 • Case II : H0: μ = μ0 HA: μ > μ0 • e.g. we want to test that the population mean is greater than 50 • Case III : H0: μ = μ0 HA: μ< μ0 • e.g. we want to test that the population mean is less than 50 
  • 162. Mekele University: Biostatistics 162 4.Test Statistic: • Case 1: population is normal or approximately normal σ2 is known σ2 is unknown ( n large or small) n large n small • Case2: If population is not normally distributed and n is large • i)If σ2 is known ii) If σ2 is unknown n X Z  o-  n s X Z o-   n s X T o-   n s X Z o-   n X Z  o- 
  • 163. Mekele University: Biostatistics 163 5.Decision Rule: i) If HA: μ μ0 • Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2 (when use Z - test) Or Reject H 0 if T >t1-α/2,n-1 or T< - t1-α/2,n-1 (when use T- test) • ii) If HA: μ> μ0 • Reject H0 if Z>Z1-α (when use Z - test) Or Reject H0 if T>t1-α,n-1 (when use T - test) 
  • 164. Mekele University: Biostatistics 164 • iii) If HA: μ< μ0 Reject H0 if Z< - Z1-α (when use Z - test) • Or Reject H0 if T<- t1-α,n-1 (when use T - test) Note: Z1-α/2 , Z1-α , Zα are tabulated values obtained from table t1-α/2 , t1-α , tα are tabulated values obtained from table E with (n-1) degree of freedom (df)
  • 165. Mekele University: Biostatistics 165 • 6.Decision : • If we reject H0, we can conclude that HA is true. • If ,however ,we do not reject H0, we may conclude that H0 is true.
  • 166. Mekele University: Biostatistics 166 An Alternative Decision Rule using the p - value Definition • The p-value is defined as the smallest value of α for which the null hypothesis can be rejected. • If the p-value is less than or equal to α ,we reject the null hypothesis (p ≤ α) • If the p-value is greater than α ,we do not reject the null hypothesis (p > α)
  • 167. Mekele University: Biostatistics 167 Example • Researchers are interested in the mean age of a certain population. • A random sample of 10 individuals drawn from the population of interest has a mean of 27. • Assuming that the population is approximately normally distributed with variance 20,can we conclude that the mean is different from 30 years ? (α=0.05) . • If the p - value is 0.0340 how can we use it in making a decision?
  • 168. Mekele University: Biostatistics 168 Solution 1-Data: variable is age, n=10, =27 ,σ2=20,α=0.05 2-Assumptions: the population is approximately normally distributed with variance 20 3-Hypotheses: • H0 : μ=30 • HA: μ 30 x 
  • 169. Mekele University: Biostatistics 169 4-Test Statistic: Z = -2.12 5.Decision Rule • The alternative hypothesis is • HA: μ > 30 • Hence we reject H0 if Z >Z1-0.025/2= Z0.975 • or Z< - Z1-0.025/2= - Z0.975 • Z0.975=1.96(from table )
  • 170. Mekele University: Biostatistics 170 • 6.Decision: • We reject H0 ,since -2.12 is in the rejection region . • We can conclude that μ is not equal to 30 • Using the p value ,we note that p-value =0.0340< 0.05,therefore we reject H0
  • 171. Mekele University: Biostatistics 171 Example • Referring to previous example.Suppose that the researchers have asked: Can we conclude that μ<30. 1.Data.see previous example 2. Assumptions .see previous example 3.Hypotheses: • H0 μ =30 • HA: μ < 30
  • 172. Mekele University: Biostatistics 172 4.Test Statistic : • = = -2.12 5. Decision Rule: Reject H0 if Z< Z α, where • Z α= -1.645. (from table) 6. Decision: Reject H0 ,thus we can conclude that the population mean is smaller than 30. n X Z  o-  10 20 3027 
  • 173. Mekele University: Biostatistics 173 Example • Among 157 African-American men ,the mean systolic blood pressure was 146 mm Hg with a standard deviation of 27. We wish to know if on the basis of these data, we may conclude that the mean systolic blood pressure for a population of African-American is greater than 140. Use α=0.01.
  • 174. Mekele University: Biostatistics 174 Solution 1. Data: Variable is systolic blood pressure, n=157 , x=146, s=27, α=0.01. 2. Assumption: population is not normal, σ2 is unknown 3. Hypotheses: H0 :μ=140 HA: μ>140 4.Test Statistic: = = = 2.78 n s X Z o-   157 27 140146  1548.2 6
  • 175. Mekele University: Biostatistics 175 5. Desicion Rule: we reject H0 if Z>Z1-α = Z0.99= 2.33 (from table D) 6. Desicion: We reject H0. Hence we may conclude that the mean systolic blood pressure for a population of African- American is greater than 140.
  • 176. Mekele University: Biostatistics 176 Hypothesis Testing :The Difference between two population mean : • We have the following steps: 1.Data: determine variable, sample size (n), sample means, population standard deviation or samples standard deviation (s) if is unknown for two population. 2. Assumptions : We have two cases: • Case1: Population is normally or approximately normally distributed with known or unknown variance (sample size n may be small or large), • Case 2: Population is not normal with known variances (n is large i.e. n≄30).
  • 177. Mekele University: Biostatistics 177 • 3.Hypotheses: • we have three cases • Case I : H0: μ 1 = μ2 → μ 1 - μ2 = 0 • HA: μ 1 ≠ μ 2 → μ 1 - μ 2 ≠ 0 • e.g. we want to test that the mean for first population is different from second population mean. • Case II : H0: μ 1 = μ2 → μ 1 - μ2 = 0 HA: μ 1 > μ 2 →μ 1 - μ 2 > 0 • e.g. we want to test that the mean for first population is greater than second population mean. • Case III : H0: μ 1 = μ2 → μ 1 - μ2 = 0 HA: μ 1 < μ 2 → μ 1 - μ 2 < 0 • e.g. we want to test that the mean for first population is greater than second population mean.
  • 178. Mekele University: Biostatistics 178 4.Test Statistic: • Case 1: Two population is normal or approximately normal σ2 is known σ2 is unknown if ( n1 ,n2 large or small) ( n1 ,n2 small) population population Variances Variances equal not equal where 2 2 2 1 2 1 2121 )(-)X-X( nn Z      21 2121 11 )(-)X-X( nn S T p     2 2 2 1 2 1 2121 )(-)X-X( n S n S T     2 )1(n)1(n 21 2 22 2 112    nn SS Sp
  • 179. Mekele University: Biostatistics 179 • Case2: If population is not normally distributed • and n1, n2 is large(n1 ≄ 0 ,n2≄ 0) • and population variances is known, 2 2 2 1 2 1 2121 )(-)X-X( nn Z     
  • 180. Mekele University: Biostatistics 180 5.Decision Rule: i) If HA: μ 1 ≠ μ 2 → μ 1 - μ 2 ≠ 0 • Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2 (when use Z - test) Or Reject H 0 if T >t1-α/2 ,(n1+n2 -2) or T< - t1-α/2,,(n1+n2 -2) (when use T- test) • __________________________ • ii) HA: μ 1 > μ 2 → μ 1 - μ 2 > 0 • Reject H0 if Z>Z1-α (when use Z - test) Or Reject H0 if T>t1-α,(n1+n2 -2) (when use T - test)
  • 181. Mekele University: Biostatistics 181 • iii) If HA: μ 1 < μ 2 → μ 1 - μ 2 < 0 Reject H0 if Z< - Z1-α (when use Z - test) • Or Reject H0 if T<- t1-α, ,(n1+n2 -2) (when use T - test) Note: Z1-α/2 , Z1-α , Zα are tabulated values obtained from table D t1-α/2 , t1-α , tα are tabulated values obtained from table E with (n1+n2 -2) degree of freedom (df) 6. Conclusion: reject or fail to reject H0
  • 182. Mekele University: Biostatistics 182 Example • Researchers wish to know if the data have collected provide sufficient evidence to indicate a difference in mean serum uric acid levels between normal individuals and individual with Downā€Ÿs syndrome. The data consist of serum uric reading on 12 individuals with Downā€Ÿs syndrome from normal distribution with variance 1 and 15 normal individuals from normal distribution with variance 1.5 . The mean are and α=0.05. Solution: 1. Data: Variable is serum uric acid levels, n1=12 , n2=15, σ2 1=1, σ2 2=1.5 ,α=0.05. 100/5.41 mgX  100/4.32 mgX 
  • 183. Mekele University: Biostatistics 183 2. Assumption: Two population are normal, σ2 1 , σ2 2 are known 3. Hypotheses: H0: μ 1 = μ2 → μ 1 - μ2 = 0 • HA: μ 1 ≠ μ 2 → μ 1 - μ 2 ≠ 0 4.Test Statistic: • = = 2.57 5. Desicion Rule: Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2 Z1-α/2= Z1-0.05/2= Z0.975=1.96 (from table D) 6-Conclusion: Reject H0 since 2.57 > 1.96 Or if p-value =0.102→ reject H0 if p < α → then reject H0 2 2 2 1 2 1 2121 )(-)X-X( nn Z      15 5.1 12 1 )0(-3.4)-(4.5  
  • 184. Mekele University: Biostatistics 184 Example The purpose of a study by Tam, was to investigate wheelchair Maneuvering in individuals with over-level spinal cord injury (SCI) And healthy control (C). Subjects used a modified a wheelchair to incorporate a rigid seat surface to facilitate the specified experimental measurements. The data for measurements of the left ischial tuerosity for SCI and control C are shown below 16915011488117122131124115131C 14313011912113016318013015060SCI
  • 185. Mekele University: Biostatistics 185 We wish to know if we can conclude, on the basis of the above data that the mean of left ischial tuberosity for control C lower than mean of left ischial tuerosity for SCI, Assume normal populations equal variances. α=0.05, p-value = -1.33
  • 186. Mekele University: Biostatistics 186 Solution: 1. Data:, nC=10 , nSCI=10, SC=21.8, SSCI=133.1 ,α=0.05. • , (calculated from data) 2.Assumption: Two population are normal, σ2 1 , σ2 2 are unknown but equal 3. Hypotheses: H0: μ C = μ SCI → μ C - μ SCI = 0 HA: μ C < μ SCI → μ C - μ SCI < 0 4.Test Statistic: • Where, 1.126CX 1.133SCIX 569.0 10 1 10 1 04.756 0)1.1331.126( 11 )(-)X-X( 21 2121        nn S T p  04.756 21010 )3.32(9)8.21(9 2 )1(n)1(n 22 21 2 22 2 112        nn SS Sp
  • 187. Mekele University: Biostatistics 187 5. Decision Rule: Reject H 0 if T< - T1-α,(n1+n2 -2) T1-α,(n1+n2 -2) = T0.95,18 = 1.7341 (from table E) 6-Conclusion: Fail to reject H0 since -0.569 < - 1.7341 Or Fail to reject H0 since p = -1.33 > α =0.05
  • 188. Mekele University: Biostatistics 188 Example Dernellis and Panaretou examined subjects with hypertension and healthy control subjects .One of the variables of interest was the aortic stiffness index. Measures of this variable were calculated From the aortic diameter evaluated by M-mode and blood pressure measured by a sphygmomanometer. Physics wish to reduce aortic stiffness. In the 15 patients with hypertension (Group 1),the mean aortic stiffness index was 19.16 with a standard deviation of 5.29. In the30 control subjects (Group 2),the mean aortic stiffness index was 9.53 with a standard deviation of 2.69. We wish to determine if the two populations represented by these samples differ with respect to mean stiffness index .we wish to know if we can conclude that in general a person with thrombosis have on the average higher IgG levels than persons without thrombosis at α=0.01, p-value = 0.0559
  • 189. Mekele University: Biostatistics 189 Solution: 1. Data:, n1=53 , n2=54, S1= 44.89, S2= 34.85 α=0.01. 2.Assumption: Two population are not normal, σ2 1 , σ2 2 are unknown and sample size large 3. Hypotheses: H0: μ 1 = μ 2 → μ 1 - μ 2 = 0 HA: μ 1 > μ 2 → μ 1 - μ 2 > 0 4.Test Statistic: • standard deviationSample Size Mean LgG levelGroup 44.895359.01Thrombosis 34.855446.61No Thrombosis 59.1 54 85.34 53 89.44 0)61.4601.59()(-)X-X( 22 2 2 2 1 2 1 2121        n S n S Z 
  • 190. Mekele University: Biostatistics 190 5. Decision Rule: Reject H 0 if Z > Z1-α Z1-α = Z0.99 = 2.33 (from table D) 6-Conclusion: Fail to reject H0 since 1.59 > 2.33 Or Fail to reject H0 since p = 0.0559 > α =0.01
  • 191. Mekele University: Biostatistics 191 Hypothesis Testing A single population proportion: • Testing hypothesis about population proportion (P) is carried out in much the same way as for mean when condition is necessary for using normal curve are met • We have the following steps: 1.Data: sample size (n), sample proportion( ) , P0 2. Assumptions :normal distribution , pˆ n a p  samplein theelementofno.Total isticcharachtarsomewithsamplein theelementofno. ˆ
  • 192. Mekele University: Biostatistics 192 • 3.Hypotheses: • we have three cases • Case I : H0: P = P0 HA: P ≠ P0 • Case II : H0: P = P0 HA: P > P0 • Case III : H0: P = P0 HA: P < P0 4.Test Statistic: Where H0 is true ,is distributed approximately as the standard normal n qp pp Z 00 0 ˆ  
  • 193. Mekele University: Biostatistics 193 5.Decision Rule: i) If HA: P ≠ P0 • Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2 • _______________________ • ii) If HA: P> P0 • Reject H0 if Z>Z1-α • _____________________________ • iii) If HA: P< P0 Reject H0 if Z< - Z1-α Note: Z1-α/2 , Z1-α , Zα are tabulated values obtained from table D 6. Conclusion: reject or fail to reject H0
  • 194. Mekele University: Biostatistics 194 2. Assumptions : is approximately normally distributed 3.Hypotheses: • we have three cases • H0: P = 0.063 HA: P > 0.063 • 4.Test Statistic : 5.Decision Rule: Reject H0 if Z>Z1-α Where Z1-α = Z1-0.05 =Z0.95= 1.645 21.1 301 )0.937(063.0 063.008.0ˆ 00 0      n qp pp Z pˆ
  • 195. Mekele University: Biostatistics 195 6. Conclusion: Fail to reject H0 Since Z =1.21 > Z1-α=1.645 Or , If P-value = 0.1131, fail to reject H0 → P > α
  • 196. Mekele University: Biostatistics 196 Example Wagen collected data on a sample of 301 Hispanic women Living in Texas .One variable of interest was the percentage of subjects with impaired fasting glucose (IFG). In the study,24 women were classified in the (IFG) stage .The article cites population estimates for (IFG) among Hispanic women in Texas as 6.3 percent .Is there sufficient evidence to indicate that the population Hispanic women in Texas has a prevalence of IFG higher than 6.3 percent ,let α=0.05 Solution: 1.Data: n = 301, p0 = 6.3/100=0.063 ,a=24, q0 =1- p0 = 1- 0.063 =0.937, α=0.05 08.0 301 24 ˆ  n a p
  • 197. Mekele University: Biostatistics 197 Hypothesis Testing :The Difference between two population proportion: • Testing hypothesis about two population proportion (P1,, P2 ) is carried out in much the same way as for difference between two means when condition is necessary for using normal curve are met • We have the following steps: 1.Data: sample size (n1 n2), sample proportions( ), Characteristic in two samples (x1 , x2), 2- Assumption : Two populations are independent . 21 ˆ,ˆ PP 21 21 nn xx p   
  • 198. Mekele University: Biostatistics 198 • 3.Hypotheses: • we have three cases • Case I : H0: P1 = P2 → P1 - P2 = 0 HA: P1 ≠ P2 → P1 - P2 ≠ 0 • Case II : H0: P1 = P2 → P1 - P2 = 0 HA: P1 > P2 → P1 - P2 > 0 • Case III : H0: P1 = P2 → P1 - P2 = 0 HA: P1 < P2 → P1 - P2 < 0 4.Test Statistic: Where H0 is true ,is distributed approximately as the standard normal 21 2121 )1()1( )()ˆˆ( n pp n pp pppp Z     
  • 199. Mekele University: Biostatistics 199 5.Decision Rule: i) If HA: P1 ≠ P2 • Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2 • _______________________ • ii) If HA: P1 > P2 • Reject H0 if Z >Z1-α • _____________________________ • iii) If HA: P1 < P2 • Reject H0 if Z< - Z1-α Note: Z1-α/2 , Z1-α , Zα are tabulated values obtained from table D 6. Conclusion: reject or fail to reject H0
  • 200. Mekele University: Biostatistics 200 Example Noonan is a genetic condition that can affect the heart growth, blood clotting and mental and physical development. Noonan examined the stature of men and women with Noonan. The study contained 29 Male and 44 female adults. One of the cut-off values used to assess stature was the third percentile of adult height .Eleven of the males fell below the third percentile of adult male height ,while 24 of the female fell below the third percentile of female adult height .Does this study provide sufficient evidence for us to conclude that among subjects with Noonan ,females are more likely than males to fall below the respective of adult height? Let α=0.05 Solution: 1.Data: n M = 29, n F = 44 , x M= 11 , x F= 24, α=0.05 479.0 4429 2411        FM FM nn xx p 545.0 44 24 ˆ,379.0 29 11 ˆ  F F F M m M n x p n x p
  • 201. Mekele University: Biostatistics 201 2- Assumption : Two populations are independent . 3.Hypotheses: • Case II : H0: PF = PM → PF - PM = 0 HA: PF > PM → PF - PM > 0 • 4.Test Statistic: 5.Decision Rule: Reject H0 if Z >Z1-α , Where Z1-α = Z1-0.05 =Z0.95= 1.645 6. Conclusion: Fail to reject H0 Since Z =1.39 > Z1-α=1.645 Or , If P-value = 0.0823 → fail to reject H0 → P > α 39.1 29 )521.0)(479.0( 44 )521.0)(479.0( 0)379.0545.0( )1()1( )()ˆˆ( 21 2121          n pp n pp pppp Z
  • 202. An Introduction to the Chi-Square Distribution
  • 203. Mekele University: Biostatistics 203 TESTS OF INDEPENDENCE • To test whether two criteria of classification are independent . For example socioeconomic status and area of residence of people in a city are independent. • We divide our sample according to status, low, medium and high incomes etc. and the same samples is categorized according to urban, rural or suburban and slums etc. • Put the first criterion in columns equal in number to classification of 1st criteria ( Socioeconomic status) and the 2nd in rows, where the no. of rows equal to the no. of categories of 2nd criteria (areas of cities).
  • 204. Mekele University: Biostatistics 204 The Contingency Table • Table Two-Way Classification of sample First Criterion of Classification → Second Criterion ↓ 1 2 3 ….. c Total 1 2 3 . . r N11 N21 N31 . . Nr1 N12 N22 N32 . . Nr2 N13 N 23 N33 . . Nr3 …… …… …... …… N1c N2c N3c . . N rc N1. N2. N3. . . Nr. Total N.1 N.2 N.3 …… N.c N
  • 205. Mekele University: Biostatistics 205 Observed versus Expected Frequencies • Oi j : The frequencies in ith row and jth column given in any contingency table are called observed frequencies that result form the cross classification according to the two classifications. • ei j :Expected frequencies on the assumption of independence of two criterion are calculated by multiplying the marginal totals of any cell and then dividing by total frequency • Formula: N NN e ji ij )(( ļ‚·ļ‚· 
  • 206. Mekele University: Biostatistics 206 Chi-square Test • After the calculations of expected frequency, Prepare a table for expected frequencies and use Chi-square Where summation is for all values of r xc = k cells. • D.F.: the degrees of freedom for using the table are (r-1)(c-1) for α level of significance • Note that the test is always one-sided.     k i e eo i ii 1 2 ] )( [ 2 
  • 207. Mekele University: Biostatistics 207 Example The researcher are interested to determine that preconception use of folic acid and race are independent. The data is: Observed Frequencies Table Expected frequencies Table Use of Folic Acid total Yes No White Black Other 260 15 7 299 41 14 559 56 21 Total 282 354 636 Yes no Total White Black Other s (282)(559)/636 =247.86 (282)(56)/636 =24.83 (282)((21) =9.31 (354)(559)/63 6 =311.14 (354)(559) = 31.17 21x354/636 =11.69 559 56 21
  • 208. Mekele University: Biostatistics 208 Calculations and Testing 091.969.11/..... 14.311/86.247/ )69.1114( )14.311299()86.247260( 2 222     • Data: See the given table • Assumption: Simple random sample • Hypothesis: H0: race and use of folic acid are independent HA: the two variables are not independent. Let α = 0.05 • The test statistic is Chi Square given earlier • Distribution when H0 is true chi-square is valid with (r-1)(c-1) = (3- 1)(2-1)= 2 d.f. • Decision Rule: Reject H0 if value of is greater than = 5.991 • Calculations:  2  2 )1)(1(,  cr
  • 209. Mekele University: Biostatistics 209 Conclusion • Statistical decision. We reject H0 since 9.08960> 5.991 • Conclusion: we conclude that H0 is false, and that there is a relationship between race and preconception use of folic acid. • P value. Since 7.378< 9.08960< 9.210, 0.01<p <0.025 • We also reject the hypothesis at 0.025 level of significance but do not reject it at 0.01 level.