0% found this document useful (0 votes)
30 views

Biostatics For Nurses

Biostatistics is the application of statistics to biological topics. Descriptive statistics summarize sample data, while inferential statistics allow conclusions about populations from samples. The document defines variables, sampling, and statistical terms. It also describes common methods for representing statistical data graphically, including dot plots, bar graphs, line graphs, and histograms.

Uploaded by

Ammar Bhatti
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Biostatics For Nurses

Biostatistics is the application of statistics to biological topics. Descriptive statistics summarize sample data, while inferential statistics allow conclusions about populations from samples. The document defines variables, sampling, and statistical terms. It also describes common methods for representing statistical data graphically, including dot plots, bar graphs, line graphs, and histograms.

Uploaded by

Ammar Bhatti
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

1

Biostatistics for Nursing Students

University of Basra

Professor Dr. Mahfoudh F. Kinany


2

Chapter one

Definition of Biostatistics

What is Statistics?

Types of Variables

Types of Statistics

Descriptive statistics

Inferential statistics
3

Statistical Terms

Sampling

Methods or Types of Sampling

What is Biostatistics?
Biostatistics is using of statistical data for the study of biological
changes in humans, such as functions and diseases, reproduction, growth
and death.
Biostatistics is the application of statistics to a variety of topics in
biology.

What is Statistics?
Statistics is the study of how to collect, organize, analyze, and
information Interpret from the data of the digital.
Statistics – a descriptive measure of calculated from sample data to
serve as an estimate of an unknown population parameter.
4

Data analysis is the process of systematically examining data with the


purpose of spotlighting useful information. Data analysis is the
foundation of scientific research.

Data are pieces of information about individuals organized


into variables.
 By an individual, we mean a particular person or object.
 By a variable, we mean a particular characteristic of the individual.
A dataset is a set of data identified with a particular experiment,
scenario, or circumstance.

Types of Variables
A variable is a property that can take on many values.
A variable is an interested criterion to be measured or observed on each individual
such as age, length, gender, blood groups…
They are called variables because their values change from one respondent to
another.
5

Figure 1, 1 type of variables

There are two kinds of variables: Quantitative Variables, and


Qualitative/Categorical Variables:
Below we define these two main types of variables and provide further sub-
classifications for each type.
1- Quantitative variables take numerical values, and represent some kind
of measurement.
Quantitative variables are variables which have numerical value such as No.
of Hours, height, weight, age or time to recovery (days).

There are two further kinds of quantitative variables

a) Discrete, when the variable takes on a countable number of values.


Discrete quantitative variables - are mostly integers or numbers used for
counting e.g. number of children, number of students, number of family
…etc
X= 0. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, …..

b) Continuous, when the variable can take on any value in some range of
values. Continuous quantitative variables - are obtained through measuring
process e.g. the height of a person, exam results … etc
6

Take temperature for example. Temperature can take on an infinite number of


values, such as 80 degrees, or 80.01 degrees, or 80.0050592359 degrees. In the
previous example we were limited to a finite number of values.

2-Qualitative / categorical variables are variables which have no numerical value


such as gender, blood group, marital status, ethnicity etc.

There are 2 types of qualitative variables:

a) Nominal - refers to the name of a category, cannot be measured or counted


and cannot be arranged in sequence. Usually it is represented by a code
number e.g Gender: "1" = Male, "2" = Female.

Examples of Nominal Variables


 Gender (Male, Female, Transgender).
 Eye color (Blue, Green, Brown, Hazel).
 Type of house (Bungalow, Duplex, Ranch).
 Type of pet (Dog, Cat, Rodent, Fish, Bird).

b) Ordinal - are non numerical but their categorical values can be arranged
according to some ordered value e.g. ranking of satisfaction level or perception
level, Likert scale of degree of agreement etc.
For example, first, second, third, or fourth

Education level (Elementary education, Secondary, Institute, college)

Example: Medical Records


Let’s revisit the dataset showing medical records for a sample of patients

variables
7

Gander Age Weight Height Smoking race


(M / F) (year) (kg) (cm) Yes =1
No = 2
Patient_1 F 36 81 168 1 white
Patient_2 M 32 76 174 0 black
Patient_3 M 46 92 186 1 Asian
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
In our example of medical records, there are several variables of each type:

 Age, Weight, and Height are quantitative variables.


 Race, Gender, and Smoking are categorical variables.

Comments:
 Notice that the values of the categorical variable Smoking have been coded as
the numbers 0 or 1.
It is quite common to code the values of a categorical variable as numbers, but you
should remember that these are just codes.

Types of Statistics (Descriptive & Inferential)


Statistics is one of the most important parts of research today considering how it
organizes data into measurable forms. However, some students get confused
between descriptive and inferential statistics, making it hard for them to select the
best option to use in their research.
If you look closely, the difference between descriptive and inferential statistics is
already pretty obvious in their given names. “Descriptive” describes data, while
“inferential” infers or allows the researcher to arrive at a conclusion based on the
collected information.
8

Figure 1, 2 types of statistics

Descriptive statistics

1.Descriptive statistics merely “describes” research and does not allow for
conclusions or predictions.

2.Descriptive statistics usually operates within a specific area that contains the entire
target population.

Summary

Descriptive – used to organize and describe a sample.


Descriptive statistics – any of numerous calculations which attempt to provide a
concise summary of the information content of data (for example, measures of
central tendency, measures of dispersion, etc.).

Inferential statistics

1.Inferential statistics makes it possible for the researcher to arrive at a conclusion


and predict changes that may occur regarding the area of concern.
9

2.Inferential statistics usually takes a sample of a population, especially if the


population is too large to conduct research on.
Summary
Inferential – used to extrapolate from a sample to a larger population
Inferential statistics involves methods of using information from a sample to
draw conclusions about the population.

Statistical Terms
Population:- the entire group of individuals we want information about, it can be
huge like "all women" or small like "all statistic students at third year".

Sample:- a part of the population that we actually examine in order to gather


information.

Sample Design:- the method used to choose the sample from the population; poor
sample designs can lead to misleading conclusions.

Parameters:- a value that describes a population.

A measurement is a number or attribute computed for each member of a


population or of a sample. The measurements of sample elements are collectively
called the sample data.
Raw data, also known as primary data, is data (e.g., numbers, figures, etc.)
collected from a source.

Sampling is the process of choosing a representative sample from a target


population and collecting data from that sample in order to understand

something about the population as a whole.

Methods or Types of Sampling


1- Probability sampling. With probability sampling, every element of the
population has a known probability of being included in the sample.
10

2- Non-probability sampling. With non-probability sampling, we cannot


specify the probability that each element will be included in the sample.
11

Chapter two

Statistical data representation

Dot Plots

Bar Graph

Line Graph

Histogram and Frequency Polygon


12

Statistical data representation


Dot Plots
The dot plot is one of the simplest ways of graphical representation of the
statistical data. As the name itself suggests, a dot plot uses the dots. It is a graphic
display which usually compares frequency within different categories.

The dot plot is composed of dots that are to be plotted on a graph paper.

A dot plot may look like:

Example 1: Draw a dot plot for the following data.

Favorite Colors Red Blue Green Yellow Orange Indigo Violet


Number of Students 9 7 5 3 2 1 3

Solution:
The line graph for the following data is given below:
13

Bar Graph
A bar graph is a very frequently used graph in statistics as well as in media. A bar
graph is a type of graph which contains rectangles or rectangular bars. The lengths
of these bars should be proportional to the numerical values represented by them.
In bar graph, the bars may be plotted either horizontally or vertically. But a vertical
bar graph (also known as column bar graph) is used more than a horizontal one.

A vertical bar graph is shown below:

Number of students went to different states for study:

The rectangular bars are separated by some distance in order to distinguish them
from one another. The bar graph shows comparison among the given categories.

Mostly, horizontal axis of the graph represents specific categories and vertical axis
shows the discrete numerical values.
14

Example 2: Plot a bar graph from the data given below.

Students A B C D
Marks 8 14 9 5

Solution:
The following bar graph is obtained:

Line Graph
A line graph is a kind of graph which represents data in a way that a series of
points are to be connected by segments of straight lines. In a line graph, the data
points are plotted on a graph and they are joined together with straight line.

A sample line graph is illustrated in the following diagram:


15

The line graphs are used in the science, statistics and media. Line graphs are very
easy to create. These are quite popular in comparison with other graphs since
they visualize characteristics revealing data trends very clearly. A line graph gives
a clear visual comparison between two variables which are represented on X-axis
and Y-axis.

Example 3: Draw a histogram from the given data.

Test Score 24-30 30-36 36-42 42-48 48-54 54-60


Frequency 5 6 8 5 10 4

Solution:
We drew the following histogram:

Histogram and Frequency Polygon


The histograms and frequency polygons are very common graphs in statistics. A
histogram is defined as a graphical representation of the mutually exclusive
events. A histogram is quite similar to the bar graph. Both are made up of
rectangular bars. The difference is that there is no gap between any two bars in
the histogram. The histogram is used to represent the continuous data.
16

A histogram may look like the following graph:

The frequency polygon is a type of graphical representation which gives us better understanding of the shape
of given distribution. Frequency polygons serve almost the similar purpose as histograms do.

et us have a look at a sample of frequency polygon:


17

Chapter Three

Measures of central tendency

The mean

The mode

The median

The geometric mean

The weighted mean


18

The harmonic mean

Measures of central tendency

1- The mean ̅ , often called the average, of a numerical set of data, is


simply the sum of the data values divided by the number of values.
This is also referred to as the arithmetic mean. The mean is the
balance point of a distribution.

A- Ungrouped data

sum of the values


Mean ( ̅ )=
the number of values

Mean =

∑x = the sum of x , N = number of values


19

Examle1:- find out the mean (average) for the following ungrouped data

24, 25, 31, 50, 53, 66, 78

Solution: - ∑x = 24 + 25 + 33 + 50 + 53 + 66 + 78 = 329

N=7

Mean = =

Example 2:- find the mean for the set of data. ( 2,4,7,10,13,16,18)

Solution:-

Mean = = 2+4+7+10+13+16+18 / 7

̅= = 10

B- grouped data
We use a modified formula when finding the mean of grouped data:

Mean ( ̅ ) =

Where f = the frequency, xi is the class mark, and ∑f = the sum of


frequency

Example 1:-find the arithmetic mean for the following data

class frequency
60 - 62 5
63 - 65 18
20

66 - 68 42
69 - 71 27
72 - 74 8

Solution:
classes f class mark (xi) fxi
60 - 62 5 (60+62)/2=61 305
63 - 65 18 (63+65)/2=64 1152
66 - 68 42 (66+68)/2=67 2814
69 - 71 27 (69+71)/2=70 1890
72 - 74 8 (72+74)/2=73 584
∑f =100 ∑fxi =6745

Mean = , mean = = 67.45

2- The mode of a set of data values is the value that appears most

often.
21

A – Ungrouped data
Example 1: find the mode from the following data
5, 9, 7, 4, 6, 8, 2, 4, 1, 3, 5, 1, 4, 6, 9, 8, 7, 5, 2, 4, 1
Solution :
Order: 1, 1, 1, 2, 2, 3, 4, 4, 4, 4, 5, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9

Mode = 4
Note:
-There is no mode when all the scores are different (or there is the same
number of many scores)
- Sometimes there is more than one mode.

B – Grouped data

The mode for grouped data :- you can calculate the mode for a grouped
frequency table using the following formula:

Where:
Modal class meets the more frequency

 L: is the lower class boundary of the modal class.


 fm: is the frequency the modal class.
 f1 is the frequency of the class before to the modal class.
22

 f2 is the frequency of the class after the modal class.


 h: is the length of modal class.

Ex:1: Calculate the mode from the following data .

classes Frequency
5-25 12
25-45 8
45-65 14
65-85 20
85-105 6

Solution:

Modal class meets the more frequency


The more frequency =20 then, Modal class = (65- 85)

L= 65, h= 20 , fm = 20, f1= 14, f2= 6



Mode = 65 + * 20
– –

Mode = 65 + * 20
23

Mode = 65 + * 20

Mode = 65+ 0.3 * 20

Mode = 65 + 6

Mode = 71

3-Median: the middle score (of an ordered set)


A-ungrouped data
Example1: ( odd data ): find the median for the following data
13,24,8,11,3,12,11,46,35
Solution:
Order 3, 8, 11, 11, 12, 13, 24, 35, 46
Order of median = (n+1) / 2
Order of median = (9+1) / 2 = 5
Median = 12

Example 2: (even data): find the median for the following data 3, 24, 35, 8,
11, 13, 46, 11, 12, 48
Solution:
Order 3, 8, 11, 11, 12, 13, 24, 35, 46, 48
Median = (12+13) / 2 = 12.5
24

B-The median for grouped data


you can calculate the median for a grouped frequency table using the
following formula:

Median = L + [(N/2 – C) / f ] h
Where:

Median class: Locate n/2 in the column of cumulative frequency. The class
in which it lies is called median class.

 L: is the lower class boundary of the median class.


 f: is the frequency the median class.
 N: means sum of frequencies.
 C: is the cumulative frequency up to the class before the median class.
 h: is the length of median class.

Ex:1: Calculate the median of the data the following.

classes Frequency
5-9 3
10-14 5
15-19 8
20-24 10
25-29 18
30-34 17
25

35-39 11
40-44 9
45-49 7

Solution:

classes Frequency the cumulative


frequency up
5-9 3 3
10-14 5 8
15-19 8 16
20-24 10 26
25-29 18 44
30-34 17 61
35-39 11 72
40-44 9 81
45-49 7 88

Median class = meet the (n / 2)

( n/2 ) = 88 / 2 = 44 then, Median class (25 – 29)

Median = L + [(N/2 – C)/f ] h


Median = 25 +[(88 / 2 – 26) / 18]*5
Median = 25 + [ 44 – 26) / 18] *5
Median = 25 + [ 18 / 18 ] *5
Median = 25 + [ 1] *5 , Median = 25 + 5 , Median = 30
26

The geometric mean is a type of average, usually used for growth rates,

like population growth.

Definition: For n numbers: multiply them all together and then take the nth

root (written n√) More formally, the geometric mean of n numbers a1 to


an is:

n√ (a1 × a2 × ... × an)

Geometric Mean (GM) = √

Example 1: What is the Geometric Mean of 10, 51.2 and 8?

 First we multiply them: 10 × 51.2 × 8 = 4096


 Then (as there are three numbers) take the cube root: 3√4096 = 16

 For n numbers: multiply them all together and then take the

nth root (written n√ )

 More formally, the geometric mean of n numbers a1 to an is:

= 3√ (10 × 51.2 × 8) = 16

Example 2: What is the Geometric Mean of 1, 3, 9, 27 and 81?

 First we multiply them: 1 × 3 × 9 × 27 × 81 = 59049


 Then (as there are 5 numbers) take the 5th root: 5√59049 = 9
27

Geometric Mean = √ =9

A weighted mean is a kind of average. Instead of each data point

contributing equally to the final mean, some data points contribute more
“weight” than others. If all the weights are equal, then the weighted mean

equals the arithmetic mean.

Formula of weighted Mean

The Weighted mean for given set of non-negative data x1, x2, x3, ….. xn
with non-negative weights w1, w2, w3, …. wn can be derived from the
formula.

̅=

EX: - A candidate obtained the following percentages of marks. English 70,

Math 90, Stat 75, Chemistry 88 and Physics 79 and the weights are 1, 2, 2,

3, 3. Find the weighted average.

̅=

̅=
28

̅=

̅ = 81.91

EX:- The following table shows the prices per 100 gram of coffee of

different brands. Using quantities as weights find the Weighted Average.

Brands A B C D E F
Price(in $$) 2.5 3 3.5 4 4.5 5
Quantity(in Kg) 10 8 3 5 7 2

The harmonic mean (sometimes called the subcontrary mean) is one of


several kinds of average, and in particular one of the Pythagorean means.

Typically, it is appropriate for situations when the average of rates is

desired.

EX:- What is the harmonic mean of 1,5,8,10 ?

Solution:-

1- Add the reciprocals of the numbers in the set: 1/1 + 1/5 + 1/8 + 1/10 = 1.425

- Divide the number of items in the set by your answer to Step 1.

There are 4 items in the set, so:


4 / 1.425 = 2.80702
29

EX:- Calculate the harmonic mean of the numbers 13.5, 14.5, 14.8, 15.2 and 16.1

Solution:-the harmonic mean is calculated as below:

x
13.2 0.0758
14.2 0.0704
14.8 0.0676
15.2 0.0658
16.1 0.0621
Total ∑ = 0.3417

̅= = = 14.632

Chapter four

Measures of dispersion (variability)

The range

The standard deviation (S)


30

The Variance (dispersion) (S2)

Standard error (SE)

Coefficient of Variation (CV)

Measures of dispersion (variability)


Dispersion in statistics (also called variability) is a way of describing how
spread out a set of data.

1- the range :-

The range of a set of data is the difference between the largest and
smallest values
31

Range = (x-max – x- min)

Example 1: In {4, 6, 9, 3, 7} the lowest value is 3, and the highest is 9.

Range = (x-max – x- min) so the range is 9 − 3 = 6.

Example 2:- find the range from the following data set:

23 56 45 65 59 55 62 54 85 25

Range = (x-max – x- min)


The maximum value is 85 and the minimum value is 23.

So the range = 85 – 23

Range = 62
32

2- The standard deviation (S) :-

The standard deviation is a measure that is used to quantify the amount


of variation or dispersion of a set of data values. Its symbol is (S).

Method I: deviations Method

N= the number of values

̅ = the mean of the values

X = each values in the sample

Example: compute the Standard deviation for the following scores

Xi = 17, 20, 23, 18, 19, 21, 22


Solution:
̅ = ∑∑ X / n
̅ = (17+20+23+18+19+21+22) / 7 = 140 / 7 = 20

x x- ̅ (x - ̅ )2
17 17-20= -3 9
20 20-20= 0 0
33

23 23-20= 3 9
18 18-20= -2 4
19 19-20= -1 1
21 21-20= 1 1
22 22-20= 2 4
∑X = 140 ∑∑(x - ̅ ) = zero ∑∑(x - ̅ )2 = 28

S =√ ,S=√

S=√ , then S = 2.61

Method II : Squares Method , S=√


Example: Calculate the standard deviation for the following sample data
using squares method: 2, 4, 8, 6, 10, and 12.
Solution:
X X2
2 4
4 16
8 64
6 36
34

10 100
12 144
∑ X =42 ∑ X2 = 364

S=√

S=√

S=√

S=√ , S=√ , then S= 3.741

3- Variance ( dispersion) (S2):-

Variance or dispersion is measures how far a set of (random)


numbers are spread out from their average value. Variance has a
central role in statistics, where some ideas that use it include
descriptive statistics, statistical inference, hypothesis testing, and
goodness of fit. Its symbol is (S2).

Formula:-
35

̅
1- S2 =

2- S2 =

Example 1 :- find the variance (s2) and stander deviation (s) from the table.

Animal Metabolic rate


1 727.7
2 1086.5
3 1091.0
4 1361.3
5 1490.5
6 1956.1

Solution:-

Mean = ̅ = ∑X ÷ n
̅ = (727.7 + 1086.5 +1091.0 +1361.3 +1490.5 +1956.1) ÷ 6

̅ = 1285.5

Animal Metabolic mean Difference from Squared Difference


rate mean from mean
1 727.7 1285.5 -557.8 311140.84
36

2 1086.5 1285.5 -199 39691


3 1091.0 1285.5 -194.5 37830.25
4 1361.3 1285.5 75.8 5745.64
5 1490.5 1285.5 205 42025
6 1956.1 1285.5 670.6 449704.36
∑(x - ̅ ) = zero ∑∑(x- ̅ )2 = 886047.09

̅
S2 =

S2 =

Then variance S2 = 177201.418

S= √S2

S= √

Then standard deviation S= 420.59

4- The standard error (SE) :-


37

The standard error is a measure of the variability of a statistic. It is an


estimate of the standard deviation of a sampling distribution.

Formula SE = √

5- Coefficient of Variation (CV)


The coefficient of variation (CV), also known as relative standard
deviation (RSD), is a standardized measure of dispersion of a probability
distribution or frequency distribution. It is often expressed as a
percentage, and is defined as the ratio of the standard deviation to the
mean (or its absolute value.)

Formula CV = ̅ * 100%

Example 1:- Calculate standard error and coefficient of variation for the

following data.

14, 8, 11, 12, 16, 10

Solution:-

̅= = = 11.83
38

̅
S=√

X ̅ X- ̅ (X- ̅
14 11.83 2.17 4.71
8 11.83 -3.83 14.67
11 11.83 -0.83 0.69
12 11.83 0.17 0.029
16 11.83 4.17 17.39
10 11.83 -1.83 3.35
∑∑ (X- ̅ = 40.839

S=√ , S=√ , S = 2.85

SE =

SE = , SE = 1.163

CV = ̅ * 100%

CV = ̅̅̅̅̅̅̅̅ * 100%

CV = 0.239 * 100 % , CV = 23.92 %


39

Example 2:- Find SE of a sample distribution whose standard deviation is

38 and sample size is 45.

Solution: - Given that S = 38 and N = 45

SE =

SE = , , SE = 5.67

Example 3:- Find CV of a sample distribution whose standard deviation is

22.6 and values mean is 140.8.

Solution: - Given that S = 22.6 and ̅ = 140.8

CV = ̅ * 100%

CV = ̅̅̅̅̅̅̅̅ * 100% , cv = 0.1605 * 100 %

So cv = 16.05 %
40

Chapter Five

Probability Distribution definition

Types of probability distribution:

Discrete probability distribution

Continuous probability distribution


41

Normal distribution

Probability definition:

Single Event probability

Sample Space and Events

Multiple-event probability

Probability Distribution definition:

Any variable can have two types of values. Either the values can be fix
numbers which are also known as discrete values or a specified range that

is known as continuous values.


42

So If a random variable is a discrete variable, its probability distribution is


called a discrete probability distribution; otherwise, it is called a continuous
probability distribution.

If a variable can take on any value between two specified values, it is called
a continuous variable; otherwise, it is called a discrete variable.

Some examples will clarify the difference between discrete and continuous
variables.


Suppose the nursing departments that all nurses must weigh
between 60 and 90 kg. The weight of nurses would be an example of
a continuous variable; since a nurse’s weight could take on any value
between 60and 90 kg.


Suppose we flip a coin and count the number of heads. The number
of heads could be any integer value between 0 and plus infinity. We
could not, for example, get 2.5 heads. Therefore, the number of
heads must be a discrete variable.

Just like variables, probability distributions can be classified as discrete or


continuous.

Discrete Probability Distributions:-If a random variable is a discrete


variable, its probability distribution is called a discrete probability
distribution.
43

An example will make this clear. Suppose you flip a coin two times. This
simple statistical experiment can have four possible outcomes: HH, HT, TH,
and TT. Now, let the random variable X represent the number of Heads
that result from this experiment. The random variable X can only take on
the values 0, 1, or 2, so it is a discrete random variable.

The probability distribution for this statistical experiment appears below.

Number of heads Probability

0 0.25
1 0.50
2 0.25

The above table represents a discrete probability distribution because it


relates each value of a discrete random variable with its probability of
occurrence. In subsequent lessons, we will cover the following discrete
probability distributions.

Normal distribution
44

The normal distribution is the most widely known and used of all
distributions. Because the normal distribution approximates many natural
phenomena so well, it has developed into a standard of reference for many
probability problems.

The normal distribution refers to a family of continuous probability


distributions described by the normal equation.

Characteristics of the Normal distribution


•1- Symmetric, bell shaped
• 2- Continuous for all values of X between -∞ and ∞ so that each
conceivable interval of real numbers has a probability other than zero.
•3- X values take -∞ ≤ X ≤ ∞
• 4- Two parameters, mean (μ) and standard deviation (σ). Note that the
normal distribution is actually a family of distributions, where (μ) and
(σ) determine the shape of the distribution.
45


5- The rule for a normal density function is

Where: x= random variable, μ= the mean , σ = the standard deviation ,


π = 3.14, exp = 2.71

• 6- About 2/3 of all cases fall within one standard deviation of the mean
that is P (μ - σ ≤ X ≤ μ + σ) = 0.6826

• 7- About 95% of cases lie within 2 standard deviations of the mean, that is
P(μ - 2σ ≤ X ≤ μ + 2σ) = 0.9544

•8- About 99 % of cases lie within 3 standard deviations of the mean, that is
P (μ - 3σ ≤ X ≤ μ + 3σ) = 0.9973
46
47

Probability definition:

Probability is used to find the number of occurrence of an event out of


possible outcomes. Probability should always lies between 0 and 1.

Single Event probability definition:

Single-event probability is used to find the probability for a single event


that occurs for an experiment. For example, consider tossing a coin, we will
get single event (either head or tail) as expected result.

Sample Space and Events

• Sample space: The set of all possible outcomes.

1. Roll a die: {1, 2, 3, 4, 5, 6}.

2. Flip a coin twice: {(H,H), (H, T), (T,H), (T,T)}.

• Event: A subset of the sample space.

1. Roll a die: the outcome is even {2, 4, 6}.

2. Flip a coin twice and the two results are different: {(H, T), (T,H)}.
48

Formula:

Probability that event A occurs P (A) = n(A) / n(S).


Where,
n (A) - number of event occurs
n (S) - number of possible outcomes

Probability that event A does not occur P(A') = 1 - P(A).


where,
P(A) is the probability of single event A.

Example:

Consider a die is rolled, there is a chance of six different outcomes as


1,2,3,4,5 and 6. Calculate the probability of getting even numbers?

Solution:
49

where,
n (A) = 3, total number of even numbers occurred is 3.

n (S) = 6,total number of outcomes is 6.

Probability that event A occurs = P (A) = n (A) / n (S) = 3 / 6 = 0.5.

Probability that event A does not occur = P (A') = 1 – P (A) = 1 - 0.5 = 0.5.

Multiple-event probability definition:


Multiple Event probability is used to find the probability for multiple events
that occurs for an experiment. For example, consider tossing a coin twice,
we may get head at first time and tail at second time. Here two events are
not occurring together and this type of events occurring is said to be
mutually exclusive events.

Formula:

Probability that event A occurs P (A) = n (A) / n (S).


where,
50

n(A) - number of event occurs in A


n(S) - number of possible outcomes

Probability that event B occurs P (B) = n (B) / n (S).


where,
n(B) - number of event occurs in B
n(S) - number of possible outcomes

Probability that event A does not occur P(A') = 1 - P(A).

Probability that event B does not occur P(B') = 1 - P(B).

Probability that both the events occur P(A ∩ B) = P(A) * P(B).

Probability that either of event occurs P(A ∪ B) = P(A) + P(B) - P(A ∩ B).

Conditional Probability P(A | B) = P(A ∩ B) / P(B).

Example: Consider, a die is thrown twice. Calculate the probability of


getting odd numbers and even numbers from the events?
Solution:
51

n(A) = occurrence of odd numbers = 3 ,


n(B) = occurrence of even numbers = 3,
n(S) = total number of sample space = 6.

P(A) = n(A) / n(S)


= 3 / 6 = 0.5.
Probability that event A occurs = 0.5.

P(B) = n(B) / n(S)


= 3 / 6 = 0.5.
Probability that event B occurs = 0.5.

P(A') = 1 - P(A)
= 1 - 0.5 = 0.5.
Probability that event A does not occur = 0.5.

P(B') = 1 - P(B)
= 1 - 0.5 = 0.5.
Probability that event B does not occur = 0.5.

P(A ∩ B) = P(A) * P(B)


= 0.5 x 0.5 = 0.25.
Probability that both the events occurs = 0.25.

P(A ∪ B) = P(A) + P(B) - P(A ∩ B)


= 0.5 + 0.5 - 0.25 = 0.75.
52

Probability that either of event occurs = 0.75.

P(A | B) = P(A ∩ B) / P(B)


= 0.25 / 0.5 = 0.5.
Conditional probability of A given B = 0.5.

Chapter Six

Hypothesis testing
Types of hypothesis:
T-Test: comparison of arithmetic means
Independent one-sample T-Test

Dependent T-test for paired samples

Independent two-sample T-test


53

Hypothesis testing:

Hypothesis testing is a scientific process to examine if a hypothesis is


plausible or not. Statistical hypotheses usually assume no relationship
between variables.

Types of hypothesis:

1-Null hypothesis (H0): The means of the two groups are not significantly
different. Its form

H0: μ1 = μ2 = μ3 = μ4 ……μn

Ex: There is no association between eye color and eyesight.

2- Alternative Hypothesis (H1): The means of the two groups are


significantly different. Its form

H1: μ1 ≠ μ2 ≠μ3 ≠ μ4 ……μn

Ex: There is association between eye color and eyesight.


54

T-Test: comparison of arithmetic means

1-Independent one-sample T-Test:

Used to compare the mean of the sample with the mean of the community

̅
Formula T= , The degrees of freedom used in this test is n - 1.

Where: ( ̅ ) the sample arithmetic mean, (μ) the population arithmetic mean

(s) the sample standard deviation, (n) the sample size

Example: If was average of intelligence to students of the College of


Science of the five previous years is (145) Make a, compared with students
of the College of Nursing in 2011 took them ten students were to have
degrees of intelligence (150,170,140,110,160,120,170,140,110,130), and
then explained by the statistically.

Solution

̅
T=

X X2
150 22500
55

170 28900
140 19600
110 12100
160 25600
120 14400
170 28900
140 19600
110 12100
130 16900
∑x= 1400 ∑x2 = 200600

̅ = 1400/10 =140

S = √S2

S2 =

S2 =

S2 = , S2 =

S2 =
56

S2 = 511.11

Since S = √S2

Then S = √511.11 , S = 22.6

̅
T=

Since µ = 145

T=

T= , T = 0.7 , T- calculated = 0.7

(df) used in this test is n – 1 , df = 9

T- tabular at 0.01= 3.25

Since T- calculated = 0.7 < T- tabular = 3.25

So the difference between the mathematical means have insignificant.

2-Dependent t-test for paired samples:


57

This model is widely used in scientific research to find out the development
the sample before and after effects.

̅ ̅
Formula T= , The degrees of freedom used in this test is n − 1

Where (Sd) is Standard deviation of the differences in values.

̅ ̅ Is the difference in arithmetic means.

Ex: Five patients have heart beat (x1=100, 120 115 110, 130), after care
become (x2=90 100, 90, 80.95), Calculate T- value, and then explain the
results by the statistically.

Solution:

̅ ̅
T=

X1 X2 d d2
100 90 10 100
120 100 20 400
115 90 25 625
110 80 30 900
130 95 35 1225
∑x1=575 ∑x2=455 ∑d = 120 ∑d2 = 3250
̅ 1=575÷5=115 ̅ 2=455÷5=91

Sd = √Sd2
58

Sd2 =

Sd2 =

Sd2 =

sd2 = 92.5

sd =√sd2 , sd =√92.5 , sd= 9.6

̅ ̅
T= , T=
√ √

T=

T= 5.58, so T- calculated = 5.58

(df) used in this test is n – 1 , df = 4

T- tabular at 0.01= 4.6

Since T- calculated = 5.58 > T- tabular = 4.6


59

So the difference between the mathematical means have significant.

3- Independent two-sample T-test

This form is used for comparison between different samples.

̅
Formula T= , The degrees of freedom used in this test is n1+ n2 - 2

Where ( ̅ ) is the arithmetic mean of first sample., (ӯ) is the arithmetic mean of second sample.

(S2x) is the variance of first sample., (S2y) is the variance of second sample.

Ex: make a comparison by T test to show difference in the degree of the


pain between the two groups of patients described below then explained by
the statistically.

Government hospital patients = (8, 6, 5, 4, 7, 6, 6, 5)

Private hospital patients = (9, 7, 5, 4, 7, 6, 8, 4)


60

̅
Solution : T=

x X2 y y2
8 64 9 81
6 36 7 49
5 25 5 25
4 16 4 16
7 49 7 49
6 36 6 36
6 36 8 64
5 25 4 16
∑x=47 ∑x2= 287 ∑y=50 ∑y2=336

̅ = 47 ÷ 8= 5.87 Ӯ= 50 ÷ 8 = 6.25

S2x = , S2x = , S2x= 1.55

S2y = , S2y = , S2y = 3.35

T= , T= √

T= , T=

61

T = - 0.48, T - calculated = 0.48

(df) used in this test is n1 +n1 – 2 , df = 14

T- tabular at 0.01= 2.97

Since T- calculated = 0.48 < T- tabular = 2.97

So the difference between the mathematical means have insignificant.

PERCENTAGE POINTS OF THE T DISTRIBUTION


Tail Probabilities
One Tail 0.10 0.05 0.025 0.01 0.005 0.001 0.0005
Two Tails 0.20 0.10 0.05 0.02 0.01 0.002 0.001
-------+---------------------------------------------------------+-----
D 1 | 3.078 6.314 12.71 31.82 63.66 318.3 637 | 1
E 2 | 1.886 2.920 4.303 6.965 9.925 22.330 31.6 | 2
G 3 | 1.638 2.353 3.182 4.541 5.841 10.210 12.92 | 3
R 4 | 1.533 2.132 2.776 3.747 4.604 7.173 8.610 | 4
E 5 | 1.476 2.015 2.571 3.365 4.032 5.893 6.869 | 5
E 6 | 1.440 1.943 2.447 3.143 3.707 5.208 5.959 | 6
S 7 | 1.415 1.895 2.365 2.998 3.499 4.785 5.408 | 7
8 | 1.397 1.860 2.306 2.896 3.355 4.501 5.041 | 8
O 9 | 1.383 1.833 2.262 2.821 3.250 4.297 4.781 | 9
F 10 | 1.372 1.812 2.228 2.764 3.169 4.144 4.587 | 10
11 | 1.363 1.796 2.201 2.718 3.106 4.025 4.437 | 11
F 12 | 1.356 1.782 2.179 2.681 3.055 3.930 4.318 | 12
R 13 | 1.350 1.771 2.160 2.650 3.012 3.852 4.221 | 13
E 14 | 1.345 1.761 2.145 2.624 2.977 3.787 4.140 | 14
E 15 | 1.341 1.753 2.131 2.602 2.947 3.733 4.073 | 15
D 16 | 1.337 1.746 2.120 2.583 2.921 3.686 4.015 | 16
62

O 17 | 1.333 1.740 2.110 2.567 2.898 3.646 3.965 | 17


M 18 | 1.330 1.734 2.101 2.552 2.878 3.610 3.922 | 18
19 | 1.328 1.729 2.093 2.539 2.861 3.579 3.883 | 19
20 | 1.325 1.725 2.086 2.528 2.845 3.552 3.850 | 20
21 | 1.323 1.721 2.080 2.518 2.831 3.527 3.819 | 21
22 | 1.321 1.717 2.074 2.508 2.819 3.505 3.792 | 22
23 | 1.319 1.714 2.069 2.500 2.807 3.485 3.768 | 23
24 | 1.318 1.711 2.064 2.492 2.797 3.467 3.745 | 24
25 | 1.316 1.708 2.060 2.485 2.787 3.450 3.725 | 25
26 | 1.315 1.706 2.056 2.479 2.779 3.435 3.707 | 26
27 | 1.314 1.703 2.052 2.473 2.771 3.421 3.690 | 27
28 | 1.313 1.701 2.048 2.467 2.763 3.408 3.674 | 28
29 | 1.311 1.699 2.045 2.462 2.756 3.396 3.659 | 29
30 | 1.310 1.697 2.042 2.457 2.750 3.385 3.646 | 30
32 | 1.309 1.694 2.037 2.449 2.738 3.365 3.622 | 32
34 | 1.307 1.691 2.032 2.441 2.728 3.348 3.601 | 34
36 | 1.306 1.688 2.028 2.434 2.719 3.333 3.582 | 36
38 | 1.304 1.686 2.024 2.429 2.712 3.319 3.566 | 38
40 | 1.303 1.684 2.021 2.423 2.704 3.307 3.551 | 40
42 | 1.302 1.682 2.018 2.418 2.698 3.296 3.538 | 42
44 | 1.301 1.680 2.015 2.414 2.692 3.286 3.526 | 44
46 | 1.300 1.679 2.013 2.410 2.687 3.277 3.515 | 46
48 | 1.299 1.677 2.011 2.407 2.682 3.269 3.505 | 48
50 | 1.299 1.676 2.009 2.403 2.678 3.261 3.496 | 50
55 | 1.297 1.673 2.004 2.396 2.668 3.245 3.476 | 55
60 | 1.296 1.671 2.000 2.390 2.660 3.232 3.460 | 60
65 | 1.295 1.669 1.997 2.385 2.654 3.220 3.447 | 65
70 | 1.294 1.667 1.994 2.381 2.648 3.211 3.435 | 70
80 | 1.292 1.664 1.990 2.374 2.639 3.195 3.416 | 80
100 | 1.290 1.660 1.984 2.364 2.626 3.174 3.390 | 100
150 | 1.287 1.655 1.976 2.351 2.609 3.145 3.357 | 150
200 | 1.286 1.653 1.972 2.345 2.601 3.131 3.340 | 200

Chapter Six

Chi- Square distribution (X2)

What is a correlation?
63

Pearson correlation

Chi- Square distribution (X2):- Chi-Square Test is the widely used non-parametric
statistical test that describes the magnitude of discrepancy between the observed
data and the data expected to be obtained with a specific hypothesis.

The observed and expected frequencies are said to be completely coinciding


when the χ2 = 0 and as the value of χ2 increases the discrepancy between the
observed and expected data becomes significant. The following formula is used to
calculate Chi-square:
64

Where,
O = Observed Frequency
E = Expected or Theoretical Frequency

Chi-Square Test for Independence


Example; - calculate chi square (X2) for the following data table

smoke no smoke

Men 207 282

Women 231 242

Solution:- Add up rows and columns:

smoke no smoke sum

Men 207 282 ∑=489

Women 231 242 ∑=473

sum ∑=438 ∑=524 ∑=962

Calculate "Expected Value" for each entry:

E= row total×column / totalsample size

Multiply each row total by each column total and divide by the overall total:

E 1,1= 489×438/962 = 222.64

E 1,2 = 489×524/962 = 266.36

E 2,1= 473×438/962 = 215.36

E 2,2 = 473×438/962 = 257.64


65

Subtract expected from actual, square it, then divide by expected:

X2 =

Which is : X2 = 1.099 + 0.918 + 1.136 + 0.949

Now add up those values:

Chi-Square is, X2 = 4.102


With degrees freedom (df) equal to (2 - 1)(2 - 1) = 1

X2 -tabular at 0.01 level = 6.635

Since x2- calculated = 4.102 < x2- tabular = 6.635 at 0.05 level

Then the x2 - calculated is significant

Example: - A public opinion poll surveyed a simple random sample of 1000


nurses. Respondents were classified by gender (male or female) and by
voting preference (Laboratory, clinical or management). Results are shown
in the contingency table below.

Voting Preferences

Laboratory clinical management

Male 200 150 50


66

Female 250 300 50

Is there a gender gap? Do the men's voting preferences differ significantly


from the women's preferences? Use X2 at 0.05 level of significance.

Solution:-

Voting Preferences
Sum
Laboratory clinical management

Male 200 150 50 ∑= 400

Female 250 300 50 ∑= 600

Sum ∑= 450 ∑= 450 ∑= 100 ∑= 1000

Calculate "Expected Value" for each entry: E=row total×column / totalsample size

E1,1 = (400 * 450) / 1000 = 180000/1000 = 180


E1,2 = (400 * 450) / 1000 = 180000/1000 = 180
E1,3 = (400 * 100) / 1000 = 40000/1000 = 40
E2,1 = (600 * 450) / 1000 = 270000/1000 = 270
E2,2 = (600 * 450) / 1000 = 270000/1000 = 270
E2,3 = (600 * 100) / 1000 = 60000/1000 = 60

X2 =

Χ2 = 2.22 + 5.00 + 2.50 + 1.48 + 3.33 + 1.67

Χ2= 16.2
67

With degrees freedom (df) equal to (2 - 1)(3 - 1) = 2.


X2 -tabular at 0.05 level = 5.99

Since x2- calculated = 16.2 > x2- tabular =5.99 at 0.05

Then the x2 - calculated is significant

Example:- Is there in a sinusitis difference according to men, women and children


as show table below . Use X2 at 0.05 level of significance.

sinusitis men women children


chronic 138 83 64
acute 64 67 84

Solution:-
sinusitis men women children Sum
68

chronic 138 83 64 ∑=285


acute 64 67 84 ∑= 215
Sum ∑= 202 ∑= 150 ∑= 148 ∑= 500

Calculate "Expected Value" for each entry: E=row total×column / totalsample size

E1,1 = (285 * 202) / 500 = 115.14


E1,2 = (285 * 150) / 500 = 85.5
E1,3 = (285 * 148) / 500 = 84.36
E2,1 = (215 * 202) / 500 = 86.86
E2,2 = (215 * 150) / 500 = 64.5
E2,3 = (215 * 148) / 500 = 63.64

X2 =

Χ2= 22.152

With degrees freedom (df) equal to (2 - 1)(3 - 1) = 2.

X2 -tabular at 0.05 level = 5.99


Since x2- calculated = 22.152 > x2- tabular =5.99 at 0.05

Then the x2 - calculated is significant

Table: Chi-Square Probabilities

df 0.10 0.05 0.025 0.01 0.005


1 2.706 3.841 5.024 6.635 7.879
2 4.605 5.991 7.378 9.210 10.597
3 6.251 7.815 9.348 11.345 12.838
4 7.779 9.488 11.143 13.277 14.860
5 9.236 11.070 12.833 15.086 16.750
6 10.645 12.592 14.449 16.812 18.548
7 12.017 14.067 16.013 18.475 20.278
8 13.362 15.507 17.535 20.090 21.955
69

df 0.10 0.05 0.025 0.01 0.005


9 14.684 16.919 19.023 21.666 23.589
10 15.987 18.307 20.483 23.209 25.188
11 17.275 19.675 21.920 24.725 26.757
12 18.549 21.026 23.337 26.217 28.300
13 19.812 22.362 24.736 27.688 29.819
14 21.064 23.685 26.119 29.141 31.319
15 22.307 24.996 27.488 30.578 32.801
16 23.542 26.296 28.845 32.000 34.267
17 24.769 27.587 30.191 33.409 35.718
18 25.989 28.869 31.526 34.805 37.156
19 27.204 30.144 32.852 36.191 38.582
20 28.412 31.410 34.170 37.566 39.997
21 29.615 32.671 35.479 38.932 41.401
22 30.813 33.924 36.781 40.289 42.796
23 32.007 35.172 38.076 41.638 44.181
24 33.196 36.415 39.364 42.980 45.559
25 34.382 37.652 40.646 44.314 46.928
26 35.563 38.885 41.923 45.642 48.290
27 36.741 40.113 43.195 46.963 49.645
28 37.916 41.337 44.461 48.278 50.993
29 39.087 42.557 45.722 49.588 52.336
30 40.256 43.773 46.979 50.892 53.672
40 51.805 55.758 59.342 63.691 66.766
50 63.167 67.505 71.420 76.154 79.490
60 74.397 79.082 83.298 88.379 91.952
70 85.527 90.531 95.023 100.425 104.215
80 96.578 101.879 106.629 112.329 116.321
90 107.565 113.145 118.136 124.116 128.299
100 118.498 124.342 129.561 135.807 140.169
What is a correlation? (Pearson correlation)

A correlation is a number between -1 and +1 that measures the degree of


association between two variables (call them X and Y).

Interpretation of the correlation coefficient: Here is how I tend to interpret


correlations.

 -1.0 to -0.7 strong negative association.


70

 -0.7 to -0.3 weak negative association.


 -0.3 to +0.3 little or no association.
 +0.3 to +0.7 weak positive association.
 +0.7 to +1.0 strong positive association.

R-calculated:

1- if R-calculated > R-tabular then the correlation between X and Y significant.


2- R-calculated < R-tabular then the correlation between X and Y insignificant.

Degree of freedom (df) = n - 2

Computing the Pearson Correlation Coefficient

One formula for the Pearson correlation coefficient r is as follows:

EX1: Sample question: compute the value of the correlation coefficient from the
following table:

Age x Glucose Level y


43 99
21 65
25 79
42 75
71

57 87
59 81

Solution:

Age x Glucose Level y xy x2 y2


43 99 4257 1849 9801
21 65 1365 441 4225
25 79 1975 625 6241
42 75 3150 1764 5625
57 87 4959 3249 7569
59 81 4779 3481 6561
Σx = 247 Σy = 486 Σxy= 20485 Σ x2= 11409 Σ y2 = 40022

Use the following formula to work out the correlation coefficient.

R= [(20485 – (247)(486) / 6] / √(11409 - 2472 / 6) (40022 - 4862 / 6]

R= 478 / √1240.83 * 656

R = 478 / √813984.48

R = 478 / 902.21

R = 0.529

Interpretation: R-calculated = 0.529 < R-tabular ( .811) at level 0.05 and (df =4) then the
correlation between X and Y is insignificant.
72

EX2: Calculate the correlation coefficient between two variables, and then interpret
the R value. X= 1, 3, 4, 4 and Y= 2, 5, 5, 8.
Solution:

x y xy x2 y2
1 2 2 1 4
3 5 15 9 25
4 5 20 16 25
4 8 32 16 64
Σx = 12 Σy = 20 Σxy= 69 Σ x2= 42 Σ y2 = 118

Interpretation: R-calculated = 0.866 < R-tabular ( .95) at level 0.05 and (df =2)
then the correlation between X and Y is insignificant.

Pearson Product-Moment Correlation Coefficient

df = n -2
Level of Significance (p)
.10 .05 .02 .01
for Two-Tailed Test
df
73

1 .988 .997 .9995 .9999

2 .900 .950 .980 .990

3 .805 .878 .934 .959

4 .729 .811 .882 .917

5 .669 .754 .833 .874

6 .622 .707 .789 .834

7 .582 .666 .750 .798

8 .549 .632 .716 .765

9 .521 .602 .685 .735

10 .497 .576 .658 .708

11 .476 .553 .634 .684

12 .458 .532 .612 .661

13 .441 .514 .592 .641

14 .426 .497 .574 .623

15 .412 .482 .558 .606

16 .400 .468 .542 .590

17 .389 .456 .528 .575

18 .378 .444 .516 .561

19 .369 .433 .503 .549

20 .360 .423 .492 .537

21 .352 .413 .482 .526

22 .344 .404 .472 .515


74

23 .337 .396 .462 .505

24 .330 .388 .453 .496

25 .323 .381 .445 .487

26 .317 .374 .437 .479

27 .311 .367 .430 .471

28 .306 .361 .423 .463

29 .301 .355 .416 .456

30 .296 .349 .409 .449

35 .275 .325 .381 .418

40 .257 .304 .358 .393

45 .243 .288 .338 .372

50 .231 .273 .322 .354

60 .211 .250 .295 .325

70 .195 .232 .274 .303

80 .183 .217 .256 .283

90 .173 .205 .242 .267

100 .164 .195 .230 .254

You might also like