0% found this document useful (0 votes)
7 views

Ipsita Panda-Biostats Assignment

The document covers key concepts in biostatistics, including the chi-square test, measures of central tendency (mean, median, mode), and various types of graphs. It explains how to calculate chi-square values and expected frequencies, as well as how to determine mean, median, and mode from data sets. Additionally, it describes different graphical representations such as bar graphs, pie charts, box plots, histograms, ogives, and scatter plots.

Uploaded by

shrutigupta4142
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Ipsita Panda-Biostats Assignment

The document covers key concepts in biostatistics, including the chi-square test, measures of central tendency (mean, median, mode), and various types of graphs. It explains how to calculate chi-square values and expected frequencies, as well as how to determine mean, median, and mode from data sets. Additionally, it describes different graphical representations such as bar graphs, pie charts, box plots, histograms, ogives, and scatter plots.

Uploaded by

shrutigupta4142
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

BIOSTATISTICS ASSIGNMENT

NAME- IPSITA PANDA


EN. NO.- A35204122006
COURSE CODE- BIOT213
COURSE TITLE- BIOSTATISTICS
Topic: Measure of central tendency, graphs & chi square test.

1. CHI SQAURE TEST

 A chi square (χ2) statistics is the measure of the difference between the
observed (O) and Expected (E) frequencies of the outcomes of a set of events
or variables.
 Chi square depends on the size of the difference between actual and observed
values, the degree of freedom and the sample size.
 It can also be used to test the goodness of fit between an observed distribution
and theoretical distribution of frequencies.
 Formula used is:

χ2= (O -E)^2/ E

O= observed values
E= expected values
 Goodness -of- fit: chi square provides a way to test how well a sample of data
matches the (known or assumed) characteristics of the larger population that
the sample is intended to represent. This is known as goodness of fit.

Example 1: In a flowering plant white flower (B) are dominant over red flower (b) and
short plant (E) are dominant over tall (e) plants. When the two double heterozygote
(BbEe) plants were crossed the resulting phenotypes is observed (O) as follows: white &
short (206), red & short (83), white & tall (65) and red & tall (30).

- According to Mendel’s dihybrid cross the F2 phenotypic ratio should be


9:3:3:1.
- Therefore, the null hypothesis will be 9:3:3:1.
- In order to calculate expected value formula used is,
Expected value = (null hypothesis (e.g. 9) * total observation)/ 16.

(expected value calculation)

- Next step will be to calculate chi square value for individual observation
and then sum of all the values.

(chi square= 4.32)


- Another method to is to calculate p value (probability of getting results
close to the extremes of the observed results. It is based on the assumption
that the null hypothesis is correct). Degree of freedom= number of classes
-1.

(Formula to calculate P value.)


- Then calculate test statistics (equal to chi square value) using the following
formula,

(Formula to calculate test statistics.)

- Now calculate the critical chi square value using the following formula,
(Formula to calculate critical chi square value.)

- Final output will be as follows-

- Since critical chi square value is greater than test statistics or chi square
value which means null hypothesis is accepted.

2. MEASURE OF CENTRAL TENDENCY

I. MEAN

- It is the ratio of sum of all the observation to the number of observations.


- In a given set of data of size n,
{x1, x2, x3…... xn}
The mean (x “bar”) is given by the following formula:

Numerator- sum of all the observation (x1+x2+……xn)


Denominator- n, that is number of observations.
- In case of different frequency- let there be ‘n’ number of items in a set x1,
x2, x3……xn and frequency corresponding them be f1, f2, f3…. fn. then
mean will be:

- In case of continuous distribution- first step is to calculate the mid value of


the class interval and second step is to apply mean formula. As mentioned
above.
II. MEDIAN

- That value of the observation which divides the entire data set into two equal
parts. Condition, that the data should be arranged in ascending or descending
order.
- Median is a positional average which locates the centre of the observation.
- If the number of observation “n” is odd, there will be a unique median, ½(n)th
observation from either end of the observation will be the median.
- If the number of observation “n” is even, there is no middle observation, but
median is defined by convection as the average of (n/2) th and (n+1)/2th
observation.
- In case of discrete frequency distribution – the first step is to arrange the data in
ascending or descending order, then find the cumulative frequency. Then divide
the cumulative frequency by 2 (cf/2). Find a number greater than cf/2; that will be
the median of the data.
- In case of continuous frequency distribution – the first step is to find the
cumulative frequency and then apply the formula:

L=lower class limit of median class


N/2= half of cumulative frequency
Cf= cumulative frequency of class before the median class
F= frequency corresponding to median class
H= class width

III. MODE

- Mode is the most frequently occurring item of the series.


- Unlike mean and median which calculate the average of the given dataset,
mode simply identifies the value that appears most frequently.
- Types of mode- unimodal (single mode), bimodal (two modes), trimodal
(three modes) and ill- defined (multiple modes).
- Mode in case of ungrouped data- value which is occurring the greatest
number of times.
- In case of grouped data- formula used is:
Mode = l + [(f1 – f0) / (2f1 – f0 – f2)] × h

- L- lower limit of modal class


- F0 – frequency of preceding modal class
- F1- frequency of modal class
- F2- frequency of succeeding modal class
- h – width of modal class

Example 2: Sample of birthweights(g) of live born infants in a private hospital in San


Diego, California in one week period is given in the following table:
To calculate the mean, median and mode following steps are to be followed-

(fig: calculation of mean)

(fig: calculation of median)


(fig: calculation of mode)
Therefore, the final answers are as follows:
Example 3: Consider the data set in the given table, which consist of white blood cell
count taken on admission of patients entering a small hospital, Allentown, Pennsylvania
on a given day. Compute the mean, median and mode.

In order to calculate mean, median and mode following steps are to be performed:

(Fig: calculation of mean)

(Fig: calculation of median)


(Fig: calculation of mode)
Therefore, the final answers are as follows:

(Mean=10.77, median= 8, mode= 8)

3. GRAPHS
A graph can be defined as pictorial representation or a diagram that represents data or
values in an organized manner.

I. BAR GRAPHS
A bar graphs or bar chart is a visual presentation of group of data that is made
up of horizontal or vertical rectangular bar of length equal to the measure of
the data.

II. PIE CHARTS


Pie chart is a way of summarizing a set of nominal data or displaying the
different values of the given variable (e.g. percentage distribution). This type
of chart is a circle divided into series of segments. The area of each segment is
the same proportion of a circle as the category.
III. BOX PLOT
When we display the data distribution in a standardized way using five
summary- minimum, Q1(first quartile), median, Q2(third quartile) and
maximum, it is called box plot.
The end of the box are the upper & lower quartiles so the box crosses the
interquartile range. A vertical line inside the box marks the median and the two
lines outside the box are the whiskers extending to the highest and lowest
observations.

IV. HISTOGRAM
A histogram is a graphical representation of a grouped frequency distribution
with continuous classes. It is an area diagram and can be defined as a set of
rectangles with bases along with the intervals between class boundaries and
with areas proportional to frequencies in the corresponding classes.

V. OGIVE
The ogive is defined as the frequency distribution graph of a series. The ogive
is a graph of cumulative distribution, which explains data values on the
horizontal plane axis and either the cumulative relative frequencies, the
cumulative frequencies or cumulative per cent frequencies on the vertical axis.
Two methods of ogive are:-(i) less than ogive- the frequencies of all preceding
classes are added to the frequency of a class. (ii) greater than ogive-
frequencies of all succeeding classes are added to the frequency of a class.

VI. SCATTER PLOT


The scatter diagram graphs numerical data pairs, with one variable on each
axis, show their relationship. This is used in case when we have numerical
data, or when there are multiple values of the dependent variable for a unique
value of an independent variable.
Example 4: Following are the weights of 57 children in a day care.
i) Bar graph:

ii) Pie chart:

iii) Box plot


iv) Histogram

v) Ogive

vi) Scatter plot

You might also like