0% found this document useful (0 votes)
13 views

Lec1 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Lec1 2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Introduction to Biostatistics

Shamik Sen
Dept. of Biosciences & Bioengineering
IIT Bombay
What is Statistics?

The science that deals with the collection, classification, analysis,


and interpretation of numerical facts or data

Examples
Opinion Polls

Weather Patterns

Income as per professions


What is Biostatistics?

Application of statistics for understanding of biological systems or


biological processes

the branch of statistics that deals with data relating to living


organisms.

Examples
Biology - evolution

Medicine – drug resistance

Public Health – Zika virus


History of Statistics

• Foundation of statistics – probability distributions - gambling!

• Developed by De Moivre & Laplace

• 19th century:

• Galton’s discovery of regression

• Karl Pearson’s work on parametric fitting of probability distributions


Why study Biostatistics

Case 1: You are the clinical coordinator for a clinical trial. You are enrolling patients at X
locations and want to add a new site. In order to add a new site, an institution review
board (IRB) must review and approve your trial protocol.

Board member asks: what is your stopping rule?

You are stumped!!

Stopping protocol:

• Group sequential statistical methods – allow data to be compared during the trial
• Based on data, trial can be declared successful
• Trial may be pre-empted
Why study Biostatistics

Case 2: Measuring cell motility – relevance to normal physiology (e.g., wound healing) &
diseases (e.g., cancer)

Role of a specific protein?


Drug effect?

What to measure? – total distance travelled, net distance, instantaneous speed?


How to measure? – what should be the time difference between frames?
Why study Biostatistics

Case 3: Protein Unfolding Apply force

Unfolding Forces for individual domains

Fluctuations in protein conformations leads to


alterations in unfolding forces Folded domains

Substrate
Why study Biostatistics

Case 4: Tumor Heterogeneity

Identify the number of sub-populations


Types of Studies

• Surveys & Cross-sectional Studies (reference point in time: now)


Texting Patterns in College Campuses

• Retrospective Studies (reference point: past)


Trying to figure out what caused a suspected outbreak of food-borne illnesses

• Prospective Studies (follow subjects from present to future)


tracking chronic diseases with long latency periods

• Clinical Trials
Sampling

Sample

Population

What is the average body temperature? (how to measure, under what conditions,
how many to measure?
Literary Digest Poll, US 1936. Wrongly predicted Landon will win. Roosevelt won.
Ceased Publication!
Types of Sampling

• Simple random sampling – unbiased

• Convenience sampling (e.g., average height of a class)

• Systematic sampling (when sample list is available)


• sort based on one measure (e.g., age, weight, etc)
• sample 1 in every n objects

• Stratified Random Sampling (when knowledge of sub-groups is present)

• Cluster Sampling

• Bootstrap Sampling
Simple Random Sampling

• Choosing 4 patients from a group of six patients (A, B, C, D, E, F)

• Number of possible ways: 6C 4 = 15 (enumerate)

• Take interval
from [0, 1] and
divide into 15
equal parts
(= 0.0667)

• Look up table

• Choose one at
a time
Descriptive Vs Inferential Statistics

Descriptive Statistics: summarizes important characteristics of a set of


measurements

Inferential Statistics: procedures for making inferences about population


characteristics from a sample drawn from the population
Steps in Inferential Statistics

• Specify question to be asked & identify population

• How to select sample

• Select sample & analyze the information

• Make an inference about the population

• Determine reliability of inference


Descriptive Statistics
Variable

Variable: characteristic which varies with time and/or different individuals

e.g.: body temperature, height, weight, etc

Student Gender Year Major # Courses CGPA


1 F 1st Maths 5 7.4
2 M 2nd Physics 9 8.1
3 M 2nd Biology 10 8.2
4 F 3rd English 18 6.9
5 F 1st Chemistry 5 9.0
Types of Variables

Variable

Qualitative
Quantitative

Discrete Continuous
Converting categorical data into plots:
Pie Charts

%age

Grade %age
A 10
B 30
C 40
D 20

A B C D
Problems?
Converting categorical data into plots:
Bar Charts
%age
45
40
35
30
25
20
15
10
5
0
A B C D

Problems?
Bar Charts: Concept of Breaks
Variation in data is huge

400
Time (sec)

300

200

100

0
A B C
Working with quantitative data

Body Mass Index (BMI)

18.3 21.9 23.0 24.3 25.4

26.6 27.5 28.8 34.2 31.0

19.2 21.0 24.5 25.5 27.8

28.2 31.0 29.1 28.1 24.2

25.6 20.0 20.0 25.0 25.2


Histogram

• Convert data into frequency

• Can be plotted as numbers, or %age

• Unimodal Versus Biomodal Versus Multimodal distribution


Scatter Plots

0
0 2 4 6 8 10
Line Plots

x
Examples: Functions

Bacterial Growth Polynomial


Radioactive Decay Exponential
Sigmoidal
Things to keep in mind

• Label axes

• Units (choice of units, e.g. cell speed)

• Choose appropriate range

• Large variations in data – breaks & insets

• Outliers
Double Y Plots

Y z

x
Example 1

Number of visits to a dental clinic in a typical week

6 7 5 1 8
4 9 3 3 4
7 2 1 4 5
5 5 5 5 7
3 4 4 5 8

Type of Variable?

Type of plot?
Example 2

Test Scores of 20 students

61 49
93 74
87 66
42 45
55 68
67 88
82 59
50 71
29 21
55 50

Test Scores of 10 students in 2 exams


Example 3

RBC counts of a healthy individual measured in 15 successive days (in unit of 106 cells)
per uL:

5.4 5.2 5.0 5.5 5.2


5.3 4.9 5.4 5.2 5.3
5.2 4.9 5.4 5.2 5.2

How to plot?

If today’s score is 5.7, is it unusual?

You might also like