0% found this document useful (0 votes)
23 views47 pages

Abs&rm 01

Uploaded by

kiran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views47 pages

Abs&rm 01

Uploaded by

kiran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

RESEARCH STATISTICS &

METHODOLOGY

Muhammad Umar
STATISTICS
Statistics play a vital role in researches.
Statistics can used in
 data collection,
 analysis,
 interpretation,
 explanation and
 presentation.

Use of statistics will guide researchers in


research for proper characterization,
summarization, presentation and interpretation of
the result of research
BIOSTATISTICS
Is the application of statistics to a wide range of topics
in biology.
It encompasses the design of biological experiments,
especially in medicine, pharmacy, agriculture and
fishery; the collection, summarization, and analysis of
data from those experiments; and the interpretation of,
and inference from, the results.
A major branch is medical biostatistics, which is
exclusively concerned with medicine and health.
RESEARCH METHOD
Is a systematic plan for conducting research.
Researcher draw on a variety of both qualitative and
quantitative research methods, including experiments,
survey research, participant observation, and secondary
data.
Quantitative methods aim to classify features, count them,
and create statistical models to test hypotheses and explain
observations.
Qualitative methods aim for a complete, detailed
description of observations, including the context of
events and circumstances.
Scientific Inquiry
 is investigating something in a logical way to find evidence that
supports an answer.
 Typically a sequential type of inquiry called the scientific method.
 It begins with a question for which you guess an answer called
a hypothesis.
 Then you do an experiment from which you collect data.
 Students often do controlled experiments where every variable that
might affect the outcome is controlled, one being purposefully
changed and the rest kept the same.
 Those data are then analyzed .
 Once that is done you can make a conclusion where you either
accept your hypothesis because your evidence supports it or you
reject your hypothesis because your evidence refutes it.
VARIABLES
Variables: the events, characteristics, behaviors, or

conditions that researchers measure and study.


A variable is either a result of some factor (dependent)

or is itself the factor (independent) that causes a


change in another Example variable:
treatment, range of motion, pain intensity etc.
Independent Variable
Independent variables are aspect (factor) of a study

which a practitioner can control or choose


They are called independent variables because they do

not depend on other variables for change


These variables are cause of outcome of a research

TREATMENT IS AN INDEPENDENT VARIABLE

IN REHABILITATION SCIENCES
Dependent Variables
These are the factors of a study that will change as a result
of change in an independent variable
These are the factors which a clinician measure or observe
RANGE OF MOTION AND PAIN RATING ARE
COMMON DEPENDENT VARIABLES IN PHYSICAL
THERAPY.
e.g. Treatment of wrist drop will prevent overstretching of
wrist extensor, maintain ROM and maintain muscle strength
etc
A dependent variable must be defined operationally.
Controlled or Constant Variables

These are the factors or conditions of a study that are KEPT

UNCHANGED in an experiment.

There can be more than one controlled variables in an

experiment

Age will be a constant factor if the participants belong to same

age group e.g. participants are 40 years old


Confounding Variables
An extraneous variable in a

research that correlates (directly or inversely)

with independent variable and distorts the results

Age can be among confounding factors in

rehabilitation sciences while studying osteoarthritis

Limited ROMs among diabetic patients


Types of Variables with Respect to
MEASUREMENT

• Variables can be classified with respect to measurement

into

– Categorical Variable

– Numerical Variable
Types of Variables with Respect to Measurement

Categorical Variable
• A categorical variable is one for which the
observations recorded result in a set of categories
• There is a distinct demarcation between the
categories
• For example:
– Gender (male and female)
– Recovery from treatment (not recovered, partially
recovered and completely recovered)
• Categorical variables are often referred to as
qualitative variables
Types of Variables with Respect to Measurement

Numerical Variable
• A numerical variable is one for which the
observations are recorded in numerical values
such as, age, height, etc.
• Numerical variable has further two types i.e.,
Discrete and Continuous
• Numerical variable is often referred to as a
quantitative variable
Types of Variables with Respect to Measurement

Numerical Variable

• Discrete Variable
– A variable that is capable of taking a set of discrete
numerical values such as 10, 15, 1, 199, etc., but not
every possible value between two given numbers
– For example, The number of heart beats in a fixed
time period, number of successful operations in a
hospital; number of cases reported at a casualty
Types of Variables with Respect to Measurement
Numerical Variable
Continuous Variable

• A variable, which is capable of taking every

possible value between two given number is


termed as a continuous variable.
• Age, weight, length, etc. are a few examples of

continuous variables
data take many different forms: categorical variable and numerical variable
Exercise
Variable How it will be measures Variables Type Variable Subtype
Gender Male & Female
Marital Status Divorced, Widowed,
Single, Married
Blood Pressure In mmHg
Hypertension Mild, Moderate, Severe
Age In Years
Age Class <40, 40 -60, >60
Weight In Kg
Height In Inches
Number of Children Frequency
Number of school Frequency
days
Depression Mild, Moderate & Severe
Anxiety Yes & No
Quality of Life Bad, average, good
Exercise
Variable How it will be measures Variables Type Variable Subtype

Gender Male & Female Categorical Data Dichotomous/binary


Marital Status Divorced, Widowed, Single, Married Categorical Data Nominal
Blood Pressure In mmHg Numerical Data Continuous
Hypertension Mild, Moderate, Severe Categorical Data Ordinal
Age In Years Numerical Data Continuous
Age Class <40, 40 -60, >60 Categorical Data Ordinal
Weight In Kg Numerical Data Continuous
Height In Inches Numerical Data Continuous
Number of Children Frequency Numerical Discrete
Number of school days Frequency Numerical Discrete
Depression Mild, Moderate & Severe Categorical Data Ordinal
Anxiety Yes & No Categorical Data Dichotomous/binary
Quality of Life Bad, average, good Categorical Data Ordinal
Measures of Central Tendency
Introduction
 A measure of central tendency is a single value that attempts to

describe a set of data by identifying the central position within that


set of data.
Also called measures of central location.

also classed as summary statistics.

 The mean (often called the average) is most likely the measure of

central tendency than others, such as the median and the mode.
 The mean, median and mode are all valid measures of central

tendency, but under different conditions, some measures of central


tendency become more appropriate to use than others.
Mean (Arithmetic)
The mean (or average) is the most popular and well
known measure of central tendency.
It can be used with both discrete and continuous data,
although its use is most often with continuous data.
The mean is equal to the sum of all the values in the
data set divided by the number of values in the data set.
So, if we have n values in a data set and they have
values x1, x2, ..., xn, the sample mean, usually denoted
by x bar), is:
This formula is usually written in a slightly different
manner using the Greek capital letter, , pronounced
"sigma", which means "sum of...":
When not to use the mean
The mean has one main disadvantage: it is particularly
Susceptible to the influence of outliers.
Outliers are values that are unusual compared to the rest
of the data set by being especially small or large in
numerical value. For example, consider the ages of neck
pain in female patients below:
Staff 1 2 3 4 5 6 7 8 9 10
Age 15 18 16 14 15 15 12 17 60 65
The mean for ten NP Patients is 23.2 years
However, inspecting the raw data suggests that this mean
value might not be the best way to accurately reflect age
of NP Patients, as most patient’s have age in the 15-17
range.
The mean is being skewed by the two large ages.
The median would be a better measure of central tendency
in this situation.
If we consider the normal distribution, the mean,
median and mode are identical.
As the data becomes skewed the mean loses its ability to
provide the best central location so the median best retains
this position and is not as strongly influenced by the
skewed values.
Median
The median is the middle score for a set of data that has
been arranged in order of magnitude.
The median is less affected by outliers and skewed data.
Need to rearrange that data into order of magnitude
(smallest first):

Examples 5 55 89 56 35 14 56 55 87 45 92
For Odd 14 35 45 55 55 56 56 65 87 89 92
For Even 65 55 89 56 35 14 56 55 87 45
the middle two scores and average the result.
Mode
The mode is the most frequent score in our data set.
It represents the highest bar in a bar chart or histogram.
Normally, the mode is used for categorical data where
we wish to know which is the most common category.
one of the problems with the mode is that it is not
unique.
Another problem with the mode is that it will not
provide a very good measure of central tendency when
the most common mark is far away from the rest of the
data in the data set.
Skewed Distributions and the Mean and
Median
 For normally distributed sample use both the mean or the
median as measure of central tendency.
 In fact, in any symmetrical distribution the mean, median
and mode are equal, however, the mean is widely preferred
as the best measure of central tendency
because it is the measure that includes
all the values in the data set for its
calculation, and any change in any of
the scores will affect the value of the
mean.
This is not the case with the median
or mode.
when data is skewed, the median is generally
considered to be the best representative of the central
location of the data.
The more skewed the distribution, the greater the
difference between the median and mean.
If dealing with a normal distribution, and tests of
normality show that the data is non-normal, it is
routine to use the median instead of the mean.
However, this is more a rule of thumb than a strict
guideline.
Sometimes, researchers wish to report the mean of a
skewed distribution
if the median and mean are not appreciably different (a
subjective assessment), and
if it allows easier comparisons to previous research to be
made.
Summary of when to use the mean,
median and mode
Type of Variable Best measure of central tendency

Nominal Mode

Ordinal Median

Interval/Ratio (not skewed) Mean

Interval/Ratio (skewed) Median


Exercise
1. Best measure of central tendency?
2. In a strongly skewed distribution, what is the best indicator of
central tendency?
3. When is the mean the best measure of central tendency?
4. Does all type of data have a median, mode and mean?
5. When is the mode the best measure of central tendency?
6. When is the median the best measure of central tendency?
7. What is the most appropriate measure of central tendency when the
data has outliers?
8. In a normally distributed data set, which is greatest: mode, median
or mean?
9. For any data set, which measures of central tendency have only one
value?
MEASURES OF DISPERSION
Measures that describe the spread or variation in the
observations
Common measures of dispersion
Range
Standard Deviation
Coefficient of Variation
Percentiles
Inter-quartile Range
Range
difference between the largest and the smallest
observation
Used with numerical data to emphasize extreme values
Serum cholesterol example
Minimum = 3.8, Maximum = 7.1
Range = 7.1 – 3.8 = 3.3
Standard Deviation
 Measure of the spread of the observations about the mean

 Used as a measure of dispersion when the mean is used to measure

central tendency for symmetric numerical data


 Standard deviation like the mean requires numerical data.

 A low standard deviation indicates that the values tend to be close to

the mean
 Essential part of many statistical tests

 The symbol of the standard deviation of a random variable

is "σ“, the symbol for a sample is "s".


Coefficient of Variation
The coefficient of variation (CV) is defined as the ratio
of the standard deviation to the mean
Measure of the relative spread in data
Used to compare variability between two numerical
data measured on different scales
Coefficient of Variation (C of V) = (s / mean) x 100%
Percentile
 A percentile (or a centile) is a measure used
in statistics indicating the value below which a
given percentage of observations in a group of observations
fall.
 For example, the 20th percentile is the value (or score) below
which 20% of the observations may be found.
 The term percentile and the related term percentile rank are
often used in the reporting of scores from norms.
 For example, if a score is at the 86th percentile, where 86 is the
percentile rank, it is equal to the value below which 86% of the
observations may be found (carefully contrast with in the 86th
percentile, which means the score is at or below the value of
which 86% of the observations may be found - every score
is in the 100th percentile).
Interquartile Range (IQR)
Quartiles split the ranked data into 4 segments with an
equal number of values per segment.
The Q1, is the value for which 25% of the observations
are smaller and 75% are larger
Q2 is the same as the median (50% of the observations
are smaller and 50% are larger)
Q3 Only 25% of the observations are greater than the
third quartile
EXERCISE
Which is closest estimate to percentile Rank for
drivers with driving time of 6 hours.
35th Percentile
50th Percentile
70th Percentile
75th Percentile
80th Percentile
EXERCISE
1. Problem I
Following Data shows peanuts with each students

4 4 10 11 15 7 14 12 6
2. Problem II
Songs in each album
14, 12, 12, 11, 9, 7, 9, 10, 10, 10
Find IQR
Thank You

You might also like