0% found this document useful (0 votes)
150 views

Choosing Appropriate Statistical Tool - PDF

Statistical analysis plays an important role in scientific research by helping researchers carefully plan, analyze, and interpret the results of investigations. Common statistical procedures include descriptive statistics like means and distributions to summarize data, inferential statistics like t-tests and ANOVA to draw conclusions about populations from samples, and relationships between variables using correlation, regression, and contingency analysis. Frequency distributions and other graphical presentations are also effective tools for summarizing statistical data.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
150 views

Choosing Appropriate Statistical Tool - PDF

Statistical analysis plays an important role in scientific research by helping researchers carefully plan, analyze, and interpret the results of investigations. Common statistical procedures include descriptive statistics like means and distributions to summarize data, inferential statistics like t-tests and ANOVA to draw conclusions about populations from samples, and relationships between variables using correlation, regression, and contingency analysis. Frequency distributions and other graphical presentations are also effective tools for summarizing statistical data.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

ROLE OF STATISTICS IN SCIENTIFIC RESEARCH

It is a tool kit for research worker in the planning,


analyzing and interpreting the results of his
investigation using methods or techniques that have
proved useful in many fields of inquiry.
Statistical data analysis

the careful selection and execution of procedures


including the specification of data requirements and
manner of data collection, conversion of raw data
into analysis data sets, generation of statistical
summaries and tests of hypotheses upon
consideration of study questions to be answered and
nature of the data (assumptions) and the
interpretations that can be deduced from the results
of the statistical calculations.
Common Procedures Involved in
Processing/Analysis of Data
• Sorting
• Tabulating
• Graphical presentation
• Comparing groups
• Examining associations between variables
• Performing statistical inference
Using Descriptive Statistics

• Descriptive statistics - numbers to represent salient


characteristics of the data
• Examples of what they measure
• Typical values
• Variation
• Location
• Shape of distribution
• Also referred to as “summary statistics”
Frequency Distributions

• a frequency distribution is a table that shows the


number of times each value or category (or class) of
values of a variable occurs
• very effective summarization tools
Frequency tabulation – qualitative variable
Table 1. Distribution of private physicians who encounter tuberculosis
patients according to their specialization
Category Frequency Percentage

General Internal Medicine


67 25.3
Family Medicine 53 20.0
General Practice 43 16.2
Pulmonology 38 14.3
Infectious Disease 7 2.6
Other Specializations 57 21.5
Total 265 100.0
Frequency tabulation – for comparisons
Table 3. Distribution of newly diagnosed tuberculosis patients according to
time interval between the onset of symptoms and labelling of the patient as a
tuberculosis patient (diagnosis delay), by type of PPMD
Polygon
Percent Distribution of TB Patients According to
Number of Days from Initial Health Consultation to Start
of DOTS, Before and After Strengthening Program
35
30
25
Percent

20
15
10
5
0
0 5 10 15 20 25 30 35 40
Number of Days
Before --+-- After
Inferential Statistics

• Methods used to draw inferences about a


population from a sample
• Answers questions such as:
- What is the value of this parameter in the
population?
- Is the difference really existent in the
populations compared
Major Areas in Inferential Statistics

Regression used to explain or predict the behavior of a


Analysis variable using information on another set of
variables

Analysis of an extension of the t-test of differences


Variance between means which can handle the
(ANOVA) comparison of mean scores between 3 or
more groups
Major Areas in Inferential Statistics

Multivariate allows the researcher to analyze simultaneous


Analysis measurements on many variables; commonly used for the
investigation of relationships, prediction, hypothesis-
testing, and also for descriptive purposes such as data
reduction, structural simplification, sorting and grouping;
examples include multivariate ANOVA, factor analysis,
discriminant analysis, cluster analysis and principal
component analysis
Time Series analyzes and explains changes in the behavior of a variable
Analysis over time
Nonparametric includes procedures that can be applied under relatively
Inferential mild assumptions regarding the underlying population
Statistics
Correlation Analysis

• It is a statistical technique used to determine the strength of the


relationship between two variables, X and Y.
• It provides a measure of strength of the linear relationship between
two variables measured in at least interval scale.

Objective: Determine linear relationship( correlation) between two variables

Parametric Test: Pearson’s correlation coefficient, 𝝆


Correlation Analysis

Illustration

A school official is interested


in the relationship between
admission test scores in math and
Reading Comprehension
Correlation Analysis

Illustration

A social scientist might be


concerned with how a city’s
crime rate is related to it’s
employment rate.
Correlation Analysis

Illustration

A nutritionist must try to relate


the quantity of carbohydrates in
the diet consumed to the
amount of sugar in the blood of
diabetic individuals.
Correlation Analysis

Test of Hypothesis
H0: ρ=0; There is no relationship between X and Y

i. Ha : ρ≠0; There is a relationship between X and Y;


ii. Ha : ρ>0; There is a positive linear relationship between X
and Y;
iii. Ha : ρ<0; There is a negative linear relationship between X
and Y;
Coefficient of Determination(R2)
- proportion of the total variation in Y that is
explained by X, usually expressed in percent.
Example:
The relationship between the number of hours spent studying and the
student’s exam score may be expressed in equation form. This equation
may be used to predict the students exam score knowing the number of
hours the student spent studying.
R2 = 55.03%
Interpretation:
Around 55% of the total variation in examination score is
explained by the number of hours spent studying. The remaining
45% is explained by other variables not in the model, or
by the fact that the relationship is not exactly linear.
Contingency Analysis

The purpose is to investigate the association between two


categorical variables, that is, whether the proportions
falling in the categories of one variable depend on the
category of second variable.

The data are summarized in an r x c contingency table of


counts for combinations of categories of two variables.
Contingency Analysis
r x c Contingency Table
Variable Variable B Total
A B1 B2 … Bc

A1 O11 O12 … O1c R1

A2 O21 O22 … O2c R2

. . . . .
. . . . .
. . . . .

Ar Or1 Or2 … Orc Rr

Total C1 C2 … Cc n
Contingency Analysis

Illustration
It is of interest to determine if a student’s dexterity (left-
handed, right- handed, and ambidextrous) is associated
with the number of accident-related injuries requiring
attention ( none or at least one injury) that they had
sustained for the past two years.
Contingency Analysis

Illustration

A study was done to determine if the


effectiveness of an arthritic pain reliever
is associated with the gender of the user.
Chi-Square Test of Independence

• Objective: determine if two categorical variables are related


or independent
Test of Hypothesis:

Ho: The row variable is not associated with the column variable.

Ha: The row variable is associated with the column variable.


Chi-Square Test of Independence
Example
At the end of a course, a college professor makes the following tally based on grade
received and class attendance. Test the hypothesis that grade received in the course is
associated with the number of days absent. Use 𝛼 = 5%.
Chi-Square Test of Independence

Example
No. of days Grade Received Total
absent
Pass Fail

0-3 24 0 24
4-6 18 2 20
3 13 16
Total 45 15 60

Ho: Grade received in the course is not associated with class attendance.
Ha: Grade received in the course is associated with class attendance.
One-Way ANOVA

• Objective: Compare three or more treatments

Parametric Test: Hypothesis Testing for One-Way ANOVA


Ho: µ1 = µ2 = … = µk
All treatment means are equal.
Ha: µi ≠ µj for at least one i≠j
At least one of the treatment mean is different from the rest.
Test Procedure: F-test using One-Way ANOVA
One-Way ANOVA
One-Way ANOVA
Kruskal Wallis Test

• Objective: Compare three or more treatments


Non Parametric Test: Hypothesis Testing for Kruskal Wallis Test

Ho: Md1 = Md2 = … = Mdk


All medians are equal.
Ha: µi ≠ µj for at least one i≠j
At least one of the median is different from the rest.
Test Procedure: Kruskal Wallis Test
Paired or Related Samples

Each observation in Sample 1 is matched or paired with an


observation in sample 2.

Pairing or use of similar units-measurements are taken on


pairs of object that are very similar to each other

Self-pairing-measurements are taken on the same individual


before and after a “treatment”
t-test on Dependent Samples

• Objective: Compare two treatment means

PARAMETRIC TEST
A. Test of Hypothesis for Dependent Samples (related samples)
Ho: 𝜇1 = 𝜇2
Ha: 𝑖. ) 𝜇1 ≠ 𝜇2
𝑖𝑖. ) 𝜇1 > 𝜇2
𝑖𝑖𝑖. ) 𝜇1 < 𝜇2
Test of Hypothesis on Two Means Using Independent Sample

Individuals are drawn independently from


the two populations to be compared.

Assumptions
1. The samples were taken independently.
2.The populations have equal variances.
3.The populations are normally distributed.
t-test on Independent Samples

B. Test of Hypothesis for Independent Samples


Ho: 𝜇1 = 𝜇2
Ha: 𝑖. ) 𝜇1 ≠ 𝜇2
𝑖𝑖. ) 𝜇1 > 𝜇2
𝑖𝑖𝑖. ) 𝜇1 < 𝜇2

Test Procedure: t-test on independent samples


NON PARAMETRIC TEST
C. Wilcoxon Rank Sum Test or Mann-Whitney U Test (for
Independent Samples)

Ho: 𝑇he two samples came from identical populations


Ha: 𝑇he two samples came from different populations

D. Sign Test (for Dependent Samples)


Regression Analysis

A statistical technique used to study the functional


relationship between variable which
allows predicting the value of one variable, say Y,
given the value of another variable, say X.
Assumption Of The Simple Linear Regression Model

The means of the subpopulations of


Y all lie on a straight line. This is
known as the assumption of linearity,
in symbols, this is expressed as

𝜇𝑌|𝑋𝑖 = 𝛽0 + 𝛽1𝑋𝑖

where 𝜇𝛶/ 𝛸 is the mean of the


subpopulation of Y given Xi.
MULTIPLE COMPARISON

ANOVA The commonly used tests for pairwise mean


F-test: Ho comparison are:
is rejected
• Least significant difference (LSD) test
• Duncan’s multiple range test (DMRT)
• Student-Newman-Keul (SNK) test
• Tukey Test/Tukey’s Honest Significant
Difference (HSD) test
• Scheffe’s (S)
An ANOVA test can tell you if your results are significant overall, but it won’t

tell you exactly where those differences lie. After you have run an ANOVA

and found significant results, then you can use tests for pairwise mean

comparison to find out which specific groups’ means (compared with each

other) are significant.


A normality test is used to determine whether sample data has been
drawn from a normally distributed population (within some tolerance). A
number of statistical tests, such as t-test and one-way or two-way
ANOVA require a normally distributed sample population.

The Shapiro-Wilk test is a test of normality.


EXPERIMENTAL DESIGN

oThe set of rules, plans and course of action taken


in the conduct of an experiment.
oIt involves the assignment of treatments to the
experimental units and a thorough understanding
of the analysis to be performed when the data
become available.
RANDOMIZED COMPLETE BLOCK DESIGN
Practical Situations
experimental units are not homogeneous
Eu’s are marked heterogeneous with
respect to some criteria of classification
Plots differ in fertility, trees differ in age
or height, analysts differ in efficiency
RANDOMIZED COMPLETE BLOCK DESIGN

Assumes as single gradient running across


experimental units
Experimental units are grouped into r blocks
Blocks- groups of experimental units that are
more or less homogeneous
References:

• Elementary Statistics
by Josefina V. Almeda, et. al.
School of Statistics, University of the Philippines Diliman

• Training Module on Basic Statistics


with Exploratory Data Analysis
- UP Statistical Center Research Foundation, Inc.

• Training Module on Statistical Data Analysis with STATA for


Health Researchers
- National Teacher Training Center for the Health Professions, University
of the Philippines Manila

You might also like