0% found this document useful (0 votes)
30 views

Chapter 5 Descriptive Inferential Statistics

Uploaded by

khanhly1009cg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Chapter 5 Descriptive Inferential Statistics

Uploaded by

khanhly1009cg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Chapter V

DESCRIPTIVE STATISTICS
&
INFERENTIAL
Descriptive vs Inferential statistics
• Descriptive statistics describe a collection
of data. E.g., Mean, median, and mode.
• Inferential statistics draw inferences
about characteristics of a population
following examination of a sample. E.g.,
standard deviation, t-test, anova.
Inferential statistics – PARAMETRIC
• Parametric statistics : where the
population is assumed to fit a
parameterized distribution ( most
typically the normal distribution ).
• Parametric is used when the test variable
is measured on an interval or ratio scale
Inferential statistics –
NON-PARAMETRIC
• Sometimes a fit to a distribution doesn’t work.
In that case a non-parametric alternative is
more likely to detect a difference or lack of
difference.
• Non-Parametric statistics are statistics where
it is not assumed that the population fits any
parameterized distributions.
• Non-Parametric statistics are typically applied
to populations that take on a ranked order.
What is Parametric test ?
• Parametric tests apply to QuanTitative data
( discrete or continuous ).
• Parametric test includes : mean, standard
deviation, one sample t-test, two sample t-test,
Z-test, ANOVA.
• In Parametric tests, the measure of central
tendency is : Mean.
• To test whether the data are normally
distributed, we can use two tests : Kolmogorov–
Smirnov test and the Shapiro–Wilk test.
Hypothesis testing :
PARAMETRIC testing
• An Inferential statistical analysis
compares sample populations to see
whether they are likely to have been
drawn from the same population.
• For parametric testing of two things :
student’s t-test.
• For parametric testing of multiple things
: Analysis of Variance (ANOVA)
What is t-test ?
• Use the t-test when you have a small sample size,
or if you don’t know the population standard
deviation. t-test very similar to a Z-score.
• t-test can be used to compare the mean of a
sample with a given value. We call this type a
“one-sample t-test.”
• t-tests can also compare the means of two
samples. We call “two-samples t-tests” : including
“two independent samples” or “two paired
samples.”
Types of t-test ?
• Independent samples t-tests consider two
distinct groups, such as males vs females.
• Paired samples t-tests relate to the same set
of respondents and thus, occur when
respondents are surveyed multiple times. This
situation occurs in pre-test/post-test studies
designed to measure a variable before and
after a treatment.
What is Non-parametric test
( also called Non-metric tests ) ?
• Non-parametric tests ( also called
distribution-free tests ) don’t assume that
your data follow a normal distribution.
• Non-parametric tests apply to QuaLitative
data : nominal or ordinal data ( also called
Non-metric independent variables ).
• In Non-parametric tests, the Measure of
central tendency is : Median.
Two cases of Standard Deviation

10
Distribution of Sample Means
Examples of Confidence level
Example of Significance level
• Contractor says that it will take 9 months to
construct a house. The house is finished in 9
months and 1 week. The completion time is not 9
months; however it is Not significantly different
from the estimated time.

• Local authorities estimate that there are 20,000


people at a concert. Ticket receipts indicate there
are 42,000 attendees. This number of 42,000 is
significantly different from 20,000.
Significance level - Confidence level -
p-value
• Confidence level is associated with Null
hypothesis H0 ( where we accept Null
hypothesis H0 ).
• Significance level is associated with the areas
outside H0 (where we reject Null hypothesis
H0 ).
• Significance value ( as alpha α ) gives us the
Critical Value for testing.
• p-value is a quantitative measure of significance.
Illustration of confidence & significance
Two-tailed hypothesis test
One-tailed hypothesis test ( right hand)
One-tailed hypothesis test ( left hand)
SELECTING METHODS OF
DATA ANALYSIS
Analyzing QuaLitative data
( Non-metric )
• Nominal :
 Calculating Frequency
 Chi-square (kiểm định tần số)

• Ordinal :
 Calculating Frequency & Median
 Testing Kolmogorov-Simirnov, Wilcoxon
Analyzing QuanTitative data ( metric )

Applying for Interval & Ratio Scale

 Calculating Mean
 Z test, t -test
Depending on Parametric vs.
Non-parametric
• Parametric  test Z, t

• Non-parametric  Chi-square,
Wilcoxon
Depending on number of samples
• Independent samples
e.g., Testing Mean of a population  use
t for Independent sample

• Dependent samples
e.g., Testing difference between 2 Means
 use t for Paired samples
Depending on Number of Variables

• Univariate data analysis


• Bivariate
• Multivariate
Depending on the correlations btw Variables
• Dependence method
– Independent variables & Dependent variables;
– e.g., Linear regression, Multiple regression,
Discriminant analysis, Multivariate analysis of
variance.
• Interdependence method
– When there are no independent nor dependent
variables but they depend on each other;
– e.g., Exploratory factor analysis, Cluster
analysis, Multidimensional scaling.
DATA ENTRY
Data Coding
• Precoding : coding Before interview [ with
Closed questions ]
• Postcoding : coding After interview [ with
Open questions ]
• Codes have two digits :
– 1st digit : for Variable
– 2nd digit : for Answer
• Data and codes are described on Code book
Data Matrix

• Data Coding  Data Matrix  Software


• Columns of matrix express Coding of Variables
• Rows of matrix express elements of samples
• Intersections of columns and rows express the
answers
Clean missing cells
• Mistake from data entry
• Mistake from missing field data
• To determine the reason : sum up the total number
of answers to each variable ( each column )
• If from data entry  supplement from
questionnaire
• If from missing interview 
– Amend interview
– Bypass that element ( reduce sample size )
– Replace by average of some answers or all
answers ( SPSS can do fast )
Clean unreasonable data
• Mainly due to data entry mistake
• Identify by calculating the frequency of
occurrence at column ( variable )
• Resolve by checking the questionnaire
SUMMARIZING DATA
• Statistical Summarization : mean, median, mode,
variance, standard deviation, range.
• Summarizing by Table: Simple & Cross table
• Summarizing by Chart :
– Bar chart :
– Pie
– Line graph
– Scattered

You might also like