0% found this document useful (0 votes)
48 views

W07 Statistical Analysis For Categorical Data-4

1. The document discusses categorical data and chi-square tests. Chi-square tests are used to analyze categorical data where individuals are classified into categories rather than having numerical values. 2. A one-way chi-square test, also called a goodness-of-fit test, compares observed category frequencies to expected frequencies under the null hypothesis. The null hypothesis generally tests for equal proportions or no difference from a known distribution. 3. The chi-square statistic measures how well the observed frequencies fit the expected frequencies based on the null hypothesis. A larger chi-square value provides evidence to reject the null hypothesis in favor of the alternative.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

W07 Statistical Analysis For Categorical Data-4

1. The document discusses categorical data and chi-square tests. Chi-square tests are used to analyze categorical data where individuals are classified into categories rather than having numerical values. 2. A one-way chi-square test, also called a goodness-of-fit test, compares observed category frequencies to expected frequencies under the null hypothesis. The null hypothesis generally tests for equal proportions or no difference from a known distribution. 3. The chi-square statistic measures how well the observed frequencies fit the expected frequencies based on the null hypothesis. A larger chi-square value provides evidence to reject the null hypothesis in favor of the alternative.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Analysis on categorical data

Kategonna / ordinal
g-nominal
Categorical outcomes
• Parameters such as the mean and the standard
deviation are the most common way to describe a
population, but there are situations in we have
questions about the proportions or relative frequencies
for a distribution

• Examples:
• How does the number of female students compare with
the number of male students in IE?
• Among several local brands of fried chicken, which is the
most preferred by most students?

• In particular, we are not measuring a numerical score for


each individual.
• Instead, the individuals are simply classified into
categories, and we want to know what proportion of the
population is in each category
One-way chi-square
• The one-way chi square is used when data consist of the
frequencies with which participants belong to the different categories
of one variable.
misal
observed
:
gender
co
-
ce

test apahahfrehuensi sama

10 20
30

• One-way chi square procedure is also called a goodness-of-fit test.


Hypotheses of the one-way chi-square
• Generally, H0 falls into one of the following categories:
1. No Preference, Equal Proportions
-
H0 : no difference among the frequencies in the categories in the population
• H0 : All pi are the same
Ha : not all frequencies in the population are equal
• HA : At least one pi differs from the others
Ho :P co = Pee

Ha :
Pco f- Pce
Hypotheses of the one-way chi-square
• Generally, H0 falls into one of the following categories:
2. No Difference from a Known Population
H0: the proportions for one population are not different from the proportions that are
known to exist for another population

Ha: the population proportions are not equal to the values specified by the null hypothesis

Histgricaldata

µ young : Poldi
Ho :
proporsi =
prop Orsi data historical

Ha proporsi ≠ eroporsi data historical


a old -1-72 %
:

Ha :
f young ≠ an %
Assumptions of the one-way chi-square
1. Participants are categorized along one variable having two or more
categories, and we count the frequency in each category.
2. Each participant can be in only one category we )
mis al cowo dance
gori (
-

3. Category membership is independent


.

hate
ada di antara
tdhmunghm
2

4. We include the responses of all participants in the study


Observed and expected frequency
• Construct a hypothetical sample that represents how the sample
distribution would look if it were in perfect agreement with the
proportions stated in the null hypothesis.
• The frequency values predicted from the null hypothesis are called expected
frequencies (!! ).
Ho : Pi P2

expected
=

fei : 71 ✗ n

fez = Be ✗ n

fell = Ph ✗ n

n=60

feg =
20 Feb =
20
f-eh = 20
Observed and expected frequency
• Select a sample of n individuals and count how many are in each
category.
• The resulting values are called observed frequencies (!"#$ ).
21
%
6 K B
%
= =
=

,
60

observed

10 20
30
Ho = Ph = Pb =
Pg

The Chi-Square Statistic


Ha = + Idah terdistribusi Sama

K B G

to 20 10
30

f- e 20 20 20

'

E% 1302%2-+12%-012 l"
✗ =
= +

• In the chi-square test for goodness of fit, the sample is ¥0


'
= t =
10
to
,

expressed as a set of observed frequencies (!!"# values), ✗


'

> × tab el -
i
reject Ho

and the null hypothesis is used to generate a set of


expected frequencies (!$ values).
• The chi-square statistic simply measures how well the
data (!!"# values) fit the hypothesis (!$ values)

% (!"#$ − !! )%
""#$ =$
!!
&
• We reject "% when #!"# > #&
• In a one-way chi square, df = k – 1.
I
n = 000
Ho : P worry = Pho worry = ◦ ' J
✗ 2=200
Ha :P worry P worry
÷
> no

f-ow 600 few

Example
=
=
-900 72=20
-

200 fenw =
400
fonw =

n = on

2=1600%-0012 120044001£ 1001-100 ✗ 2=2



=

=
200
+

Recently, there was a statement about


instant noodles price that could possibly
increase triple due to Russia-Ukraine
war which prevent these countries to Maan
export their wheat production. Among
800 Indonesian netizens, 600 of them man
are worried about this issue. Can you
conclude that the majority of MIXUE
Indonesian netizens are worried about
instant noodles‘ price increase?

vrganthkbgt
worry

ssssmangct !
Chi-square test for independence
• The chi-square statistic may also be used to test whether there is a relationship between two
independent variables (IVs).
• For example, a group of students could be classified in terms of personality (introvert, extrovert) and in
terms of color preference (red, yellow, green, or blue).
1. V 2

Color preference
Red Yellow Green Blue
Independent var 1 K1

Personality Introvert !!"# !!"# !!"# !!"#


type
Extrovert !!"# !!"# !!"# !!"#
K2

• Is there any significant relationship between personality and color preference in the population of
students?

• the test is called the chi-square test for independence or the two-way chi-square test
The Hypotheses
• The null hypothesis for the chi-square test for independence states that the two
measured variables are independent.
• Two variables are independent when there is no consistent, predictable relationship between
them.
• H0 falls into one of the following categories:
1. The data are viewed as a single sample, with each individual measured on two variables.
• The goal of this chi-square test is to evaluate the relationship between two variables:
• For the example we are considering, the goal is to determine whether there is a consistent, predictable
relationship between the type of music and whether a woman gives her phone number

DV

IV
The Hypotheses
BO

• The null hypothesis for the chi-square test for independence states that the two
measured variables are independent.
• Two variables are independent when there is no consistent, predictable relationship between
them.
• H0 falls into one of the following categories:
2. the data are viewed as two (or more) separate samples representing two (or more)
populations or treatment conditions
• The goal of this chi-square test is to determine whether there are significant differences between the
populations
• For the example we are considering whether the proportion of women giving phone numbers with romantic
music is significantly different from the proportion with neutral music.
Observed and Expected Frequencies
• The frequencies in the sample distribution are observed frequencies
(!$%& ).
• The expected frequencies (!' ) define an ideal hypothetical
distribution that is in perfect agreement with the null hypothesis.

to jumlah sample
h bars


2
"

✗ (27-21)

¥7" ¥7
=

135¥
+ "

hCn
+ f- e-
jumlah yes / no
4zo
-

+
I
=

=
51g
2
✗ tabet =
3,84
The Chi-Square Statistic
• In the chi-square test for goodness of fit, the sample is
expressed as a set of observed frequencies (!!"# values),
and the null hypothesis is used to generate a set of
expected frequencies (!$ values).
• The chi-square statistic simply measures how well the
data (!!"# values) fit the hypothesis (!$ values)

% (!"#$ − !! )%
""#$ =$
!!
&
• We reject "% when #!"# > #&
• In a one-way chi square, df = (row -1)(column-1).
Ho hategori 1 Tldah ada pcngaruh antara dan penniman Mobil

Example
: usia
2
✗ =
4192 2

2
failed to reject
✗ < tabet _
,
2
✗ tabet 5199 Mobil
=

Tldahada hub .
antara usia
dgn pemllihan

A researcher would like to know which factors are most Ho hategorr 2: Tidahada perbedaan antara muda dan tua dlm
mitch Mobil

important to people buying a new car. Each individual in tua dlm mitch Mobil
antara muda dan
heslmp Tidahada
:
perbedaan

a sample of 200 customers is asked to identify the most


important factor in the decision process: Performance,
Reliability, or Style. The researcher would like to know
whether there is a difference between the factors
identified by younger adults (age 35 or younger)
compared to those identified by older adults (age
greater than 35). The data are as follows:

f-e 16 40 24
24 60 36
Perform a two-way chi-square test to compare the
factors influencing decision-making process between
younger and older adults in buying a new car.

You might also like