0% found this document useful (0 votes)
25 views

Association between Numerical and Categorical Variables

The document discusses methods for comparing numerical distributions across categorical variables, emphasizing the importance of identifying associations between them. It describes the use of back-to-back stem plots and parallel box plots for visual comparison, highlighting key statistics such as mean, median, range, and interquartile range (IQR). Examples from VCAA exams illustrate how these methods can reveal relationships between variables, such as smoking rates by gender and arm span by year level.

Uploaded by

zengyangru
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Association between Numerical and Categorical Variables

The document discusses methods for comparing numerical distributions across categorical variables, emphasizing the importance of identifying associations between them. It describes the use of back-to-back stem plots and parallel box plots for visual comparison, highlighting key statistics such as mean, median, range, and interquartile range (IQR). Examples from VCAA exams illustrate how these methods can reveal relationships between variables, such as smoking rates by gender and arm span by year level.

Uploaded by

zengyangru
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Association between Numerical and Categorical Variables

Reporting on Categorical-Numerical Distributions


To compare numerical distributions, one appropriate summary statistic (centre or spread) for the response
variable has to be compared across the explanatory variable categories. Use statistics you have already
calculated rather than finding other ones. The spread may also be compared and contrasted.

An association (relationship) exists if the numerical response variable varies (changes) in relation with
changes in the categorical explanatory variable and not just randomly.
If there is no association, then the numerical response variable will not change (negligible changes) in
relation with changes in the categorical explanatory variable, or it will change randomly.

Back-to-Back Stem Plot


Stem plots can be put back-to-back to compare two categories. They are useful for comparing small-medium
sized data sets. The categorical variable is the explanatory variable, the numerical variable is the response.

Example VCAA 2009 Exam 1 Question 3


The back-to-back ordered stem plot
below shows the female and male
smoking rates, expressed as a
percentage, in 18 countries.
For these 18 countries, the smoking rates for females are generally lower (mean of 21.5%) and less variable
(range of 13%), than the smoking rates for males (mean of 28.1%, and range of 30%).

Parallel Box Plots


Box plots can be drawn above each other so that they can be compared.
They are useful for comparing medium-large data sets.
The categorical variable is the explanatory variable, the numerical variable is the response variable.

To compare the boxplots, one appropriate summary statistic for the response variable has to be compared
across the explanatory variable categories. This could be the median, the range, or the IQR.
Remember that the range is affected by outliers, and so should only be used if the boxplots contain no
outliers. The IQR is a better measure of spread in these cases.

The percentages within a box plot can be compared in relation to the median or quartiles of other box plots.

Example VCAA 2008 Exam 2 Question 3


The arm spans (in cm) were also recorded for each of
the Years 6, 8 and 10 girls in the larger survey. The
results are summarised in the three parallel box plots
displayed below.

The three parallel box plots suggest that arm span and
year level are associated. This can be seen as:

Use one comparison:


The median arm span increases with year level, 144,
160, and 166 for years 6, 8, and 10 respectively.

The IQR of arm span decreases with year level, approximately 22, 12, and 10 for years 6, 8, and 10
respectively. OR The range of arm span decreases with year level, approximately 55.5, 32, and 27 (excluding
the outlier) for years 6, 8, and 10 respectively.

You might also like