Association between Numerical and Categorical Variables
Association between Numerical and Categorical Variables
An association (relationship) exists if the numerical response variable varies (changes) in relation with
changes in the categorical explanatory variable and not just randomly.
If there is no association, then the numerical response variable will not change (negligible changes) in
relation with changes in the categorical explanatory variable, or it will change randomly.
To compare the boxplots, one appropriate summary statistic for the response variable has to be compared
across the explanatory variable categories. This could be the median, the range, or the IQR.
Remember that the range is affected by outliers, and so should only be used if the boxplots contain no
outliers. The IQR is a better measure of spread in these cases.
The percentages within a box plot can be compared in relation to the median or quartiles of other box plots.
The three parallel box plots suggest that arm span and
year level are associated. This can be seen as:
The IQR of arm span decreases with year level, approximately 22, 12, and 10 for years 6, 8, and 10
respectively. OR The range of arm span decreases with year level, approximately 55.5, 32, and 27 (excluding
the outlier) for years 6, 8, and 10 respectively.