Using ANOVA in Project Report
Using ANOVA in Project Report
Introduction
The sports industry in the United States is a booming business that produced over $213 billion last year and is projected to continue its current
trend [1]. To succeed in such business, companies should have a solid understanding of their target market. Unlike most sports market analysis
reports that are based on private or long-term field surveys, we analyzed the sports market using a nationwide public survey "American Time
Use Survey" from American Bureau of Labor Statistics [2]. Using such public surveys enable us to represent a complete population of the United
States, yet save us significant cost and time to do field analysis. Our results show that even analyzing a public survey, we were able to gain many
useful insights and verify interesting hypothesizes in the sports market.
Data Description
The American Time Use Survey (ATUS) is an annual survey that “measures how people divide their time among life’s activities". The ATUS data
sample is a subset population of Current Population Survey (CPS). [3] Subjects are interviewed one time about "how they spent their time on the
previous day, where they were, and whom they were with". The survey resulted in six data tables containing different information, among
which we used the activity summary data file for the year of 2007 (atussum_2007.dat). It contains respondents' demographic information as
well as their time use on various activities. Here are some examples for the data fields with explanations
Demographic (examples): Time spent on different activities (we focus on sport activities)
TEAGE: age t130103: time spent on playing basketball
TESEX: gender t130104: time spent on biking
… ...
Hypothesizes:
To get a better understanding of people’s sporting behavior, we will test the following hypothesizes:
1. Whether factors such as gender, age, employment status and income level have impact on people’s time use on sports, exercises and
recreations?
2. How do people manage to squeeze their time for sports, exercise and recreations?
3. Whether certain kinds of sports are more popular among people with higher income?
4. As people grow older, will they keep doing exercises, and whether this differs from sport to sport?
1.1 χ2 test of whether employed population is more likely to attend sports regularly (Figure 3)
Results: p-value = 0.2802
We want to test if people’s decision in regular exercise depends on their employ status. Thus we applied χ2 tests. Here independent variables
are employed or not and whether the person exercises regularly.
The bar plot (Figure 3) and chi square test shows that the two variables are independent with each other. Thus we cannot reject the null
hypothesis that employment status does not have a significant effect on whether one would like to spend time on sports. The result means that
even unemployed, people tend not to stop participating or attending sports.
1.2. χ2 test whether males are more likely to attend sports regularly (Figure 4)
Results: p-value < 2.2e-16
Same as the first test, χ2 test also applies here for the same reason.
The p-value is small enough to reject the null hypothesis. So the two variable are not independent of each other, instead, gender does has a
great effect on whether one spends some time on sports. As shown in figure 4, in female group, the proportion of people who do not sports at
all is much larger than the male group.
1.3. Pairwise t test of time use on sports for different income levels (Figure 5)
0 1 2 3
1 0.236 - - -
2 0.989 0.989 - -
3 0.679 0.037 0.310 -
4 0.740 0.047 0.366 0.989
(Note: 0: 0-500, 1:500-1000, 2:1000-1500, 3: 1500-2000, 4 :>2000)
We tested if the average time uses on sports are the same for different income levels.
All p-values are not statistically significant. Therefore we cannot reject the null hypothesis that people of different income levels spend same
amount of time on sports activities. Along with Figure 5, we can tell that people with different income levels do not have difference on time use
on sports.
2. One-way ANOVA and t-test of activities time consumption between people who exercise regularly and those who do not (Figure 6)
3. Two-way ANOVA test of how income and sport events affect sports' popularity (Figure 9)
To exam the individual and interaction effect, we used two-way ANOVA test. Here independent variables are income and sports, and
dependent variable is the number of people exercise on specific sports.
4. Two-way ANOVA test of how age and sport events affect sports' popularity (Figure 8)
Figure 3 employment status vs. whether sports time bar plot Figure 4 Employment status vs. whether sports bar plot
Figure 5 Total time use on sports activities vs. income level Figure 6 Time use of people who exercise daily vs. not
Figure 7 Plot of time use on sports vs. age with confidence intervals