Part 2 Project-Basic Inferential Stat
Part 2 Project-Basic Inferential Stat
Load the ToothGrowth data and perform some basic exploratory data analyses
data("ToothGrowth")
We will perform some basic exploratory data analyses such as plotting the observations and the dimension of
the data.
str(ToothGrowth)
## 'data.frame':
60 obs. of 3 variables:
## $ len : num 4.2 11.5 7.3 5.8 6.4 10 11.2 11.2 5.2 7 ...
## $ supp: Factor w/ 2 levels "OJ","VC": 2 2 2 2 2 2 2 2 2 2 ...
## $ dose: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
ggplot(ToothGrowth, aes(x = supp, y = len)) + geom_boxplot() +
labs(title="Boxplot of Tooth Length by Supplement Type",x="supplement type", y="tooth length")
tooth length
30
20
10
OJ
VC
supplement type
We see that the data has 60 observations with 3 variables, namely the len or length of the tooth, the supp or
supplement type and dose or the dose in mg/day. Now looking on the scatterplot, the median of tooth length
in vitamin C is lower compared to orange juice but its more variable.
We observed also that dose is a numeric class and since we want to compare the length per supp and dose
also, we need to convert the dose class to factor.
ToothGrowth$dose<-as.factor(ToothGrowth$dose)
ggplot(ToothGrowth, aes(x = dose, y = len)) + geom_boxplot() +
labs(title="Boxplot of Tooth Length by Dose Type",x="dose type", y="tooth length")
tooth length
30
20
10
0.5
dose type
We see in the boxplot that their is difference between dose, as dose increases the tooth length also increases.
len
Min.
: 4.20
1st Qu.:13.07
Median :19.25
Mean
:18.81
3rd Qu.:25.27
Max.
:33.90
supp
OJ:30
VC:30
dose
0.5:20
1 :20
2 :20
In general, the length of the tooth has the mean of 18.81, with range of 4.20 to 33.90. Our supp variable is a
two level factor while dose is a 3 level factor. We will use again the summary function to summarize the
length per group (similar to the boxplot results above)
tapply(ToothGrowth$len, ToothGrowth$supp, summary)
## $OJ
##
Min. 1st Qu.
Median
Max.
3
##
8.20
15.52
##
## $VC
##
Min. 1st Qu.
##
4.20
11.20
22.70
Median
16.50
20.66
25.72
30.90
Max.
33.90
Median
9.850
Max.
21.500
Median
19.25
Max.
27.30
Median
25.95
Max.
33.90
##
##
##
##
##
##
Now lets conduct t.test to different dose pair, (0.5,1.0),(0.5,2.0) and (1.0, 2.0).
## first pair (0.5,1.0)
t.test(subset(ToothGrowth, dose==0.5)$len,subset(ToothGrowth, dose==1.0)$len,
alternative="two.sided", var.equal=FALSE)
##
##
##
##
##
##
##
##
##
##
##
## second pair
t.test(subset(ToothGrowth, dose==0.5)$len,subset(ToothGrowth, dose==2.0)$len,
alternative="two.sided", var.equal=FALSE)
##
##
##
##
##
##
##
##
##
##
##
## third pair
t.test(subset(ToothGrowth, dose==1.0)$len,subset(ToothGrowth, dose==2.0)$len,
alternative="two.sided", var.equal=FALSE)
##
##
##
##
##
##
##
## -8.996481 -3.733519
## sample estimates:
## mean of x mean of y
##
19.735
26.100
For dose, we will use Bonferroni Correction since we have more than 1 test. We will reject the null hypothesis
that there is no difference in mean between dose if p-value is less than alpha/m test or (we use the conventional
level of significance, alpha = 0.05) or 0.0166667.