MANOVA
MANOVA
Visualize dataset,
library(gridExtra)
p1 <- ggplot(df, aes(x = plant_var, y = height, fill = plant_var)) +
geom_boxplot(outlier.shape = NA) + geom_jitter(width = 0.2) +
theme(legend.position="top")
p2 <- ggplot(df, aes(x = plant_var, y = canopy_vol, fill = plant_var)) +
geom_boxplot(outlier.shape = NA) + geom_jitter(width = 0.2) +
theme(legend.position="top")
grid.arrange(p1, p2, ncol=2)
perform one-way MANOVAPermalink
dep_vars <- cbind(df$height, df$canopy_vol)
fit <- manova(dep_vars ~ plant_var, data = df)
summary(fit)
# output
Df Pillai approx F num Df den Df Pr(>F)
plant_var 3 1.0365 12.909 6 72 7.575e-10 ***
Residuals 36
post-hoc testPermalink
Group means:
dep_vars1 dep_vars2
A 18.90 0.784
B 16.54 0.608
C 3.05 0.272
D 9.35 0.474
Proportion of trace:
LD1 LD2
0.9855 0.0145
# plot
plot_lda <- data.frame(df[, "plant_var"], lda = predict(post_hoc)$x)
ggplot(plot_lda) + geom_point(aes(x = lda.LD1, y = lda.LD2, colour = plant_var), size
= 4)
The LDA scatter plot discriminates against multiple plant
varieties based on the two dependent variables. The C and
D plant variety has a significant difference (well separated)
as compared to A and B. A and B plant varieties are more
similar to each other. Overall, LDA discriminated between
multiple plant varieties.
Test MANOVA assumptionsPermalink
Multivariate outliersPermalink
Linearity assumptionPermalink
Linearity assumption can be checked by visualizing the
pairwise scatterplot for the dependent variable for each
group. The data points should lie on the straight line to
meet the linearity assumption. The violation of the linearity
assumption reduces the statistical power.
library(gridExtra)
p1 <- df %>% group_by(plant_var) %>% filter(plant_var == "A") %>% ggplot(aes(x =
height, y = canopy_vol)) + geom_point() + ggtitle("Variety: A")
p2 <- df %>% group_by(plant_var) %>% filter(plant_var == "B") %>% ggplot(aes(x =
height, y = canopy_vol)) + geom_point() + ggtitle("Variety: B")
p3 <- df %>% group_by(plant_var) %>% filter(plant_var == "C") %>% ggplot(aes(x =
height, y = canopy_vol)) + geom_point() + ggtitle("Variety: C")
p4 <- df %>% group_by(plant_var) %>% filter(plant_var == "D") %>% ggplot(aes(x =
height, y = canopy_vol)) + geom_point() + ggtitle("Variety: D")
grid.arrange(p1, p2, p3, p4, ncol=2)
The scatterplot indicates that dependent variables have a
linear relationship for each group in the independent
variable
Multicollinearity assumptionPermalink
Multicollinearity can be checked by correlation between
the dependent variable. If you have more than two
dependent variable you can use correlation matrix
or variance inflation factor to assess the multicollinearity.
References