Alternate_Simulated_Practice_Exam_ETC1010
Alternate_Simulated_Practice_Exam_ETC1010
Analysis
Q1. [3 marks]
Answer:
Tidy data refers to a standard format where each variable forms a column, each observation forms a row, and
each type of observational unit forms a table. It ensures consistency and makes data easier to manipulate
and visualize in R.
Q2. [3 marks]
You have a data frame where dates are in column names (e.g., Jan_2023, Feb_2023). Is this tidy? Explain.
Answer:
No, it is not tidy. Dates should be in one column and values in another, not spread across column names.
Q3. [1 mark]
Answer:
True. Sharing code and data allows others to verify and replicate the analysis.
Q4. [5 marks]
Answer:
Q5. [3 marks]
What type of graph is most suitable to compare the distribution of test scores across three different schools?
Answer:
A boxplot is suitable as it shows medians, IQR, and potential outliers across multiple categories.
Q6. [3 marks]
Interpret the following: A line graph shows a steep increase in sales in December compared to November.
Answer:
It suggests a sharp rise in sales, possibly due to seasonal factors like holidays or promotions.
Q7. [6 marks]
What are two key visual elements that enhance graph readability? Provide examples.
Answer:
1. Labels and titles - clearly indicate what each axis and plot represents.
2. Appropriate use of color - e.g., different colors for product lines in a sales chart.
Q8. [3 marks]
Answer:
Pie charts become cluttered and hard to interpret with many slices. Bar charts offer better visual comparison.
PART C: Clustering
Q9. [1 mark]
Answer:
Answer:
Q11. [3 marks]
Answer:
A dendrogram is a tree-like diagram showing hierarchical relationships. The height at which branches merge
Explain the difference between complete and single linkage in hierarchical clustering.
Answer:
Complete linkage considers the maximum distance between observations in different clusters, leading to
compact clusters. Single linkage uses the minimum distance, which can lead to chaining effects.
Q13. [6 marks]
Answer:
Q14. [3 marks]
True or False: In clustering, the order of data points affects the final output.
Answer:
True for k-means without a fixed seed, but not for hierarchical clustering.
PART D: Regression
Answer:
True. But it may not improve model performance (Adjusted R is better for comparison).
Q16. [1 mark]
Answer:
fitted()
Q17. [2 marks]
What does a negative coefficient for 'price' imply in a sales prediction model?
Answer:
Q18. [5 marks]
What are two assumptions of linear regression and how can you check them?
Answer:
1. Linearity - check with scatterplot of fitted vs. residuals.
Q19. [4 marks]
Answer:
Q20. [2 marks]
Answer:
```r
return(x * y)
```
Q21. [3 marks]
Answer:
To compare coefficients directly and ensure variables on different scales dont bias the model.