0% found this document useful (0 votes)
7 views4 pages

Alternate_Simulated_Practice_Exam_ETC1010

The document is an alternate simulated practice exam for an introductory data analysis course, covering exploratory data analysis (EDA) concepts, data visualization, clustering, and regression. It includes questions on tidy data, reproducibility, graph types, clustering methods, and regression assumptions. Each section contains multiple-choice, true/false, and open-ended questions designed to assess understanding of key data analysis principles.

Uploaded by

Tani C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views4 pages

Alternate_Simulated_Practice_Exam_ETC1010

The document is an alternate simulated practice exam for an introductory data analysis course, covering exploratory data analysis (EDA) concepts, data visualization, clustering, and regression. It includes questions on tidy data, reproducibility, graph types, clustering methods, and regression assumptions. Each section contains multiple-choice, true/false, and open-ended questions designed to assess understanding of key data analysis principles.

Uploaded by

Tani C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Alternate Simulated Practice Exam - ETC1010/5510: Introduction to Data

Analysis

PART A: EDA Concepts

Q1. [3 marks]

Explain the term 'tidy data'. Why is it important in data analysis?

Answer:

Tidy data refers to a standard format where each variable forms a column, each observation forms a row, and

each type of observational unit forms a table. It ensures consistency and makes data easier to manipulate

and visualize in R.

Q2. [3 marks]

You have a data frame where dates are in column names (e.g., Jan_2023, Feb_2023). Is this tidy? Explain.

Answer:

No, it is not tidy. Dates should be in one column and values in another, not spread across column names.

Each variable (date) should be in its own column.

Q3. [1 mark]

True or False: Reproducibility includes sharing your code and data.

Answer:

True. Sharing code and data allows others to verify and replicate the analysis.

Q4. [5 marks]

Describe three ways to make your R analysis more reproducible.

Answer:

1. Use R Markdown to combine code, output, and text.

2. Set a seed for random operations with `set.seed()`.

3. Comment your code and use version control (e.g., Git).

PART B: Data Visualisation

Q5. [3 marks]

What type of graph is most suitable to compare the distribution of test scores across three different schools?

Answer:

A boxplot is suitable as it shows medians, IQR, and potential outliers across multiple categories.
Q6. [3 marks]

Interpret the following: A line graph shows a steep increase in sales in December compared to November.

Answer:

It suggests a sharp rise in sales, possibly due to seasonal factors like holidays or promotions.

Q7. [6 marks]

What are two key visual elements that enhance graph readability? Provide examples.

Answer:

1. Labels and titles - clearly indicate what each axis and plot represents.

2. Appropriate use of color - e.g., different colors for product lines in a sales chart.

Q8. [3 marks]

Why should pie charts be avoided for comparing many categories?

Answer:

Pie charts become cluttered and hard to interpret with many slices. Bar charts offer better visual comparison.

PART C: Clustering

Q9. [1 mark]

True or False: k-means clustering is deterministic.

Answer:

False. It is not deterministic unless the random seed is fixed.

Q10. [1 mark - MCQ]

Which statement is TRUE?

a) Clusters must be spherical

b) Clustering is always accurate

c) Distance metric affects clustering

d) Number of clusters is not important

Answer:

c) Distance metric affects clustering

Q11. [3 marks]

What is a dendrogram? How can it help in choosing the number of clusters?

Answer:

A dendrogram is a tree-like diagram showing hierarchical relationships. The height at which branches merge

helps determine an appropriate number of clusters.


Q12. [4 marks]

Explain the difference between complete and single linkage in hierarchical clustering.

Answer:

Complete linkage considers the maximum distance between observations in different clusters, leading to

compact clusters. Single linkage uses the minimum distance, which can lead to chaining effects.

Q13. [6 marks]

What are two limitations of hierarchical clustering?

Answer:

1. Computationally expensive for large datasets.

2. Sensitive to outliers and noise.

Q14. [3 marks]

True or False: In clustering, the order of data points affects the final output.

Answer:

True for k-means without a fixed seed, but not for hierarchical clustering.

PART D: Regression

Q15. [1 mark - True/False]

Adding a variable to a regression model always increases R-squared.

Answer:

True. But it may not improve model performance (Adjusted R is better for comparison).

Q16. [1 mark]

Which function extracts fitted values from a model?

Answer:

fitted()

Q17. [2 marks]

What does a negative coefficient for 'price' imply in a sales prediction model?

Answer:

It implies that as price increases, sales are expected to decrease.

Q18. [5 marks]

What are two assumptions of linear regression and how can you check them?

Answer:
1. Linearity - check with scatterplot of fitted vs. residuals.

2. Homoscedasticity - check if residuals have constant spread.

Q19. [4 marks]

What does a funnel shape in a residual vs. fitted plot indicate?

Answer:

It indicates heteroscedasticity - non-constant variance of errors.

Q20. [2 marks]

Write an R function to multiply two numbers.

Answer:

```r

multiply <- function(x, y) {

return(x * y)

```

Q21. [3 marks]

In regression, why is it important to standardize predictors?

Answer:

To compare coefficients directly and ensure variables on different scales dont bias the model.

You might also like