DSC2608 - Assessment - 05 S1-2025
DSC2608 - Assessment - 05 S1-2025
nn
DSC2608
Assessment: 05
Total marks: 70
l
Assessment 05
Study material: learning units 1-4
Important:
1. This is a written assignment that must be answered and submitted on the DSC2608 module site under
Assessment 05.
3. For questions requiring R code implementation, include both the code and the resulting output as
part of your solution. Any code submitted without the corresponding output will result in a mark of
zero.
4. An example of how your final PDF submission should look like is made available under Forum
Announcements. Please review this file before writing and submitting your written assignment.
5. The due date for this assignment is fixed. No extension can be granted because the solutions will be
posted on the DSC2608 module site after the closing date.
2
Question 1 [10 - marks]
Determine whether the statement is true or false and briefly explain your answer. You may
receive no credit for a correct answer with no explanation.
1.1 The function as.data.frame(matrix(1:6, nrow=2)) creates a dataframe with two
rows and three columns. [2]
1.2 In R, when assigning y <- 10.5, the output of typeof(y) is “integer”. [2]
1.3 In R, factors are used to store categorical data and can only contain numeric values. [2]
1.4 Lists in R can store elements of different data types, but all elements must have the
same length. [2]
1.5 The function cbind() is used to combine vectors or matrices by adding rows. [2]
Question 2 [45 - marks]
Write statements in R to perform the tasks described using the Carseats.csv file provided.
Remember to include both the code and the resulting output as part of your solution.
2.1 Importing packages and dataset, and viewing data.
2.1.1 Load the readr package in R and import the Carseats dataset, then assign it the
name Carseats. [1]
2.1.2 Check the structure of the Carseats dataset. [1]
2.1.3 What is the number of rows and columns, and the data type of each variable in
the dataset? [2]
2.1.4 Display summary statistics for the Carseats dataset and interpret each statistic. [3]
2.1.5 Check if there are any missing values in the Carseats dataset. [1]
2.2 Data types and data structures.
2.2.1 Extract only the numeric variables from the Carseats dataset. [2]
2.2.2 Calculate the correlation matrix for the numeric variables selected in 2.2.1 and
determine which variables are better correlated with Sales. [3]
2.2.3 Create a new column named IncomeCategory that classifies Income into three
groups: “Low” (below 40), “Medium” (40–80) and “High” (above 80), and provide
the number of entries in each category. [3]
2.3 Data visualisations
2.3.1 Create a bar chart to visualise the distribution of IncomeCategory and interpret
the results. [4]
2.3.2 Generate a box plot of Sales categorised by ShelveLoc and describe any notable
patterns. [5]
2.3.3 Create a scatter plot of Sales versus Advertising and discuss any visible trends. [4]
3
2.3.4 Construct a density plot for Price and explain its distribution characteristics. [3]
2.4 Sample hypothesis testing and percentiles for estimation
2.4.1 Perform a hypothesis test to check whether the mean Sales is significantly different
from 7 using a one-sample t-test. [4]
2.4.2 Perform a hypothesis test to check if the mean Price differs between urban and
rural stores using a two-sample t-test. Compare the results. [6]
2.4.3 Compute the 90th percentile of Advertising. [1]
2.4.4 Compute the 25th and 75th percentiles of Sales. [2]
Question 3 [15 - marks]
Consider the following sample data representing annual inflation rates (in percentage) over
a 10-year period in a country:
2.1, 1.8, 2.3, 1.7, 2.5, 2.8, 3.0, 2.2, 1.9, 2.4
Answer the following empirical bootstrap questions based on the data provided:
3.1 Write R code to calculate the sample mean inflation rate, x̄, and assign it to a variable
called mean inflation. [1]
3.2 Write R code to generate 1000 bootstrap samples, that is, a n × 1000 matrix of random
resamples from the original data with replacement. [2]
3.3 Compute the mean statistic of the bootstrap sample and assign it to bs means. [2]
3.4 Calculate the difference (ν) between the sample mean inflation rate (mean inflation)
and the mean of each bootstrap sample (bs means). [2]
3.5 Calculate the descriptive statistics for the distribution of ν defined in question 3.4.
Specifically, calculate the summary statistics, standard deviation and mean for the
distribution of ν. [4]
3.6 Plot a histogram to visualise the distribution of ν and interpret it. [4]