STAT 1000 Assignment - Solutions
STAT 1000 Assignment - Solutions
Solutions
2023-02-06
Setup [3 marks]
Before you begin, set your name and student number in Line 3. [1 mark]
0. Import the FullCourse dataset, available on the UMLearn page. Make sure you have “Heading” set to
“Yes” when you import the data, and make sure you name the object FullCourse. [2 marks]
This dataset contains Midterm and Final Exam grades for an entire course of 500 students (including two
sections of 250 students each). The Midterm and Final Exam grades are each marked out of 100.
Suppose that you are the instructor for one of the sections of this course. The block of code below will isolate
the grades of the students in your section only, and save them as a dataframe named MyClass. Also, it will
assign letter grades based on their weighted total grade (the midterm is worth 40% of the total grade, while
the final is worth 60%). This is the dataset you will use to answer the assignment questions.
After importing the data, replace 1111111 with your seven-digit student id number in the set.seed function
below, and click the green arrow at the top-right hand side of the code chunk. This part is not worth marks,
but you will receive a five-mark deduction on your assignment if it is not completed correctly.
set.seed(1111111)
MyClass = FullCourse[sample(1:NROW(FullCourse), 250), ]
MyClass$Letters = cut(0.4*MyClass$Midterm + 0.6*MyClass$Final,
c(0, 50, 60, 65, 70, 75, 80, 90, 100),
c("F", "D", "C", "C+", "B", "B+", "A", "A+"))
rm(FullCourse)
Make sure you complete the setup steps before beginning your assignment!
For the following questions, use the MyClass dataset that you created in the setup stage.
1. For each of the variables in the MyClass dataset, what is the data type? (I.e., categorical and nominal,
categorical and ordinal, or quantitative?) [2 marks]
Midterms and Final Exams are quantitative, while the Letters are categorical and ordinal.
1
2. Create a histogram of the Midterm grades and a histogram of the Final Exam grades. Set the breaks
to 10. Determine an appropriate title for each graph, and an appropriate label for the x-axes (do not
leave them at their default values). Make sure the two histograms are different colours. [3 marks]
hist(MyClass$Midterms, breaks = 10, main = "Midterm Grades", xlab = "Grades", col = "violetred1")
Midterm Grades
60
50
40
Frequency
30
20
10
0
20 40 60 80 100
Grades
hist(MyClass$Finals, breaks = 10, main = "Final Exam Grades", xlab = "Grades", col = "tomato2")
2
Final Exam Grades
50
40
Frequency
30
20
10
0
0 20 40 60 80 100
Grades
3. Describe the shape of the distributions you see (in particular, the direction of the skewness). [1 mark]
4. Based on your previous answer, do you expect the mean of each variable to be greater than, less than,
or approximate equal to the median? Why? [1 mark]
Since the datasets are left-skewed, I expect the mean of each variable to be less than the median.
5. Calculate the means of the Midterm grades and the Final Exam grades. [1 mark]
mean(MyClass$Midterms)
## [1] 71.668
mean(MyClass$Finals)
## [1] 66.108
6. Calculate the medians of the Midterm grades and the Final Exam grades. [1 mark]
median(MyClass$Midterms)
## [1] 74.5
3
median(MyClass$Finals)
## [1] 67
7. Do your results in Question 5 and Question 6 match your remarks in Question 4? [1 mark]
For each variable, the mean is less than the median, so yes.
8. Calculate the five number summaries of the Midterm grades and the Final Exam grades. [1 mark]
fivenum(MyClass$Midterms)
fivenum(MyClass$Finals)
## [1] 5 54 67 81 99
9. Calculate the standard deviations of the Midterm grades and the Final Exam grades. [1 mark]
sd(MyClass$Midterms)
## [1] 17.96666
sd(MyClass$Finals)
## [1] 18.64938
10. Based on the shape of the each histogram, would it be better to describe these distributions with the
mean and standard deviation, or with the five number summary? Why? [1 mark]
Since the distributions are skewed, it would be preferable to describe them with the five number summary as
opposed to the mean and standard deviation.
11. Create a horizontal outlier boxplot of the Midterm grades. Determine an appropriate title for the
graph, and an appropriate label for the x-axis (it is okay to have the same name for the title and the
x-axis). [2 marks]
4
Midterm Grades
20 40 60 80 100
Grades
12. Create a horizontal quantile boxplot of the Midterm grades. Determine an appropriate title for the
graph, and an appropriate label for the x-axis (it is okay to have the same name for the title and the
x-axis). [1 mark]
Midterm Grades
20 40 60 80 100
Grades
5
13. Create a side-by-side vertical outlier boxplot comparing the Midterm and Final Exam grades. Deter-
mine an appropriate title for the graph, and an appropriate label for the y-axis (it is okay to have the
same name for the title and the x-axis). Use the names argument to set the names of the individual
boxplots. [2 marks]
boxplot(MyClass$Midterms, MyClass$Finals, ylab = "Grades", main = "Midterm vs Final Exam Grades", names
40
20
14. Using the five number summary calculated in Question 8, calculate and print out the upper and lower
fences used in the construction of the outlier boxplot in the previous question. [2 marks]
LF.midterms
## [1] 23.5
UF.midterms
## [1] 118.5
LF.finals
## [1] 13.5
6
UF.finals
## [1] 121.5
15. Below is a frequency table of the letter grades in this class (knit the file to view):
##
## F D C C+ B B+ A A+
## 32 43 28 17 27 35 50 18
What is the average number of grade points received by students in this class?
Note: the table below displays the letter grade to grade point conversion. Knit this file to PDF and view the
output to see it.
Letter Grade A+ A B+ B C+ C D F
Grade Point 4.5 4.0 3.5 3.0 2.5 2.0 1.0 0.0
## [1] 2.504