Practical 4
Practical 4
Goals: The objective of this practical is to review the descriptive statistics, plots and basic
R programming we have learned through the first three practicals. In additoin, a new data
analysis is perfomed to consolidate what has been learned so far, while learning about few
extra possibilities of R.
1 Revision
Revisit practical 1, 2 and 3 (files Practical1.R, Practical2.R, Practical3.R are pro-
vided in the corresponding folders in Chamilo). Make sure you understand what is done
with each command in R.
• “Price” : Index of the cost of 112 goods and services excluding rent (Zurich = 100)
Since the summary statistics and plots we have learned so far has its suitable variable type,
it is necessary to know what kind of data you have in your file before summarizing or
visualizing it. You can check the type of variables in the Cities dataset with:
str ( Cities )
or with:
class ( Cities $ ...) # ... has to be replaced by a variable name
to get the type for each variable. Use
summary ( Cities )
1
University of Geneva GSEM
Statistics I Fall 2017
Prof. Eva Cantoni Practical 4
1. Provide summary statistics for the variables which have suitable type in the dataset.
When appropriate, draw a kernal density plot to check whether their distributions are
symmetric or not.
2. Draw boxplots of all the numerical (continuous) variables into a single graphical win-
dow. You can use the par() function including the option mfrow=c(nrows, ncols)
to create a matrix of nrows by ncols plots that are filled in by row. For example, if
you need plots to be arranged horizontally, let nrows=1.
par ( mfrow = c (1 ,3)) # 3 figures arranged in a row
boxplot ( Cities $ Work , col = " lightsalmon1 " )
# with the default color changed to lightsalmon
boxplot ( Cities $ Price , col = " mediumseagreen " )
boxplot ( Cities $ Salary , col = " goldenrod2 " )
par ( mfrow = c (1 ,1)) # back to the default setting
What can you say about the distribution of each variable by looking only at the
boxplots?
3. Draw histograms of all the numerical (continuous) variables into a single graphical
window. Use here as well the col parameter to change the default settings.
Describe the distribution of the variables with these new information.
4. Draw violin plots of all the numerical (continuous) variables into a single graphical
window. You have to use the function na.omit() here to eliminate the missing values.
par ( mfrow = c (1 ,3)) # 3 figures arranged in a row
vioplot ( na . omit ( Cities $ Work ))
vioplot ( na . omit ( Cities $ Price ))
vioplot ( na . omit ( Cities $ Salary ))
par ( mfrow = c (1 ,1)) # back to the default setting
Describe the distribution of the variables with these new information. Try to change
manually the width of the bandwidth with the parameter h. What do you observe?
5. Compare, via QQ-plots, the empirical distribution of variables Work, Price and Salary
separately with the Gaussian distribution and draw a reference line. Does the Gaussian
distribution fit well?