0% found this document useful (0 votes)

2 views

A2RIB_T4

This assignment focuses on data wrangling and basic statistics using R. Students are required to perform various tasks including loading datasets, cleaning data, calculating statistics, and interpreting results. Submissions must include a PDF and R script, with specific naming conventions and deadlines.

Uploaded by

e.stephenson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

A2RIB_T4

Uploaded by

e.stephenson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Assignment 2

Instructions (please read carefully)

General information: This assignment focuses on data wrangling and basic

univariate/bivariate statistics.

Submitting the assignment: Please use this word file as your starting point. Add your
answers in the boxes below the questions. Please also copy-paste the R code that you use if
the question asks you to do so. Once you have completed it, convert this word document to
pdf and submit the pdf as well as the R script that you used to come to the answers in Canvas
under -> Assignments -> Assignment 2.

Remember to upload the files to Canvas on Thursday before 12h (noon).

Please name the pdf document and the R script: “A2RIB_TeamName”. For example, if Team
A submitted the files, they would be named A2RIB_TA.pdf & A2RIB_TA.R.

To check:

1. Set the working directory as your main folder (under Session -> Set Working
Directory).
2. Consult the R instructional videos and the Analysis: “Data-Wrangling A Key Skill”
chapter.
3. Make sure you download the necessary data for this assignment (provided in Canvas).
4. Make sure you download the packages that we introduced in this session, namely
“skimr”, “janitor”, and “kableExtra”.

Questions

Methodology

1. From the master theses shared on Canvas (module: Master theses examples) identify
and comment on the execution of one particular research design that the student used.
Note: you can pick and choose whichever master thesis.

2. From the master theses shared on Canvas, screenshot an example of summary

statistics (e.g., whether in text or table form). Explain what the summary statistics are
telling you.
R section

1. Load the vehicle_data.csv file into object named d. Note, you will have to add a few
extra arguments to the upload function to make it work. Explain why you needed to
add the extra arguments. Copy in the code you used to answer this question.

Data cleaning

2. Remove the first 5 rows of d and rename “q3” into “Year” as well as “q88” into
“Transmission”. Also remove the CNG type cars from the dataset. How many rows
does the data have now and how many of these rows are Petrol cars? Copy in the code
you used to answer this question.

3. In the d dataset create a new variable that will be called internal. Internal should have
the value of “fast_sell” if the car has been driven more than 50.000 kilometers and is a
diesel car, otherwise it should say “slow_sell”. What is the percentage of fast_sell and
slow_sell? Copy in the code you used to answer this question.

Before you start with the next question, familiarize yourself with the function round(). Its
usage should be clear from the name, but you can find more info in the help documentation.

4. What percentage of cars are diesel cars and sold by an individual? Round the numbers
to 2 decimal points. Copy in the code you used to answer this question.

5. The last 4 variables starting with “perauth…” in your dataset d are from an
authenticity measure that was asked of the car owners. In essence, car owners were
asked how authentic does the drive feel. What is the Chronbach’s alpha for these 4
variables? What does Cronbach’s alpha measure? Explain. Copy in the code you used
to answer this question.

6. What are the average selling price points as a function of fuel type, seller type, and
transmission? Round the numbers to 2 decimals and make a nice table. Copy in the
code you used to answer this question. Screenshot how your table and copy in as well.
7. What is the correlation between seller price and kilometers driven. Provide the
interpretation of the correlation. What is the p-value of the correlation? Copy in the
code you used to answer this question.

8. What is the correlation between selling price and car year (i.e., year of car make)?
Provide an interpretation of the results.

For this next part, download the housing.csv dataset. This dataset provides information on
median house prices for California districts derived from the 1990 census. The dataset
variables are the following:

longitude: A measure of how far west a house is; a higher value is farther west
latitude: A measure of how far north a house is; a higher value is farther north
housingMedianAge: Median age of a house within a block; a lower number is a newer
building
totalRooms: Total number of rooms within a block
totalBedrooms: Total number of bedrooms within a block
population: Total number of people residing within a block
households: Total number of households, a group of people residing within a home unit, for a
block
medianIncome: Median income for households within a block of houses (measured in tens
of thousands of US Dollars)
medianHouseValue: Median house value for households within a block (measured in US
Dollars)
oceanProximity: Location of the house w.r.t ocean/sea

9. Load the housing.csv dataset into R and call it d1. Get a brief overview of the data,
describe the data, what are the specifics, what are the variable types? Use functions
like skim(), summary()… Are there any particularities in the data that we should
attend do?

10. Remove all rows that have NAs. Once you do that, calculate what is the average
median house value where the total number of people residing within a block is higher
than 1000? Copy in the code that you used.

11. Create a rough estimate of price per square meter by creating a new variable that
divides median house value with total rooms. Split this newly created variable into a
“low” category if the number is lower than 3 and a “high” category otherwise. What is
the average median house value for low? Copy in the code that you used.
12. Does this value (the one you obtained in question 11) change dependent on ocean
proximity? Copy in the code that you used.

13. What is the correlation between median income and median house value? Provide an
interpretation of the results. Copy in the code that you used.

14. What is the correlation between median income, median house value, total bedrooms,
and population? You can use the correlation() function from the correlation package
to help you create a correlation matrix: https://easystats.github.io/correlation/ Note:
there are now 4 variables you are looking at. Provide an interpretation of the results.
Copy in the code that you used and screenshot the correlation matrix you obtained.

Linear Regression Assignment
0% (1)
Linear Regression Assignment
8 pages
18 Templates - Canvases For Building Great Teams
100% (1)
18 Templates - Canvases For Building Great Teams
20 pages
Unit 2
No ratings yet
Unit 2
32 pages
Kratochwill Et Al. Best Practices in School-Based Problem Solving
100% (1)
Kratochwill Et Al. Best Practices in School-Based Problem Solving
22 pages
C# Interview Questions You'll Most Likely Be Asked
From Everand
C# Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Template DBM CSC Form No. 1 Position Description Forms 2018 2
No ratings yet
Template DBM CSC Form No. 1 Position Description Forms 2018 2
16 pages
Life Second Edition Intermediate Unit 4
100% (3)
Life Second Edition Intermediate Unit 4
7 pages
Evangelism by Fire Reinhard Bonnke PDF
38% (8)
Evangelism by Fire Reinhard Bonnke PDF
2 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
BANA 3010 Assignment 2
No ratings yet
BANA 3010 Assignment 2
3 pages
Instructions: Monday, 12:00pm (Noon), October 5th
No ratings yet
Instructions: Monday, 12:00pm (Noon), October 5th
2 pages
R Programming
No ratings yet
R Programming
11 pages
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Chapter - 03 - Review of Basic Data
No ratings yet
Chapter - 03 - Review of Basic Data
92 pages
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
Data Preprocessing
No ratings yet
Data Preprocessing
27 pages
KrutikaKolhe-862467252-HW2
No ratings yet
KrutikaKolhe-862467252-HW2
25 pages
R Doc Ii Vee
No ratings yet
R Doc Ii Vee
24 pages
Starting With R
No ratings yet
Starting With R
34 pages
I C 152 Lab Assignment 8
No ratings yet
I C 152 Lab Assignment 8
10 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
Module 2notes
No ratings yet
Module 2notes
44 pages
Homework 1
No ratings yet
Homework 1
3 pages
Working With Data
No ratings yet
Working With Data
38 pages
Lab1: Introduction To R: Islr2
No ratings yet
Lab1: Introduction To R: Islr2
10 pages
Using R For Basic Statistical Analysis
No ratings yet
Using R For Basic Statistical Analysis
11 pages
DS Lab
No ratings yet
DS Lab
31 pages
Module 2
No ratings yet
Module 2
20 pages
Lecture 10 R
No ratings yet
Lecture 10 R
117 pages
Practical 3 Intro To R
No ratings yet
Practical 3 Intro To R
10 pages
Apunts BLOC 1 Estadística
No ratings yet
Apunts BLOC 1 Estadística
15 pages
Assignment3 A20
No ratings yet
Assignment3 A20
3 pages
Business Analytics-1: STR (Crew - Data)
No ratings yet
Business Analytics-1: STR (Crew - Data)
16 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
40 pages
Experiment # 4
No ratings yet
Experiment # 4
10 pages
Statistical Modeling Using R - Lab Manual
No ratings yet
Statistical Modeling Using R - Lab Manual
23 pages
ProgrammingForDS14_Rbasics
No ratings yet
ProgrammingForDS14_Rbasics
32 pages
Engo 645
No ratings yet
Engo 645
9 pages
FDP Indoglobal Group of Colleges: 27 April To 1 May R Programming Language Assignment Submission
No ratings yet
FDP Indoglobal Group of Colleges: 27 April To 1 May R Programming Language Assignment Submission
12 pages
BZAN_6310-project_instructions
No ratings yet
BZAN_6310-project_instructions
4 pages
Advanced R Data Analysis Training PDF
No ratings yet
Advanced R Data Analysis Training PDF
72 pages
R Functions List
No ratings yet
R Functions List
8 pages
Making predictions
No ratings yet
Making predictions
13 pages
WWWWWW WWWWWW WWWWWW WWWWWW WWWW WWWW WWWWWW: Data Transformation With Dplyr
No ratings yet
WWWWWW WWWWWW WWWWWW WWWWWW WWWW WWWW WWWWWW: Data Transformation With Dplyr
2 pages
TFG DanielRees Session1
No ratings yet
TFG DanielRees Session1
13 pages
Rstudio Study Notes For PA 20181126
No ratings yet
Rstudio Study Notes For PA 20181126
6 pages
Lab1 411 Eman Yahya 7773225
No ratings yet
Lab1 411 Eman Yahya 7773225
16 pages
Pratapa P Evidence of Learning 4
No ratings yet
Pratapa P Evidence of Learning 4
2 pages
Practical File R by Komal
No ratings yet
Practical File R by Komal
26 pages
2_DataPreProcessing_code
No ratings yet
2_DataPreProcessing_code
46 pages
AML-2203 Advanced Python AI and ML Tools Assignment
No ratings yet
AML-2203 Advanced Python AI and ML Tools Assignment
19 pages
r Module 5
No ratings yet
r Module 5
21 pages
Data Science Using R
No ratings yet
Data Science Using R
11 pages
stata应用课程回归
No ratings yet
stata应用课程回归
50 pages
Prediction
100% (1)
Prediction
10 pages
arunav da prac
No ratings yet
arunav da prac
55 pages
Prerequisites: R Installation
No ratings yet
Prerequisites: R Installation
11 pages
CS202 Assignment - 4- GIKI
No ratings yet
CS202 Assignment - 4- GIKI
3 pages
Group Assignment - SB - 06 - T22023
No ratings yet
Group Assignment - SB - 06 - T22023
4 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Final DSR Lab Record
No ratings yet
Final DSR Lab Record
16 pages
Introduction to R for Business Analytics(1)
No ratings yet
Introduction to R for Business Analytics(1)
7 pages
R-Programming Lab Mannual
No ratings yet
R-Programming Lab Mannual
33 pages
Exercise 3
No ratings yet
Exercise 3
4 pages
Documento Sem Título
No ratings yet
Documento Sem Título
8 pages
Use Case Homework
No ratings yet
Use Case Homework
2 pages
Plastasia-2022 Exhibitors Manual Domestic
No ratings yet
Plastasia-2022 Exhibitors Manual Domestic
42 pages
NURS FPX 6212 Assessment 3 Outcome Measures, Issues, and Opportunities
No ratings yet
NURS FPX 6212 Assessment 3 Outcome Measures, Issues, and Opportunities
8 pages
G 7 Niken Tarin Haniatus
No ratings yet
G 7 Niken Tarin Haniatus
3 pages
Official Website - Spain Visa
No ratings yet
Official Website - Spain Visa
2 pages
Recording Studio Directory
No ratings yet
Recording Studio Directory
20 pages
Unit One - Limits and Continuity
No ratings yet
Unit One - Limits and Continuity
3 pages
To What Extent Did the End of the Password Sharing Influence Netflix Brand Image and Sales
No ratings yet
To What Extent Did the End of the Password Sharing Influence Netflix Brand Image and Sales
25 pages
2nd Law Therm0
No ratings yet
2nd Law Therm0
28 pages
123 Emami
No ratings yet
123 Emami
41 pages
Gcse Photography Coursework Examples
67% (3)
Gcse Photography Coursework Examples
8 pages
Student's Elementary 146
No ratings yet
Student's Elementary 146
1 page
Dhanendra Kumar
No ratings yet
Dhanendra Kumar
351 pages
Teaser - Ampersand AGOFS-Dec-2024
No ratings yet
Teaser - Ampersand AGOFS-Dec-2024
2 pages
Chapter 3 - Cardiovascular Disorder and Pregnancy
No ratings yet
Chapter 3 - Cardiovascular Disorder and Pregnancy
3 pages
Relevant Provisions of Companies Act
No ratings yet
Relevant Provisions of Companies Act
19 pages
Answer Keys English
No ratings yet
Answer Keys English
8 pages
Unit 1 Short Test 2AB
No ratings yet
Unit 1 Short Test 2AB
2 pages
Inventory and Sales Price
No ratings yet
Inventory and Sales Price
3 pages
Role Clarification Process
No ratings yet
Role Clarification Process
1 page
Disney
No ratings yet
Disney
1 page
Manual de Reparacion Glycol-Pump
No ratings yet
Manual de Reparacion Glycol-Pump
34 pages
01 Alegaciones 1993 Chandler
No ratings yet
01 Alegaciones 1993 Chandler
24 pages
Antaram, Inc.: Martketing Plan
No ratings yet
Antaram, Inc.: Martketing Plan
17 pages

A2RIB_T4

Uploaded by

A2RIB_T4

Uploaded by

Assignment 2

Instructions (please read carefully)

General information: This assignment focuses on data wrangling and basic

Remember to upload the files to Canvas on Thursday before 12h (noon).

2. From the master theses shared on Canvas, screenshot an example of summary

You might also like