0% found this document useful (0 votes)

3 views

Assign--1

The document provides a comprehensive guide on data manipulation using the DPLYR package in R, specifically focusing on the Carseats dataset. It covers various operations such as selecting, filtering, arranging, mutating, and summarizing data, along with examples and explanations of each function's purpose. Additionally, it introduces the creation of new variables for categorizing sales performance based on defined thresholds.

Uploaded by

Michaella Jaculina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Assign--1

Uploaded by

Michaella Jaculina

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Data Manipulation using DPLYR package

2024-08-22

Installing and loading the packages

#install.packages("ISLR", dependencies = TRUE)
#install.packages("dplyr", dependencies = TRUE)

library(ISLR)
library(dplyr)

##
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':

##
## filter, lag

## The following objects are masked from 'package:base':

##
## intersect, setdiff, setequal, union

Loading the Datasets

data("Carseats")

Exploring the Carseats Dataset

head(Carseats, n=2)

## Sales CompPrice Income Advertising Population Price ShelveLoc Age

Education
## 1 9.50 138 73 11 276 120 Bad 42
17
## 2 11.22 111 48 16 260 83 Good 65
10
## Urban US
## 1 Yes Yes
## 2 Yes Yes

The carseats datasets has 8 attributes such as Sales, ComPrice, Income, Advertising,
Population, Price, ShelveLoc, Age, Education, Urban, and US records.
Select
1. Select the columns Sales, CompPrice, and Income from the Carseats dataset.
Selected_Car <- Carseats %>%
select(Sales, CompPrice, Income) %>%
slice_head(n=10)

Selected_Car

## Sales CompPrice Income

## 1 9.50 138 73
## 2 11.22 111 48
## 3 10.06 113 35
## 4 7.40 117 100
## 5 4.15 141 64
## 6 10.81 124 113
## 7 6.63 115 105
## 8 11.85 136 81
## 9 6.54 132 110
## 10 4.69 132 113

In this code, I used the select() function to select the columns titled Sale, Comprice,
and Income. Selecting the sales records is essential for monitoring performance and
tracking revenues. The Comprice column is used for cost management and profitability
analysis. The Income record is crucial for profit calculation, financial planning, and
compliance and reporting.

Filter
2. Filter the Carseats dataset to include only observations where Sales is greater than
8000.
Filtered_Car <- Carseats %>%
filter(Sales > 8.000) %>%
slice_head (n=10)

Filtered_Car

## Sales CompPrice Income Advertising Population Price ShelveLoc

Age Education
## 1 9.50 138 73 11 276 120 Bad
42 17
## 2 11.22 111 48 16 260 83 Good
65 10
## 3 10.06 113 35 10 269 80 Medium
59 12
## 6 10.81 124 113 13 501 72 Bad
78 16
## 8 11.85 136 81 15 425 120 Good
67 10
## 11 9.01 121 78 9 150 100 Bad
26 10
## 12 11.96 117 94 4 503 94 Good
50 13
## 14 10.96 115 28 11 29 86 Good
53 18
## 15 11.17 107 117 11 148 118 Good
52 18
## 16 8.71 149 95 5 400 144 Medium
76 18
## Urban US
## 1 Yes Yes
## 2 Yes Yes
## 3 Yes Yes
## 6 No Yes
## 8 Yes Yes
## 11 No Yes
## 12 Yes Yes
## 14 Yes Yes
## 15 Yes Yes
## 16 No No

niversity 6.8 25 26385 92

After filtering, the top 10 rows of the resulting dataset are selected using the slice_head function.
The filtered data showcases various features of car seat sales, including details such as competition price
(CompPrice), average income of the population (Income), and advertising expenditure (Advertising). The
results display a mix of attributes across different observations. For example, the Sales values range from
8.71 to 11.96. Other characteristics include competition prices between 107 and 149, advertising
expenditures from 4 to 16, and the age of store shelves varying between 26 and 78 years. The ShelveLoc
variable categorizes the display location quality as "Bad," "Good," or "Medium," with "Good" being the
most common. Additionally, the data indicates that most stores are located in urban areas (Urban) and are
within the US (US), except for one instance where a store is neither urban nor US-based. This filtered
dataset provides a snapshot of stores with relatively high sales performance, revealing the diversity in
their market and demographic characteristics.
Arrange
3. Order the Carseats dataset by Sales in descending order.
Arranged_Car <- Carseats %>%
arrange(desc(Sales)) %>%
slice_head(n=10)

Arranged_Car

## Sales CompPrice Income Advertising Population Price ShelveLoc

Age Education
## 377 16.27 141 60 19 319 92 Good
44 11
## 317 15.63 122 36 5 369 72 Good
35 10
## 26 14.90 139 32 0 176 82 Good
54 11
## 368 14.37 95 106 0 256 53 Good
52 17
## 19 13.91 110 110 0 408 68 Good
46 17
## 31 13.55 125 94 0 447 89 Good
30 12
## 353 13.44 133 103 14 288 122 Good
61 17
## 69 13.39 149 69 20 366 134 Good
60 13
## 358 13.36 103 73 3 276 72 Medium
34 15
## 194 13.28 139 70 7 71 96 Good
61 10
## Urban US
## 377 Yes Yes
## 317 Yes Yes
## 26 No No
## 368 Yes No
## 19 No Yes
## 31 Yes No
## 353 Yes Yes
## 69 Yes Yes
## 358 Yes Yes
## 194 Yes Yes

First, the data is sorted in descending order based on the "Sales" column using the
arrange(desc(Sales)) function. Then, the slice_head(n=10) function is applied to extract the top 10 entries
from this sorted data. The resulting dataset highlights the 10 Carseats records with the highest sales, along
with associated details such as competitor prices, income levels, advertising budgets, and various other
attributes. Notably, most of these top-selling stores have a "Good" shelf location rating and are situated in
urban areas in the United States. This suggests that these factors might contribute to higher sales
performance in these particular locations.

Mutate
4. Create a new variable in the Carseats dataset called Profit calculated as Sales minus
Price.
Mutated_Car <- Carseats %>%
mutate(Profit = Sales - Price) %>%
slice_head (n=10)

Mutated_Car

## Sales CompPrice Income Advertising Population Price ShelveLoc

Age Education
## 1 9.50 138 73 11 276 120 Bad
42 17
## 2 11.22 111 48 16 260 83 Good
65 10
## 3 10.06 113 35 10 269 80 Medium
59 12
## 4 7.40 117 100 4 466 97 Medium
55 14
## 5 4.15 141 64 3 340 128 Bad
38 13
## 6 10.81 124 113 13 501 72 Bad
78 16
## 7 6.63 115 105 0 45 108 Medium
71 15
## 8 11.85 136 81 15 425 120 Good
67 10
## 9 6.54 132 110 0 108 124 Medium
76 10
## 10 4.69 132 113 0 131 124 Medium
76 17
## Urban US Profit
## 1 Yes Yes -110.50
## 2 Yes Yes -71.78
## 3 Yes Yes -69.94
## 4 Yes Yes -89.60
## 5 Yes No -123.85
## 6 No Yes -61.19
## 7 Yes No -101.37
## 8 Yes Yes -108.15
## 9 No No -117.46
## 10 No Yes -119.31
The code provided manipulates the Carseats dataset by creating a new variable called Profit,
which is calculated as the difference between Sales and Price. After this transformation, the code extracts
the first 10 rows of the modified dataset using the slice_head function. The resulting data shows that for
each of the first 10 entries, the Profit values are all negative, indicating that the price of the car seats
exceeds the sales revenue for these observations. This suggests that these particular products are being
sold at a loss. The dataset also includes various other variables, such as CompPrice, Income, Advertising,
Population, ShelveLoc, and demographic information like Age, Education, Urban, and US, which could
be analyzed further to understand the factors influencing these negative profits.

Group_by and Summarize

5. Calculate the average Sales for each ShelveLoc in the Carseats dataset.
Summary_Car <- Carseats %>%
count(ShelveLoc)
Summary_Car

## ShelveLoc n
## 1 Bad 96
## 2 Good 85
## 3 Medium 219

The code provided calculates a summary of the Carseats dataset, focusing specifically on the
ShelveLoc variable, which represents the quality of shelf location for car seats. By using the count
function from the dplyr package, the code counts the number of occurrences for each category within the
ShelveLoc variable. The resulting summary data shows that out of all the observations, 96 instances have
a "Bad" shelf location, 85 instances have a "Good" shelf location, and 219 instances have a "Medium"
shelf location. This distribution indicates that the "Medium" shelf location is the most common among the
car seat products in the dataset.

Additional Challenges
6. Create a new variable in the Carseats dataset indicating whether sales are high,
medium, or low based on certain thresholds.
Carseats_with_new_column <- Carseats %>%
mutate(SalesCategory = case_when(
Sales > 8 ~ "High",
Sales > 4 ~ "Medium",
TRUE ~ "Low")) %>%
slice_head(n=10)

Carseats_with_new_column
## Sales CompPrice Income Advertising Population Price ShelveLoc
Age Education
## 1 9.50 138 73 11 276 120 Bad
42 17
## 2 11.22 111 48 16 260 83 Good
65 10
## 3 10.06 113 35 10 269 80 Medium
59 12
## 4 7.40 117 100 4 466 97 Medium
55 14
## 5 4.15 141 64 3 340 128 Bad
38 13
## 6 10.81 124 113 13 501 72 Bad
78 16
## 7 6.63 115 105 0 45 108 Medium
71 15
## 8 11.85 136 81 15 425 120 Good
67 10
## 9 6.54 132 110 0 108 124 Medium
76 10
## 10 4.69 132 113 0 131 124 Medium
76 17
## Urban US SalesCategory
## 1 Yes Yes High
## 2 Yes Yes High
## 3 Yes Yes High
## 4 Yes Yes Medium
## 5 Yes No Medium
## 6 No Yes High
## 7 Yes No Medium
## 8 Yes Yes High
## 9 No No Medium
## 10 No Yes Medium

The code provided creates a new column called SalesCategory in the Carseats dataset by using
the mutate function from the dplyr package. The SalesCategory is determined based on the Sales values:
if Sales is greater than 8, the category is labeled as "High"; if Sales is between 4 and 8, it is labeled as
"Medium"; and if Sales is 4 or below, it is labeled as "Low." After adding this new column, the
slice_head function is used to select the first 10 rows of the modified dataset. The output displays these
rows with the new SalesCategory column included. For example, in the first row, the Sales value is 9.50,
which results in a "High" classification in the SalesCategory. In contrast, the fifth row, with a Sales value
of 4.15, falls into the "Medium" category. This process enables quick categorization of sales performance
within the dataset, aiding in easier data analysis and interpretation.

Retail Analysis With Walmart Data
100% (10)
Retail Analysis With Walmart Data
2 pages
Education - Training 33206
100% (2)
Education - Training 33206
3,304 pages
Case Study
50% (2)
Case Study
8 pages
Controlling Input and Output - Exercises
0% (1)
Controlling Input and Output - Exercises
12 pages
Assignment Business Analytics B Biswas
No ratings yet
Assignment Business Analytics B Biswas
7 pages
As 4049.4-2006 Paints and Related Materials - Pavement Marking Materials High Performance Pavement Marking Sy
No ratings yet
As 4049.4-2006 Paints and Related Materials - Pavement Marking Materials High Performance Pavement Marking Sy
8 pages
Factor-Hair-Revised: Salma Mohiuddin 27/08/2019 Setting Up The Working Directoryy
No ratings yet
Factor-Hair-Revised: Salma Mohiuddin 27/08/2019 Setting Up The Working Directoryy
37 pages
Amta - Final - Notes.r: ### Step Wise AIC Regression
No ratings yet
Amta - Final - Notes.r: ### Step Wise AIC Regression
6 pages
Practicals IP-12 1-4
No ratings yet
Practicals IP-12 1-4
9 pages
CsvfilesProjectCOMPUTER-1
No ratings yet
CsvfilesProjectCOMPUTER-1
28 pages
7 K-Means Clustering
No ratings yet
7 K-Means Clustering
27 pages
SMDM Project Report Dipti
No ratings yet
SMDM Project Report Dipti
14 pages
Da Laqs Saqs
No ratings yet
Da Laqs Saqs
23 pages
Homework 1
No ratings yet
Homework 1
3 pages
Awini Mustapha-Project1
No ratings yet
Awini Mustapha-Project1
8 pages
R Programs 2024-2025
No ratings yet
R Programs 2024-2025
13 pages
A - B Testing
No ratings yet
A - B Testing
31 pages
Experiment 8
No ratings yet
Experiment 8
9 pages
CH 02
No ratings yet
CH 02
38 pages
Intro To Data Science Lecture 4
No ratings yet
Intro To Data Science Lecture 4
13 pages
Manipulating Data in R
No ratings yet
Manipulating Data in R
57 pages
New Chapter 13 Elementary Statistics
No ratings yet
New Chapter 13 Elementary Statistics
15 pages
BigMart PDF
100% (1)
BigMart PDF
42 pages
Working With Data
No ratings yet
Working With Data
38 pages
Data Science
No ratings yet
Data Science
22 pages
2023 Gerunov BusinessAnalyticsR SU
No ratings yet
2023 Gerunov BusinessAnalyticsR SU
107 pages
Advanced Statistics-Project
No ratings yet
Advanced Statistics-Project
16 pages
Data Manipulation Using R: Acm Datascience Camp
No ratings yet
Data Manipulation Using R: Acm Datascience Camp
35 pages
Business Analytics Unit - IV Notes_60637706_2025_05!15!02_16
No ratings yet
Business Analytics Unit - IV Notes_60637706_2025_05!15!02_16
28 pages
Ank SMDM PDF
No ratings yet
Ank SMDM PDF
39 pages
Chapter_1
No ratings yet
Chapter_1
28 pages
決策樹-R程式練習
No ratings yet
決策樹-R程式練習
11 pages
Task 2 - Experimentation and uplift testing - Jupyter Notebook
No ratings yet
Task 2 - Experimentation and uplift testing - Jupyter Notebook
41 pages
Expt6total.i (2) - JupyterLab
No ratings yet
Expt6total.i (2) - JupyterLab
7 pages
M6 QA Univ Sol
No ratings yet
M6 QA Univ Sol
19 pages
prac2
No ratings yet
prac2
11 pages
Data Presentation - Descriptive Stats - PGPEX
No ratings yet
Data Presentation - Descriptive Stats - PGPEX
87 pages
ass-2 (2)
No ratings yet
ass-2 (2)
13 pages
Another Project-Creating Customer Segments
No ratings yet
Another Project-Creating Customer Segments
31 pages
Mathallcodes 1
No ratings yet
Mathallcodes 1
32 pages
Analysis Report
No ratings yet
Analysis Report
8 pages
Assessment Cover Sheet: Student Declaration
No ratings yet
Assessment Cover Sheet: Student Declaration
7 pages
Wholesale Custumer
100% (1)
Wholesale Custumer
32 pages
3.DataFrames.GGPlot2
No ratings yet
3.DataFrames.GGPlot2
28 pages
ML Lab Manual 2025-2
No ratings yet
ML Lab Manual 2025-2
35 pages
Telecom Customer Churn
0% (1)
Telecom Customer Churn
39 pages
Data Manipulation Workshop Handout
No ratings yet
Data Manipulation Workshop Handout
46 pages
prac2
No ratings yet
prac2
11 pages
Vertopal.com AML Project LearnerNotebook LowCode
No ratings yet
Vertopal.com AML Project LearnerNotebook LowCode
74 pages
Lab1: Introduction To R: Islr2
No ratings yet
Lab1: Introduction To R: Islr2
10 pages
Vit Ap: Foundations For Data Analytics (CSE1006 - 405) Marks: 50 Duration: 90 Mins. Section 1 Answer All The Questions
No ratings yet
Vit Ap: Foundations For Data Analytics (CSE1006 - 405) Marks: 50 Duration: 90 Mins. Section 1 Answer All The Questions
3 pages
Chapter - 03 - Review of Basic Data
No ratings yet
Chapter - 03 - Review of Basic Data
92 pages
Principlesofworkflowin Dataanalysis: Scottlong
No ratings yet
Principlesofworkflowin Dataanalysis: Scottlong
14 pages
Case Study Module 1
No ratings yet
Case Study Module 1
4 pages
Registro da analise de dataset de laptops
No ratings yet
Registro da analise de dataset de laptops
1 page
fds qb
No ratings yet
fds qb
6 pages
Practical file 12.
No ratings yet
Practical file 12.
22 pages
Understanding.results.with.Python.B0DCY757YS
No ratings yet
Understanding.results.with.Python.B0DCY757YS
467 pages
EDA LAB MANUAL (1) (1)
No ratings yet
EDA LAB MANUAL (1) (1)
34 pages
Week - 1 Day - 1 Descriptive Statistics
No ratings yet
Week - 1 Day - 1 Descriptive Statistics
40 pages
R Programming
No ratings yet
R Programming
11 pages
Visualizing Financial Data
From Everand
Visualizing Financial Data
Julie Rodriguez
No ratings yet
Isa Rubrics
No ratings yet
Isa Rubrics
90 pages
Strategies Guided by Best Practice For Community Mobilization
No ratings yet
Strategies Guided by Best Practice For Community Mobilization
6 pages
Teacher Education Accreditation Council TEAC's Accreditation Framework TEAC's Accreditation Process
No ratings yet
Teacher Education Accreditation Council TEAC's Accreditation Framework TEAC's Accreditation Process
2 pages
INLINE 5 Driver Installation PDF
No ratings yet
INLINE 5 Driver Installation PDF
6 pages
BC - Grammar 1-B2 - Progress Test 1
No ratings yet
BC - Grammar 1-B2 - Progress Test 1
5 pages
Theories As Structures I
No ratings yet
Theories As Structures I
60 pages
ABG SET-C Test Paper-1 (With Answers) 01.09.2023
No ratings yet
ABG SET-C Test Paper-1 (With Answers) 01.09.2023
7 pages
Overview of The Availability and Utilization of Kaolin As A Potential Raw Material in Chemicals & Drugs Formulation in Nigeria
100% (1)
Overview of The Availability and Utilization of Kaolin As A Potential Raw Material in Chemicals & Drugs Formulation in Nigeria
6 pages
Unit 9 Crossword
No ratings yet
Unit 9 Crossword
1 page
3200 Tablet Press Maximum Economy of Production Double Rotary Press Single/Double Layer
No ratings yet
3200 Tablet Press Maximum Economy of Production Double Rotary Press Single/Double Layer
14 pages
Reductor Skimmer Daf SK 12080
100% (1)
Reductor Skimmer Daf SK 12080
2 pages
English As The Global Language of Science
No ratings yet
English As The Global Language of Science
4 pages
Beyond Power Sector Reforms The Need For Decentralised Energy Options (DEOPs) For Electricity Governance in Nigeria
0% (1)
Beyond Power Sector Reforms The Need For Decentralised Energy Options (DEOPs) For Electricity Governance in Nigeria
22 pages
Bio Magnetic Therapy - DrJockers
100% (2)
Bio Magnetic Therapy - DrJockers
20 pages
S Adamson Resume
No ratings yet
S Adamson Resume
4 pages
Energy Transformations Presentation
No ratings yet
Energy Transformations Presentation
21 pages
OPERCOM
100% (2)
OPERCOM
2 pages
Networks and Systems Profvgkmurti Department of Electrical Engineering Indian Institute of Technology, Madras Lecture - 03 Introductory Concepts
No ratings yet
Networks and Systems Profvgkmurti Department of Electrical Engineering Indian Institute of Technology, Madras Lecture - 03 Introductory Concepts
22 pages
Tugas 3 - Bahasa Inggris
100% (2)
Tugas 3 - Bahasa Inggris
3 pages
Chart of Schools
No ratings yet
Chart of Schools
1 page
XCP or "Universal Measurement and Calibration Protocol" Is A Network Protocol
No ratings yet
XCP or "Universal Measurement and Calibration Protocol" Is A Network Protocol
2 pages
History: Sanskrit India Guru
No ratings yet
History: Sanskrit India Guru
35 pages
Unit: Fahrenheit 451 CCSS or State Standards: Lesson Title: Lesson 1: Introduction Grade/Period: 9 Grade
No ratings yet
Unit: Fahrenheit 451 CCSS or State Standards: Lesson Title: Lesson 1: Introduction Grade/Period: 9 Grade
3 pages
Recruitment: By: K.C.Pattanaik Regd:1561301024
100% (1)
Recruitment: By: K.C.Pattanaik Regd:1561301024
22 pages
Creativity Theories and Themes Research Development and Practice 1st Edition Mark A. Runco (Author) - Download the ebook today and own the complete version
No ratings yet
Creativity Theories and Themes Research Development and Practice 1st Edition Mark A. Runco (Author) - Download the ebook today and own the complete version
56 pages
Impact of Social Media On Ghanaian High School Students: Library Philosophy and Practice January 2018
No ratings yet
Impact of Social Media On Ghanaian High School Students: Library Philosophy and Practice January 2018
35 pages
2012 Offshore en Web
No ratings yet
2012 Offshore en Web
4 pages
The Leadership Quarterly: Ilke Inceoglu, Geo FF Thomas, Chris Chu, David Plans, Alexandra Gerbasi
No ratings yet
The Leadership Quarterly: Ilke Inceoglu, Geo FF Thomas, Chris Chu, David Plans, Alexandra Gerbasi
24 pages

Assign--1

Uploaded by

Assign--1

Uploaded by

Data Manipulation using DPLYR package

Installing and loading the packages

## The following objects are masked from 'package:stats':

## The following objects are masked from 'package:base':

Loading the Datasets

Exploring the Carseats Dataset

## Sales CompPrice Income Advertising Population Price ShelveLoc Age

## Sales CompPrice Income

## Sales CompPrice Income Advertising Population Price ShelveLoc

niversity 6.8 25 26385 92

## Sales CompPrice Income Advertising Population Price ShelveLoc

## Sales CompPrice Income Advertising Population Price ShelveLoc

Group_by and Summarize

You might also like