Assign--1
Assign--1
2024-08-22
library(ISLR)
library(dplyr)
##
## Attaching package: 'dplyr'
The carseats datasets has 8 attributes such as Sales, ComPrice, Income, Advertising,
Population, Price, ShelveLoc, Age, Education, Urban, and US records.
Select
1. Select the columns Sales, CompPrice, and Income from the Carseats dataset.
Selected_Car <- Carseats %>%
select(Sales, CompPrice, Income) %>%
slice_head(n=10)
Selected_Car
In this code, I used the select() function to select the columns titled Sale, Comprice,
and Income. Selecting the sales records is essential for monitoring performance and
tracking revenues. The Comprice column is used for cost management and profitability
analysis. The Income record is crucial for profit calculation, financial planning, and
compliance and reporting.
Filter
2. Filter the Carseats dataset to include only observations where Sales is greater than
8000.
Filtered_Car <- Carseats %>%
filter(Sales > 8.000) %>%
slice_head (n=10)
Filtered_Car
After filtering, the top 10 rows of the resulting dataset are selected using the slice_head function.
The filtered data showcases various features of car seat sales, including details such as competition price
(CompPrice), average income of the population (Income), and advertising expenditure (Advertising). The
results display a mix of attributes across different observations. For example, the Sales values range from
8.71 to 11.96. Other characteristics include competition prices between 107 and 149, advertising
expenditures from 4 to 16, and the age of store shelves varying between 26 and 78 years. The ShelveLoc
variable categorizes the display location quality as "Bad," "Good," or "Medium," with "Good" being the
most common. Additionally, the data indicates that most stores are located in urban areas (Urban) and are
within the US (US), except for one instance where a store is neither urban nor US-based. This filtered
dataset provides a snapshot of stores with relatively high sales performance, revealing the diversity in
their market and demographic characteristics.
Arrange
3. Order the Carseats dataset by Sales in descending order.
Arranged_Car <- Carseats %>%
arrange(desc(Sales)) %>%
slice_head(n=10)
Arranged_Car
First, the data is sorted in descending order based on the "Sales" column using the
arrange(desc(Sales)) function. Then, the slice_head(n=10) function is applied to extract the top 10 entries
from this sorted data. The resulting dataset highlights the 10 Carseats records with the highest sales, along
with associated details such as competitor prices, income levels, advertising budgets, and various other
attributes. Notably, most of these top-selling stores have a "Good" shelf location rating and are situated in
urban areas in the United States. This suggests that these factors might contribute to higher sales
performance in these particular locations.
Mutate
4. Create a new variable in the Carseats dataset called Profit calculated as Sales minus
Price.
Mutated_Car <- Carseats %>%
mutate(Profit = Sales - Price) %>%
slice_head (n=10)
Mutated_Car
## ShelveLoc n
## 1 Bad 96
## 2 Good 85
## 3 Medium 219
The code provided calculates a summary of the Carseats dataset, focusing specifically on the
ShelveLoc variable, which represents the quality of shelf location for car seats. By using the count
function from the dplyr package, the code counts the number of occurrences for each category within the
ShelveLoc variable. The resulting summary data shows that out of all the observations, 96 instances have
a "Bad" shelf location, 85 instances have a "Good" shelf location, and 219 instances have a "Medium"
shelf location. This distribution indicates that the "Medium" shelf location is the most common among the
car seat products in the dataset.
Additional Challenges
6. Create a new variable in the Carseats dataset indicating whether sales are high,
medium, or low based on certain thresholds.
Carseats_with_new_column <- Carseats %>%
mutate(SalesCategory = case_when(
Sales > 8 ~ "High",
Sales > 4 ~ "Medium",
TRUE ~ "Low")) %>%
slice_head(n=10)
Carseats_with_new_column
## Sales CompPrice Income Advertising Population Price ShelveLoc
Age Education
## 1 9.50 138 73 11 276 120 Bad
42 17
## 2 11.22 111 48 16 260 83 Good
65 10
## 3 10.06 113 35 10 269 80 Medium
59 12
## 4 7.40 117 100 4 466 97 Medium
55 14
## 5 4.15 141 64 3 340 128 Bad
38 13
## 6 10.81 124 113 13 501 72 Bad
78 16
## 7 6.63 115 105 0 45 108 Medium
71 15
## 8 11.85 136 81 15 425 120 Good
67 10
## 9 6.54 132 110 0 108 124 Medium
76 10
## 10 4.69 132 113 0 131 124 Medium
76 17
## Urban US SalesCategory
## 1 Yes Yes High
## 2 Yes Yes High
## 3 Yes Yes High
## 4 Yes Yes Medium
## 5 Yes No Medium
## 6 No Yes High
## 7 Yes No Medium
## 8 Yes Yes High
## 9 No No Medium
## 10 No Yes Medium
The code provided creates a new column called SalesCategory in the Carseats dataset by using
the mutate function from the dplyr package. The SalesCategory is determined based on the Sales values:
if Sales is greater than 8, the category is labeled as "High"; if Sales is between 4 and 8, it is labeled as
"Medium"; and if Sales is 4 or below, it is labeled as "Low." After adding this new column, the
slice_head function is used to select the first 10 rows of the modified dataset. The output displays these
rows with the new SalesCategory column included. For example, in the first row, the Sales value is 9.50,
which results in a "High" classification in the SalesCategory. In contrast, the fifth row, with a Sales value
of 4.15, falls into the "Medium" category. This process enables quick categorization of sales performance
within the dataset, aiding in easier data analysis and interpretation.