0% found this document useful (0 votes)
13 views

R Program3

The document discusses various operations that can be performed on datasets in R like iris and airquality. It includes printing the structure and summary, checking for missing values, sorting and merging datasets.

Uploaded by

jefoli1651
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

R Program3

The document discusses various operations that can be performed on datasets in R like iris and airquality. It includes printing the structure and summary, checking for missing values, sorting and merging datasets.

Uploaded by

jefoli1651
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

4) Print the dataset iris

· Print the structure of the dataset iris

· Print the summary of all the variables of the dataset iris (Hint: Use function summary())

· How many of the variables (columns) are in the dataset iris

· How many observations (rows) are in the dataset iris

· Use duplicated() function to print the logical vector indicating the duplicate values present
in the dataset iris

· Extract duplicate elements from the dataset iris

· Extract unique elements from the dataset iris

· Print the indices of duplicate elements in the dataset iris

· Print the indices of unique elements in the dataset iris

· How many unique elements are in the dataset iris

· How many duplicate elements are in the dataset iris

Code:
Output:
5) Print the dataset airquality
· Print the structure of the dataset airquality

· Print the summary of all the variables of the dataset airquality (Hint: Use function
summary())

· How many of the variables (columns) are in the dataset airquality

· How many observations (rows) are in the dataset airquality

· Use the function is.na() to find whether any missing values are in the dataset airquality

· Print the indices of the missing values in the dataset airquality in column major
representation

· Print the indices of the missing values in the dataset airquality in row major representation

· Print indices of the missing values in row and column number-wise (Hint: Use function
which() and argument arr.ind = TRUE)

· How many missing values are in the dataset airquality?

· Which variables are the missing values concentrated in?

· How would you omit all rows containing missing values?

· Print the records without missing values in the dataset airquality using the function
complete.cases()

· Print the records without missing values in the dataset airquality using the function
na.omit()

· Print the records without missing values in the dataset airquality using the function
na.exclude()

· Print the records containing missing values in the dataset airquality using the function
complete.cases()
Code:

Output:
6)Consider a numeric vector x <- c(3,4,5,6,7,8)
· Write a command to recode the values less than 6 with zero in the vector x

· Write a command to recode the values between 4 and 8 with 100

· Write a command to recode the values that are less than 5 or greater than 6 with 50

· Write a command to recode the values less than 6 with NA in the vector x

· Write a command to recode the values between 4 and 8 with NA

· Write a command to recode the values that are less than 5 or greater than 6 with NA

· Count number of NA values after each operation

· Find mean of x (Hint: exclude NA values)

· Find median of x (Hint: exclude NA values)

· Write a command to recode the values less than 6 with “NA” (enclose with double quotes)
in the vector x

· Write a command to recode the values between 4 and 8 with “NA”

· Write a command to recode the values that are less than 5 or greater than 6 with “NA”

· Count number of NA values after each operation

· Find mean of x (Hint: exclude NA values)

· Find median of x (Hint: exclude NA values)

· What is the difference between NA and “NA”

Code:
Output:
7)Consider the given vectors:

A <- c(3, 2, NA, 5, 3, 7, NA, NA, 5, 2, 6)

B <- c(3, 2, NA, 5, 3, 7, NA, “NA”, 5, 2, 6)

· Find the length of the vector A

· Find the length of the vector B

· Sort the values in vector A and put it in p (Hint: use function sort())

· Find the length of p

· Sort the values in vector B and put it in q

· Find the length of q

· What did you infer from the above results

Code:
Output:

8) Create the "buildings" and "surveydata" dataframes to merge:

buildings <- data.frame(location=c(1, 2, 3), name=c("building1", "building2", "building3"))

surveydata <- data.frame(survey=c(1,1,1,2,2,2), location=c(1,2,3,2,3,1),


efficiency=c(51,64,70,71,80,58))

· The dataframes, buildings and surveydata have a common key variable called, “location”.

· Use the merge() function to merge the two dataframes by “location”, into a new dataframe
“buildingStats”.

Code:
Output:

9. Give the dataframes different key variable names:

buildings <- data.frame(location=c(1, 2, 3), name=c("building1", "building2", "building3"))

surveydata <- data.frame(survey=c(1,1,1,2,2,2), LocationID=c(1,2,3,2,3,1),


efficiency=c(51,64,70,71,80,58))

· The dataframes, buildings and data now have corresponding variables called location, and
LocationID.

· Use the merge() function to merge the columns of the two dataframes by the corresponding
variables.

Ø Perform inner join, outer join, left outer join, right outer join, cross join and write the
outputs in all cases.

Code:

Output:
10)Merge the rows of the following two dataframes:

buildings <- data.frame(location=c(1, 2, 3), name=c("building1", "building2", "building3"))

buildings2 <- data.frame(location=c(5, 4, 6), name=c("building5", "building4", "building6"))


Also, specify a new dataframe, “allBuidings”.

Code:

Output:

Code:
Output:
12) Read in the cars.txt dataset and call it car1. Make sure you use the "header=F" option to
specify that there are no column names associated with the dataset. Next, assign "speed" and
"dist" to be the first and second column names to the car1 dataset. Find the dimension and
structure of the dataset car1.

Code:
car1 <- read.table("C:\Users\sanna\Documents\SEM-4\FDA\R_programs/cars.txt", header = FALSE, skip =
3)

colnames(car1) <- c("speed", "dist")

print("Dimensions of car1:")

print(dim(car1))

print("Structure of car1:")

print(str(car1))

13)Create a dataframe(dtf) which contains data on store location, account rep, number of
employees and monthly sales and obtain the following output.

a) Write R code to sort the data frame in descending order by monthsales.

b) Write R code to first sort the above data frame by salesrep as the primary sort in ascending
order and then by monthsales in descending order.

Code:
Output:

14) Create a matrix of 4 X 5 containing duplicate elements and print unique elements from it.

Code:

Output:

You might also like