0% found this document useful (0 votes)
52 views

Week2 1 Rev

This document summarizes a lecture on data structures, variables, and data types in R programming. It discusses matrices, lists, indexing, built-in data like mtcars, combining and sorting data frames, and using the data.table package for improved efficiency with large datasets. Functions covered include Matrix(), colnames(), rownames(), [[]], $, sample(), cbind(), rbind(), and fread(). The key advantages of data.table are its speed for handling large datasets compared to data frames.

Uploaded by

Aaron Chan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Week2 1 Rev

This document summarizes a lecture on data structures, variables, and data types in R programming. It discusses matrices, lists, indexing, built-in data like mtcars, combining and sorting data frames, and using the data.table package for improved efficiency with large datasets. Functions covered include Matrix(), colnames(), rownames(), [[]], $, sample(), cbind(), rbind(), and fread(). The key advantages of data.table are its speed for handling large datasets compared to data frames.

Uploaded by

Aaron Chan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

ISOM 3390 Business Programming in R

Week 2 Data structures, variables, and data types


14, September 2022

Instructor Hyungsoo Lim

ISOM 3390 Business Programming in R


Plan for today

 Basic arithmetic
• Matrix
• List
• Indexing
• Useful functions for organizing data structure
• Data.table

ISOM 3390 Business Programming in R 2


Matrix

 Matrix()

ISOM 3390 Business Programming in R 3


Matrix

 Dimension of a Matrix

ISOM 3390 Business Programming in R 4


Matrix

 Names of a Matrix
• Use colnames() and rownames()

ISOM 3390 Business Programming in R 5


Lists

 Each element of a list can be any type

Current booking Current booking


at hotel website at OTA website
Last booking at
0.91 0.12
hotel website
Last booking
0.24 0.72
at OTA website

ISOM 3390 Business Programming in R 6


Subset of a list

 Use [[ ]]

ISOM 3390 Business Programming in R 7


Global environment

• How do we clean the workspace?

• You can easily remove any objects with rm()


• Ex)
✓ a = c(1,2,3)
✓ b = c(“aa”,”bb”)
✓ Rm(a, b)

ISOM 3390 Business Programming in R 8


Built-in data

• List of built-in data


✓ Check the available data with “data()”
• Ex) mtcars
1. See str(mtcars)

ISOM 3390 Business Programming in R 9


Built-in data

• List of built-in data


✓ Check the available data with “data()”
• Ex) mtcars
1. See str(mtcars)
2. Check missing values
• More details will be discussed in the lab session

ISOM 3390 Business Programming in R 10


Built-in data

• List of built-in data


✓ Check the available data with “data()”
• Ex) mtcars
1. See str(mtcars)
2. Check missing values
3. Check head(mtcars) & tail (mtcars)

ISOM 3390 Business Programming in R 11


Indexing

 How about checking randomly selected rows?


• Remind of indexing with a vector
✓ a = c(1:5)
✓ a[c(1,3)]
• Similarly, we can index a data frame using []

ISOM 3390 Business Programming in R 12


Indexing

 How about checking randomly selected rows?


• Use sample function

• Is it necessary to check random rows?


✓ not necessarily but it is recommended to check especially when creating/adding/adjusting new variables
• e.g., a proportion (there could be the case with zero denominator)

ISOM 3390 Business Programming in R 13


Indexing

 How about columns?


• To see the first and third columns
✓ mtcars[, c(1,3)] or mtcars[,c(“mpg”, “disp”)]

ISOM 3390 Business Programming in R 14


Indexing

 Use $ to index

ISOM 3390 Business Programming in R 15


Indexing

 Use $ to index
• What if we type mpg < cyl?
✓ It does not work
✓ Outputs an error message → Error: object ‘mpg’ not found

• mtcars$mpg < mtcars$cyl


✓ It works

ISOM 3390 Business Programming in R 16


Subset of data frame

 Conditional subset

ISOM 3390 Business Programming in R 17


Data frame

 Combining data frames


• Use cbind()

ISOM 3390 Business Programming in R 18


Data frame

 Combining data frames


• Use rbind()

ISOM 3390 Business Programming in R 19


Data frame

 Add columns

ISOM 3390 Business Programming in R 20


Data frame

 Sorting

ISOM 3390 Business Programming in R 21


Unique function

• Unique(x)
✓ unique returns a vector or data frame or like x but with duplicate elements/rows removed
✓ Useful when discarding duplicate elements/rows

ISOM 3390 Business Programming in R 22


Summary of data structures in R

• Homogeneous: types of all elements are same


• Heterogeneous: types of all elements could be different

Homogeneous Heterogeneous
1 dimension Vector List
2 dimensions Matrix Data frame

ISOM 3390 Business Programming in R 23


Data.table

 When you handle a large dataset


• Use package called “data.table”

• Pretty similar to data.frame but much faster than data.frame

• How to download the package?


1. Type install.packages(“data.table”)
2. Tools → Install packages… → type data.table

• How to load the installed package?


✓ library(data.table)

ISOM 3390 Business Programming in R 24


Data.table

 Comparison for importing a csv file


• To import data, you need to check your working directory using “getwd()”
✓ e.g., "C:/Users/hyung/Documents"
• “setwd()” will alter the working directory
✓ e.g., setwd(“C:/Users/ISOM3390”)
• Import data in data.frame
✓ read.csv(‘reddevils_comments_16_17.csv’)
• Import data in data.table
✓ fread(‘reddevils_comments_16_17.csv’)
• system.time: return CPU times

ISOM 3390 Business Programming in R 25


Data.table

 Comparison for creating a csv file


• Generate a data.frame (smpl_db)
✓ Number of rows: 100,000 & Number of columns: 10
✓ Elements are randomly generated with zero mean and one standard deviation

• Great a csv file in data.frame


✓ write.csv(smpl_db, ‘smpl_db_data_frame.csv’)
• Great a csv file in data.table
✓ fwrite(smpl_db, ‘smpl_db_data_table.csv’)

ISOM 3390 Business Programming in R 26

You might also like