R
R
md 10/3/2022
R programming language
history
Robert Gentleman has developed R in 1993
Robert got inspirations from another language S [Statistics] from IBM
developed for performing the statistical calculations
features
it is not a general purpose development language
can not be used for
developing console applications
developing web applications (*)
developing mobile applications
can be used to develop applications involving the statistical calculations
provides the general purpose programming constructs
variable declarations
function declaration and invocation
loops
condition
data types
collections
provides in-built statistical function
provides various types of packages
graphical packages (ggplot)
machine learning packages
data analysis packages
data analytics packages
is a scripting language (there is no entry point function)
environment setup
install latest version of R programming language from
https://cran.r-project.org/bin/windows/base/
IDEs
RStudio [https://rstudio.com/products/rstudio/download/]
Pycharm [https://www.jetbrains.com/help/pycharm/r-plugin-support.html]
Editor
VSCode
update system environment variable named path with the bin directory of python and R
execute the code
1 / 20
notes.md 10/3/2022
shortcuts in RStudio
ctrl + l => clear the console
ctrl + shift + n => creates a new R script
ctrl + enter => run the current line or selected lines
ctrl + shift + s => runs the current entire document
? => shows help of the function name
?? => search for the topic
identifier
used to identify an entity like variable or a function
rules to declare identifier
can not start an identifier with a number
e.g. 1name is invalid
can not start an identifier with an underscore
e.g. _varname is invalid
can not use special characters like space, symbols (#$%@)
e.g.
invalid identifier (has space within it): first name
invalid identifier: name@1, name#, name%
identifier can use a special character dot (.)
e.g.
valid identifier: first.name
valid identifier: can.vote
convention
to create an identifier for first name
first_name (preferred in python)
first.name (preferred in R)
variable
num = 100
print(num)
pre-defined values
NA
Not Available
Inf
stands for infinity
data types
in R, data type is always inferred
automatically decided by R by looking at the current value inside the variable
types
Vectors
Lists
Matrices
Arrays
3 / 20
notes.md 10/3/2022
Factors
Data Frames
Vectors
integer
represents a whole number
e.g.
# vector of integer
age = 30L
character
represents a string
e.g.
# vector of character
first.name = "steve"
# vector of character
last.name = 'jobs'
logical
represents boolean values like
4 / 20
notes.md 10/3/2022
TRUE or FALSE
T or F
e.g.
# vector of logical
can.vote = TRUE
raw
represents raw characters
e.g.
# vector of raw
address = charToRaw('pune')
# vector of raw
hello = charToRaw('नम ार')
complex
represents a complex number
complex number contains
real part
imaginary part
e.g.
# vector of complex
complex.number = 10 + 20i
# vector of numeric
# v.1 = [10, 20, 30, 40, 50, 1]
v.1 = c(10, 20, 30, 40, 50, TRUE)
# vector of character
# v.2 = ["10", "20", "steve", "30", "40"]
v.2 = c(10, 20, "steve", 30, 40)
# vector of character
5 / 20
notes.md 10/3/2022
indexing
index starts at one (1) [does not start at zero (0)]
positivie indexing
e.g.
numbers = c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
negative indexing
exclude the index and return remaining values
e.g.
numbers = c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
slicing
6 / 20
notes.md 10/3/2022
numbers = c(10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
broadcast operator
performing operation on every member of the vector
mathematical operators 1
mathematical operators 2
7 / 20
notes.md 10/3/2022
logical operators
# broadcast
# FALSE FALSE FALSE TRUE TRUE FALSE TRUE
print(!numbers.3)
# non-broadcast operators
# TRUE
print(numbers.3 && TRUE)
# TRUE
print(numbers.4 || TRUE)
# broadcast operators
Lists
multi-dimensional collection
collection made up with multiple vectors
to create a list call a function list()
e.g.
list created without temporary names
list.1 = list(
c(10, 20, 30, 40, 50),
c(60, 70, 80, 90, 100)
)
# [[1]]
# [1] 10, 20, 30, 40, 50
# [[2]]
# [2] 60, 70, 80, 90, 100
8 / 20
notes.md 10/3/2022
# 20
print(list.1[[1]][2])
# 90
print(list.1[[2]][4])
persons = list(
personNames = c("person1", "person2"),
personAddresses = c("pune", "mumbai")
)
# $personNames
# [1] "person1" "person2"
# #personAddresses
# [1] "pune" "mumbai"
# "person1" "person2"
print(persons$personNames)
# person2
print(persons$personNames[2])
# "pune" "mumbai"
print(persons$personAddresses)
# pune
print(persons$personAddresses[1])
Matrices
9 / 20
notes.md 10/3/2022
# [, 1] [, 2]
# [1, ] 10 30
# [2, ] 20 40
# [, 1] [, 2]
# [1, ] 10 20
# [2, ] 30 40
# [, 1] [, 2]
# [1, ] 10 30
# [2, ] 20 40
# [1] 10 30
print(m.1[1, ])
# [1] 30 40
print(m.1[, 2])
# [1] 30
print(m.1[1, 2])
to make the retrieval easier, matrix provides a naming rows and columns convention
e.g.
# 7.5
print(cars["car1", "price"])
# i10
print(cars["car2", "model"])
10 / 20
notes.md 10/3/2022
# [,1] [,2]
# [1,] 10 10
# [2,] 20 20
# [3,] 30 30
# [,1] [,2]
# [1,] 10 20
# [2,] 30 10
# [3,] 20 30
Arrays
Factors
factor.2 = factor(
c('red', 'green', 'red', 'green', 'green'),
levels = c('red', 'green'),
labels = c(1, 2)
)
# 1 2 1 2 2
# Levels: 1 2
Data Frames
df.persons = data.frame(
name = names,
age = ages,
email = emails,
address = addresses
)
functions
class
used to get the type of data structure
e.g.
# numeric
print (class(v))
typeof
used to get the data type of data structure
e.g.
12 / 20
notes.md 10/3/2022
# double
print (typeof(v))
names
used to set or get the temporary names for every position in the data structure
used with data.frames and lists
e.g.
# names ages
print(names(list.1))
str
similar to info() in pandas
used to get the basic information about the data structure
used with data frames and lists
the information includes
number of observations (rows)
number of variables (columns)
name of every variable
data type of every variable
preview of data in every column
e.g.
df.persons = data.frame(
name = names,
age = ages,
email = emails,
address = addresses
)
print(str(df.persons))
"[email protected]"
# $ address: chr "pune" "mumbai" "nashik" "satara"
summary
similar to describe() function in pandas
used to get statistical information about the data structure
can be used with data frames and lists
information includes
data type of column
for numeric column
minimum value
maxivalue
median
mean
1st quartile
3rd quartile
e.g.
df.persons = data.frame(
name = names,
age = ages,
email = emails,
address = addresses
)
print(summary(df.persons))
head
14 / 20
notes.md 10/3/2022
df.persons = data.frame(
name = names,
age = ages,
email = emails,
address = addresses
)
print(head(df.persons, 2))
tail
used to retrieve last few records
e.g.
df.persons = data.frame(
name = names,
age = ages,
email = emails,
address = addresses
)
print(tail(df.persons, 2))
nrow
15 / 20
notes.md 10/3/2022
df.persons = data.frame(
name = names,
age = ages,
email = emails,
address = addresses
)
# 4
print(nrow(df))
ncol
used to get number of columns in a data frame
e.g.
df.persons = data.frame(
name = names,
age = ages,
email = emails,
address = addresses
)
# 4
print(ncol(df))
colnames
used to get column names
e.g.
16 / 20
notes.md 10/3/2022
df.persons = data.frame(
name = names,
age = ages,
email = emails,
address = addresses
)
rownames
used to get row names
e.g.
df.persons = data.frame(
name = names,
age = ages,
email = emails,
address = addresses
)
Type inspection
in R type can be inspected by using is functions (functions start with is.*)
e.g.
num = 100
print(is.numeric(num)) # TRUE
17 / 20
notes.md 10/3/2022
print(is.character(num)) # FALSE
first.name = "steve"
print(is.numeric(first.name)) # FALSE
print(is.character(first.name)) # TRUE
Type conversion
used to convert data from one type to another
e.g.
print(as.numeric("10")) # 10
print(as.logical("TRUE")) # TRUE
print(as.character(10)) # "10"
print(as.character(FALSE)) # "FALSE"
print(as.logical(1)) # TRUE
print(as.logical(0)) # FALSE
print(as.logical(100)) # TRUE
print(as.logical(-100)) # TRUE
loops
for loop
functions
custom function
user defind functions
fucntion can be created with function object
syntax
18 / 20
notes.md 10/3/2022
# function declaration
# empty function
function1 = function() {
# function body here
}
# function call
function1()
built-in fuction
getwd
used to get the current working directory
working directory is used to search the files/documents when application wants to read
them
setwd
used to set the current working directory
working directory is used to search the files/documents when application wants to read
them
e.g.
setwd('/Volumes/data/sunbeam/dataset/data1/')
install.package
used to install a package or library and its dependencies
e.g.
install.package('tidyverse')
library
used to load a library in the current environment
e.g.
library(tidyverse)
19 / 20
notes.md 10/3/2022
version numbers
version number: x.y.z.a
x: major version
y: minor version
z: build version
a: nightly build version
change in the version number
major number: breaking changes
minor number: enhancement / feature addition / bug fixing
build number: bug fixing / nightly build
20 / 20