0% found this document useful (0 votes)
1 views

FE418_RLectureNotes1

The document provides an extensive overview of R programming basics, including setting the working directory, performing basic math operations, and understanding data types such as numeric, character, and logical. It covers variable assignment, the use of vectors and factors, and introduces data structures like data frames. Additionally, it discusses handling missing data and advanced data structures, emphasizing the importance of data frames in R.

Uploaded by

erhanmutlu42
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

FE418_RLectureNotes1

The document provides an extensive overview of R programming basics, including setting the working directory, performing basic math operations, and understanding data types such as numeric, character, and logical. It covers variable assignment, the use of vectors and factors, and introduces data structures like data frames. Additionally, it discusses handling missing data and advanced data structures, emphasizing the importance of data frames in R.

Uploaded by

erhanmutlu42
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 15

# This is a comment.

Comment lines in R start with the '#' sign


# Any thing written on a comment line is NOT executed

#Current working directory and environment


getwd()

#setwd("C:\Users\Erhan\Desktop\FE418")
#Error!

#You can copy and paste a path to a directory from your computer
#\path\to\a directory\...

#For example:
#C:\Desktop\newdirectory
C:\Users\Erhan\Desktop\FE418
setwd("C:/Users/Erhan/Desktop/FE418")

#!!!IMPORTANT!!! Note "\" should be "/" in R


#For example:
#/path/to/a directory/...
#C:/Desktop/FE418
# create a new folder with the name FE418 and make it the current working directory

#setwd("/../FE418")

#basic math
10*10

#calling a function
sqrt(4)

#basic math
#PEMDAS: Parenthesis, Exponents, Multiplication, Division, Addition, Subtraction
4*6+5

#Above expression is equivalent to:


(4*6)+5

4*(6+5)

#view folders and files in the current directory


dir()

#listing of objects, functions, variables in the current working environment


ls() #alphabetic order

#variable Assignment
x <- 2
x

#Unpreferred way:
#y=5
y = 5
y

3 <- z
z
a <- b <- 7
a
b

assign("j" , 4)
j

#variable names are case sensitive: j and J are different


#variable names can NOT start with a number or an underscore.
#prefer full noun names

#removing variables (frees up memory)


#rm()

j
rm(j)
j

ls()

#to remove any item from this list (can NOT be undone! )
rm(list = ls())

#Data Types: 1-numeric, 2-character(string), 3-Date/POSIXct(time based), 4-logical


(TRUE/FALSE))
#to check the type of data
class(x)
#testing whether a variable is numeric
is.numeric(x)

#assigning integer values (no decimals)


i <- 5L
i
is.integer(i)
is.numeric(i)

# R can promote integers to numeric when needed


class (4L)
class (2.8)
4L*2.8
class (4L*2.8)

class (5L)
class (2L)
5L/2L
class(5L/2L)

#Character Data
#R handles string data 2 ways: character and factor
x <- "data"
x
y <- factor("data")
y
#more on this in vector section

#characters are case sensitive "Data" is different from "data" or "DATA"


#to find the length of a character
nchar(x)
nchar("hello")
nchar(3)
nchar(452)
#this will not work for factor data
nchar(y)

#dates: Date stores just a date while POSIXct stores 6a date and time
#both objects are represented as the number of days (Date) or seconds (POSIXct)
#since January 1, 1970

Sys.Date( )
date()
x <-date()
x
class(x)
nchar(x)

date1 <- as.Date("2016-03-11")


date1

class(date1)
as.numeric(date1)

date2 <- as.Date("2020-03-03")


date3 <- as.Date("03-03-2020")

date3
as.numeric(date2)
as.numeric(date3)

#as.Date("26-02-2019", format = "%d - %m - %Y")


#as.Date("26/02/2019", format = "%d / %m / %Y")

d1 <- as.Date("03-03-2020","%d - %m - %Y")


d2 <- as.Date("03/03/2020","%d / %m / %Y")
d1
d2

date2 <- as.POSIXct("2019-02-26 10:55:43")


date2
class (date2)
as.numeric(date2)

# to get the "Date" for a number you need to install a package (such as zoo). Base
R do not recognize numbers as a variable to be converted to a specifi date
#date4 <- as.Date(17953) #installed the zoo package

as.Date(as.character(20200303), "%Y%m%d")

#Logicals: logicals are a way of representing data that can be either TRUE or FALSE
#Numerically, TRUE is the same as 1 and FALSE as 0.
TRUE * 5
FALSE * 5

k <- TRUE
k
class(k)
is.logical(k)
l <- "TRUE"
class(l)
is.logical(l)

#logicals can result from comparing two numbers or characters


#does 2 equal 3
2 == 3

n <- (2==3)
n

#does 2 not equal 3


2 != 3
m <- (2!=3)
m

n+m
n*m
n-m

#is two less than 3


2 < 3
#is 2 less than or equal to 3
2 <= 3
n
#is 'data' equal to 'stats'
"data" == "stats"
"data"=="DATA"

#is 'data' less than 'stats'


"data" < "stats"
"data" > "stats"

"1234" < "123"


"1234" > "123"
"1234"=="5678"

#Vectors
# collection of elements, all of the SAME type: c(1, 3, 2, 5, 1)
#c("R", "Excel", "SAS", "minitab")
#different from mathematical vectors (no column or row vector)
# "c" stands for combine

x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)


x
class(x)
x*3
x+2
x-3
x/4
x**2
sqrt(x)

y <- c(1,"A",2)
y
class(y)
y+3
# shortcut : operator
1:10

seq(1, 10, 1)
seq(1, 10, 2)
seq(1,10, 0.5)

#seq(10,1,1)

seq(from=1,to=10,by=1)

seq(from=10,to=1,by=-1)

seq(10,1, -1)

10:1
-2:3
5:-7
x <- 1:10
y <- -5:4
x
y
x+y
x-y
x*y
x/y
x**y
length(x)
length(y)
length(x+y)

# operating on two vectors of unequal length


x + c(1, 2)
x + c(1, 2, 3)

#Comparisons also work on vectors. Here the result


# is a vector of the same length containing TRUE or FALSE for each element.
x <= 5
x > y
x<y

any(x < y)
all(x < y)

any(x>y)
all(x>y)

q <- c("Hockey", "Football", "Baseball", "Curling", "Rugby", "Lacrosse",


"Basketball", "Tennis", "Cricket", "Soccer")

class(q)
q
nchar(q)
y
nchar(y)
length(q)

a <- c("a",
"b",
"c")

#Accessing individual elements of a vector is done using square brackets ([ ]). The
first element of x is retrieved by typing x[1],
#the first two elements by x[1:2] and nonconsecutive elements by x[c(1, 4)].

x[1]
q[1]
x[1:2]
x[5:7]

x[c(1, 4)]
#x[1,4]

q[1:2]
q[c(1, 4)]
q[8:10]
q[c(1,5,7,9)]

#It is possible to give names to a vector either during creation or after the fact.
# provide a name for each element of an array using a name-value pair
c(One = "a", Two = "y", Last = "r")

# create a vector
w <- 1:3
w
# name the elements
names(w) <- c("a", "b", "c")
w

class(w)

#Factor Vectors
#factors are an important concept in R, especially when building models (such as
statistical, machiene learning, etc.).
# Let�s create a simple vector of text data that has a few repeats. We will
#start with the vector q we created earlier and add some elements to it.

q2 <- c(q, "Hockey", "Lacrosse", "Hockey", "Water Polo", "Hockey", "Lacrosse")


q2

#Converting this to a factor is easy with as.factor


q2Factor <- as.factor(q2)
class(q2)
class(q2Factor)

q2Factor

#Notice that after printing out every element of q2Factor, R also prints the
#levels of q2Factor. The levels of a factor are the unique values of that
#factor variable. Technically, R is giving each unique value of a factor a
#unique integer tying it back to the character representation. This can be
#seen with as.numeric.

as.numeric(q2Factor) # olan numara kaçıncı level olduğunu gösteriyor.


#In ordinary factors the order of the levels does not matter and one level is no
#different from another. Sometimes, however, it is important to understand the
#order of a factor, such as when coding education levels. Setting the ordered
#argument to TRUE creates an ordered factor with the order given in the levels
argument.

factor(x=c("High School", "College", "Masters", "Doctorate"),


levels=c("High School", "College", "Masters", "Doctorate"),
ordered=TRUE)

factor(x=c("High School", "College", "Masters", "Doctorate"),


levels=c("High School", "College", "Masters", "Doctorate"),
ordered=FALSE)

# Create a categorical vector


day_vector <- c('evening', 'morning', 'afternoon', 'midday', 'midnight', 'evening')

# Convert `day_vector` to a factor with ordered level


factor_day <- factor(day_vector, order = TRUE, levels =c('morning', 'midday',
'afternoon', 'evening', 'midnight'))

# Print the new variable


factor_day
# Count the number of occurence of each level
summary(factor_day) # bir leveldan kaç tane olduğunu söylüyor

#CALLING FUNCTIONS
mean(x)

#FUNCTION DOCUMENTATION
?`+`
?`*`
?`==`

#There are occasions when we have only a sense of the function we want to use. In
that case we can
#look up the function by using part of the name with apropos.
apropos("mea")

##MISSING DATA
##Missing data plays a critical role in both statistics and computing, and R has
##two types of missing data, NA and NULL. While they are similar, they behave
differently
##and that difference needs attention.

# NA concept
#Often we will have data that has missing values for any number of reasons.
Statistical programs use varying techniques to represent missing data such as a
dash, a period or even the number 99.
#R uses NA. NA will often be seen as just another element of a vector. is.na tests
each element of a vector for missingness.

z <- c(1, 2, NA, 8, 3, NA, 3)


z
class(z)
length(z)

is.na(z)
zChar <- c("Hockey", NA, "Lacrosse")
zChar
is.na(zChar)

# NULL concept
#NULL is the absence of anything.
#It is not exactly missingness, it is nothingness. Functions can sometimes return
NULL and their arguments can be NULL.
#An important difference between NA and NULL is that NULL is atomical and cannot
exist within a vector. If used inside a vector it simply disappears.

z <- c(1, NULL, 3)


z

#Even though it was entered into the vector z, it did not get stored in z. In fact,
z is only two elements long.

#The test for a NULL value is is.null.

d <- NULL
d
is.null(d)

is.null(7)

#Since NULL cannot be a part of a vector, is.null is appropriately not vectorized.

# Advanced Data Structures


#Sometimes data requires more complex storage than simple vectors and thankfully R
provides a host of data structures.
#The most common are the data.frame, matrix and list followed by the array.
#Of these, the data.frame will be most familiar to anyone who has used a
spreadsheet, the matrix to people familiar with matrix math and the list to
programmers.

#DATA.FRAMES
#Perhaps one of the most useful features of R is the data.frame. It is one of the
most often cited reasons for R�s ease of use.

#On the surface a data.frame is just like an Excel spreadsheet in that it has
columns and rows. In statistical terms, each column is a variable and each row is
an observation.

#In terms of how R organizes data.frames, each column is actually a vector, each of
which has the same length.

#That is very important because it lets each column hold a different type of data
(see Section 4.3).

#This also implies that within a column each element must be of the same type, just
like with vectors.

#There are numerous ways to construct a data.frame, the simplest being to use the
data.frame function.

#Let�s create a basic data.frame using some of the vectors we have already
introduced, namely x, y and q.
x <- 10:1
y <- -4:5
q <- c("Hockey", "Football", "Baseball", "Curling", "Rugby", "Lacrosse",
"Basketball", "Tennis", "Cricket", "Soccer")

theDF <- data.frame(x, y, q)


theDF

x y q
1 10 -4 Hockey
2 9 -3 Football
3 8 -2 Baseball
4 7 -1 Curling
5 6 0 Rugby
6 5 1 Lacrosse
7 4 2 Basketball
8 3 3 Tennis
9 2 4 Cricket
10 1 5 Soccer

#This creates a 10x3 data.frame consisting of those three vectors. Notice the names
of theDF are simply the variables.
#We could have assigned names during the creation process, which is generally a
good idea.

class(theDF)

theDF <- data.frame(First = x, Second = y, Sport = q)

#wrong: theDF <- data.frame( x=First, Second = y, Sport = q)

theDF

First Second Sport


1 10 -4 Hockey
2 9 -3 Football
3 8 -2 Baseball
4 7 -1 Curling
5 6 0 Rugby
6 5 1 Lacrosse
7 4 2 Basketball
8 3 3 Tennis
9 2 4 Cricket
10 1 5 Soccer

t(theDF) #transpose yapıyor

a <- 1:5
b <- -5:-1
c <- c(1,2,3,NA,5)
df2 <- data.frame (a,b,c)
df2

#data.frames are complex objects with many attributes. The most frequently checked
attributes are the number of rows and columns. Of course there are functions to do
this for us: nrow and ncol.
#And in case both are wanted at the same time there is the dim function.

nrow(theDF)

ncol(theDF)

dim(theDF)

#Checking the column names of a data.frame is as simple as using the names


function.
#This returns a character vector listing the columns.
#Since it is a vector we can access individual elements of it just like any other
vector.

names(theDF)

names(theDF)[3]

#We can also check and assign the row names of a data.frame.

rownames(theDF)

rownames(theDF) <- c("One", "Two", "Three", "Four", "Five", "Six",


"Seven", "Eight", "Nine", "Ten")
rownames(theDF)

[1] "One" "Two" "Three" "Four" "Five" "Six" "Seven" "Eight"


[9] "Nine" "Ten"

# set them back to the generic index


rownames(theDF) <- NULL
rownames(theDF)

[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"

#Usually a data.frame has far too many rows to print them all to the screen,
#so thankfully the head function prints out only the first few rows.

head(theDF) #ilk 6 satırı basıyor

First Second Sport


1 10 -4 Hockey
2 9 -3 Football
3 8 -2 Baseball
4 7 -1 Curling
5 6 0 Rugby
6 5 1 Lacrosse

head(theDF, n = 7)

First Second Sport


1 10 -4 Hockey
2 9 -3 Football
3 8 -2 Baseball
4 7 -1 Curling
5 6 0 Rugby
6 5 1 Lacrosse
7 4 2 Basketball

tail(theDF)

First Second Sport


5 6 0 Rugby
6 5 1 Lacrosse
7 4 2 Basketball
8 3 3 Tennis
9 2 4 Cricket
10 1 5 Soccer

tail(theDF, 2)

First Second Sport


9 2 4 Cricket
10 1 5 Soccer

#As we can with other variables, we can check the class of a data.frame using the
class function.

class(theDF)

[1] "data.frame"

#Since each column of the data.frame is an individual vector, it can be accessed


individually and each has its own class. Like many other aspects of R, there are
multiple ways to access an individual column.
#There is the $ operator and also the square brackets. Running theDF$Sport will
give the third column in theDF. That allows us to specify one particular column by
name.

theDF$Sport

vector1 <- theDF$First


vector1

vector2 <- theDF$Second


vector2

class(theDF$Sport)

#Similar to vectors, data.frames allow us to access individual elements by their


position using square brackets, but instead of having one position two are
specified.
#The first is the row number and the second is the column number. So to get the
third row from the second column we use theDF[3, 2].

theDF[3, 2]

[1] -2

theDF[10, 3]

[1] "Soccer"
#To specify more than one row or column use a vector of indices.
# row 3, columns 2 through 3

theDF[3, 2:3]

Second Sport
3 -2 Baseball

theDF[10, 1:2]

First Second
10 1 5

# rows 3 and 5, column 2


# since only one column was selected it was returned as a vector

# hence the column names will not be printed


theDF[c(3, 5), 2]

[1] -2 0

# rows 3 and 5, columns 2 through 3


theDF[c(3, 5), 2:3]

Second Sport
3 -2 Baseball

>
> # rows 3 and 5, column 2
> # since only one column was selected it was returned as a vector

> # hence the column names will not be printed


theDF[c(3, 5), 2]

[1] -2 0

>
> # rows 3 and 5, columns 2 through 3
theDF[c(3, 5), 2:3]

Second Sport
3 -2 Baseball
5 0 Rugby

#To access an entire row, specify that row while not specifying any column.
Likewise, to access an entire column, specify that column while not specifying any
row.

> # all of column 3


> # since it is only one column a vector is returned

theDF[, 3]

[1] "Hockey" "Football" "Baseball" "Curling" "Rugby" "Lacrosse"


[7] "Basketball" "Tennis" "Cricket" "Soccer"
> # all of columns 2 through 3
theDF[, 2:3]

Second Sport
1 -4 Hockey
2 -3 Football
3 -2 Baseball
4 -1 Curling
5 0 Rugby
6 1 Lacrosse
7 2 Basketball
8 3 Tennis
9 4 Cricket
10 5 Soccer

>
> # all of row 2
theDF[2, ]

First Second Sport


2 9 -3 Football

>
> # all of rows 2 through 4
theDF[2:4, ]

First Second Sport


2 9 -3 Football
3 8 -2 Baseball
4 7 -1 Curling

class(theDF[2:4, ])

#To access multiple columns by name, make the column argument a character vector of
the names.

theDF[, c("First", "Sport")]

First Sport
1 10 Hockey
2 9 Football
3 8 Baseball
4 7 Curling
5 6 Rugby
6 5 Lacrosse
7 4 Basketball
8 3 Tennis
9 2 Cricket
10 1 Soccer

#Yet another way to access a specific column is to use its column name (or its
number) either as second argument to the square brackets or as the only argument to
either single or double square brackets.

> # just the "Sport" column


> # since it is one column it returns as a (factor) vector
theDF[, "Sport"]

[1] "Hockey" "Football" "Baseball" "Curling" "Rugby" "Lacrosse"


[7] "Basketball" "Tennis" "Cricket" "Soccer"

class(theDF[, "Sport"])

[1] "character"

>
> # just the "Sport" column
> # this returns a one column data.frame
theDF["Sport"]

Sport
1 Hockey
2 Football
3 Baseball
4 Curling
5 Rugby
6 Lacrosse
7 Basketball
8 Tennis
9 Cricket
10 Soccer

class(theDF["Sport"])

[1] "data.frame"

>
> # just the "Sport" column
> # this also returns a (factor) vector

theDF[["Sport"]]

[1] "Hockey" "Football" "Baseball" "Curling" "Rugby" "Lacrosse"


[7] "Basketball" "Tennis" "Cricket" "Soccer"

class(theDF[["Sport"]])

[1] "character"

#All of these methods have differing outputs. Some return a vector, some return a
single-column data.frame.
#To ensure a single-column data.frame while using single-square brackets, there is
a third argument: drop=FALSE.
#This also works when specifying a single column by number.

theDF[, "Sport", drop = FALSE]

Sport
1 Hockey
2 Football
3 Baseball
4 Curling
5 Rugby
6 Lacrosse
7 Basketball
8 Tennis
9 Cricket
10 Soccer

class(theDF[, "Sport", drop = FALSE])

[1] "data.frame"

>
theDF[, 3, drop = FALSE]

Sport
1 Hockey
2 Football
3 Baseball
4 Curling
5 Rugby
6 Lacrosse
7 Basketball
8 Tennis
9 Cricket
10 Soccer

class(theDF[, 3, drop = FALSE])

[1] "data.frame"

You might also like