R-Basic Concepts
R-Basic Concepts
Discussion
• What is R?
• Installation of R
• R as calculator
• R objects
• Data type
• Operators
• Basic inbuilt functions
What is R ?
• R is a programming language and software environment for statistical
analysis, graphics representation and reporting.
https://www.r-project.org/about.html
• 2+3*4
• 3^2
• exp(1)
• sqrt(10)
• pi
• 2*pi/10
R-Objects
• If you want to use previous numbers again, you’ll have to ask your
computer to save them somewhere
• You can do that by creating an R object
• What is an object?
• Just a name that you can use to call up stored data
• For example,
• a <- 1
• a
• a+2
Naming rule for objects
• You can name an object in R almost anything you want, but there are
a few rules.
• First, a name cannot start with a number.
• Second, a name cannot use some special symbols, like ^, !, $, @, +, -,
• R is case sensitive
• my.name is different from My.name.
Display or remove objects- ls(), rm()
• We can list which object names we have already used
• ls()
• We can remove object from memory
• rm()
Operators
• An operator is a symbol that tells the compiler to perform specific
mathematical or logical manipulations.
• We have the following types of operators in R programming −
• Arithmetic Operators (+,-,*, /, ^)
• Relational Operators (<, <=, >, >=, ==, !=)
• Logical Operators (&, |, !)
• Assignment Operators (<-, =, <<-)
• Miscellaneous Operators (:, %in% )
>X*Y*Z
Error: Object "X" not found
• Further assignment
• y <- c(x, 0, x)
Vector arithmetic
• Vectors can be used in arithmetic expressions (element by element
manipulation)
• Vectors occurring in the same expression need not all be of the same
length.
• Shorter vectors in the expression are recycled as often as needed
• Example:
• v <- 2*x + y + 1
• The elementary arithmetic operators are the usual +, -, *, / and ^ for raising to a
power.
• Common arithmetic functions are available.
• log, exp, sin, cos, tan, sqrt, and so on, all have their usual meaning.
• max and min select the largest and smallest elements of a vector respectively.
• range is a function whose value is a vector of length two, namely c(min(x), max(x)).
• length(x) is the number of elements in x
• sum(x) gives the total of the elements in x
• prod(x) their product.
Complex numbers- more example
• To work with complex numbers, supply an explicit complex part.
• Thus sqrt(-17) will give NaN and a warning, but sqrt(-17+0i) will do the
computations as complex numbers.
Generating regular sequences
• R has a number of facilities for generating commonly used sequences
of numbers.
• The colon operator (:) returns every integer between two integers. It
is an easy way to create a sequence of numbers
• For example
• 1:30 is the vector c(1, 2, ..., 29, 30).
• The colon operator has high priority within an expression, so, for example
2*1:15 is the vector c(2, 4, ..., 28, 30).
• Put n <- 10 and compare the sequences 1:n-1 and 1:(n-1).
• The construction 30:1 may be used to generate a sequence backwards.
seq(), rep()- Generating sequence
• The function seq() is a more general facility for generating sequences.
• seq(2,10) is the same vector as 2:10
• More general syntax:
• s3 <- seq(-5, 5, by=.2)
• s4 <- seq(length=51, from=-5, by=.2)
• rep()
• > x=c(2,4,6)
• > rep(x, times=5)
• [1] 2 4 6 2 4 6 2 4 6 2 4 6 2 4 6
• > rep(x, each=5)
• [1] 2 2 2 2 2 4 4 4 4 4 6 6 6 6 6
Missing values
• In some cases the components of a vector may not be completely known.
• “not available” or a “missing value” in the statistical sense
• It can be resolved by assigning the special value NA.
• The function is.na(x) gives a logical vector of the same size as x with value
TRUE if and only if the corresponding element in x is NA.
• > z <- c(1:3,NA)
• >z
• [1] 1 2 3 NA
• > is.na(z)
• [1] FALSE FALSE FALSE TRUE
• A second kind of “missing” values which are produced by numerical
computation, the so-called Not a Number, NaN, values.
• > 0/0
• > Inf - Inf
Character vectors
• Character quantities and character vectors are used frequently in R
• Character strings are entered using either matching double (") or single (’)
quotes, but are printed using double quotes (or sometimes without
quotes).
• char_vect=c(“hello friends”)
• Print(char_vect)
• Character vectors may be concatenated into a vector by the c() function
• The paste() function takes an arbitrary number of arguments and
concatenates them one by one into character strings.
• For example
• labs <- paste(c("X","Y"), 1:10, sep="")
• output
• c("X1", "Y2", "X3", "Y4", "X5", "Y6", "X7", "Y8", "X9", "Y10")
• Note: recycling of short lists takes place; thus c("X", "Y") is repeated 5 times to match the
sequence 1:10
Index vectors; selecting and modifying subsets
of a data set
• Subsets of the elements of a vector may be selected by appending to the
name of the vector an index vector in square brackets.
• A vector of positive integral quantities.
• In this case the values in the index vector must lie in the set {1, 2, . . . , length(x)}
• Example-
• x=c(1:10)
• x[6] # is the sixth component of x
• x[1:10] # extract the 1st to 10th element of vector x
• A vector of negative integral quantities.
• Such an index vector specifies the values to be excluded rather than included.
• y <- x[-(1:5)]
• It produce all elements of vector x expect the first five elements of x.
Index vectors; selecting and modifying subsets
of a data set-II
• A logical vector
• x=c(1,2,3,NA)
• y <- x[!is.na(x)]
• creates (or re-creates) an object y which will contain the non-missing values
of x, in the same order. Note that if x has missing values, y will be shorter than
x.
• Example:
• (x+1)[(!is.na(x)) & x>0] -> z
• creates an object z and places in it the values of the vector x+1 for which the
corresponding value in x was both non-missing and positive.
Inbuilt data-bases available in R
• R databases are available on https://stat.ethz.ch/R-manual/R-
devel/library/datasets/html/00Index.html
• To see in R-console type:
• data()
Other types of objects
• matrices: multi-dimensional arrays and generalizations of vectors.
• They are vectors that can be indexed by two or more indices and will be printed in
special ways.
• factors: provide compact ways to handle categorical data.
• lists: a general form of vector in which the various elements need not be of
the same type, and are often themselves vectors or lists.
• Lists provide a convenient way to return the results of a statistical computation.
• data frames: matrix-like structures, in which the columns can be of
different types.
• Think of data frames as ‘data matrices’ with one row per observational unit but with
(possibly) both numerical and categorical variables.
• Many experiments are best described by data frames: the treatments are categorical
but the response is numeric.
• functions: objects in R which can be stored in the project’s workspace.
• This provides a simple and convenient way to extend R.
Matrices
• var(y)
[1] 2.079305
• hist(x, col="lightblue")
• plot(x,y)
Charts and Graphs
• Pie chart
• Bar chart
• Box plots
• Histograms
• Line graphs
• Scatter plots
Pie charts
• R Programming language has numerous libraries to create charts and
graphs.
• A pie-chart is a representation of values as slices of a circle with
different colors.
• The slices are labeled and the numbers corresponding to each slice is
also represented in the chart.
• In R the pie chart is created using the pie() function which takes
positive numbers as a vector input.
• The additional parameters are used to control labels, color, title etc.
Pie charts – Syntax
• The basic syntax for creating a pie-chart using the R is −
• pie(x, labels, radius, main, col, clockwise)
• Following is the description of the parameters used −
• – x is a vector containing the numeric values used in the pie
• chart.
• – labels is used to give description to the slices.
• – radius indicates the radius of the circle of the pie chart.
• – main indicates the title of the chart.
• – col indicates the color palette.
• – clockwise is a logical value indicating if the slices are drawn clockwise or anti
clockwise.
Pie charts – Example
• # Create data for the graph.
• x <- c(21, 62, 10, 53)
• labels <- c("London", "New York", "Singapore", "Mumbai")
• # Plot the chart for cars with weight between 2.5 to 5 and mileage
between 15 and 30.
• plot(x = input$wt,y = input$mpg, xlab = "Weight", ylab = "Milage",
xlim = c(2.5,5), ylim = c(15,30), main = "Weight vs Milage" )
Scatter Plot matrices
• When we have more than two variables and we want to find the
correlation between one variable versus the remaining ones we use
scatterplot matrix.
• We use pairs() function to create matrices of scatterplots.
• – Syntax:
• pairs(formula, data)
• formula represents the series of variables used in pairs.
• data represents the data set from which the variables will be taken.
Scatter Plot matrices – Example
• # Plot the matrices between 4 variables giving 12 plots.
• # One variable with 3 others and total 4 variables.
• pairs(~wt+mpg+disp+cyl, data = mtcars, main ="Scatterplot Matrix")
• More detailed: scattered plot in R
• https://youtu.be/LPeq9A1FCa0
Packages
• We/You’re not the only person writing your own functions with R.
• Many professors, programmers, and statisticians use R to design tools
that can help people analyze data. They then make these tools free
for anyone to use.
• To use these tools, you just have to download them. They come as
preassembled collections of functions and objects called packages.
Recorded video lectures available on you-
tube
• Scattered plot in R
• https://youtu.be/LPeq9A1FCa0
• Functions in R (Differential Equations Part-1)
• https://youtu.be/FNyaoR1PUTo
• Differential Equations in R (Part-2)
• https://youtu.be/dbpvExzFooI
References:
• Zuur, Alain, Elena N. Ieno, and Erik Meesters. A Beginner's Guide to R.
Springer Science & Business Media, 2009.
• Tutorialspoint website:
• https://www.tutorialspoint.com/r/index.htm
• Dalgaard, Peter. "Introductory Statistics with R. sl: Springer Science+
Business Media." (2008).
• Grolemund, Garrett, and Hadley Wickham. "R for data science."
(2018).