0% found this document useful (0 votes)
14 views67 pages

R-Basic Concepts

R is a programming language and software environment for statistical analysis and graphics. It allows calculation, the generation of graphical displays, and programming capabilities. R objects can represent data, functions, and code and there are different types of objects like vectors, matrices, data frames, and lists. Basic functions and operators allow manipulation of objects in R.

Uploaded by

Himanshu Rawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views67 pages

R-Basic Concepts

R is a programming language and software environment for statistical analysis and graphics. It allows calculation, the generation of graphical displays, and programming capabilities. R objects can represent data, functions, and code and there are different types of objects like vectors, matrices, data frames, and lists. Basic functions and operators allow manipulation of objects in R.

Uploaded by

Himanshu Rawat
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

R Software-Basic Concepts

Discussion
• What is R?
• Installation of R
• R as calculator
• R objects
• Data type
• Operators
• Basic inbuilt functions
What is R ?
• R is a programming language and software environment for statistical
analysis, graphics representation and reporting.
https://www.r-project.org/about.html

• R is available as Free Software under the terms of the Free Software


Foundation’s GNU General Public License in source code form.
Installation of R
• Download and install R from
https://cran.r-project.org/bin/windows/base/

• Download and install R-studio [integrated development environment


(IDE) for R] from
https://rstudio.com/products/rstudio/download/
R as a Calculator
• 1+1

• 2+3*4

• 3^2

• exp(1)

• sqrt(10)

• pi

• 2*pi/10
R-Objects
• If you want to use previous numbers again, you’ll have to ask your
computer to save them somewhere
• You can do that by creating an R object
• What is an object?
• Just a name that you can use to call up stored data
• For example,
• a <- 1
• a
• a+2
Naming rule for objects
• You can name an object in R almost anything you want, but there are
a few rules.
• First, a name cannot start with a number.
• Second, a name cannot use some special symbols, like ^, !, $, @, +, -,

• R is case sensitive
• my.name is different from My.name.
Display or remove objects- ls(), rm()
• We can list which object names we have already used
• ls()
• We can remove object from memory
• rm()
Operators
• An operator is a symbol that tells the compiler to perform specific
mathematical or logical manipulations.
• We have the following types of operators in R programming −
• Arithmetic Operators (+,-,*, /, ^)
• Relational Operators (<, <=, >, >=, ==, !=)
• Logical Operators (&, |, !)
• Assignment Operators (<-, =, <<-)
• Miscellaneous Operators (:, %in% )

Link to different types of operators:


https://www.tutorialspoint.com/r/r_operators.htm
R as a Smart Calculator (assignment to a
variable/object)
> x <- 1
> y <- 3
> z <- 4
>x*y*z
[1] 12

>X*Y*Z
Error: Object "X" not found

> This.Year <- 2004


> This.Year
[1] 2004
Incomplete command (+) sign
• If you type an incomplete command and press Enter, R will display a +
prompt, which indicates R is waiting for you to type the rest of your
command.
• Example:
• 5-
• +1
• Either finish the command or hit Escape to start over:
Hashtag character/Comment, #
• Humans will be able to read the comments, but your computer will
pass over them. The hashtag is known as the commenting symbol in
R.
• Example:
• x <- 1 # Can define variables
• y <- 3 # using "<-" operator to set values
• z <- 4
• x * y * z # This is desired multiplication
Types of objects in R
• Vectors
• Lists
• Matrices
• Arrays
• Factors
• Data Frames

• These may be combination of six data types: double/numeric, integer,


character, logical, complex, raw (known as atomic vector)
Atomic vector
• An atomic vector is just a simple vector of data.
• R recognizes six basic types of atomic vectors:
• double/numeric
• integer
• character
• logical
• complex
• raw
Example
• # Atomic vector of type character.
• print("abc")

• # Atomic vector of type double.


• print(12.5)

• # Atomic vector of type integer.


• print(63L)

• # Atomic vector of type logical.


• print(TRUE)

• # Atomic vector of type complex.


• print(2+3i)

• # Atomic vector of type raw.


• print(charToRaw('hello'))
Numerical Vectors
• R operates on named data structures.
• The simplest such structure is the numeric vector, which is a single
entity consisting of an ordered collection of numbers.
• Syntax:
• x <- c(1,2,3,4,5)
• assign("x", c(1,2,3,4,5))
• c(1,2,3,4,5) -> x
• x = c(1,2,3,4,5)

• Further assignment
• y <- c(x, 0, x)
Vector arithmetic
• Vectors can be used in arithmetic expressions (element by element
manipulation)
• Vectors occurring in the same expression need not all be of the same
length.
• Shorter vectors in the expression are recycled as often as needed
• Example:
• v <- 2*x + y + 1
• The elementary arithmetic operators are the usual +, -, *, / and ^ for raising to a
power.
• Common arithmetic functions are available.
• log, exp, sin, cos, tan, sqrt, and so on, all have their usual meaning.
• max and min select the largest and smallest elements of a vector respectively.
• range is a function whose value is a vector of length two, namely c(min(x), max(x)).
• length(x) is the number of elements in x
• sum(x) gives the total of the elements in x
• prod(x) their product.
Complex numbers- more example
• To work with complex numbers, supply an explicit complex part.
• Thus sqrt(-17) will give NaN and a warning, but sqrt(-17+0i) will do the
computations as complex numbers.
Generating regular sequences
• R has a number of facilities for generating commonly used sequences
of numbers.
• The colon operator (:) returns every integer between two integers. It
is an easy way to create a sequence of numbers
• For example
• 1:30 is the vector c(1, 2, ..., 29, 30).
• The colon operator has high priority within an expression, so, for example
2*1:15 is the vector c(2, 4, ..., 28, 30).
• Put n <- 10 and compare the sequences 1:n-1 and 1:(n-1).
• The construction 30:1 may be used to generate a sequence backwards.
seq(), rep()- Generating sequence
• The function seq() is a more general facility for generating sequences.
• seq(2,10) is the same vector as 2:10
• More general syntax:
• s3 <- seq(-5, 5, by=.2)
• s4 <- seq(length=51, from=-5, by=.2)
• rep()
• > x=c(2,4,6)
• > rep(x, times=5)
• [1] 2 4 6 2 4 6 2 4 6 2 4 6 2 4 6
• > rep(x, each=5)
• [1] 2 2 2 2 2 4 4 4 4 4 6 6 6 6 6
Missing values
• In some cases the components of a vector may not be completely known.
• “not available” or a “missing value” in the statistical sense
• It can be resolved by assigning the special value NA.
• The function is.na(x) gives a logical vector of the same size as x with value
TRUE if and only if the corresponding element in x is NA.
• > z <- c(1:3,NA)
• >z
• [1] 1 2 3 NA
• > is.na(z)
• [1] FALSE FALSE FALSE TRUE
• A second kind of “missing” values which are produced by numerical
computation, the so-called Not a Number, NaN, values.
• > 0/0
• > Inf - Inf
Character vectors
• Character quantities and character vectors are used frequently in R
• Character strings are entered using either matching double (") or single (’)
quotes, but are printed using double quotes (or sometimes without
quotes).
• char_vect=c(“hello friends”)
• Print(char_vect)
• Character vectors may be concatenated into a vector by the c() function
• The paste() function takes an arbitrary number of arguments and
concatenates them one by one into character strings.
• For example
• labs <- paste(c("X","Y"), 1:10, sep="")
• output
• c("X1", "Y2", "X3", "Y4", "X5", "Y6", "X7", "Y8", "X9", "Y10")
• Note: recycling of short lists takes place; thus c("X", "Y") is repeated 5 times to match the
sequence 1:10
Index vectors; selecting and modifying subsets
of a data set
• Subsets of the elements of a vector may be selected by appending to the
name of the vector an index vector in square brackets.
• A vector of positive integral quantities.
• In this case the values in the index vector must lie in the set {1, 2, . . . , length(x)}
• Example-
• x=c(1:10)
• x[6] # is the sixth component of x
• x[1:10] # extract the 1st to 10th element of vector x
• A vector of negative integral quantities.
• Such an index vector specifies the values to be excluded rather than included.
• y <- x[-(1:5)]
• It produce all elements of vector x expect the first five elements of x.
Index vectors; selecting and modifying subsets
of a data set-II
• A logical vector
• x=c(1,2,3,NA)
• y <- x[!is.na(x)]
• creates (or re-creates) an object y which will contain the non-missing values
of x, in the same order. Note that if x has missing values, y will be shorter than
x.
• Example:
• (x+1)[(!is.na(x)) & x>0] -> z
• creates an object z and places in it the values of the vector x+1 for which the
corresponding value in x was both non-missing and positive.
Inbuilt data-bases available in R
• R databases are available on https://stat.ethz.ch/R-manual/R-
devel/library/datasets/html/00Index.html
• To see in R-console type:
• data()
Other types of objects
• matrices: multi-dimensional arrays and generalizations of vectors.
• They are vectors that can be indexed by two or more indices and will be printed in
special ways.
• factors: provide compact ways to handle categorical data.
• lists: a general form of vector in which the various elements need not be of
the same type, and are often themselves vectors or lists.
• Lists provide a convenient way to return the results of a statistical computation.
• data frames: matrix-like structures, in which the columns can be of
different types.
• Think of data frames as ‘data matrices’ with one row per observational unit but with
(possibly) both numerical and categorical variables.
• Many experiments are best described by data frames: the treatments are categorical
but the response is numeric.
• functions: objects in R which can be stored in the project’s workspace.
• This provides a simple and convenient way to extend R.
Matrices

• A matrix is a two-dimensional rectangular data set. It can be created


using a vector input to the matrix function.
• # Create a matrix.
• M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)
• print(M)
Factors
• Factors are the R-objects which are created using a vector.
• It stores the vector along with the distinct values of the elements in the vector as labels.
• The labels are always character irrespective of whether it is numeric or character or Boolean etc.
in the input vector. They are useful in statistical modeling.
• Factors are created using the factor() function. The nlevels functions gives the count of
levels.
• # Create a vector.
• apple_colors <- c('green','green','yellow','red','red','red','green')

• # Create a factor object.


• factor_apple <- factor(apple_colors)

• # Print the factor.


• print(factor_apple)
• print(nlevels(factor_apple))
Lists

• A list is an R-object which can contain many different types of


elements inside it like vectors, functions and even another list inside
it.
• # Create a list.
• mylist1 <- list(c(2,5,3),21.3,sin)

• # Print the list.


• print(mylist1)
Data Frames
• Data frames are tabular data objects.
• Unlike a matrix in data frame each column can contain different modes of
data.
• The first column can be numeric while the second column can be character and third
column can be logical.
• It is a list of vectors of equal length.
• Data Frames are created using the data.frame() function.
# Create the data frame.
BMI <- data.frame(
gender = c("Male", "Male","Female"),
height = c(152, 171.5, 165),
weight = c(81,93, 78),
Age = c(42,38,26)
)
print(BMI)
Summary of Data in Data Frame
• The statistical summary and nature of the data can be obtained by
applying summary() function.
• # Create the data frame.
• emp.data <- data.frame(
• emp_id = c (1:5),
• emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
• salary = c(623.3,515.2,611.0,729.0,843.25),

• start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
• "2015-03-27")),
• stringsAsFactors = FALSE
• )
• # Print the summary.
• print(summary(emp.data))
Extract Data from Data Frame
• Extract specific column from a data frame using column name.
• # Create the data frame.
• emp.data <- data.frame(
• emp_id = c (1:5),
• emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
• salary = c(623.3,515.2,611.0,729.0,843.25),

• start_date = as.Date(c("2012-01-01","2013-09-23","2014-11-15","2014-05-11",
• "2015-03-27")),
• stringsAsFactors = FALSE
• )
• # Extract Specific columns.
• result <- data.frame(emp.data$emp_name, emp.data$salary)
• print(result)
Continue…
• # Extract first two rows.
• result <- emp.data[1:2,]
• print(result)
• # Extract 3rd and 5th row with 2nd and 4th column.
• result <- emp.data[c(3,5),c(2,4)]
• print(result)
Basic Utility Functions
length() returns the number of elements
mean() returns the sample mean
median() returns the sample mean
range() returns the largest and smallest values
unique() removes duplicate elements
summary() calculates descriptive statistics
diff() takes difference between consecutive elements
rev() reverses elements
Part-II
• Graphical procedures (plotting of mathematical functions)
• Plot()
• Random Generation of data
• Charts and Graphs
• Pie chart
• Bar chart
• Box plots
• Histograms
• Line graphs
• Scatter plots
• Packages
Graphical procedures
• Graphical facilities are an important and extremely versatile
component of the R environment.
• Plotting commands are divided into three basic groups:
• High-level plotting functions create a new plot on the graphics device,
possibly with axes, labels, titles and so on.
• Low-level plotting functions add more information to an existing plot, such as
extra points, lines and labels.
• Interactive graphics functions allow you interactively add information to, or
extract information from, an existing plot, using a pointing device such as a
mouse.
plot() function
• plot(x)
• If x is a time series, this produces a time-series plot.
• If x is a numeric vector, it produces a plot of the values in the vector against
their index in the vector.
• If x is a complex vector, it produces a plot of imaginary versus real parts of the
vector elements.
• plot(x, y)
• If x and y are vectors, plot(x, y) produces a scatterplot of y against x.
• f is a factor object, y is a numeric vector.
• plot(f) # generates a bar plot of f
• plot(f, y) # produces boxplots of y for each level of f
Plot() function for data frame
• df is a data frame, y is any object, expr is a list of object names
separated by ‘+’ (e.g., a + b + c). The first two forms produce
distributional plots of the variables in a data frame (first form) or of a
number of named objects (second form). The third form plots y
against every object named in expr.
• plot(df)
• plot(~ expr)
• plot(y ~ expr)
• Example:
Arguments to high-level plotting functions
• add=TRUE
• Forces the function to act as a low-level graphics function, superimposing the plot on
the current plot (some functions only).
• axes=FALSE
• Suppresses generation of axes—useful for adding your own custom axes with the axis() function.
The default, axes=TRUE, means include axes.
• log="x"
log="y"
log="xy"
• Causes the x, y or both axes to be logarithmic. This will work for many, but not all,
types of plot.
Continue…
• type="p" #Plot individual points (the default)
• type="l" #Plot lines
• type="b" #Plot points connected by lines (both)
• type="o" #Plot points overlaid by lines
• type="h" #Plot vertical lines from points to the zero axis (high-density)
• type="s"
• type="S"
• Step-function plots. In the first form, the top of the vertical defines the point; in the second,
the bottom.
• type="n"
• No plotting at all. However axes are still drawn (by default) and the coordinate system is set
up according to the data. Ideal for creating plots with subsequent low-level graphics
functions.
Continue…
• xlab=string
• ylab=string
• Axis labels for the x and y axes. Use these arguments to change the default
labels, usually the names of the objects used in the call to the high-level
plotting function.
• main=string
• Figure title, placed at the top of the plot in a large font.
• sub=string
• Sub-title, placed just below the x-axis in a smaller font.
Plot()- Example
• # Create the data for the chart.
• v <- c(7,12,28,3,41)
• # Plot the line graph.
• plot(v, type = "o")
Example
# Create the data frame.
BMI <- data.frame(
gender = c(‘Male’, ‘Male’, ‘Female’, ‘Male’, ‘Male’, ‘Male’, ‘Female’, ‘Male’),
height = c(152, 171.5, 165, 167, 170, 171, 168, 166),
weight = c(81, 93, 78, 60, 66, 65, 80, 85),
Age = c(42,38,26, 29, 30, 31, 32, 33)
)
print(BMI)
Example
• x=seq(-6*pi,6*pi,by=0.1)
• y=sin(x)
• plot(x,y)
• # more
• plot(x,y, xlab='x', ylab='sin(x)')
• # change the function
• y=sin(x+x^2)
Random Generation in R
• R generates pretty good random numbers
• Examples:
runif(n, min = 1, max = 1) #Samples from Uniform distribution
rbinom(n, size, prob) #Samples from Binomial distribution
rnorm(n, mean = 0, sd = 1) # Samples from Normal distribution
rexp(n, rate = 1) #Samples from Exponential distribution
rt(n, df) #Samples from T-distribution
And others!
Example: Generation and visualization of
samples
• x <- rnorm(1000)
• y <- rnorm(1000) + x
• summary(y)

• var(y)
[1] 2.079305
• hist(x, col="lightblue")
• plot(x,y)
Charts and Graphs
• Pie chart
• Bar chart
• Box plots
• Histograms
• Line graphs
• Scatter plots
Pie charts
• R Programming language has numerous libraries to create charts and
graphs.
• A pie-chart is a representation of values as slices of a circle with
different colors.
• The slices are labeled and the numbers corresponding to each slice is
also represented in the chart.
• In R the pie chart is created using the pie() function which takes
positive numbers as a vector input.
• The additional parameters are used to control labels, color, title etc.
Pie charts – Syntax
• The basic syntax for creating a pie-chart using the R is −
• pie(x, labels, radius, main, col, clockwise)
• Following is the description of the parameters used −
• – x is a vector containing the numeric values used in the pie
• chart.
• – labels is used to give description to the slices.
• – radius indicates the radius of the circle of the pie chart.
• – main indicates the title of the chart.
• – col indicates the color palette.
• – clockwise is a logical value indicating if the slices are drawn clockwise or anti
clockwise.
Pie charts – Example
• # Create data for the graph.
• x <- c(21, 62, 10, 53)
• labels <- c("London", "New York", "Singapore", "Mumbai")

• # Plot the chart.


• pie(x,labels)
Pie chart with colors and labels
• # Create data for the graph.
• x <- c(21, 62, 10,53)
• labels <- c("London","New York","Singapore","Mumbai")
• piepercent<- round(100*x/sum(x), 1)
• png(file = "city_percentage_legends.png")
• # Plot the chart.
• pie(x, labels = piepercent, main = "City pie chart",col =
• rainbow(length(x)))
• legend("topright", c("Pune","Nashik","Aurangabad","Mumbai"),
• cex = 0.8, fill = rainbow(length(x)))
• # Save the file.
• dev.off()
Bar charts
• A bar chart represents data in rectangular bars with length of the bar
proportional to the value of the variable.
• R uses the function barplot() to create bar charts.
• R can draw both vertical and horizontal bars in the bar chart.
• In bar chart each of the bars can be given different colors.
Bar charts – Syntax
• The basic syntax to create a bar-chart in R is −
• barplot(H, xlab, ylab, main, names.arg, col)
• Following is the description of the parameters used −
• – H is a vector or matrix containing numeric values used in bar chart.
• – xlab is the label for x axis.
• – ylab is the label for y axis.
• – main is the title of the bar chart.
• – names.arg is a vector of names appearing under each bar.
• – col is used to give colors to the bars in the graph.
Bar char- Example
• # Create the data for the chart.
• H <- c(7,12,28,3,41)
• M <- c("Mar","Apr","May","Jun","Jul")
• # Plot the bar chart.
• barplot(H, names.arg = M, xlab = "Month", ylab = "Revenue", col =
"blue", main = "Revenue chart", border= "red")
Boxplot
• Boxplots are a measure of how well distributed is the data in a data
set.
• It divides the data set into three quartiles. This graph represents the
minimum, maximum, median, first quartile and third quartile in the
data set.
• It is also useful in comparing the distribution of data across data sets
by drawing boxplots for each of them.
• Boxplots are created in R by using the boxplot() function.
Boxplot – Syntax
• The basic syntax to create a boxplot in R is −
• boxplot(x, data, notch, varwidth, names, main)
• Following is the description of the parameters used −
• – x is a vector or a formula
• – data is the data frame.
• – notch is a logical value. Set as TRUE to draw a notch.
• – varwidth is a logical value. Set as true to draw width of the box
proportionate to the sample size.
• – names are the group labels which will be printed under each boxplot.
• – main is used to give a title to the graph.
boxplot() - example
• We use the data set "mtcars" available in the R environment to create
a basic boxplot. Let's look at the columns "mpg" and "cyl" in mtcars.
• input <- mtcars[,c('mpg','cyl')]
• print(head(input))
• # Plot the chart.
• boxplot(mpg ~ cyl, data = mtcars, xlab = "Number of Cylinders", ylab = "Miles
Per Gallon", main = "Mileage Data")
Histogram
• A histogram represents the frequencies of values of a variable
bucketed into ranges.
• Histogram is similar to bar chart but the difference is it groups the
values into continuous ranges.
• Each bar in histogram represents the height of the number of values
present in that range.
• R creates histogram using hist() function. This function takes a vector
as an input and uses some more parameters to plot histograms.
Histogram – Syntax
• The basic syntax for creating a histogram using R is −
• hist(v, main, xlab, xlim, ylim, breaks, col, border)
• Following is the description of the parameters used −
• v is a vector containing numeric values used in histogram.
• main indicates title of the chart.
• xlab is used to give description of x-axis.
• xlim is used to specify the range of values on the x-axis.
• ylim is used to specify the range of values on the y-axis.
• breaks is used to mention the width of each bar.
• col is used to set color of the bars.
• border is used to set border color of each bar.
Histogram()- Example
• # Create data for the graph.
• v <- c(9,13,21,8,36,22,12,41,31,33,19)
• # Create the histogram.
• hist(v,xlab = "Weight",col = "green",border ="red", xlim = c(0,40), ylim =
c(0,5), breaks = 5)
Scatter Plot
• Scatterplots show many points plotted in the Cartesian plane.
• Each point represents the values of two variables.
• One variable is chosen in the horizontal axis and another in the
vertical axis.
• The simple scatterplot is created using the plot() function.
Scatter Plot – Example
• # Get the input values.
• input <- mtcars[,c('wt','mpg')]

• # Plot the chart for cars with weight between 2.5 to 5 and mileage
between 15 and 30.
• plot(x = input$wt,y = input$mpg, xlab = "Weight", ylab = "Milage",
xlim = c(2.5,5), ylim = c(15,30), main = "Weight vs Milage" )
Scatter Plot matrices
• When we have more than two variables and we want to find the
correlation between one variable versus the remaining ones we use
scatterplot matrix.
• We use pairs() function to create matrices of scatterplots.
• – Syntax:
• pairs(formula, data)
• formula represents the series of variables used in pairs.
• data represents the data set from which the variables will be taken.
Scatter Plot matrices – Example
• # Plot the matrices between 4 variables giving 12 plots.
• # One variable with 3 others and total 4 variables.
• pairs(~wt+mpg+disp+cyl, data = mtcars, main ="Scatterplot Matrix")
• More detailed: scattered plot in R
• https://youtu.be/LPeq9A1FCa0
Packages
• We/You’re not the only person writing your own functions with R.
• Many professors, programmers, and statisticians use R to design tools
that can help people analyze data. They then make these tools free
for anyone to use.
• To use these tools, you just have to download them. They come as
preassembled collections of functions and objects called packages.
Recorded video lectures available on you-
tube
• Scattered plot in R
• https://youtu.be/LPeq9A1FCa0
• Functions in R (Differential Equations Part-1)
• https://youtu.be/FNyaoR1PUTo
• Differential Equations in R (Part-2)
• https://youtu.be/dbpvExzFooI
References:
• Zuur, Alain, Elena N. Ieno, and Erik Meesters. A Beginner's Guide to R.
Springer Science & Business Media, 2009.
• Tutorialspoint website:
• https://www.tutorialspoint.com/r/index.htm
• Dalgaard, Peter. "Introductory Statistics with R. sl: Springer Science+
Business Media." (2008).
• Grolemund, Garrett, and Hadley Wickham. "R for data science."
(2018).

You might also like