UNIT 2
UNIT 2
A function in R is an object containing multiple interrelated statements that are run together in a
predefined order every time the function is called. Functions in R can be built-in or created by the
user (user-defined). The main purpose of creating a user-defined function is to optimize our program,
avoid the repetition of the same block of code used for a specific task that is frequently performed in
a particular project, prevent us from inevitable and hard-to-debug errors related to copy-paste
operations, and make the code more readable. A good practice is creating a function whenever we're
supposed to run a certain set of commands more than twice.
Built-in Functions in R
There are plenty of helpful built-in functions in R used for various purposes. Some of the most
popular ones are:
min(), max(), mean(), median() – return the minimum / maximum / mean / median value of
a numeric vector, correspondingly
exists() – returns TRUE or FALSE depending on whether or not a variable is defined in the R
environment
print(min(vector))
print(mean(vector))
print(median(vector))
print(sum(vector))
print(range(vector))
print(str(vector))
print(length(vector))
print(sort(vector, decreasing=TRUE))
[1] 1
[1] 3
[1] 3
[1] 18
[1] 1 5
num [1:6] 3 5 2 3 1 4
NULL
[1] 6
[1] 5 4 3 3 2 1
[1] TRUE
Creating a Function in R
While applying built-in functions facilitates many common tasks, often we need to create our own
function to automate the performance of a particular task. To declare a user-defined function in R,
we use the keyword function. The syntax is as follows:
function body
Above, the main components of an R function are: function name, function parameters,
and function body. Let's take a look at each of them separately.
Function Name
This is the name of the function object that will be stored in the R environment after the function
definition and used for calling that function. It should be concise but clear and meaningful so that
the user who reads our code can easily understand what exactly this function does. For example, if
we need to create a function for calculating the circumference of a circle with a known radius, we'd
better call this function circumference rather than function_1 or circumference_of_a_circle. (Side
note: While commonly we use verbs in function names, it's ok to use just a noun if that noun is very
descriptive and unambiguous.)
Function Parameters
Sometimes, they are called formal arguments. Function parameters are the variables in the function
definition placed inside the parentheses and separated with a comma that will be set to actual values
(called arguments) each time we call the function. For example:
2*pi*r
}
print(circumference(2))
[1] 12.56637
Above, we created a function to calculate the circumference of a circle with a known radius using the
formula C=2πr, so the function has the only parameter r. After defining the function, we called it with
the radius equal to 2 (hence, with the argument 2).
It's possible, even though rarely useful, for a function to have no parameters:
'Hello, World!'
print(hello_world())
Also, some parameters can be set to default values (those related to a typical case) inside the
function definition, which then can be reset when calling the function. Returning to
our circumference function, we can set the default radius of a circle as 1, so if we call the function
with no argument passed, it will calculate the circumference of a unit circle (i.e., a circle with a radius
of 1). Otherwise, it will calculate the circumference of a circle with the provided radius:
2*pi*r
print(circumference())
print(circumference(2))
[1] 6.283185
[1] 12.56637
Function Body
The function body is a set of commands inside the curly braces that are run in a predefined order
every time we call the function. In other words, in the function body, we place what exactly we need
the function to do:
x+y
print(sum_two_nums(1, 2))
[1] 3
Note that the statements in the function body (in the above example – the only statement x + y)
should be indented by 2 or 4 spaces, depending on the IDE where we run the code, but the
important thing is to be consistent with the indentation throughout the program. While it doesn't
affect the code performance and isn't obligatory, it makes the code easier to read.
It's possible to drop the curly braces if the function body contains a single statement. For example:
print(sum_two_nums(1, 2))
[1] 3
As we saw from all the above examples, in R, it usually isn't necessary to explicitly include the return
statement when defining a function since an R function just automatically returns the last evaluated
expression in the function body. However, we still can add the return statement inside the function
body using the syntax return(expression_to_be_returned). This becomes inevitable if we need to
return more than one result from a function. For example:
return(c(mean, median))
print(mean_median(c(1, 1, 1, 2, 3)))
Note that in the return statement above, we actually return a vector containing the necessary
results, and not just the variables separated by a comma (since the return() function can return only
a single R object). Instead of a vector, we could also return a list, especially if the results to be
returned are supposed to be of different data types.
Calling a Function in R
In all the above examples, we actually already called the created functions many times. To do so, we
just put the punction name and added the necessary arguments inside the parenthesis. In R, function
arguments can be passed by position, by name (so-called named arguments), by mixing position-
based and name-based matching, or by omitting the arguments at all.
If we pass the arguments by position, we need to follow the same sequence of arguments as defined
in the function:
x-y
print(subtract_two_nums(3, 1))
[1] 2
In the above example, x is equal to 3 and y – to 1, and not vice versa.
If we pass the arguments by name, i.e., explicitly specify what value each parameter defined in the
function takes, the order of the arguments doesn't matter:
x-y
print(subtract_two_nums(x=3, y=1))
print(subtract_two_nums(y=1, x=3))
[1] 2
[1] 2
Since we explicitly assigned x=3 and y=1, we can pass them either as x=3, y=1 or y=1, x=3 – the result
will be the same.
It's possible to mix position- and name-based matching of the arguments. Let's look at the example
of the function for calculating BMR (basal metabolic rate), or daily consumption of calories, for
women based on their weight (in kg), height (in cm), and age (in years). The formula that will be used
in the function is the Mifflin-St Jeor equation:
Now, let's calculate the calories for a woman 30 years old, with a weight of 60 kg and a height of 165
cm. However, for the age parameter, we'll pass the argument by name and for the other two
parameters, we'll pass the arguments by position:
[1] 1320.25
In the case like above (when we mix matching by name and by position), the named arguments are
extracted from the whole succession of arguments and are matched first, while the rest of the
arguments are matched by position, i.e., in the same order as they appear in the function definition.
However, this practice isn't recommended and can lead to confusion.
Finally, we can omit some (or all) of the arguments at all. This can happen if we set some (or all) of
the parameters to default values inside the function definition. Let's return to
our calculate_calories_women function and set the default age of a woman as 30 y.o.:
print(calculate_calories_women(60, 165))
[1] 1320.25
Control statements are expressions used to control the execution and flow of the program based on
the conditions provided in the statements. These structures are used to make a decision after
assessing the variable. In this article, we’ll discuss all the control statements with the examples.
if condition
if-else condition
for loop
nested loops
while loop
return statement
next statement
if condition
This control structure checks the expression provided in parenthesis is true or not. If true, the
execution of the statements in braces {} continues.
Syntax:
if(expression){
statements
....
....
Example:
x <- 100
Output:
if-else condition
It is similar to if condition but when the test expression in if condition fails, then statements
in else condition are executed.
Syntax:
if(expression){
statements
....
....
else{
statements
....
....
Example:
x <- 5
}else{
Output:
It is a type of loop or sequence of statements executed repeatedly until exit condition is reached.
Syntax:
for(value in vector){
statements
....
....
Example:
x <- letters[4:10]
for(i in x){
print(i)
Output:
[1] "d"
[1] "e"
[1] "f"
[1] "g"
[1] "h"
[1] "i"
[1] "j"
Nested loops
Nested loops are similar to simple loops. Nested means loops inside loop. Moreover, nested loops
are used to manipulate the matrix.
Example:
# Defining matrix
m <- matrix(2:15, 2)
for (r in seq(nrow(m))) {
for (c in seq(ncol(m))) {
print(m[r, c])
Output:
[1] 2
[1] 4
[1] 6
[1] 8
[1] 10
[1] 12
[1] 14
[1] 3
[1] 5
[1] 7
[1] 9
[1] 11
[1] 13
[1] 15
while loop
while loop is another kind of loop iterated until a condition is satisfied. The testing expression is
checked first before executing the body of loop.
Syntax:
while(expression){
statement
....
....
Example:
x=1
# Print 1 to 5
print(x)
x=x+1
Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
repeat is a loop which can be iterated many number of times but there is no exit condition to come
out from the loop. So, break statement is used to exit from the loop. break statement can be used in
any type of loop to exit from the loop.
Syntax:
repeat {
statements
....
....
if(expression) {
break
Example:
x=1
# Print 1 to 5
repeat{
print(x)
x=x+1
break
Output:
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
return statement
return statement is used to return the result of an executed function and returns control to the
calling function.
Syntax:
return(expression)
Example:
return("Positive")
return("Negative")
}else{
return("Zero")
}
func(1)
func(0)
func(-1)
Output:
[1] "Positive"
[1] "Zero"
[1] "Negative"
next statement
next statement is used to skip the current iteration without executing the further statements and
continues the next iteration cycle without terminating the loop.
Example:
# Defining vector
x <- 1:10
for(i in x){
if(i%%2 != 0){
print(i)
Output:
[1] 2
[1] 4
[1] 6
[1] 8
[1] 10
In order to manipulate the data, R provides a library called dplyr which consists of many built-in
methods to manipulate the data. So to use the data manipulation function, first need to import the
dplyr package using library(dplyr) line of code. Below is the list of a few data manipulation functions
present in dplyr package.
filter() method
The filter() function is used to produce the subset of the data that satisfies the condition specified in
the filter() method. In the condition, we can use conditional operators, logical operators, NA values,
range operators etc. to filter out data. Syntax of filter() function is given below-
filter(dataframeName, condition)
Example:
In the below code we used filter() function to fetch the data of players who scored more than 100
runs from the “stats” data frame.
R
library(dplyr)
filter(stats, runs>100)
Output
1 B 200 20
2 C 408 NA
distinct() method
The distinct() method removes duplicate rows from data frame or based on the specified columns.
The syntax of distinct() method is given below-
Example:
Here in this example, we used distinct() method to remove the duplicate rows from the data frame
and also remove duplicates based on a specified column.
R
library(dplyr)
# create a data frame
distinct(stats)
Output
1 A 100 17
2 B 200 20
3 C 408 NA
4 D 19 5
5 A 56 2
1 A 100 17
2 B 200 20
3 C 408 NA
4 D 19 5
arrange() method
In R, the arrange() method is used to order the rows based on a specified column. The syntax of
arrange() method is specified below-
arrange(dataframeName, columnName)
Example:
In the below code we ordered the data based on the runs from low to high using arrange() function.
R
# import dplyr package
library(dplyr)
arrange(stats, runs)
Output
1 D 19 5
2 A 100 17
3 B 200 20
4 C 408 NA
select() method
The select() method is used to extract the required columns as a table by specifying the required
column names in select() method. The syntax of select() method is mentioned below-
select(dataframeName, col1,col2,…)
Example:
Here in the below code we fetched the player, wickets column data only using select() method.
R
library(dplyr)
select(stats, player,wickets)
Output
player wickets
1 A 17
2 B 20
3 C NA
4 D 5
rename() method
The rename() function is used to change the column names. This can be done by the below syntax-
rename(dataframeName, newName=oldName)
Example:
In this example, we change the column name “runs” to “runs_scored” in stats data frame.
R
library(dplyr)
rename(stats, runs_scored=runs)
Output
1 A 100 17
2 B 200 20
3 C 408 NA
4 D 19 5
These methods are used to create new variables. The mutate() function creates new variables
without dropping the old ones but transmute() function drops the old variables and creates new
variables. The syntax of both methods is mentioned below-
mutate(dataframeName, newVariable=formula)
transmute(dataframeName, newVariable=formula)
Example:
In this example, we created a new column avg using mutate() and transmute() methods.
R
library(dplyr)
mutate(stats, avg=runs/4)
transmute(stats, avg=runs/4)
Output
1 A 100 17 25.00
2 B 200 20 50.00
3 C 408 7 102.00
4 D 19 5 4.75
avg
1 25.00
2 50.00
3 102.00
4 4.75
Here mutate() functions adds a new column for the existing data frame without dropping the old
ones where as transmute() function created a new variable but dropped all the old columns.
summarize() method
Using the summarize method we can summarize the data in the data frame by using aggregate
functions like sum(), mean(), etc. The syntax of summarize() method is specified below-
summarize(dataframeName, aggregate_function(columnName))
Example:
In the below code we presented the summarized data present in the runs column using summarize()
method.
R
library(dplyr)
# summarize method
Output
sum(runs) mean(runs)
1 727 181.75
Generally, in R Programming Language, data processing is done by taking data as input from a data
frame where the data is organized into rows and columns. Data frames are mostly used since
extracting data is much simpler and hence easier. But sometimes we need to reshape the format of
the data frame from the one we receive. Hence, in R, we can split, merge and reshape the data frame
using various functions.
Transpose of a Matrix
While doing an analysis or using an analytic function, the resultant data obtained because of the
experiment or study is generally different. The obtained data usually has one or more columns that
correspond or identify a row followed by a number of columns that represent the measured values.
We can say that these columns that identify a row can be the composite key of a column in a
database.
Transpose of a Matrix
We can easily calculate the transpose of a matrix in R language with the help of the t() function. The
t() function takes a matrix or data frame as an input and gives the transpose of that matrix or data
frame as its output.
Syntax:
Example:
R
print("Original Matrix")
first
first
Output:
In R, we can join two vectors or merge two data frames using functions. There are basically two
functions that perform these tasks:
cbind():
We can combine vectors, matrix or data frames by columns using cbind() function.
rbind():
We can combine vectors, matrix or data frames by rows using rbind() function.
Example:
R
print(info)
age=c("28", "87"),
address=c("bangalore", "kolkata"))
# Rbind function
print(new.info)
Output:
In R, we can merge two data frames using the merge() function provided both the data frames
should have the same column names. We may merge the two data frames based on a key value.
R
ID=c("114", "115"))
print(total)
Output:
name ID
1 arjun 113
2 shaoni 111
3 soumi 112
4 esha 115
5 sounak 114
Data reshaping involves many steps in order to obtain desired or required format. One of the popular
methods is melting the data which converts each row into a unique id-variable combination and then
casting it. The two functions used for this process:
melt():
where,
dcast():
where,
data: data to be melted
formula: formula that defines how to cast
fun.aggregate: used if there is a data aggregation
Example:
R
library(reshape2)
print("Melting")
print(m)
print("Casting")
print(idmn)
Output:
[1] "Melting"
[1] "Casting"
id x1 x2
1 1 4 5.5
2 2 4 2.5
String Manipulation in R
String manipulation basically refers to the process of handling and analyzing strings. It involves
various operations concerned with modification and parsing of strings to use and change its data. R
offers a series of in-built functions to manipulate the contents of a string. In this article, we will study
different functions concerned with the manipulation of strings in R.
Concatenation of Strings
String Concatenation is the technique of combining two strings. String Concatenation can be done
using many ways:
Example:
Python3
print (str)
Output:
"Learn Code"
In case no separator is specified the default separator ” ” is inserted between individual
strings. Example:
Python3
print (str)
Output:
Since, the objects to be concatenated are of different lengths, a repetition of the string of
smaller length is applied with the other input strings. The first string is a sequence of 1, 2, 3
which is then individually concatenated with the other string “4” using separator ‘:’.
Python3
print (str)
Output:
Since, both the strings are of the same length, the corresponding elements of both are
concatenated, that is the first element of the first string is concatenated with the first
element of second-string using the sep ‘–‘.
cat() function Different types of strings can be concatenated together using the cat())
function in R, where sep specifies the separator to give between the strings and file name, in
case we wish to write the contents onto a file. Syntax:
Example:
Python3
print (str)
Output:
learn:code:techNULL
The output string is printed without any quotes and the default separator is ‘:’.NULL value is
appended at the end. Example:
Python3
Output:
12345
The output is written to a text file sample.txt in the same working directory.
length() function The length() function determines the number of strings specified in the
function. Example:
Python3
Output:
nchar() function nchar() counts the number of characters in each of the strings specified as
arguments to the function individually. Example:
Python3
Output:
54
The output indicates the length of Learn and then Code separated by ” ” .
Conversion to upper case All the characters of the strings specified are converted to upper
case. Example:
Python3
Conversion to lower case All the characters of the strings specified are converted to lower
case. Example:
Python3
Output :
casefold() function All the characters of the strings specified are converted to lowercase or
uppercase according to the arguments in casefold(…, upper=TRUE). Examples:
Python3
Output:
Python3
Output:
Character replacement
Characters can be translated using the chartr(oldchar, newchar, …) function in R, where every
instance of old character is replaced by the new character in the specified set of strings. Example 1:
Python3
Output:
Python3
Output:
Every instance of old string is replaced by new specified string. “i” is replaced by “#” by “s” by “@”,
that is the corresponding positions of old string is replaced by new string. Example 3:
Python3
Output:
Error in chartr("ate", "#@", "I hate ate") : 'old' is longer than 'new'
Execution halted
The length of the old string should be less than the new string.
A string can be split into corresponding individual strings using ” ” the default separator. Example:
Python3
Output:
substr(…, start, end) or substring(…, start, end) function in R extracts substrings out of a string
beginning with the start index and ending with the end index. It also replaces the specified substring
with a new set of characters. Example:
Python3
Output:
"Lear"
str & lt
- c(& quot
with"
, & quot
new"
, & quot
language"
substr(str, 3, 3) & lt
- & quot
% & quot
print(str)
Output:
Python3
print(str)
Output:
Replaces the third character of each string alternatively with the specified symbols.
A data structure is a particular way of organizing data in a computer so that it can be used effectively.
The idea is to reduce the space and time complexities of different tasks. Data structures in R
programming are tools for holding multiple values.
R’s base data structures are often organized by their dimensionality (1D, 2D, or nD) and whether
they’re homogeneous (all elements must be of the identical type) or heterogeneous (the elements
are often of various types). This gives rise to the six data types which are most frequently utilized in
data analysis.
Vectors
Lists
Dataframes
Matrices
Arrays
Factors
Tibbles
Vectors
A vector is an ordered collection of basic data types of a given length. The only key thing here is all
the elements of a vector must be of the identical data type e.g homogeneous data structures.
Vectors are one-dimensional data structures.
Example:
X = c(1, 3, 5, 7, 8)
print(X)
Output:
[1] 1 3 5 7 8
Lists
A list is a generic object consisting of an ordered collection of objects. Lists are heterogeneous data
structures. These are also one-dimensional data structures. A list can be a list of vectors, list of
matrices, a list of characters and a list of functions and so on.
Example:
empId = c(1, 2, 3, 4)
numberOfEmp = 4
print(empList)
Output:
[[1]]
[1] 1 2 3 4
[[2]]
[1] "Debi" "Sandeep" "Subham" "Shiba"
[[3]]
[1] 4
Dataframes
Dataframes are generic data objects of R which are used to store the tabular data. Dataframes are
the foremost popular data objects in R programming because we are comfortable in seeing the data
within the tabular form. They are two-dimensional, heterogeneous data structures. These are lists of
vectors of equal lengths.
A data-frame must have column names and every row should have a unique name.
Example:
Output:
Matrices
A matrix is a rectangular arrangement of numbers in rows and columns. In a matrix, as we know rows
are the ones that run horizontally and columns are the ones that run vertically. Matrices are two-
dimensional, homogeneous data structures.
Now, let’s see how to create a matrix in R. To create a matrix in R you need to use the function called
matrix. The arguments to this matrix() are the set of elements in the vector. You have to pass how
many numbers of rows and how many numbers of columns you want to have in your matrix and this
is the important point you have to remember that by default, matrices are in column-wise order.
Example:
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3, ncol = 3,
# in column-wise order
byrow = TRUE
print(A)
Output:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
Arrays
Arrays are the R data objects which store the data in more than two dimensions. Arrays are n-
dimensional data structures. For example, if we create an array of dimensions (2, 3, 3) then it creates
3 rectangular matrices each with 2 rows and 3 columns. They are homogeneous data structures.
Now, let’s see how to create arrays in R. To create an array in R you need to use the function called
array(). The arguments to this array() are the set of elements in vectors and you have to pass a vector
containing the dimensions of the array.
Example:
Python3
A = array(
c(1, 2, 3, 4, 5, 6, 7, 8),
dim = c(2, 2, 2)
10
)
11
12
print(A)
Output:
,,1
[,1] [,2]
[1,] 1 3
[2,] 2 4
,,2
[,1] [,2]
[1,] 5 7
[2,] 6 8
Factors
Factors are the data objects which are used to categorize the data and store it as levels. They are
useful for storing categorical data. They can store both strings and integers. They are useful to
categorize unique values in columns like “TRUE” or “FALSE”, or “MALE” or “FEMALE”, etc.. They are
useful in data analysis for statistical modeling.
Now, let’s see how to create factors in R. To create a factor in R you need to use the function called
factor(). The argument to this factor() is the vector.
Example:
print(fac)
Output:
Tibbles
Tibbles are an enhanced version of data frames in R, part of the tidyverse. They offer improved
printing, stricter column types, consistent subsetting behavior, and allow variables to be referred to
as objects. Tibbles provide a modern, user-friendly approach to tabular data in R.
Now, let’s see how we can create a tibble in R. To create tibbles in R we can use the tibble function
from the tibble package, which is part of the tidyverse.
Example:
library(tibble)
print(my_data)
Output:
R – Matrices
R – Matrices
Creating a Matrix in R
The arguments to this matrix() are the set of elements in the vector. You have to pass how many
numbers of rows and how many numbers of columns you want to have in your matrix.
Parameters:
Example:
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
# No of rows
nrow = 3,
# No of columns
ncol = 3,
byrow = TRUE
# Naming rows
# Naming columns
print(A)
Output
cde
a123
b456
c789
R allows the creation of various different types of matrices with the use of arguments passed to the
matrix() function.
1. Matrix where all rows and columns are filled by a single constant ‘k’:
Syntax: matrix(k, m, n)
Parameters:
k: the constant
m: no of rows
n: no of columns
Example:
# R program to illustrate
# special matrices
print(matrix(5, 3, 3))
Output
[1,] 5 5 5
[2,] 5 5 5
[3,] 5 5 5
2. Diagonal matrix:
A diagonal matrix is a matrix in which the entries outside the main diagonal are all zero. To create
such a R matrix the syntax is given below:
Syntax: diag(k, m, n)
Parameters:
k: the constants/array
m: no of rows
n: no of columns
Example:
# R program to illustrate
# special matrices
Output
[1,] 5 0 0
[2,] 0 3 0
[3,] 0 0 3
3. Identity matrix:
An identity matrix in which all the elements of the principal diagonal are ones and all other elements
are zeros. To create such a R matrix the syntax is given below:
Syntax: diag(k, m, n)
Parameters:
k: 1
m: no of rows
n: no of columns
Example:
# R program to illustrate
# special matrices
print(diag(1, 3, 3))
Output
[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1
4. Matrix Metrics
Matrix metrics tell you about the Matrix you created. You might want to know the number of rows,
number of columns, dimensions of a Matrix.
Below Example will help you in answering following questions:
How can you know how many rows are there in the matrix?
Example:
# R program to illustrate
# matrix metrics
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
print(A)
print(dim(A))
cat("Number of rows:\n")
print(nrow(A))
cat("Number of columns:\n")
print(ncol(A))
cat("Number of elements:\n")
print(length(A))
# OR
print(prod(dim(A)))
Output
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[1] 3 3
Number of rows:
[1] 3
Number of columns:
[1] 3
Number of elements:
[1] ...
We can access elements in the R matrices using the same convention that is followed in data frames.
So, you will have a matrix and followed by a square bracket with a comma in between array.
Value before the comma is used to access rows and value that is after the comma is used to access
columns. Let’s illustrate this by taking a simple R code.
Accessing rows:
# R program to illustrate
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
print(A)
print(A[1:2, ])
Output
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[1,] 1 2 3
[2,] 4 5 6
Accessing columns:
# R program to illustrate
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
)
print(A)
print(A[, 1:2])
Output
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[,1] [,2]
[1,] 1 2
[2,] 4 5
[3,] 7 8
# R program to illustrate
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
)
print(A)
# Accessing 2
print(A[1, 2])
# Accessing 6
print(A[2, 3])
Output
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[1] 2
[1] 6
Accessing Submatrices in R:
# R program to illustrate
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
)
print(A)
cat("Accessing the first three rows and the first two columns\n")
print(A[1:3, 1:2])
Output
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
Accessing the first three rows and the first two columns
[,1] [,2]
[1,] 1 2
[2,] 4 5
[3...
Example:
# R program to illustrate
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
)
print(A)
# from 9 to 30
# by direct assignments
A[3, 3] = 30
print(A)
Output
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 30
R-Matrix Concatenation
Concatenation of a row:
# R program to illustrate
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
print(A)
B = matrix(
nrow = 1,
ncol = 3
print(B)
C = rbind(A, B)
print(C)
Output
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[1,] 10 11 12
Concatenation of a column:
# R program to illustrate
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
print(A)
B = matrix(
nrow = 3,
ncol = 1,
byrow = TRUE
print(B)
# Add a new column using cbind()
C = cbind(A, B)
print(C)
Output
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[,1]
[1,] 10
[2,] 11
[3,] 12
Dimension inconsistency: Note that you have to make sure the consistency of dimensions between
the matrix before you do this matrix concatenation.
# R program to illustrate
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
)
print(A)
B = matrix(
nrow = 1,
ncol = 3,
print(B)
C = cbind(A, B)
print(C)
Output:
To add a row in R-matrix you can use rbind() function and to add a column to R-matrix you can
use cbind() function.
Adding Row
Let’s see below example on how to add row in R-matrix?
Example:
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
print(number)
print(number)
Output
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[2,] 4 5 6
[3,]...
Adding Column
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
print(number)
print(number)
Output
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
After adding a new column:
new_column
[1,] 1 2 3 10
[2,] 4 5 6 1...
To delete a row or a column, first of all, you need to access that row or column and then insert a
negative sign before that row or column. It indicates that you had to delete that row or column.
Row deletion:
# R program to illustrate
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
print(A)
# 2nd-row deletion
A = A[-2, ]
print(A)
Output
[2,] 4 5 6
[3,] 7 8 9
[1,] 1 2 3
[2,] 7 8 9
Column deletion:
# R program to illustrate
A = matrix(
c(1, 2, 3, 4, 5, 6, 7, 8, 9),
nrow = 3,
ncol = 3,
byrow = TRUE
print(A)
# 2nd-row deletion
A = A[, -2]
print(A)
Output
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[,1] [,2]
[1,] 1 3
[2,] 4 6
[3,] 7 9
R – Array
Arrays are essential data storage structures defined by a fixed number of dimensions. Arrays are used
for the allocation of space at contiguous memory locations.
In R Programming Language Uni-dimensional arrays are called vectors with the length being their
only dimension. Two-dimensional arrays are called matrices, consisting of fixed numbers of rows and
columns. R Arrays consist of all elements of the same data type. Vectors are supplied as input to the
function and then create an array based on the number of dimensions.
Creating an Array
An R array can be created with the use of array() the function. A list of elements is passed to the
array() functions along with the dimensions as required.
Syntax:
where
Otherwise, a list has to be specified which has a name for each component of the dimension. Each
component is either a null or a vector of length equal to the dim value of that corresponding
dimension.
Uni-Dimensional Array
A vector is a uni-dimensional array, which is specified by a single dimension, length. A Vector can be
created using ‘c()‘ function. A list of values is passed to the c() function to create a vector.
print (vec1)
Output:
[1] 1 2 3 4 5 6 7 8 9
Length of vector : 9
Multi-Dimensional Array
A two-dimensional matrix is an array specified by a fixed number of rows and columns, each
containing the same data type. A matrix is created by using array() function to which the values and
the dimensions are passed.
print(arr)
Output:
,,1
[,1] [,2] [,3]
[1,] 2 4 6
[2,] 3 5 7
,,2
[,1] [,2] [,3]
[1,] 8 10 12
[2,] 9 11 13
Vectors of different lengths can also be fed as input into the array() function. However, the total
number of elements in all the vectors combined should be equal to the number of elements in the
matrices. The elements are arranged in the order in which they are specified in the function.
R
vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
print (arr)
Output:
,,1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
,,2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
We will use dim function to find out the dimension of the R array.
dim(arr)
Output:
[1] 2 3 2
This specifies the dimensions of the R array. In this case, we are creating a 3D array with dimensions
2x3x2. The first dimension has size 2, the second dimension has size 3, and the third dimension has
size 2.
Naming of Arrays
The row names, column names and matrices names are specified as a vector of the number of rows,
number of columns and number of matrices respectively. By default, the rows, columns and matrices
are named by their index values.
dimnames = list(row_names,
col_names, mat_names))
print (arr)
Output:
, , Mat1
col1 col2 col3
row1 2 4 6
row2 3 5 7
, , Mat2
col1 col2 col3
row1 8 10 12
row2 9 11 13
Accessing arrays
The R arrays can be accessed by using indices for different dimensions separated by commas.
Different components can be specified by any combination of elements’ names or positions.
# accessing elements
Output:
Vector is : 1 2 3 4 5 6 7 8 9 10
Third element of vector is : 3
Accessing entire matrices
dimnames = list(row_names,
col_names, mat_names))
arr
print (arr[,,1])
print(arr[,,"Mat2"])
Output:
, , Mat1
col1 col2 col3
row1 1 3 5
row2 2 4 6
, , Mat2
col1 col2 col3
row1 7 9 11
row2 8 10 12
accessing matrix 1 by index value
[1] "Matrix 1"
col1 col2 col3
row1 1 3 5
row2 2 4 6
accessing matrix 2 by its name
[1] "Matrix 2"
col1 col2 col3
row1 7 9 11
row2 8 10 12
Rows and columns can also be accessed by both names as well as indices.
dimnames = list(row_names,
col_names, mat_names))
arr
print(arr["row2",,"Mat2"])
Output:
, , Mat1
col1 col2 col3
row1 1 3 5
row2 2 4 6
, , Mat2
col1 col2 col3
row1 7 9 11
row2 8 10 12
accessing matrix 1 by index value
[1] "1st column of matrix 1"
row1 row2
1 2
accessing matrix 2 by its name
[1] "2nd row of matrix 2"
col1 col2 col3
8 10 12
Elements can be accessed by using both the row and column numbers or names.
Output:
A smaller subset of the array elements can be accessed by defining a range of row or column limits.
arr
Output:
, , Mat1
col1 col2 col3 col4
row1 1 3 5 7
row2 2 4 6 8
, , Mat2
col1 col2 col3 col4
row1 9 11 13 15
row2 10 12 14 1
print elements of both the rows and columns 2 and 3 of matrix 1
col2 col3
row1 3 5
row2 4 6
Elements can be appended at the different positions in the array. The sequence of elements is
retained in order of their addition to the array. The time complexity required to add new elements is
O(n) where n is the length of the array. The length of the array increases by the number of element
additions. There are various in-built functions available in R to add new values:
c(vector, values): c() function allows us to append values to the end of the array. Multiple
values can also be added together.
append(vector, values): This method allows the values to be appended at any position in the
vector. By default, this function adds the element at end. append(vector, values,
after=length(vector)) adds new values after specified length of the array specified in the last
argument of the function.
Using the length function of the array: Elements can be added at length+x indices where
x>0.
x <- c(1, 2, 3, 4, 5)
x <- c(x, 6)
print ("Array after 1st modification ")
print (x)
x <- append(x, 7)
print (x)
x[len + 1] <- 8
print (x)
x[len + 3]<-9
print (x)
print (x)
print (x)
Output:
The original length of the array was 7, and after third modification elements are present till the 8th
index value. Now, at the fourth modification, when we add element 9 at the tenth index value, the
R’s inbuilt function automatically adds NA at the missing value positions.
At 5th modification, the array of elements [10, 11, 12] are added beginning from the 11th index.
At 6th modification, array [-1, -1] is appended after the third position in the array.
Elements can be removed from arrays in R, either one at a time or multiple together. These elements
are specified as indexes to the array, wherein the array values satisfying the conditions are retained
and rest removed. The comparison for removal is based on array values. Multiple conditions can also
be combined together to remove a range of elements. Another way to remove elements is by
using %in% operator wherein the set of element values belonging to the TRUE values of the operator
are displayed as result and the rest are removed.
m <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
print("Original Array")
print(m)
m <- m[m != 3]
print(m)
# Removing elements based on a condition (greater than 2 and less than or equal to 8)
print(m)
# Remove a sequence of elements using another array
Output:
[1] 1 2 4 5 6 7 8 9
[1] 4 5 6 7 8
[1] 5 7
At 1st modification, all the element values that are not equal to 3 are retained. At 2nd modification,
the range of elements that are between 2 and 8 are retained, rest are removed. At 3rd modification,
the elements satisfying the FALSE value are printed, since the condition involves the NOT operator.
The elements of the array can be updated with new values by assignment of the desired index of the
array with the modified value. The changes are retained in the original array. If the index value to be
updated is within the length of the array, then the value is changed, otherwise, the new element is
added at the specified index. Multiple elements can also be updated at once, either with the same
element value or multiple values in case the new values are specified as a vector.
m <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)
print (m)
# updating single element
m[1] <- 0
print (m)
m[7:9] <- -1
print (m)
print (m)
m[10] <- 10
print (m)
Output:
At 2nd modification, the elements at indexes 7 to 9 are updated with -1 each. At 3rd modification,
the second element is replaced by -1 and fifth element by -2 respectively. At 4th modification, a new
element is added since 10th index is greater than the length of the array.
R – Data Frames
Last Updated : 10 Dec, 2024
Data frames can also be interpreted as matrices where each column of a matrix can be of different
data types. R DataFrame is made up of three principal components, the data, rows, and columns.
As you can see in the image below, this is how a data frame is structured.
The data is presented in tabular form, which makes it easier to operate and understand.
R – Data Frames
To create an R data frame use data.frame() function and then pass each of the vectors you have
created as arguments to the function.
friend_id = c(1:5),
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
print(friend.data)
Output:
friend_id friend_name
1 1 Sachin
2 2 Sourav
3 3 Dravid
4 4 Sehwag
5 5 Dhoni
One can get the structure of the R data frame using str() function in R.
It can display even the internal structure of large lists which are nested. It provides one-liner output
for the basic R objects letting the user know about the object and its constituents.
friend_id = c(1:5),
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
# using str()
print(str(friend.data))
Output:
In the R data frame, the statistical summary and nature of the data can be obtained by
applying summary() function.
It is a generic function used to produce result summaries of the results of various model fitting
functions. The function invokes particular methods which depend on the class of the first argument.
friend_id = c(1:5),
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
# using summary()
print(summary(friend.data))
Output:
friend_id friend_name
Min. :1 Length:5
1st Qu.:2 Class :character
Median :3 Mode :character
Mean :3
3rd Qu.:4
Max. :5
# R program to extract
friend_id = c(1:5),
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
print(result)
Output:
friend.data.friend_name
1 Sachin
2 Sourav
3 Dravid
4 Sehwag
5 Dhoni
A data frame in R can be expanded by adding new columns and rows to the already existing R data
frame.
# R program to expand
friend_id = c(1:5),
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
"Bangalore", "Hyderabad",
"Chennai")
print(resultant)
Output:
In R, one can perform various types of operations on a data frame like accessing rows and columns,
selecting the subset of the data frame, editing data frames, delete rows and columns in a data
frame, etc.
Please refer to DataFrame Operations in R to know about all types of operations that can be
performed on a data frame.
We can select and access any element from data frame by using single $ ,brackets [ ] or double
brackets [[]] to access columns from a data frame.
friend_id = c(1:5),
friend_name = c("Sachin", "Sourav",
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
friend.data[1]
friend.data[['friend_name']]
friend.data$friend_id
Output:
friend_id
1 1
2 2
3 3
4 4
5 5
Access Items using [[]]
[1] "Sachin" "Sourav" "Dravid" "Sehwag" "Dhoni"
Access Items using $
[1] 1 2 3 4 5
We can find out how many rows and columns present in our dataframe by using dim function.
friend_id = c(1:5),
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
dim(friend.data)
Output:
[1] 5 2
You can easily add rows and columns in a R DataFrame. Insertion helps in expanding the already
existing DataFrame, without needing a new one.
Let’s look at how to add rows and columns in a DataFrame ? with an example:
To add rows in a Data Frame, you can use a built-in function rbind().
print(Products)
print(Products)
Output:
To add columns in a Data Frame, you can use a built-in function cbind().
print(Products)
print(Products)
Output:
A data frame in R removes columns and rows from the already existing R data frame.
library(dplyr)
data
# Remove a row with friend_id = 3
data
Output:
In the above code, we first created a data frame called data with three
columns: friend_id, friend_name, and location. To remove a row with friend_id equal to 3, we used
the subset() function and specified the condition friend_id != 3. This removed the row
with friend_id equal to 3.
library(dplyr)
data
Output:
friend_id friend_name
1 1 Sachin
2 2 Sourav
3 3 Dravid
4 4 Sehwag
5 5 Dhoni
To remove the location column, we used the select() function and specified -location. The – sign
indicates that we want to remove the location column. The resulting data frame data will have only
two columns: friend_id and friend_name.
There are 2 way to combine data frames in R. You can either combine them vertically or horizontally.
If you want to combine 2 data frames vertically, you can use rbind() function. This function works for
combination of two or more data frames.
cat("Dataframe 1:\n")
print(df1)
cat("\nDataframe 2:\n")
print(df2)
cat("\nCombined Dataframe:\n")
print(combined_df)
Output:
Dataframe 1:
Dataframe 2:
Combined Dataframe:
If you want to combine 2 data frames horizontally, you can use cbind() function. This function works
for combination of two or more data frames.
cat("Dataframe 1:\n")
print(df1)
cat("\nDataframe 2:\n")
print(df2)
cat("\nCombined Dataframe:\n")
print(combined_df)
Output:
Dataframe 1:
Name Age Score
1 Alice 25 80
2 Bob 30 75
Dataframe 2:
Height Weight
1 160 55
2 175 70
Combined Dataframe:
R Factors
Factors in R Programming Language are data structures that are implemented to categorize the data
or represent categorical data and store it on multiple levels.
They can be stored as integers with a corresponding label to every unique integer. The R factors may
look similar to character vectors, they are integers and care must be taken while using them as
strings. The R factor accepts only a restricted number of distinct values. For example, a data field
such as gender may contain values only from female, male, or transgender.
In the above example, all the possible cases are known beforehand and are predefined. These
distinct values are known as levels. After a factor is created it only consists of levels that are by
default sorted alphabetically.
Levels: It is a set of distinct values which are given to the input vector x.
Exclude: This will mention all the values you want to exclude.
Ordered: This logical attribute decides whether the levels are ordered.
nmax: It will decide the upper limit for the maximum number of levels.
Creating a Factor in R Programming Language
The command used to create or modify a factor in R language is – factor() with a vector as input.
The two steps to creating an R factor :
Creating a vector
Examples: Let us create a factor gender with levels female, male and transgender.
R
# Creating a vector
print(x)
# named gender
gender <-factor(x)
print(gender)
Output
R
gender
Output
Further one can check the levels of a factor by using function levels().
The function is.factor() is used to check whether the variable is a factor and returns “TRUE” if it is a
factor.
R
print(is.factor(gender))
Output
[1] TRUE
Function class() is also used to check whether the variable is a factor and if true returns “factor”.
R
class(gender)
Output
[1] "factor"
Like we access elements of a vector, the same way we access the elements of a factor. If gender is a
factor then gender[i] would mean accessing an ith element in the factor.
Example
R
gender[3]
Output
[1] male
Example
R
gender[c(2, 4)]
Output
Example
R
gender[-3]
Output
Then, we use the square brackets [-3] to subset the vector and remove the third element,
which is “male”.
The output is the remaining elements of the gender vector, which are “female”, “male”, and
“female”. The output also shows the levels of the factor, which are “female” and “male”.
Modification of a Factor in R
After a factor is formed, its components can be modified but the new values which need to be
assigned must be at the predefined level.
Example
R
gender[2]<-"female"
gender
Output
For selecting all the elements of the factor gender except ith element, gender[-i] should be used. So
if you want to modify a factor and add value out of predefined levels, then first modify levels.
Example
R
gender
Output
The data frame’s data can be only of three types- factor, numeric, and character type.
In R language when we create a data frame, its column is categorical data, and hence a R factor is
automatically created on it.
We can create a data frame and check if its column is a factor.
Example
R
print(employee)
print(is.factor(employee$gender))
Output
1 40 103200 male
2 49 106200 male
3 48 150200 transgender
4 40 10606 female
5 67 10390 male
6 52 14070 female
7 53 10220 transgender
[1] TRUE
R – Lists
Last Updated : 11 Mar, 2024
The list can be a list of vectors, a list of matrices, a list of characters, a list of functions, and so on.
A list is a vector but with heterogeneous data elements. A list in R is created with the use of the list()
function.
R allows accessing elements of an R list with the use of the index value. In R, the indexing of a list
starts with 1 instead of 0.
Creating a List
In other words, a list is a generic vector containing other objects. To illustrate how a list looks, we
take an example here. We want to build a list of employees with the details. So for this, we want
attributes such as ID, employee name, and the number of employees.
Example:
R
empId = c(1, 2, 3, 4)
numberOfEmp = 4
print(empList)
Output
[[1]]
[1] 1 2 3 4
[[2]]
[[3]]
[1] 4
Example:
R
print(my_named_list)
Output
$name
[1] "Sudheer"
$age
[1] 25
$city
[1] "Delhi"
All the components of a list can be named and we can use those names to access the components of
the R list using the dollar command.
Example:
R
# R program to access
# components of a list
empId = c(1, 2, 3, 4)
numberOfEmp = 4
empList = list(
"ID" = empId,
"Names" = empName,
print(empList)
print(empList$Names)
Output
$ID
[1] 1 2 3 4
$Names
$`Total Staff`
[1] 4
To access the top-level components of a R list we have to use a double slicing operator “[[ ]]” which is
two square brackets and if we want to access the lower or inner-level components of a R list we have
to use another square bracket “[ ]” along with the double slicing operator “[[ ]]“.
Example:
R
# R program to access
# components of a list
empId = c(1, 2, 3, 4)
numberOfEmp = 4
empList = list(
"ID" = empId,
"Names" = empName,
print(empList)
print(empList[[2]])
print(empList[[2]][2])
print(empList[[1]][4])
Output
$ID
[1] 1 2 3 4
$Names
[1] "Debi" "Sandeep" "Subham" "Shiba"
$`Total Staff`
[1] 4
A R list can also be modified by accessing the components and replacing them with the ones which
you want.
Example:
R
# R program to edit
# components of a list
empId = c(1, 2, 3, 4)
numberOfEmp = 4
empList = list(
"ID" = empId,
"Names" = empName,
print(empList)
empList[[1]][5] = 5
empList[[2]][5] = "Kamala"
print(empList)
Output
$ID
[1] 1 2 3 4
$Names
$`Total Staff`
[1] 4
$ID
[1] 1 2 3 4 5
$Names
Concatenation of lists
Two R lists can be concatenated using the concatenation function. So, when we want to concatenate
two lists we have to use the concatenation operator.
Syntax:
list = c(list, list1)
list = the original list
list1 = the new list
Example:
R
# R program to edit
# components of a list
empId = c(1, 2, 3, 4)
numberOfEmp = 4
empList = list(
"ID" = empId,
"Names" = empName,
print(empList)
print(empList)
Output
$ID
[1] 1 2 3 4
$Names
$`Total Staff`
[1] 4
R
# creating a list
my_numbers = c(1,5,6,3)
append(my_numbers, 45)
#printing list
my_numbers
Output
[1] 1 5 6 3 45
[1] 1 5 6 3
To delete components of a R list, first of all, we need to access those components and then insert a
negative sign before those components. It indicates that we had to delete that component.
Example:
R
# R program to access
# components of a list
empId = c(1, 2, 3, 4)
numberOfEmp = 4
empList = list(
"ID" = empId,
"Names" = empName,
print(empList)
print(empList[-3])
print(empList[[2]][-2])
Output
$ID
[1] 1 2 3 4
$Names
[1] 4
$ID
[1] 1 2 3 4
$Names
Merging list
We can merge the R list by placing all the lists into a single list.
R
print(new_list)
Output:
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] "Sun"
[[5]]
[1] "Mon"
[[6]]
[1] "Tue"
Here we are going to convert the R list to vector, for this we will create a list first and then unlist the
list into the vector.
R
# Create lists.
print(lst)
print(vec)
Output
[[1]]
[1] 1 2 3 4 5
[1] 1 2 3 4 5
R List to matrix
We will create matrices using matrix() function in R programming. Another function that will be used
is unlist() function to convert the lists into a vector.
R
# Defining list
list(4, 5, 6))
# Print list
cat("The list is:\n")
print(lst1)
# Print matrix
print(mat)
Output
[[1]]
[[1]][[1]]
[1] 1
[[1]][[2]]
[1] 2
[[1]][[3]]
[1] 3
[[2]]
[[2]][[1]]
[1] 4
[[2]][[2]]
[1] 5
[[2]][[3]]
[1] 6
Class: list
[1,...