0% found this document useful (0 votes)
13 views101 pages

UNIT 2

Uploaded by

ailurophileas24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views101 pages

UNIT 2

Uploaded by

ailurophileas24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 101

What Is a Function in R?

A function in R is an object containing multiple interrelated statements that are run together in a
predefined order every time the function is called. Functions in R can be built-in or created by the
user (user-defined). The main purpose of creating a user-defined function is to optimize our program,
avoid the repetition of the same block of code used for a specific task that is frequently performed in
a particular project, prevent us from inevitable and hard-to-debug errors related to copy-paste
operations, and make the code more readable. A good practice is creating a function whenever we're
supposed to run a certain set of commands more than twice.

Built-in Functions in R

There are plenty of helpful built-in functions in R used for various purposes. Some of the most
popular ones are:

 min(), max(), mean(), median() – return the minimum / maximum / mean / median value of
a numeric vector, correspondingly

 sum() – returns the sum of a numeric vector

 range() – returns the minimum and maximum values of a numeric vector

 abs() – returns the absulute value of a number

 str() – shows the structure of an R object

 print() – displays an R object on the console

 ncol() – returns the number of columns of a matrix or a dataframe

 length() – returns the number of items in an R object (a vector, a list, etc.)

 nchar() – returns the number of characters in a character object

 sort() – sorts a vector in ascending or descending (decreasing=TRUE) order

 exists() – returns TRUE or FALSE depending on whether or not a variable is defined in the R
environment

Let's see some of the above functions in action:

vector <- c(3, 5, 2, 3, 1, 4)

print(min(vector))

print(mean(vector))

print(median(vector))

print(sum(vector))

print(range(vector))

print(str(vector))

print(length(vector))
print(sort(vector, decreasing=TRUE))

print(exists('vector')) ## note the quotation marks

[1] 1

[1] 3

[1] 3

[1] 18

[1] 1 5

num [1:6] 3 5 2 3 1 4

NULL

[1] 6

[1] 5 4 3 3 2 1

[1] TRUE

Creating a Function in R

While applying built-in functions facilitates many common tasks, often we need to create our own
function to automate the performance of a particular task. To declare a user-defined function in R,
we use the keyword function. The syntax is as follows:

function_name <- function(parameters){

function body

Above, the main components of an R function are: function name, function parameters,
and function body. Let's take a look at each of them separately.

Function Name

This is the name of the function object that will be stored in the R environment after the function
definition and used for calling that function. It should be concise but clear and meaningful so that
the user who reads our code can easily understand what exactly this function does. For example, if
we need to create a function for calculating the circumference of a circle with a known radius, we'd
better call this function circumference rather than function_1 or circumference_of_a_circle. (Side
note: While commonly we use verbs in function names, it's ok to use just a noun if that noun is very
descriptive and unambiguous.)

Function Parameters

Sometimes, they are called formal arguments. Function parameters are the variables in the function
definition placed inside the parentheses and separated with a comma that will be set to actual values
(called arguments) each time we call the function. For example:

circumference <- function(r){

2*pi*r
}

print(circumference(2))

[1] 12.56637

Above, we created a function to calculate the circumference of a circle with a known radius using the
formula C=2πr, so the function has the only parameter r. After defining the function, we called it with
the radius equal to 2 (hence, with the argument 2).

It's possible, even though rarely useful, for a function to have no parameters:

hello_world <- function(){

'Hello, World!'

print(hello_world())

[1] "Hello, World!"

Also, some parameters can be set to default values (those related to a typical case) inside the
function definition, which then can be reset when calling the function. Returning to
our circumference function, we can set the default radius of a circle as 1, so if we call the function
with no argument passed, it will calculate the circumference of a unit circle (i.e., a circle with a radius
of 1). Otherwise, it will calculate the circumference of a circle with the provided radius:

circumference <- function(r=1){

2*pi*r

print(circumference())

print(circumference(2))

[1] 6.283185

[1] 12.56637

Function Body

The function body is a set of commands inside the curly braces that are run in a predefined order
every time we call the function. In other words, in the function body, we place what exactly we need
the function to do:

sum_two_nums <- function(x, y){

x+y

print(sum_two_nums(1, 2))

[1] 3
Note that the statements in the function body (in the above example – the only statement x + y)
should be indented by 2 or 4 spaces, depending on the IDE where we run the code, but the
important thing is to be consistent with the indentation throughout the program. While it doesn't
affect the code performance and isn't obligatory, it makes the code easier to read.

It's possible to drop the curly braces if the function body contains a single statement. For example:

sum_two_nums <- function(x, y) x + y

print(sum_two_nums(1, 2))

[1] 3

As we saw from all the above examples, in R, it usually isn't necessary to explicitly include the return
statement when defining a function since an R function just automatically returns the last evaluated
expression in the function body. However, we still can add the return statement inside the function
body using the syntax return(expression_to_be_returned). This becomes inevitable if we need to
return more than one result from a function. For example:

mean_median <- function(vector){

mean <- mean(vector)

median <- median(vector)

return(c(mean, median))

print(mean_median(c(1, 1, 1, 2, 3)))

[1] 1.6 1.0

Note that in the return statement above, we actually return a vector containing the necessary
results, and not just the variables separated by a comma (since the return() function can return only
a single R object). Instead of a vector, we could also return a list, especially if the results to be
returned are supposed to be of different data types.

Calling a Function in R

In all the above examples, we actually already called the created functions many times. To do so, we
just put the punction name and added the necessary arguments inside the parenthesis. In R, function
arguments can be passed by position, by name (so-called named arguments), by mixing position-
based and name-based matching, or by omitting the arguments at all.

If we pass the arguments by position, we need to follow the same sequence of arguments as defined
in the function:

subtract_two_nums <- function(x, y){

x-y

print(subtract_two_nums(3, 1))

[1] 2
In the above example, x is equal to 3 and y – to 1, and not vice versa.

If we pass the arguments by name, i.e., explicitly specify what value each parameter defined in the
function takes, the order of the arguments doesn't matter:

subtract_two_nums <- function(x, y){

x-y

print(subtract_two_nums(x=3, y=1))

print(subtract_two_nums(y=1, x=3))

[1] 2

[1] 2

Since we explicitly assigned x=3 and y=1, we can pass them either as x=3, y=1 or y=1, x=3 – the result
will be the same.

It's possible to mix position- and name-based matching of the arguments. Let's look at the example
of the function for calculating BMR (basal metabolic rate), or daily consumption of calories, for
women based on their weight (in kg), height (in cm), and age (in years). The formula that will be used
in the function is the Mifflin-St Jeor equation:

calculate_calories_women <- function(weight, height, age){

(10 * weight) + (6.25 * height) - (5 * age) - 161

Now, let's calculate the calories for a woman 30 years old, with a weight of 60 kg and a height of 165
cm. However, for the age parameter, we'll pass the argument by name and for the other two
parameters, we'll pass the arguments by position:

print(calculate_calories_women(age=30, 60, 165))

[1] 1320.25

In the case like above (when we mix matching by name and by position), the named arguments are
extracted from the whole succession of arguments and are matched first, while the rest of the
arguments are matched by position, i.e., in the same order as they appear in the function definition.
However, this practice isn't recommended and can lead to confusion.

Finally, we can omit some (or all) of the arguments at all. This can happen if we set some (or all) of
the parameters to default values inside the function definition. Let's return to
our calculate_calories_women function and set the default age of a woman as 30 y.o.:

calculate_calories_women <- function(weight, height, age=30){

(10 * weight) + (6.25 * height) - (5 * age) - 161

print(calculate_calories_women(60, 165))
[1] 1320.25

Control Statements in R Programming

Last Updated : 01 Jun, 2020

Control statements are expressions used to control the execution and flow of the program based on
the conditions provided in the statements. These structures are used to make a decision after
assessing the variable. In this article, we’ll discuss all the control statements with the examples.

In R programming, there are 8 types of control statements as follows:

 if condition

 if-else condition

 for loop

 nested loops

 while loop

 repeat and break statement

 return statement

 next statement

if condition

This control structure checks the expression provided in parenthesis is true or not. If true, the
execution of the statements in braces {} continues.

Syntax:

if(expression){

statements

....

....

Example:

x <- 100

if(x > 10){


print(paste(x, "is greater than 10"))

Output:

[1] "100 is greater than 10"

if-else condition

It is similar to if condition but when the test expression in if condition fails, then statements
in else condition are executed.

Syntax:

if(expression){

statements

....

....

else{

statements

....

....

Example:

x <- 5

# Check value is less than or greater than 10

if(x > 10){

print(paste(x, "is greater than 10"))

}else{

print(paste(x, "is less than 10"))

Output:

[1] "5 is less than 10"


for loop

It is a type of loop or sequence of statements executed repeatedly until exit condition is reached.

Syntax:

for(value in vector){

statements

....

....

Example:

x <- letters[4:10]

for(i in x){

print(i)

Output:

[1] "d"

[1] "e"

[1] "f"

[1] "g"

[1] "h"

[1] "i"

[1] "j"

Nested loops

Nested loops are similar to simple loops. Nested means loops inside loop. Moreover, nested loops
are used to manipulate the matrix.

Example:

# Defining matrix

m <- matrix(2:15, 2)
for (r in seq(nrow(m))) {

for (c in seq(ncol(m))) {

print(m[r, c])

Output:

[1] 2

[1] 4

[1] 6

[1] 8

[1] 10

[1] 12

[1] 14

[1] 3

[1] 5

[1] 7

[1] 9

[1] 11

[1] 13

[1] 15

while loop

while loop is another kind of loop iterated until a condition is satisfied. The testing expression is
checked first before executing the body of loop.

Syntax:

while(expression){

statement

....

....

Example:
x=1

# Print 1 to 5

while(x <= 5){

print(x)

x=x+1

Output:

[1] 1

[1] 2

[1] 3

[1] 4

[1] 5

repeat loop and break statement

repeat is a loop which can be iterated many number of times but there is no exit condition to come
out from the loop. So, break statement is used to exit from the loop. break statement can be used in
any type of loop to exit from the loop.

Syntax:

repeat {

statements

....

....

if(expression) {

break

Example:

x=1
# Print 1 to 5

repeat{

print(x)

x=x+1

if(x > 5){

break

Output:

[1] 1

[1] 2

[1] 3

[1] 4

[1] 5

return statement

return statement is used to return the result of an executed function and returns control to the
calling function.

Syntax:

return(expression)

Example:

# Checks value is either positive, negative or zero

func <- function(x){

if(x > 0){

return("Positive")

}else if(x < 0){

return("Negative")

}else{

return("Zero")
}

func(1)

func(0)

func(-1)

Output:

[1] "Positive"

[1] "Zero"

[1] "Negative"

next statement

next statement is used to skip the current iteration without executing the further statements and
continues the next iteration cycle without terminating the loop.

Example:

# Defining vector

x <- 1:10

# Print even numbers

for(i in x){

if(i%%2 != 0){

next #Jumps to next loop

print(i)

Output:

[1] 2

[1] 4

[1] 6

[1] 8
[1] 10

Data Manipulation in R with Dplyr Package

Last Updated : 22 Aug, 2022

In this article let’s discuss manipulating data in the R programming language.

In order to manipulate the data, R provides a library called dplyr which consists of many built-in
methods to manipulate the data. So to use the data manipulation function, first need to import the
dplyr package using library(dplyr) line of code. Below is the list of a few data manipulation functions
present in dplyr package.

Function Name Description

filter() Produces a subset of a Data Frame.

distinct() Removes duplicate rows in a Data Frame

arrange() Reorder the rows of a Data Frame

select() Produces data in required columns of a Data Frame

rename() Renames the variable names

mutate() Creates new variables without dropping old ones.

transmute() Creates new variables by dropping the old.

summarize() Gives summarized data like Average, Sum, etc.

filter() method

The filter() function is used to produce the subset of the data that satisfies the condition specified in
the filter() method. In the condition, we can use conditional operators, logical operators, NA values,
range operators etc. to filter out data. Syntax of filter() function is given below-
filter(dataframeName, condition)

Example:

In the below code we used filter() function to fetch the data of players who scored more than 100
runs from the “stats” data frame.

 R

# import dplyr package

library(dplyr)

# create a data frame

stats <- data.frame(player=c('A', 'B', 'C', 'D'),

runs=c(100, 200, 408, 19),

wickets=c(17, 20, NA, 5))

# fetch players who scored more

# than 100 runs

filter(stats, runs>100)

Output

player runs wickets

1 B 200 20

2 C 408 NA

distinct() method

The distinct() method removes duplicate rows from data frame or based on the specified columns.
The syntax of distinct() method is given below-

distinct(dataframeName, col1, col2,.., .keep_all=TRUE)

Example:

Here in this example, we used distinct() method to remove the duplicate rows from the data frame
and also remove duplicates based on a specified column.

 R

# import dplyr package

library(dplyr)
# create a data frame

stats <- data.frame(player=c('A', 'B', 'C', 'D', 'A', 'A'),

runs=c(100, 200, 408, 19, 56, 100),

wickets=c(17, 20, NA, 5, 2, 17))

# removes duplicate rows

distinct(stats)

#remove duplicates based on a column

distinct(stats, player, .keep_all = TRUE)

Output

player runs wickets

1 A 100 17

2 B 200 20

3 C 408 NA

4 D 19 5

5 A 56 2

player runs wickets

1 A 100 17

2 B 200 20

3 C 408 NA

4 D 19 5

arrange() method

In R, the arrange() method is used to order the rows based on a specified column. The syntax of
arrange() method is specified below-

arrange(dataframeName, columnName)

Example:

In the below code we ordered the data based on the runs from low to high using arrange() function.

 R
# import dplyr package

library(dplyr)

# create a data frame

stats <- data.frame(player=c('A', 'B', 'C', 'D'),

runs=c(100, 200, 408, 19),

wickets=c(17, 20, NA, 5))

# ordered data based on runs

arrange(stats, runs)

Output

player runs wickets

1 D 19 5

2 A 100 17

3 B 200 20

4 C 408 NA

select() method

The select() method is used to extract the required columns as a table by specifying the required
column names in select() method. The syntax of select() method is mentioned below-

select(dataframeName, col1,col2,…)

Example:

Here in the below code we fetched the player, wickets column data only using select() method.

 R

# import dplyr package

library(dplyr)

# create a data frame

stats <- data.frame(player=c('A', 'B', 'C', 'D'),

runs=c(100, 200, 408, 19),


wickets=c(17, 20, NA, 5))

# fetch required column data

select(stats, player,wickets)

Output

player wickets

1 A 17

2 B 20

3 C NA

4 D 5

rename() method

The rename() function is used to change the column names. This can be done by the below syntax-

rename(dataframeName, newName=oldName)

Example:

In this example, we change the column name “runs” to “runs_scored” in stats data frame.

 R

# import dplyr package

library(dplyr)

# create a data frame

stats <- data.frame(player=c('A', 'B', 'C', 'D'),

runs=c(100, 200, 408, 19),

wickets=c(17, 20, NA, 5))

# renaming the column

rename(stats, runs_scored=runs)

Output

player runs_scored wickets

1 A 100 17
2 B 200 20

3 C 408 NA

4 D 19 5

mutate() & transmute() methods

These methods are used to create new variables. The mutate() function creates new variables
without dropping the old ones but transmute() function drops the old variables and creates new
variables. The syntax of both methods is mentioned below-

mutate(dataframeName, newVariable=formula)

transmute(dataframeName, newVariable=formula)

Example:

In this example, we created a new column avg using mutate() and transmute() methods.

 R

# import dplyr package

library(dplyr)

# create a data frame

stats <- data.frame(player=c('A', 'B', 'C', 'D'),

runs=c(100, 200, 408, 19),

wickets=c(17, 20, 7, 5))

# add new column avg

mutate(stats, avg=runs/4)

# drop all and create a new column

transmute(stats, avg=runs/4)

Output

player runs wickets avg

1 A 100 17 25.00

2 B 200 20 50.00

3 C 408 7 102.00

4 D 19 5 4.75
avg

1 25.00

2 50.00

3 102.00

4 4.75

Here mutate() functions adds a new column for the existing data frame without dropping the old
ones where as transmute() function created a new variable but dropped all the old columns.

summarize() method

Using the summarize method we can summarize the data in the data frame by using aggregate
functions like sum(), mean(), etc. The syntax of summarize() method is specified below-

summarize(dataframeName, aggregate_function(columnName))

Example:

In the below code we presented the summarized data present in the runs column using summarize()
method.

 R

# import dplyr package

library(dplyr)

# create a data frame

stats <- data.frame(player=c('A', 'B', 'C', 'D'),

runs=c(100, 200, 408, 19),

wickets=c(17, 20, 7, 5))

# summarize method

summarize(stats, sum(runs), mean(runs))

Output

sum(runs) mean(runs)

1 727 181.75

Data Reshaping in R Programming

Last Updated : 01 Aug, 2023


Generally, in R Programming Language, data processing is done by taking data as input from a data
frame where the data is organized into rows and columns. Data frames are mostly used since
extracting data is much simpler and hence easier. But sometimes we need to reshape the format of
the data frame from the one we receive. Hence, in R, we can split, merge and reshape the data frame
using various functions.

The various forms of reshaping data in a data frame are:

 Transpose of a Matrix

 Joining Rows and Columns

 Merging of Data Frames

 Melting and Casting

Why R – Data Reshaping is Important?

While doing an analysis or using an analytic function, the resultant data obtained because of the
experiment or study is generally different. The obtained data usually has one or more columns that
correspond or identify a row followed by a number of columns that represent the measured values.
We can say that these columns that identify a row can be the composite key of a column in a
database.

Transpose of a Matrix

We can easily calculate the transpose of a matrix in R language with the help of the t() function. The
t() function takes a matrix or data frame as an input and gives the transpose of that matrix or data
frame as its output.

Syntax:

t(Matrix/ Data frame)

Example:

 R

# R program to find the transpose of a matrix

first <- matrix(c(1:12), nrow=4, byrow=TRUE)

print("Original Matrix")

first

first <- t(first)


print("Transpose of the Matrix")

first

Output:

[1] "Original Matrix"


[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
[4,] 10 11 12

[1] "Transpose of the Matrix"

[,1] [,2] [,3] [,4]


[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12

Joining Rows and Columns in Data Frame

In R, we can join two vectors or merge two data frames using functions. There are basically two
functions that perform these tasks:

cbind():

We can combine vectors, matrix or data frames by columns using cbind() function.

Syntax: cbind(x1, x2, x3)

where x1, x2 and x3 can be vectors or matrices or data frames.

rbind():

We can combine vectors, matrix or data frames by rows using rbind() function.

Syntax: rbind(x1, x2, x3)

where x1, x2 and x3 can be vectors or matrices or data frames.

Example:

 R

# Cbind and Rbind function in R

name <- c("Shaoni", "esha", "soumitra", "soumi")

age <- c(24, 53, 62, 29)

address <- c("puducherry", "kolkata", "delhi", "bangalore")


# Cbind function

info <- cbind(name, age, address)

print("Combining vectors into data frame using cbind ")

print(info)

# creating new data frame

newd <- data.frame(name=c("sounak", "bhabani"),

age=c("28", "87"),

address=c("bangalore", "kolkata"))

# Rbind function

new.info <- rbind(info, newd)

print("Combining data frames using rbind ")

print(new.info)

Output:

[1] "Combining vectors into data frame using cbind "


name age address
[1,] "Shaoni" "24" "puducherry"
[2,] "esha" "53" "kolkata"
[3,] "soumitra" "62" "delhi"
[4,] "soumi" "29" "bangalore"

[1] "Combining data frames using rbind "

name age address


1 Shaoni 24 puducherry
2 esha 53 kolkata
3 soumitra 62 delhi
4 soumi 29 bangalore
5 sounak 28 bangalore
6 bhabani 87 kolkata

Merging two Data Frames

In R, we can merge two data frames using the merge() function provided both the data frames
should have the same column names. We may merge the two data frames based on a key value.

Syntax: merge(dfA, dfB, …)


Example:

 R

# Merging two data frames in R

d1 <- data.frame(name=c("shaoni", "soumi", "arjun"),

ID=c("111", "112", "113"))

d2 <- data.frame(name=c("sounak", "esha"),

ID=c("114", "115"))

total <- merge(d1, d2, all=TRUE)

print(total)

Output:

name ID
1 arjun 113
2 shaoni 111
3 soumi 112
4 esha 115
5 sounak 114

Melting and Casting

Data reshaping involves many steps in order to obtain desired or required format. One of the popular
methods is melting the data which converts each row into a unique id-variable combination and then
casting it. The two functions used for this process:

melt():

It is used to convert a data frame into a molten data frame.

Syntax: melt(data, …, na.rm=FALSE, value.name=”value”)

where,

data: data to be melted


… : arguments
na.rm: converts explicit missings into implicit missings
value.name: storing values

dcast():

It is used to aggregate the molten data frame into a new form.

Syntax: melt(data, formula, fun.aggregate)

where,
data: data to be melted
formula: formula that defines how to cast
fun.aggregate: used if there is a data aggregation

Example:

 R

library(reshape2)

a <- data.frame(id = c("1", "1", "2", "2"),

points = c("1", "2", "1", "2"),

x1 = c("5", "3", "6", "2"),

x2 = c("6", "5", "1", "4"))

# Convert numeric columns to actual numeric values

a$x1 <- as.numeric(as.character(a$x1))

a$x2 <- as.numeric(as.character(a$x2))

print("Melting")

m <- melt(a, id = c("id", "points"))

print(m)

print("Casting")

idmn <- dcast(m, id ~ variable, mean)

print(idmn)

Output:

[1] "Melting"

id points variable value


1 1 1 x1 5
2 1 2 x1 3
3 2 1 x1 6
4 2 2 x1 2
5 1 1 x2 6
6 1 2 x2 5
7 2 1 x2 1
8 2 2 x2 4

[1] "Casting"

id x1 x2
1 1 4 5.5
2 2 4 2.5

String Manipulation in R

Last Updated : 12 Jan, 2023

String manipulation basically refers to the process of handling and analyzing strings. It involves
various operations concerned with modification and parsing of strings to use and change its data. R
offers a series of in-built functions to manipulate the contents of a string. In this article, we will study
different functions concerned with the manipulation of strings in R.

Concatenation of Strings

String Concatenation is the technique of combining two strings. String Concatenation can be done
using many ways:

 paste() function Any number of strings can be concatenated together using


the paste() function to form a larger string. This function takes separator as argument which
is used between the individual string elements and another argument ‘collapse’ which
reflects if we wish to print the strings together as a single larger string. By default, the value
of collapse is NULL. Syntax:

paste(..., sep=" ", collapse = NULL)

 Example:

 Python3

# R program for String concatenation

# Concatenation using paste() function

str <- paste("Learn", "Code")

print (str)

 Output:

"Learn Code"
 In case no separator is specified the default separator ” ” is inserted between individual
strings. Example:

 Python3

str <- paste(c(1:3), "4", sep = ":")

print (str)

 Output:

"1:4" "2:4" "3:4"

 Since, the objects to be concatenated are of different lengths, a repetition of the string of
smaller length is applied with the other input strings. The first string is a sequence of 1, 2, 3
which is then individually concatenated with the other string “4” using separator ‘:’.

 Python3

str <- paste(c(1:4), c(5:8), sep = "--")

print (str)

 Output:

"1--5" "2--6" "3--7" "4--8"

 Since, both the strings are of the same length, the corresponding elements of both are
concatenated, that is the first element of the first string is concatenated with the first
element of second-string using the sep ‘–‘.

 cat() function Different types of strings can be concatenated together using the cat())
function in R, where sep specifies the separator to give between the strings and file name, in
case we wish to write the contents onto a file. Syntax:

cat(..., sep=" ", file)

 Example:

 Python3

# R program for string concatenation

# Concatenation using cat() function

str <- cat("learn", "code", "tech", sep = ":")

print (str)

 Output:

learn:code:techNULL
 The output string is printed without any quotes and the default separator is ‘:’.NULL value is
appended at the end. Example:

 Python3

cat(c(1:5), file ='sample.txt')

 Output:

12345

The output is written to a text file sample.txt in the same working directory.

Calculating Length of strings

 length() function The length() function determines the number of strings specified in the
function. Example:

 Python3

# R program to calculate length

print (length(c("Learn to", "Code")))

 Output:

 There are two strings specified in the function.

 nchar() function nchar() counts the number of characters in each of the strings specified as
arguments to the function individually. Example:

 Python3

print (nchar(c("Learn", "Code")))

 Output:

54

 The output indicates the length of Learn and then Code separated by ” ” .

Case Conversion of strings

 Conversion to upper case All the characters of the strings specified are converted to upper
case. Example:

 Python3

print (toupper(c("Learn Code", "hI")))


 Output :

"LEARN CODE" "HI"

 Conversion to lower case All the characters of the strings specified are converted to lower
case. Example:

 Python3

print (tolower(c("Learn Code", "hI")))

 Output :

"learn code" "hi"

 casefold() function All the characters of the strings specified are converted to lowercase or
uppercase according to the arguments in casefold(…, upper=TRUE). Examples:

 Python3

print (casefold(c("Learn Code", "hI")))

 Output:

"learn code" "hi"

 By default, the strings get converted to lower case.

 Python3

print (casefold(c("Learn Code", "hI"), upper = TRUE))

 Output:

"LEARN CODE" "HI"

Character replacement

Characters can be translated using the chartr(oldchar, newchar, …) function in R, where every
instance of old character is replaced by the new character in the specified set of strings. Example 1:

 Python3

chartr("a", "A", "An honest man gave that")

Output:

"An honest mAn gAve thAt"


Every instance of ‘a’ is replaced by ‘A’. Example 2:

 Python3

chartr("is", "#@", c("This is it", "It is great"))

Output:

"Th#@ #@ #t" "It #@ great"

Every instance of old string is replaced by new specified string. “i” is replaced by “#” by “s” by “@”,
that is the corresponding positions of old string is replaced by new string. Example 3:

 Python3

chartr("ate", "#@", "I hate ate")

Output:

Error in chartr("ate", "#@", "I hate ate") : 'old' is longer than 'new'

Execution halted

The length of the old string should be less than the new string.

Splitting the string

A string can be split into corresponding individual strings using ” ” the default separator. Example:

 Python3

strsplit("Learn Code Teach !", " ")

Output:

[1] "Learn" "Code" "Teach" "!"

Working with substrings

substr(…, start, end) or substring(…, start, end) function in R extracts substrings out of a string
beginning with the start index and ending with the end index. It also replaces the specified substring
with a new set of characters. Example:

 Python3

substr("Learn Code Tech", 1, 4)

Output:

"Lear"

Extracts the first four characters from the string.


 Python3

str & lt

- c(& quot

program", & quot

with"

, & quot

new"

, & quot

language"

substr(str, 3, 3) & lt

- & quot

% & quot

print(str)

Output:

"pr%gram" "wi%h" "ne%" "la%guage"

Replaces the third character of every string with % sign.

 Python3

str <- c("program", "with", "new", "language")

substr(str, 3, 3) <- c("%", "@")

print(str)

Output:

"pr%gram" "wi@h" "ne%" "la@guage"

Replaces the third character of each string alternatively with the specified symbols.

Data Structures in R Programming

Last Updated : 26 Mar, 2024


A data structure is a particular way of organizing data in a computer so that it can be used effectively.
The idea is to reduce the space and time complexities of different tasks. Data structures in R
programming are tools for holding multiple values.

R’s base data structures are often organized by their dimensionality (1D, 2D, or nD) and whether
they’re homogeneous (all elements must be of the identical type) or heterogeneous (the elements
are often of various types). This gives rise to the six data types which are most frequently utilized in
data analysis.

The most essential data structures used in R include:

 Vectors

 Lists

 Dataframes

 Matrices

 Arrays

 Factors

 Tibbles

Vectors

A vector is an ordered collection of basic data types of a given length. The only key thing here is all
the elements of a vector must be of the identical data type e.g homogeneous data structures.
Vectors are one-dimensional data structures.

Example:

# R program to illustrate Vector

# Vectors(ordered collection of same data type)

X = c(1, 3, 5, 7, 8)

# Printing those elements in console

print(X)

Output:

[1] 1 3 5 7 8

Lists

A list is a generic object consisting of an ordered collection of objects. Lists are heterogeneous data
structures. These are also one-dimensional data structures. A list can be a list of vectors, list of
matrices, a list of characters and a list of functions and so on.
Example:

# R program to illustrate a List

# The first attributes is a numeric vector

# containing the employee IDs which is

# created using the 'c' command here

empId = c(1, 2, 3, 4)

# The second attribute is the employee name

# which is created using this line of code here

# which is the character vector

empName = c("Debi", "Sandeep", "Subham", "Shiba")

# The third attribute is the number of employees

# which is a single numeric variable.

numberOfEmp = 4

# We can combine all these three different

# data types into a list

# containing the details of employees

# which can be done using a list command

empList = list(empId, empName, numberOfEmp)

print(empList)

Output:

[[1]]
[1] 1 2 3 4

[[2]]
[1] "Debi" "Sandeep" "Subham" "Shiba"
[[3]]
[1] 4

Dataframes

Dataframes are generic data objects of R which are used to store the tabular data. Dataframes are
the foremost popular data objects in R programming because we are comfortable in seeing the data
within the tabular form. They are two-dimensional, heterogeneous data structures. These are lists of
vectors of equal lengths.

Data frames have the following constraints placed upon them:

 A data-frame must have column names and every row should have a unique name.

 Each column must have the identical number of items.

 Each item in a single column must be of the same data type.

 Different columns may have different data types.

To create a data frame we use the data.frame() function.

Example:

# R program to illustrate dataframe

# A vector which is a character vector

Name = c("Amiya", "Raj", "Asish")

# A vector which is a character vector

Language = c("R", "Python", "Java")

# A vector which is a numeric vector

Age = c(22, 25, 45)

# To create dataframe use data.frame command

# and then pass each of the vectors

# we have created as arguments

# to the function data.frame()

df = data.frame(Name, Language, Age)


print(df)

Output:

Name Language Age


1 Amiya R 22
2 Raj Python 25
3 Asish Java 45

Matrices

A matrix is a rectangular arrangement of numbers in rows and columns. In a matrix, as we know rows
are the ones that run horizontally and columns are the ones that run vertically. Matrices are two-
dimensional, homogeneous data structures.
Now, let’s see how to create a matrix in R. To create a matrix in R you need to use the function called
matrix. The arguments to this matrix() are the set of elements in the vector. You have to pass how
many numbers of rows and how many numbers of columns you want to have in your matrix and this
is the important point you have to remember that by default, matrices are in column-wise order.

Example:

# R program to illustrate a matrix

A = matrix(

# Taking sequence of elements

c(1, 2, 3, 4, 5, 6, 7, 8, 9),

# No of rows and columns

nrow = 3, ncol = 3,

# By default matrices are

# in column-wise order

# So this parameter decides

# how to arrange the matrix

byrow = TRUE

print(A)

Output:
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9

Arrays

Arrays are the R data objects which store the data in more than two dimensions. Arrays are n-
dimensional data structures. For example, if we create an array of dimensions (2, 3, 3) then it creates
3 rectangular matrices each with 2 rows and 3 columns. They are homogeneous data structures.

Now, let’s see how to create arrays in R. To create an array in R you need to use the function called
array(). The arguments to this array() are the set of elements in vectors and you have to pass a vector
containing the dimensions of the array.

Example:

Python3

# R program to illustrate an array

A = array(

# Taking sequence of elements

c(1, 2, 3, 4, 5, 6, 7, 8),

# Creating two rectangular matrices

# each with two rows and two columns

dim = c(2, 2, 2)

10

)
11

12

print(A)

Output:

,,1

[,1] [,2]
[1,] 1 3
[2,] 2 4

,,2

[,1] [,2]
[1,] 5 7
[2,] 6 8

Factors

Factors are the data objects which are used to categorize the data and store it as levels. They are
useful for storing categorical data. They can store both strings and integers. They are useful to
categorize unique values in columns like “TRUE” or “FALSE”, or “MALE” or “FEMALE”, etc.. They are
useful in data analysis for statistical modeling.

Now, let’s see how to create factors in R. To create a factor in R you need to use the function called
factor(). The argument to this factor() is the vector.

Example:

# R program to illustrate factors

# Creating factor using factor()

fac = factor(c("Male", "Female", "Male",

"Male", "Female", "Male", "Female"))

print(fac)

Output:

[1] Male Female Male Male Female Male Female


Levels: Female Male

Tibbles
Tibbles are an enhanced version of data frames in R, part of the tidyverse. They offer improved
printing, stricter column types, consistent subsetting behavior, and allow variables to be referred to
as objects. Tibbles provide a modern, user-friendly approach to tabular data in R.

Now, let’s see how we can create a tibble in R. To create tibbles in R we can use the tibble function
from the tibble package, which is part of the tidyverse.

Example:

# Load the tibble package

library(tibble)

# Create a tibble with three columns: name, age, and city

my_data <- tibble(

name = c("Sandeep", "Amit", "Aman"),

age = c(25, 30, 35),

city = c("Pune", "Jaipur", "Delhi")

# Print the tibble

print(my_data)

Output:

name age city


<chr> <dbl> <chr>
1 Sandeep 25 Pune
2 Amit 30 Jaipur
3 Aman 35 Delhi

R – Matrices

Last Updated : 23 Jul, 2024

R-matrix is a two-dimensional arrangement of data in rows and columns.


In a matrix, rows are the ones that run horizontally and columns are the ones that run vertically. In R
programming, matrices are two-dimensional, homogeneous data structures. These are some
examples of matrices:

R – Matrices

Creating a Matrix in R

To create a matrix in R you need to use the function called matrix().

The arguments to this matrix() are the set of elements in the vector. You have to pass how many
numbers of rows and how many numbers of columns you want to have in your matrix.

Note: By default, matrices are in column-wise order.

Syntax to Create R-Matrix

matrix(data, nrow, ncol, byrow, dimnames)

Parameters:

 data – values you want to enter

 nrow – no. of rows

 ncol – no. of columns

 byrow – logical clue, if ‘true’ value will be assigned by rows

 dimnames – names of rows and columns

Example:

# R program to create a matrix

A = matrix(

# Taking sequence of elements

c(1, 2, 3, 4, 5, 6, 7, 8, 9),

# No of rows
nrow = 3,

# No of columns

ncol = 3,

# By default matrices are in column-wise order

# So this parameter decides how to arrange the matrix

byrow = TRUE

# Naming rows

rownames(A) = c("a", "b", "c")

# Naming columns

colnames(A) = c("c", "d", "e")

cat("The 3x3 matrix:\n")

print(A)

Output

The 3x3 matrix:

cde

a123

b456

c789

Creating Special Matrices in R

R allows the creation of various different types of matrices with the use of arguments passed to the
matrix() function.

1. Matrix where all rows and columns are filled by a single constant ‘k’:

To create such a R matrix the syntax is given below:

Syntax: matrix(k, m, n)
Parameters:
k: the constant
m: no of rows
n: no of columns

Example:

# R program to illustrate

# special matrices

# Matrix having 3 rows and 3 columns

# filled by a single constant 5

print(matrix(5, 3, 3))

Output

[,1] [,2] [,3]

[1,] 5 5 5

[2,] 5 5 5

[3,] 5 5 5

2. Diagonal matrix:

A diagonal matrix is a matrix in which the entries outside the main diagonal are all zero. To create
such a R matrix the syntax is given below:

Syntax: diag(k, m, n)
Parameters:
k: the constants/array
m: no of rows
n: no of columns

Example:

# R program to illustrate

# special matrices

# Diagonal matrix having 3 rows and 3 columns

# filled by array of elements (5, 3, 3)


print(diag(c(5, 3, 3), 3, 3))

Output

[,1] [,2] [,3]

[1,] 5 0 0

[2,] 0 3 0

[3,] 0 0 3

3. Identity matrix:

An identity matrix in which all the elements of the principal diagonal are ones and all other elements
are zeros. To create such a R matrix the syntax is given below:

Syntax: diag(k, m, n)
Parameters:
k: 1
m: no of rows
n: no of columns

Example:

# R program to illustrate

# special matrices

# Identity matrix having

# 3 rows and 3 columns

print(diag(1, 3, 3))

Output

[,1] [,2] [,3]

[1,] 1 0 0

[2,] 0 1 0

[3,] 0 0 1

4. Matrix Metrics

Matrix metrics tell you about the Matrix you created. You might want to know the number of rows,
number of columns, dimensions of a Matrix.
Below Example will help you in answering following questions:

 How can you know the dimension of the matrix?

 How can you know how many rows are there in the matrix?

 How many columns are in the matrix?

 How many elements are there in the matrix?

Example:

# R program to illustrate

# matrix metrics

# Create a 3x3 matrix

A = matrix(

c(1, 2, 3, 4, 5, 6, 7, 8, 9),

nrow = 3,

ncol = 3,

byrow = TRUE

cat("The 3x3 matrix:\n")

print(A)

cat("Dimension of the matrix:\n")

print(dim(A))

cat("Number of rows:\n")

print(nrow(A))

cat("Number of columns:\n")

print(ncol(A))

cat("Number of elements:\n")

print(length(A))
# OR

print(prod(dim(A)))

Output

The 3x3 matrix:

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6

[3,] 7 8 9

Dimension of the matrix:

[1] 3 3

Number of rows:

[1] 3

Number of columns:

[1] 3

Number of elements:

[1] ...

Accessing Elements of a R-Matrix

We can access elements in the R matrices using the same convention that is followed in data frames.
So, you will have a matrix and followed by a square bracket with a comma in between array.

Value before the comma is used to access rows and value that is after the comma is used to access
columns. Let’s illustrate this by taking a simple R code.

Accessing rows:

# R program to illustrate

# access rows in metrics

# Create a 3x3 matrix

A = matrix(

c(1, 2, 3, 4, 5, 6, 7, 8, 9),

nrow = 3,

ncol = 3,
byrow = TRUE

cat("The 3x3 matrix:\n")

print(A)

# Accessing first and second row

cat("Accessing first and second row\n")

print(A[1:2, ])

Output

The 3x3 matrix:

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6

[3,] 7 8 9

Accessing first and second row

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6

Accessing columns:

# R program to illustrate

# access columns in metrics

# Create a 3x3 matrix

A = matrix(

c(1, 2, 3, 4, 5, 6, 7, 8, 9),

nrow = 3,

ncol = 3,

byrow = TRUE
)

cat("The 3x3 matrix:\n")

print(A)

# Accessing first and second column

cat("Accessing first and second column\n")

print(A[, 1:2])

Output

The 3x3 matrix:

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6

[3,] 7 8 9

Accessing first and second column

[,1] [,2]

[1,] 1 2

[2,] 4 5

[3,] 7 8

More Example of Accessing Elements of a R-matrix:

# R program to illustrate

# access an entry in metrics

# Create a 3x3 matrix

A = matrix(

c(1, 2, 3, 4, 5, 6, 7, 8, 9),

nrow = 3,

ncol = 3,

byrow = TRUE
)

cat("The 3x3 matrix:\n")

print(A)

# Accessing 2

print(A[1, 2])

# Accessing 6

print(A[2, 3])

Output

The 3x3 matrix:

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6

[3,] 7 8 9

[1] 2

[1] 6

Accessing Submatrices in R:

We can access the submatrix in a matrix using the colon(:) operator.

# R program to illustrate

# access submatrices in a matrix

# Create a 3x3 matrix

A = matrix(

c(1, 2, 3, 4, 5, 6, 7, 8, 9),

nrow = 3,

ncol = 3,

byrow = TRUE
)

cat("The 3x3 matrix:\n")

print(A)

cat("Accessing the first three rows and the first two columns\n")

print(A[1:3, 1:2])

Output

The 3x3 matrix:

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6

[3,] 7 8 9

Accessing the first three rows and the first two columns

[,1] [,2]

[1,] 1 2

[2,] 4 5

[3...

Modifying Elements of a R-Matrix

In R you can modify the elements of the matrices by a direct assignment.

Example:

# R program to illustrate

# editing elements in metrics

# Create a 3x3 matrix

A = matrix(

c(1, 2, 3, 4, 5, 6, 7, 8, 9),

nrow = 3,

ncol = 3,

byrow = TRUE
)

cat("The 3x3 matrix:\n")

print(A)

# Editing the 3rd rows and 3rd column element

# from 9 to 30

# by direct assignments

A[3, 3] = 30

cat("After edited the matrix\n")

print(A)

Output

The 3x3 matrix:

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6

[3,] 7 8 9

After edited the matrix

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6

[3,] 7 8 30

R-Matrix Concatenation

Matrix concatenation refers to the merging of rows or columns of an existing R matrix.

Concatenation of a row:

The concatenation of a row to a matrix is done using rbind().

# R program to illustrate

# concatenation of a row in metrics


# Create a 3x3 matrix

A = matrix(

c(1, 2, 3, 4, 5, 6, 7, 8, 9),

nrow = 3,

ncol = 3,

byrow = TRUE

cat("The 3x3 matrix:\n")

print(A)

# Creating another 1x3 matrix

B = matrix(

c(10, 11, 12),

nrow = 1,

ncol = 3

cat("The 1x3 matrix:\n")

print(B)

# Add a new row using rbind()

C = rbind(A, B)

cat("After concatenation of a row:\n")

print(C)

Output

The 3x3 matrix:

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6
[3,] 7 8 9

The 1x3 matrix:

[,1] [,2] [,3]

[1,] 10 11 12

After concatenation of a row:

[,1] [,2] [,3...

Concatenation of a column:

The concatenation of a column to a matrix is done using cbind().

# R program to illustrate

# concatenation of a column in metrics

# Create a 3x3 matrix

A = matrix(

c(1, 2, 3, 4, 5, 6, 7, 8, 9),

nrow = 3,

ncol = 3,

byrow = TRUE

cat("The 3x3 matrix:\n")

print(A)

# Creating another 3x1 matrix

B = matrix(

c(10, 11, 12),

nrow = 3,

ncol = 1,

byrow = TRUE

cat("The 3x1 matrix:\n")

print(B)
# Add a new column using cbind()

C = cbind(A, B)

cat("After concatenation of a column:\n")

print(C)

Output

The 3x3 matrix:

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6

[3,] 7 8 9

The 3x1 matrix:

[,1]

[1,] 10

[2,] 11

[3,] 12

After concatenation of a column:

[,1] [,2] ...

Dimension inconsistency: Note that you have to make sure the consistency of dimensions between
the matrix before you do this matrix concatenation.

# R program to illustrate

# Dimension inconsistency in metrics concatenation

# Create a 3x3 matrix

A = matrix(

c(1, 2, 3, 4, 5, 6, 7, 8, 9),

nrow = 3,

ncol = 3,

byrow = TRUE
)

cat("The 3x3 matrix:\n")

print(A)

# Creating another 1x3 matrix

B = matrix(

c(10, 11, 12),

nrow = 1,

ncol = 3,

cat("The 1x3 matrix:\n")

print(B)

# This will give an error

# because of dimension inconsistency

C = cbind(A, B)

cat("After concatenation of a column:\n")

print(C)

Output:

The 3x3 matrix:


[, 1] [, 2] [, 3]
[1, ] 1 2 3
[2, ] 4 5 6
[3, ] 7 8 9
The 1x3 matrix:
[, 1] [, 2] [, 3]
[1, ] 10 11 12
Error in cbind(A, B) : number of rows of matrices must match (see arg 2)

Adding Rows and Columns in a R-Matrix

To add a row in R-matrix you can use rbind() function and to add a column to R-matrix you can
use cbind() function.

Adding Row
Let’s see below example on how to add row in R-matrix?

Example:

# Create a 3x3 matrix

number <- matrix(

c(1, 2, 3, 4, 5, 6, 7, 8, 9),

nrow = 3,

ncol = 3,

byrow = TRUE

cat("Before inserting a new row:\n")

print(number)

# New row to be inserted

new_row <- c(10, 11, 12) # Define the new row

# Inserting the new row at the second position

A <- rbind(number[1, ], new_row, number[-1, ])

cat("\nAfter inserting a new row:\n")

print(number)

Output

Before inserting a new row:

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6

[3,] 7 8 9

After inserting a new row:

[,1] [,2] [,3]


[1,] 1 2 3

[2,] 4 5 6

[3,]...

Adding Column

Let’s see below example on how to add column in R-matrix?

# Create a 3x3 matrix

number <- matrix(

c(1, 2, 3, 4, 5, 6, 7, 8, 9),

nrow = 3,

ncol = 3,

byrow = TRUE

cat("Before adding a new column:\n")

print(number)

# New column to be added

new_column <- c(10, 11, 12) # Define the new column

# Adding the new column at the end

number <- cbind(number, new_column)

cat("\nAfter adding a new column:\n")

print(number)

Output

Before adding a new column:

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6

[3,] 7 8 9
After adding a new column:

new_column

[1,] 1 2 3 10

[2,] 4 5 6 1...

Deleting Rows and Columns of a R-Matrix

To delete a row or a column, first of all, you need to access that row or column and then insert a
negative sign before that row or column. It indicates that you had to delete that row or column.

Row deletion:

# R program to illustrate

# row deletion in metrics

# Create a 3x3 matrix

A = matrix(

c(1, 2, 3, 4, 5, 6, 7, 8, 9),

nrow = 3,

ncol = 3,

byrow = TRUE

cat("Before deleting the 2nd row\n")

print(A)

# 2nd-row deletion

A = A[-2, ]

cat("After deleted the 2nd row\n")

print(A)

Output

Before deleting the 2nd row

[,1] [,2] [,3]


[1,] 1 2 3

[2,] 4 5 6

[3,] 7 8 9

After deleted the 2nd row

[,1] [,2] [,3]

[1,] 1 2 3

[2,] 7 8 9

Column deletion:

# R program to illustrate

# column deletion in metrics

# Create a 3x3 matrix

A = matrix(

c(1, 2, 3, 4, 5, 6, 7, 8, 9),

nrow = 3,

ncol = 3,

byrow = TRUE

cat("Before deleting the 2nd column\n")

print(A)

# 2nd-row deletion

A = A[, -2]

cat("After deleted the 2nd column\n")

print(A)

Output

Before deleting the 2nd column


[,1] [,2] [,3]

[1,] 1 2 3

[2,] 4 5 6

[3,] 7 8 9

After deleted the 2nd column

[,1] [,2]

[1,] 1 3

[2,] 4 6

[3,] 7 9

R – Array

Last Updated : 08 Aug, 2024

Arrays are essential data storage structures defined by a fixed number of dimensions. Arrays are used
for the allocation of space at contiguous memory locations.

In R Programming Language Uni-dimensional arrays are called vectors with the length being their
only dimension. Two-dimensional arrays are called matrices, consisting of fixed numbers of rows and
columns. R Arrays consist of all elements of the same data type. Vectors are supplied as input to the
function and then create an array based on the number of dimensions.

Creating an Array

An R array can be created with the use of array() the function. A list of elements is passed to the
array() functions along with the dimensions as required.

Syntax:

array(data, dim = (nrow, ncol, nmat), dimnames=names)

where

nrow: Number of rows

ncol : Number of columns

nmat: Number of matrices of dimensions nrow * ncol

dimnames : Default value = NULL.

Otherwise, a list has to be specified which has a name for each component of the dimension. Each
component is either a null or a vector of length equal to the dim value of that corresponding
dimension.
Uni-Dimensional Array

A vector is a uni-dimensional array, which is specified by a single dimension, length. A Vector can be
created using ‘c()‘ function. A list of values is passed to the c() function to create a vector.

vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)

print (vec1)

# cat is used to concatenate

# strings and print it.

cat ("Length of vector : ", length(vec1))

Output:

[1] 1 2 3 4 5 6 7 8 9
Length of vector : 9

Multi-Dimensional Array

A two-dimensional matrix is an array specified by a fixed number of rows and columns, each
containing the same data type. A matrix is created by using array() function to which the values and
the dimensions are passed.

# arranges data from 2 to 13

# in two matrices of dimensions 2x3

arr = array(2:13, dim = c(2, 3, 2))

print(arr)

Output:

,,1
[,1] [,2] [,3]
[1,] 2 4 6
[2,] 3 5 7
,,2
[,1] [,2] [,3]
[1,] 8 10 12
[2,] 9 11 13

Vectors of different lengths can also be fed as input into the array() function. However, the total
number of elements in all the vectors combined should be equal to the number of elements in the
matrices. The elements are arranged in the order in which they are specified in the function.

R
vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)

vec2 <- c(10, 11, 12)

# elements are combined into a single vector,

# vec1 elements followed by vec2 elements.

arr = array(c(vec1, vec2), dim = c(2, 3, 2))

print (arr)

Output:

,,1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
,,2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12

Dimension of the Array

We will use dim function to find out the dimension of the R array.

# for multi dimension array

arr = array(2:13, dim = c(2, 3, 2))

dim(arr)

Output:

[1] 2 3 2

This specifies the dimensions of the R array. In this case, we are creating a 3D array with dimensions
2x3x2. The first dimension has size 2, the second dimension has size 3, and the third dimension has
size 2.

Naming of Arrays

The row names, column names and matrices names are specified as a vector of the number of rows,
number of columns and number of matrices respectively. By default, the rows, columns and matrices
are named by their index values.

row_names <- c("row1", "row2")

col_names <- c("col1", "col2", "col3")


mat_names <- c("Mat1", "Mat2")

# the naming of the various elements

# is specified in a list and

# fed to the function

arr = array(2:14, dim = c(2, 3, 2),

dimnames = list(row_names,

col_names, mat_names))

print (arr)

Output:

, , Mat1
col1 col2 col3
row1 2 4 6
row2 3 5 7
, , Mat2
col1 col2 col3
row1 8 10 12
row2 9 11 13

Accessing arrays

The R arrays can be accessed by using indices for different dimensions separated by commas.
Different components can be specified by any combination of elements’ names or positions.

Accessing Uni-Dimensional Array

The elements can be accessed by using indexes of the corresponding elements.

vec <- c(1:10)

# accessing entire vector

cat ("Vector is : ", vec)

# accessing elements

cat ("Third element of vector is : ", vec[3])

Output:

Vector is : 1 2 3 4 5 6 7 8 9 10
Third element of vector is : 3
Accessing entire matrices

vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)

vec2 <- c(10, 11, 12)

row_names <- c("row1", "row2")

col_names <- c("col1", "col2", "col3")

mat_names <- c("Mat1", "Mat2")

arr = array(c(vec1, vec2), dim = c(2, 3, 2),

dimnames = list(row_names,

col_names, mat_names))

arr

# accessing matrix 1 by index value

print ("Matrix 1")

print (arr[,,1])

# accessing matrix 2 by its name

print ("Matrix 2")

print(arr[,,"Mat2"])

Output:

, , Mat1
col1 col2 col3
row1 1 3 5
row2 2 4 6
, , Mat2
col1 col2 col3
row1 7 9 11
row2 8 10 12
accessing matrix 1 by index value
[1] "Matrix 1"
col1 col2 col3
row1 1 3 5
row2 2 4 6
accessing matrix 2 by its name
[1] "Matrix 2"
col1 col2 col3
row1 7 9 11
row2 8 10 12

Accessing specific rows and columns of matrices

Rows and columns can also be accessed by both names as well as indices.

vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)

vec2 <- c(10, 11, 12)

row_names <- c("row1", "row2")

col_names <- c("col1", "col2", "col3")

mat_names <- c("Mat1", "Mat2")

arr = array(c(vec1, vec2), dim = c(2, 3, 2),

dimnames = list(row_names,

col_names, mat_names))

arr

# accessing matrix 1 by index value

print ("1st column of matrix 1")

print (arr[, 1, 1])

# accessing matrix 2 by its name

print ("2nd row of matrix 2")

print(arr["row2",,"Mat2"])

Output:

, , Mat1
col1 col2 col3
row1 1 3 5
row2 2 4 6
, , Mat2
col1 col2 col3
row1 7 9 11
row2 8 10 12
accessing matrix 1 by index value
[1] "1st column of matrix 1"
row1 row2
1 2
accessing matrix 2 by its name
[1] "2nd row of matrix 2"
col1 col2 col3
8 10 12

Accessing elements individually

Elements can be accessed by using both the row and column numbers or names.

vec1 <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)

vec2 <- c(10, 11, 12)

row_names <- c("row1", "row2")

col_names <- c("col1", "col2", "col3")

mat_names <- c("Mat1", "Mat2")

arr = array(c(vec1, vec2), dim = c(2, 3, 2),

dimnames = list(row_names, col_names, mat_names))

# accessing matrix 1 by index value

print ("2nd row 3rd column matrix 1 element")

print (arr[2, "col3", 1])

# accessing matrix 2 by its name

print ("2nd row 1st column element of matrix 2")

print(arr["row2", "col1", "Mat2"])

Output:

[1] "2nd row 3rd column matrix 1 element"


[1] 6
[1] "2nd row 1st column element of matrix 2"
[1] 8

Accessing subset of array elements

A smaller subset of the array elements can be accessed by defining a range of row or column limits.

row_names <- c("row1", "row2")

col_names <- c("col1", "col2", "col3", "col4")

mat_names <- c("Mat1", "Mat2")

arr = array(1:15, dim = c(2, 4, 2),


dimnames = list(row_names, col_names, mat_names))

arr

# print elements of both the rows and columns 2 and 3 of matrix 1

print (arr[, c(2, 3), 1])

Output:

, , Mat1
col1 col2 col3 col4
row1 1 3 5 7
row2 2 4 6 8
, , Mat2
col1 col2 col3 col4
row1 9 11 13 15
row2 10 12 14 1
print elements of both the rows and columns 2 and 3 of matrix 1
col2 col3
row1 3 5
row2 4 6

Adding elements to array

Elements can be appended at the different positions in the array. The sequence of elements is
retained in order of their addition to the array. The time complexity required to add new elements is
O(n) where n is the length of the array. The length of the array increases by the number of element
additions. There are various in-built functions available in R to add new values:

 c(vector, values): c() function allows us to append values to the end of the array. Multiple
values can also be added together.

 append(vector, values): This method allows the values to be appended at any position in the
vector. By default, this function adds the element at end. append(vector, values,
after=length(vector)) adds new values after specified length of the array specified in the last
argument of the function.

 Using the length function of the array: Elements can be added at length+x indices where
x>0.

# creating a uni-dimensional array

x <- c(1, 2, 3, 4, 5)

# addition of element using c() function

x <- c(x, 6)
print ("Array after 1st modification ")

print (x)

# addition of element using append function

x <- append(x, 7)

print ("Array after 2nd modification ")

print (x)

# adding elements after computing the length

len <- length(x)

x[len + 1] <- 8

print ("Array after 3rd modification ")

print (x)

# adding on length + 3 index

x[len + 3]<-9

print ("Array after 4th modification ")

print (x)

# append a vector of values to the array after length + 3 of array

print ("Array after 5th modification")

x <- append(x, c(10, 11, 12), after = length(x)+3)

print (x)

# adds new elements after 3rd index

print ("Array after 6th modification")

x <- append(x, c(-1, -1), after = 3)

print (x)

Output:

[1] "Array after 1st modification "


[1] 1 2 3 4 5 6
[1] "Array after 2nd modification "
[1] 1 2 3 4 5 6 7
[1] "Array after 3rd modification "
[1] 1 2 3 4 5 6 7 8
[1] "Array after 4th modification "
[1] 1 2 3 4 5 6 7 8 NA 9
[1] "Array after 5th modification"
[1] 1 2 3 4 5 6 7 8 NA 9 10 11 12
[1] "Array after 6th modification"
[1] 1 2 3 -1 -1 4 5 6 7 8 NA 9 10 11 12

The original length of the array was 7, and after third modification elements are present till the 8th
index value. Now, at the fourth modification, when we add element 9 at the tenth index value, the
R’s inbuilt function automatically adds NA at the missing value positions.
At 5th modification, the array of elements [10, 11, 12] are added beginning from the 11th index.
At 6th modification, array [-1, -1] is appended after the third position in the array.

Removing Elements from Array

Elements can be removed from arrays in R, either one at a time or multiple together. These elements
are specified as indexes to the array, wherein the array values satisfying the conditions are retained
and rest removed. The comparison for removal is based on array values. Multiple conditions can also
be combined together to remove a range of elements. Another way to remove elements is by
using %in% operator wherein the set of element values belonging to the TRUE values of the operator
are displayed as result and the rest are removed.

# Creating an array of length 9

m <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)

print("Original Array")

print(m)

# Remove a single value element (3) from the array

m <- m[m != 3]

print("After 1st modification")

print(m)

# Removing elements based on a condition (greater than 2 and less than or equal to 8)

m <- m[m > 2 & m <= 8]

print("After 2nd modification")

print(m)
# Remove a sequence of elements using another array

remove <- c(4, 6, 8)

# Check which elements satisfy the remove property

print(m %in% remove)

print("After 3rd modification")

print(m[!m %in% remove])

Output:

[1] "Original Array"


[1] 1 2 3 4 5 6 7 8 9

[1] "After 1st modification"

[1] 1 2 4 5 6 7 8 9

[1] "After 2nd modification"

[1] 4 5 6 7 8

[1] TRUE FALSE TRUE FALSE TRUE

[1] "After 3rd modification"

[1] 5 7

At 1st modification, all the element values that are not equal to 3 are retained. At 2nd modification,
the range of elements that are between 2 and 8 are retained, rest are removed. At 3rd modification,
the elements satisfying the FALSE value are printed, since the condition involves the NOT operator.

Updating Existing Elements of Array

The elements of the array can be updated with new values by assignment of the desired index of the
array with the modified value. The changes are retained in the original array. If the index value to be
updated is within the length of the array, then the value is changed, otherwise, the new element is
added at the specified index. Multiple elements can also be updated at once, either with the same
element value or multiple values in case the new values are specified as a vector.

# creating an array of length 9

m <- c(1, 2, 3, 4, 5, 6, 7, 8, 9)

print ("Original Array")

print (m)
# updating single element

m[1] <- 0

print ("After 1st modification")

print (m)

# updating sequence of elements

m[7:9] <- -1

print ("After 2nd modification")

print (m)

# updating two indices with two different values

m[c(2, 5)] <- c(-1, -2)

print ("After 3rd modification")

print (m)

# this add new element to the array

m[10] <- 10

print ("After 4th modification")

print (m)

Output:

[1] "Original Array"


[1] 1 2 3 4 5 6 7 8 9
[1] "After 1st modification"
[1] 0 2 3 4 5 6 7 8 9
[1] "After 2nd modification"
[1] 0 2 3 4 5 6 -1 -1 -1
[1] "After 3rd modification"
[1] 0 -1 3 4 -2 6 -1 -1 -1
[1] "After 4th modification"
[1] 0 -1 3 4 -2 6 -1 -1 -1 10

At 2nd modification, the elements at indexes 7 to 9 are updated with -1 each. At 3rd modification,
the second element is replaced by -1 and fifth element by -2 respectively. At 4th modification, a new
element is added since 10th index is greater than the length of the array.

R – Data Frames
Last Updated : 10 Dec, 2024

R Programming Language is an open-source programming language that is widely used as a statistical


software and data analysis tool. Data Frames in R Language are generic data objects of R that are
used to store tabular data.

Data frames can also be interpreted as matrices where each column of a matrix can be of different
data types. R DataFrame is made up of three principal components, the data, rows, and columns.

R Data Frames Structure

As you can see in the image below, this is how a data frame is structured.

The data is presented in tabular form, which makes it easier to operate and understand.

R – Data Frames

Create Dataframe in R Programming Language

To create an R data frame use data.frame() function and then pass each of the vectors you have
created as arguments to the function.

# R program to create dataframe

# creating a data frame


friend.data <- data.frame(

friend_id = c(1:5),

friend_name = c("Sachin", "Sourav",

"Dravid", "Sehwag",

"Dhoni"),

stringsAsFactors = FALSE

# print the data frame

print(friend.data)

Output:

friend_id friend_name
1 1 Sachin
2 2 Sourav
3 3 Dravid
4 4 Sehwag
5 5 Dhoni

Get the Structure of the R Data Frame

One can get the structure of the R data frame using str() function in R.

It can display even the internal structure of large lists which are nested. It provides one-liner output
for the basic R objects letting the user know about the object and its constituents.

# R program to get the

# structure of the data frame

# creating a data frame

friend.data <- data.frame(

friend_id = c(1:5),

friend_name = c("Sachin", "Sourav",

"Dravid", "Sehwag",

"Dhoni"),

stringsAsFactors = FALSE

# using str()
print(str(friend.data))

Output:

'data.frame': 5 obs. of 2 variables:


$ friend_id : int 1 2 3 4 5
$ friend_name: chr "Sachin" "Sourav" "Dravid" "Sehwag" ...
NULL

Summary of Data in the R data frame

In the R data frame, the statistical summary and nature of the data can be obtained by
applying summary() function.

It is a generic function used to produce result summaries of the results of various model fitting
functions. The function invokes particular methods which depend on the class of the first argument.

# R program to get the

# summary of the data frame

# creating a data frame

friend.data <- data.frame(

friend_id = c(1:5),

friend_name = c("Sachin", "Sourav",

"Dravid", "Sehwag",

"Dhoni"),

stringsAsFactors = FALSE

# using summary()

print(summary(friend.data))

Output:

friend_id friend_name
Min. :1 Length:5
1st Qu.:2 Class :character
Median :3 Mode :character
Mean :3
3rd Qu.:4
Max. :5

Extract Data from Data Frame in R


Extracting data from an R data frame means that to access its rows or columns. One can extract a
specific column from an R data frame using its column name.

# R program to extract

# data from the data frame

# creating a data frame

friend.data <- data.frame(

friend_id = c(1:5),

friend_name = c("Sachin", "Sourav",

"Dravid", "Sehwag",

"Dhoni"),

stringsAsFactors = FALSE

# Extracting friend_name column

result <- data.frame(friend.data$friend_name)

print(result)

Output:

friend.data.friend_name
1 Sachin
2 Sourav
3 Dravid
4 Sehwag
5 Dhoni

Expand Data Frame in R Language

A data frame in R can be expanded by adding new columns and rows to the already existing R data
frame.

# R program to expand

# the data frame

# creating a data frame


friend.data <- data.frame(

friend_id = c(1:5),

friend_name = c("Sachin", "Sourav",

"Dravid", "Sehwag",

"Dhoni"),

stringsAsFactors = FALSE

# Expanding data frame

friend.data$location <- c("Kolkata", "Delhi",

"Bangalore", "Hyderabad",

"Chennai")

resultant <- friend.data

# print the modified data frame

print(resultant)

Output:

friend_id friend_name location


1 1 Sachin Kolkata
2 2 Sourav Delhi
3 3 Dravid Bangalore
4 4 Sehwag Hyderabad
5 5 Dhoni Chennai

In R, one can perform various types of operations on a data frame like accessing rows and columns,
selecting the subset of the data frame, editing data frames, delete rows and columns in a data
frame, etc.

Please refer to DataFrame Operations in R to know about all types of operations that can be
performed on a data frame.

Access Items in R Data Frame

We can select and access any element from data frame by using single $ ,brackets [ ] or double
brackets [[]] to access columns from a data frame.

# creating a data frame

friend.data <- data.frame(

friend_id = c(1:5),
friend_name = c("Sachin", "Sourav",

"Dravid", "Sehwag",

"Dhoni"),

stringsAsFactors = FALSE

# Access Items using []

friend.data[1]

# Access Items using [[]]

friend.data[['friend_name']]

# Access Items using $

friend.data$friend_id

Output:

friend_id
1 1
2 2
3 3
4 4
5 5
Access Items using [[]]
[1] "Sachin" "Sourav" "Dravid" "Sehwag" "Dhoni"
Access Items using $
[1] 1 2 3 4 5

Amount of Rows and Columns

We can find out how many rows and columns present in our dataframe by using dim function.

# creating a data frame

friend.data <- data.frame(

friend_id = c(1:5),

friend_name = c("Sachin", "Sourav",

"Dravid", "Sehwag",

"Dhoni"),
stringsAsFactors = FALSE

# find out the number of rows and clumns

dim(friend.data)

Output:

[1] 5 2

Add Rows and Columns in R Data Frame

You can easily add rows and columns in a R DataFrame. Insertion helps in expanding the already
existing DataFrame, without needing a new one.

Let’s look at how to add rows and columns in a DataFrame ? with an example:

Add Rows in R Data Frame

To add rows in a Data Frame, you can use a built-in function rbind().

Following example demonstrate the working of rbind() in R Data Frame.

# Creating a dataframe representing products in a store

Products <- data.frame(

Product_ID = c(101, 102, 103),

Product_Name = c("T-Shirt", "Jeans", "Shoes"),

Price = c(15.99, 29.99, 49.99),

Stock = c(50, 30, 25)

# Print the existing dataframe

cat("Existing dataframe (Products):\n")

print(Products)

# Adding a new row for a new product

New_Product <- c(104, "Sunglasses", 39.99, 40)

Products <- rbind(Products, New_Product)


# Print the updated dataframe after adding the new product

cat("\nUpdated dataframe after adding a new product:\n")

print(Products)

Output:

Existing dataframe (Products):

Product_ID Product_Name Price Stock


1 101 T-Shirt 15.99 50
2 102 Jeans 29.99 30
3 103 Shoes 49.99 25

Updated dataframe after adding a new product:

Product_ID Product_Name Price Stock


1 101 T-Shirt 15.99 50
2 102 Jeans 29.99 30
3 103 Shoes 49.99 25
4 104 Sunglasses 39.99 40

Add Columns in R Data Frame

To add columns in a Data Frame, you can use a built-in function cbind().

Following example demonstrate the working of cbind() in R Data Frame.

# Existing dataframe representing products in a store

Products <- data.frame(

Product_ID = c(101, 102, 103),

Product_Name = c("T-Shirt", "Jeans", "Shoes"),

Price = c(15.99, 29.99, 49.99),

Stock = c(50, 30, 25)

# Print the existing dataframe

cat("Existing dataframe (Products):\n")

print(Products)

# Adding a new column for 'Discount' to the dataframe

Discount <- c(5, 10, 8) # New column values for discount


Products <- cbind(Products, Discount)

# Rename the added column

colnames(Products)[ncol(Products)] <- "Discount" # Renaming the last column

# Print the updated dataframe after adding the new column

cat("\nUpdated dataframe after adding a new column 'Discount':\n")

print(Products)

Output:

Existing dataframe (Products):

Product_ID Product_Name Price Stock


1 101 T-Shirt 15.99 50
2 102 Jeans 29.99 30
3 103 Shoes 49.99 25

Updated dataframe after adding a new column 'Discount':

Product_ID Product_Name Price Stock Discount


1 101 T-Shirt 15.99 50 5
2 102 Jeans 29.99 30 10
3 103 Shoes 49.99 25 8

Remove Rows and Columns

A data frame in R removes columns and rows from the already existing R data frame.

Remove Row in R DataFrame

library(dplyr)

# Create a data frame

data <- data.frame(

friend_id = c(1, 2, 3, 4, 5),

friend_name = c("Sachin", "Sourav", "Dravid", "Sehwag", "Dhoni"),

location = c("Kolkata", "Delhi", "Bangalore", "Hyderabad", "Chennai")

data
# Remove a row with friend_id = 3

data <- subset(data, friend_id != 3)

data

Output:

friend_id friend_name location


1 1 Sachin Kolkata
2 2 Sourav Delhi
3 3 Dravid Bangalore
4 4 Sehwag Hyderabad
5 5 Dhoni Chennai

# Remove a row with friend_id = 3

friend_id friend_name location


1 1 Sachin Kolkata
2 2 Sourav Delhi
4 4 Sehwag Hyderabad
5 5 Dhoni Chennai

In the above code, we first created a data frame called data with three
columns: friend_id, friend_name, and location. To remove a row with friend_id equal to 3, we used
the subset() function and specified the condition friend_id != 3. This removed the row
with friend_id equal to 3.

Remove Column in R DataFrame

library(dplyr)

# Create a data frame

data <- data.frame(

friend_id = c(1, 2, 3, 4, 5),

friend_name = c("Sachin", "Sourav", "Dravid", "Sehwag", "Dhoni"),

location = c("Kolkata", "Delhi", "Bangalore", "Hyderabad", "Chennai")

data

# Remove the 'location' column

data <- select(data, -location)


data

Output:

friend_id friend_name location


1 1 Sachin Kolkata
2 2 Sourav Delhi
3 3 Dravid Bangalore
4 4 Sehwag Hyderabad
5 5 Dhoni Chennai
>
Remove the 'location' column

friend_id friend_name
1 1 Sachin
2 2 Sourav
3 3 Dravid
4 4 Sehwag
5 5 Dhoni

To remove the location column, we used the select() function and specified -location. The – sign
indicates that we want to remove the location column. The resulting data frame data will have only
two columns: friend_id and friend_name.

Combining Data Frames in R

There are 2 way to combine data frames in R. You can either combine them vertically or horizontally.

Let’s look at both cases with example:

Combine R Data Frame Vertically

If you want to combine 2 data frames vertically, you can use rbind() function. This function works for
combination of two or more data frames.

# Creating two sample dataframes

df1 <- data.frame(

Name = c("Alice", "Bob"),

Age = c(25, 30),

Score = c(80, 75)

df2 <- data.frame(

Name = c("Charlie", "David"),


Age = c(28, 35),

Score = c(90, 85)

# Print the existing dataframes

cat("Dataframe 1:\n")

print(df1)

cat("\nDataframe 2:\n")

print(df2)

# Combining the dataframes using rbind()

combined_df <- rbind(df1, df2)

# Print the combined dataframe

cat("\nCombined Dataframe:\n")

print(combined_df)

Output:

Dataframe 1:

Name Age Score


1 Alice 25 80
2 Bob 30 75

Dataframe 2:

Name Age Score


1 Charlie 28 90
2 David 35 85

Combined Dataframe:

Name Age Score


1 Alice 25 80
2 Bob 30 75
3 Charlie 28 90
4 David 35 85
Combine R Data Frame Horizontally:

If you want to combine 2 data frames horizontally, you can use cbind() function. This function works
for combination of two or more data frames.

# Creating two sample dataframes

df1 <- data.frame(

Name = c("Alice", "Bob"),

Age = c(25, 30),

Score = c(80, 75)

df2 <- data.frame(

Height = c(160, 175),

Weight = c(55, 70)

# Print the existing dataframes

cat("Dataframe 1:\n")

print(df1)

cat("\nDataframe 2:\n")

print(df2)

# Combining the dataframes using cbind()

combined_df <- cbind(df1, df2)

# Print the combined dataframe

cat("\nCombined Dataframe:\n")

print(combined_df)

Output:

Dataframe 1:
Name Age Score
1 Alice 25 80
2 Bob 30 75

Dataframe 2:

Height Weight
1 160 55
2 175 70

Combined Dataframe:

Name Age Score Height Weight


1 Alice 25 80 160 55
2 Bob 30 75 175 70

R Factors

Last Updated : 10 May, 2023

Factors in R Programming Language are data structures that are implemented to categorize the data
or represent categorical data and store it on multiple levels.

They can be stored as integers with a corresponding label to every unique integer. The R factors may
look similar to character vectors, they are integers and care must be taken while using them as
strings. The R factor accepts only a restricted number of distinct values. For example, a data field
such as gender may contain values only from female, male, or transgender.

In the above example, all the possible cases are known beforehand and are predefined. These
distinct values are known as levels. After a factor is created it only consists of levels that are by
default sorted alphabetically.

Attributes of Factors in R Language

 x: It is the vector that needs to be converted into a factor.

 Levels: It is a set of distinct values which are given to the input vector x.

 Labels: It is a character vector corresponding to the number of labels.

 Exclude: This will mention all the values you want to exclude.

 Ordered: This logical attribute decides whether the levels are ordered.

 nmax: It will decide the upper limit for the maximum number of levels.
Creating a Factor in R Programming Language

The command used to create or modify a factor in R language is – factor() with a vector as input.
The two steps to creating an R factor :

 Creating a vector

 Converting the vector created into a factor using function factor()

Examples: Let us create a factor gender with levels female, male and transgender.

 R

# Creating a vector

x <-c("female", "male", "male", "female")

print(x)

# Converting the vector x into a factor

# named gender

gender <-factor(x)
print(gender)

Output

[1] "female" "male" "male" "female"

[1] female male male female

Levels: female male

Levels can also be predefined by the programmer.

 R

# Creating a factor with levels defined by programmer

gender <- factor(c("female", "male", "male", "female"),

levels = c("female", "transgender", "male"));

gender

Output

[1] female male male female

Levels: female transgender male

Further one can check the levels of a factor by using function levels().

Checking for a Factor in R

The function is.factor() is used to check whether the variable is a factor and returns “TRUE” if it is a
factor.

 R

gender <- factor(c("female", "male", "male", "female"));

print(is.factor(gender))

Output

[1] TRUE

Function class() is also used to check whether the variable is a factor and if true returns “factor”.

 R

gender <- factor(c("female", "male", "male", "female"));

class(gender)
Output

[1] "factor"

Accessing elements of a Factor in R

Like we access elements of a vector, the same way we access the elements of a factor. If gender is a
factor then gender[i] would mean accessing an ith element in the factor.

Example

 R

gender <- factor(c("female", "male", "male", "female"));

gender[3]

Output

[1] male

Levels: female male

More than one element can be accessed at a time.

Example

 R

gender <- factor(c("female", "male", "male", "female"));

gender[c(2, 4)]

Output

[1] male female

Levels: female male

Subtract one element at a time.

Example

 R

gender <- factor(c("female", "male", "male", "female" ));

gender[-3]

Output

[1] female male female

Levels: female male


 First, we create a factor vector gender with four elements: “female”, “male”, “male”, and
“female”.

 Then, we use the square brackets [-3] to subset the vector and remove the third element,
which is “male”.

 The output is the remaining elements of the gender vector, which are “female”, “male”, and
“female”. The output also shows the levels of the factor, which are “female” and “male”.

Modification of a Factor in R

After a factor is formed, its components can be modified but the new values which need to be
assigned must be at the predefined level.

Example

 R

gender <- factor(c("female", "male", "male", "female" ));

gender[2]<-"female"

gender

Output

[1] female female male female

Levels: female male

For selecting all the elements of the factor gender except ith element, gender[-i] should be used. So
if you want to modify a factor and add value out of predefined levels, then first modify levels.

Example

 R

gender <- factor(c("female", "male", "male", "female" ));

# add new level

levels(gender) <- c(levels(gender), "other")

gender[3] <- "other"

gender

Output

[1] female male other female

Levels: female male other

Factors in Data Frame


The Data frame is similar to a 2D array with the columns containing all the values of one variable and
the rows having one set of values from every column. There are four things to remember about data
frames:

 column names are compulsory and cannot be empty.

 Unique names should be assigned to each row.

 The data frame’s data can be only of three types- factor, numeric, and character type.

 The same number of data items must be present in each column.

In R language when we create a data frame, its column is categorical data, and hence a R factor is
automatically created on it.
We can create a data frame and check if its column is a factor.

Example

 R

age <- c(40, 49, 48, 40, 67, 52, 53)

salary <- c(103200, 106200, 150200,

10606, 10390, 14070, 10220)

gender <- c("male", "male", "transgender",

"female", "male", "female", "transgender")

employee<- data.frame(age, salary, gender)

print(employee)

print(is.factor(employee$gender))

Output

age salary gender

1 40 103200 male

2 49 106200 male

3 48 150200 transgender

4 40 10606 female

5 67 10390 male

6 52 14070 female

7 53 10220 transgender

[1] TRUE

R – Lists
Last Updated : 11 Mar, 2024

A list in R programming is a generic object consisting of an ordered collection of objects. Lists


are one-dimensional, heterogeneous data structures.

The list can be a list of vectors, a list of matrices, a list of characters, a list of functions, and so on.

A list is a vector but with heterogeneous data elements. A list in R is created with the use of the list()
function.

R allows accessing elements of an R list with the use of the index value. In R, the indexing of a list
starts with 1 instead of 0.

Creating a List

To create a List in R you need to use the function called “list()“.

In other words, a list is a generic vector containing other objects. To illustrate how a list looks, we
take an example here. We want to build a list of employees with the details. So for this, we want
attributes such as ID, employee name, and the number of employees.

Example:

 R

# R program to create a List

# The first attributes is a numeric vector

# containing the employee IDs which is created

# using the command here

empId = c(1, 2, 3, 4)

# The second attribute is the employee name

# which is created using this line of code here

# which is the character vector

empName = c("Debi", "Sandeep", "Subham", "Shiba")

# The third attribute is the number of employees


# which is a single numeric variable.

numberOfEmp = 4

# We can combine all these three different

# data types into a list

# containing the details of employees

# which can be done using a list command

empList = list(empId, empName, numberOfEmp)

print(empList)

Output

[[1]]

[1] 1 2 3 4

[[2]]

[1] "Debi" "Sandeep" "Subham" "Shiba"

[[3]]

[1] 4

Naming List Components

Naming list components make it easier to access them.

Example:

 R

# Creating a named list

my_named_list <- list(name = "Sudheer", age = 25, city = "Delhi")


# Printing the named list

print(my_named_list)

Output

$name

[1] "Sudheer"

$age

[1] 25

$city

[1] "Delhi"

Accessing R List Components

We can access components of an R list in two ways.

1. Access components by names:

All the components of a list can be named and we can use those names to access the components of
the R list using the dollar command.

Example:

 R

# R program to access

# components of a list

# Creating a list by naming all its components

empId = c(1, 2, 3, 4)

empName = c("Debi", "Sandeep", "Subham", "Shiba")

numberOfEmp = 4

empList = list(
"ID" = empId,

"Names" = empName,

"Total Staff" = numberOfEmp

print(empList)

# Accessing components by names

cat("Accessing name components using $ command\n")

print(empList$Names)

Output

$ID

[1] 1 2 3 4

$Names

[1] "Debi" "Sandeep" "Subham" "Shiba"

$`Total Staff`

[1] 4

Accessing name components using $ command

[1] "Debi" "Sandeep" "Subham" "Shiba"

2. Access components by indices:

We can also access the components of the R list using indices.

To access the top-level components of a R list we have to use a double slicing operator “[[ ]]” which is
two square brackets and if we want to access the lower or inner-level components of a R list we have
to use another square bracket “[ ]” along with the double slicing operator “[[ ]]“.

Example:

 R
# R program to access

# components of a list

# Creating a list by naming all its components

empId = c(1, 2, 3, 4)

empName = c("Debi", "Sandeep", "Subham", "Shiba")

numberOfEmp = 4

empList = list(

"ID" = empId,

"Names" = empName,

"Total Staff" = numberOfEmp

print(empList)

# Accessing a top level components by indices

cat("Accessing name components using indices\n")

print(empList[[2]])

# Accessing a inner level components by indices

cat("Accessing Sandeep from name using indices\n")

print(empList[[2]][2])

# Accessing another inner level components by indices

cat("Accessing 4 from ID using indices\n")

print(empList[[1]][4])

Output

$ID

[1] 1 2 3 4

$Names
[1] "Debi" "Sandeep" "Subham" "Shiba"

$`Total Staff`

[1] 4

Accessing name components using indices

[1] "Debi" "Sandeep" "Subham" "Shiba"

Accessing Sandeep from na...

Modifying Components of a List

A R list can also be modified by accessing the components and replacing them with the ones which
you want.

Example:

 R

# R program to edit

# components of a list

# Creating a list by naming all its components

empId = c(1, 2, 3, 4)

empName = c("Debi", "Sandeep", "Subham", "Shiba")

numberOfEmp = 4

empList = list(

"ID" = empId,

"Names" = empName,

"Total Staff" = numberOfEmp

cat("Before modifying the list\n")

print(empList)

# Modifying the top-level component


empList$`Total Staff` = 5

# Modifying inner level component

empList[[1]][5] = 5

empList[[2]][5] = "Kamala"

cat("After modified the list\n")

print(empList)

Output

Before modifying the list

$ID

[1] 1 2 3 4

$Names

[1] "Debi" "Sandeep" "Subham" "Shiba"

$`Total Staff`

[1] 4

After modified the list

$ID

[1] 1 2 3 4 5

$Names

[1] "Debi" "Sandeep" "Subham" ...

Concatenation of lists

Two R lists can be concatenated using the concatenation function. So, when we want to concatenate
two lists we have to use the concatenation operator.

Syntax:
list = c(list, list1)
list = the original list
list1 = the new list

Example:

 R

# R program to edit

# components of a list

# Creating a list by naming all its components

empId = c(1, 2, 3, 4)

empName = c("Debi", "Sandeep", "Subham", "Shiba")

numberOfEmp = 4

empList = list(

"ID" = empId,

"Names" = empName,

"Total Staff" = numberOfEmp

cat("Before concatenation of the new list\n")

print(empList)

# Creating another list

empAge = c(34, 23, 18, 45)

# Concatenation of list using concatenation operator

empList = c(empName, empAge)

cat("After concatenation of the new list\n")

print(empList)

Output

Before concatenation of the new list

$ID
[1] 1 2 3 4

$Names

[1] "Debi" "Sandeep" "Subham" "Shiba"

$`Total Staff`

[1] 4

After concatenation of the new list

[1] "Debi" "Sandeep" "Subham" "S...

Adding Item to List

To add an item to the end of list, we can use append() function.

 R

# creating a list

my_numbers = c(1,5,6,3)

#adding number at the end of list

append(my_numbers, 45)

#printing list

my_numbers

Output

[1] 1 5 6 3 45

[1] 1 5 6 3

Deleting Components of a List

To delete components of a R list, first of all, we need to access those components and then insert a
negative sign before those components. It indicates that we had to delete that component.

Example:

 R
# R program to access

# components of a list

# Creating a list by naming all its components

empId = c(1, 2, 3, 4)

empName = c("Debi", "Sandeep", "Subham", "Shiba")

numberOfEmp = 4

empList = list(

"ID" = empId,

"Names" = empName,

"Total Staff" = numberOfEmp

cat("Before deletion the list is\n")

print(empList)

# Deleting a top level components

cat("After Deleting Total staff components\n")

print(empList[-3])

# Deleting a inner level components

cat("After Deleting sandeep from name\n")

print(empList[[2]][-2])

Output

Before deletion the list is

$ID

[1] 1 2 3 4

$Names

[1] "Debi" "Sandeep" "Subham" "Shiba"


$`Total Staff`

[1] 4

After Deleting Total staff components

$ID

[1] 1 2 3 4

$Names

[1] "Debi" "Sand...

Merging list

We can merge the R list by placing all the lists into a single list.

 R

# Create two lists.

lst1 <- list(1,2,3)

lst2 <- list("Sun","Mon","Tue")

# Merge the two lists.

new_list <- c(lst1,lst2)

# Print the merged list.

print(new_list)

Output:

[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[4]]
[1] "Sun"
[[5]]
[1] "Mon"
[[6]]
[1] "Tue"

Converting List to Vector

Here we are going to convert the R list to vector, for this we will create a list first and then unlist the
list into the vector.

 R

# Create lists.

lst <- list(1:5)

print(lst)

# Convert the lists to vectors.

vec <- unlist(lst)

print(vec)

Output

[[1]]

[1] 1 2 3 4 5

[1] 1 2 3 4 5

R List to matrix

We will create matrices using matrix() function in R programming. Another function that will be used
is unlist() function to convert the lists into a vector.

 R

# Defining list

lst1 <- list(list(1, 2, 3),

list(4, 5, 6))

# Print list
cat("The list is:\n")

print(lst1)

cat("Class:", class(lst1), "\n")

# Convert list to matrix

mat <- matrix(unlist(lst1), nrow = 2, byrow = TRUE)

# Print matrix

cat("\nAfter conversion to matrix:\n")

print(mat)

cat("Class:", class(mat), "\n")

Output

The list is:

[[1]]

[[1]][[1]]

[1] 1

[[1]][[2]]

[1] 2

[[1]][[3]]

[1] 3

[[2]]

[[2]][[1]]

[1] 4

[[2]][[2]]

[1] 5
[[2]][[3]]

[1] 6

Class: list

After conversion to matrix:

[,1] [,2] [,3]

[1,...

You might also like