0% found this document useful (0 votes)
58 views

Introduction To R: Pavan Kumar A

R is a statistical programming language widely used for data analysis and visualization. It provides many built-in statistical functions and graphical capabilities. R can also be extended through contributed packages that add additional functionality. Key features of R include its ability to handle data, perform statistical analyses and tests, create graphical representations of data, and be programmed for advanced applications through the use of functions, packages, and programming capabilities.

Uploaded by

naresh darapu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

Introduction To R: Pavan Kumar A

R is a statistical programming language widely used for data analysis and visualization. It provides many built-in statistical functions and graphical capabilities. R can also be extended through contributed packages that add additional functionality. Key features of R include its ability to handle data, perform statistical analyses and tests, create graphical representations of data, and be programmed for advanced applications through the use of functions, packages, and programming capabilities.

Uploaded by

naresh darapu
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Introduction To R

Pavan Kumar A
What is R?

 The R statistical programming language is a free open


source package based on the S language
( developed by Bell Labs).
 R was created by Ross Ihaka and Robert Gentleman at
the university of Auckland, New Zealand
 The language is very powerful for writing programs.

 Many statistical functions are already built in.

 Contributed packages expand the functionality to cutting


edge research.
BASIC FEATURES OF R
 R is for data analysis and data visualization tool.
 Visualization in the form of charts, plots and graphs
 It is supported with number of graphical, statistical techniques.

 There are several GUI editors of R language, out of which RGui and Rstudio
are commonly used.
 Common characteristics of R

 Effective and powerful data handling


 Arrays and Matrices related operations
 Graphical representations of the analysis
Basic Features of R – Statistical Features
 R provides various statistical and graphical techniques, such as
 Linear and non-linear modeling,
 Classical statistical tests,
 Time-series analysis,
 Classification, Clustering etc.

 R has various predefined packages. User can also install packages.

 R can generate static graphs. To generate dynamic and interactive graphics,


user has to install additional packages
BASIC FEATURES OF R - PROGRAMMING FEATURES
 R supports following
 Basic Math operations
 Vector Operations
 Matrix Operations
 Some other data structures like data frames and lists.

 It can be used with other programming languages such as Python, Perl, Ruby,
Julia and on Hadoop & Spark
Basic Features of R - Packages

 A Package is a collection of functions and datasets.


 To access the contents of package you have to first install (if it is not in-built)
and load it.
 R provides 2 types of packages

 Standard Packages (in-built) part of R source code


 Contributed Packages (user-defined)
 CRAN (Comprehensive R Archive Network) – Collection on R packages.

 These packages widely used in Finance, Genetics, HPC, Machine Learning,


Medical Imaging, Social Sciences and Spatial Statistics
BASIC FEATURES OF R – GRAPHICAL USER INTERFACE
 Some popular text editors and Integrated Development Environments (IDEs)
that support R programming are
 ConTEXT
 Eclipse
 Emacs (Emacs Speaks Statistics)
 Vim editor
 jEdit
 Rstudio
 WinEdit
GETTING STARTED
 Where to get R?
 Go to www.r-project.org
 Downloads: CRAN (The Comprehensive R Archive Network)
 Set your Mirror: Any of the mirror site can be selected.
GETTING STARTED
 Opening a script.
 This gives you a script window.
Getting Started
 Basic assignment and operations.
 Arithmetic Operations:
 +, -, *, /, ^ are the standard arithmetic operators.

 Matrix Arithmetic.
 * is element wise multiplication

 %*% is matrix multiplication

 Assignment
 To assign a value to a variable use “<-” or “=”
Getting Started
 How to use help in R?
 R has a very good built-in help system.
 If you know which function you want help with simply use ?_______ with the
function in the blank.
 Ex: ?hist.

 If you don‟t know which function to use, then use help.search(“_______”).


 Ex: help.search(“histogram”)
Packages
 Packages are collections of
 R functions,
 Data and
 compiled code in a well-defined format.
 The directory where packages are stored is called the library.

 To access and use the package, it has to be loaded first.


Packages
 R comes with a standard set of packages. Others are available for download
and installation. Once installed, they have to be loaded into the session to be
used.
 To install or add new R packages
 install.packages(“package_name”)
 To load the package
 library(package_name)
 To see default packages on R
 library()
 To see installed packages on R
 installed.packages()
 You can create your own package
CRAN
 It is A Comprehensive R Archive Network, contains many packages which can
be used in many domains like
 Genetics, Bioinformatics
 Finance
 HPC (High Performance Computing)
 Machine Learning
 Medical Imaging
 Big data
R CONSOLE
 After installing R on the Linux machine. Just type R on the command line
 After R console is opened, it shows some basic information about R, such as R
version, date of release, licensing
R CONSOLE
 In the previous figure, notice “>” symbol.
 This is called R prompt, which allows users to write commands and then press
ENTER key to execute the command.
 To get more information about the console, go to Help->Console.
DEVELOPING A SIMPLE PROGRAM
 Sample program for printing
 Here, we are using the print() function to display “Hello World” on the R
console
>print(“Hello World”)
[1] “Hello World”
Here, we are doing simple math
>2+3
[1] 5
 Code begins with „>‟ symbol and output begins with [1]
QUITTING R
 You can quit an active session of R by entering q() command
 After executing the q() command, the question dialogue box appears asking
whether to save the work space.
HANDLING BASIC EXPRESSIONS
 Anything that you type on R console, it executes immediately on pressing the
ENTER key.
 Basic Arithmetic in R
>12+45+9-7
[1] 59
R executes the expression in the following order
12+45+9=66
66-7=59
Lets look at complex mathematical operation
>18+23/2-5/4*3.5
[1] 25.125
HANDLING BASIC EXPRESSIONS
 To calculate such complex mathematical expressions, R uses BODMAS
(Brackets of Division Multiplication Addition Subtraction)

>18+22/2-4/4*3.5
[1] 25.5

>(18+22/2-4/4)*3.5
[1] 98.875
HANDLING BASIC EXPRESSIONS
 Mathematical Operators in R
 +, -, *, () - Simple Mathematical operations
 pi - Stands for Pie value
 X^Y - X raised to Y
 sqrt(x) - square root of x
 abs(x) - Absolute value of x
 factorial(x) - Factorial of x
 log(x) - logarithm of x
 cos(x), sin(x), tan(x) - Trigonometric functions
DECLARING VARIABLES IN R
 Variables are symbols that are used to contain and store the values.
 Two ways to assign the values

 Using “=” symbol


>MyVar=10
Using “<-” symbol
>MyVar<-10

Here, MyVar is a object and it is assigned with the value 10.


Any of the above mentioned can used to assign the values.
VARIABLE TYPES IN R
 Numbers
 Real numbers
 R organizes numbers in 3 formats
 Scalar : Represents a single number (0 dimensional)

 Vector : Represents row of numbers (1 dimensional)

 Matrix: Represents the table like format (2 dimensional)

 Working with Vectors


 It consists of ordered collection of numbers or strings

 Numerical Vector

 String/character vector
VARIABLE TYPES IN R
 Numeric Vector:
 Vector of numeric values.
 A scalar number is the simplest numeric vector.
 Example:
1.5
## [1] 1.5
 To store it for future use,
X<-1.5
VARIABLE TYPES IN R - VECTORS
 Constructing the numeric and character vectors in R
 The numeric() is used to create a zero vector of given length

 The c() is used to construct the vector (Integer/Character)

 c(10,20,20,30,40)
It is a Numerical/Integer vector

 c(“Hello2”, 20, “Hello4”, 30)


 It is combination of Numerical and Character vector
 c(“Hello1”, “Hello2”, “Hello3”)
 It is a Character vector
VARIABLE TYPES IN R - VECTORS
 Constructing the numeric and character vectors in R
 We can also combine mixture of single-element vectors and multi element
vectors and obtain a vector with the same elements as previously created.
 Example
c(1, 2, c(3, 4, 5))
[1] 1 2 3 4 5
VARIABLE TYPES IN R - VECTORS
Creating the vector using (:) operator

>1:15 (generates numbers form 1 to 15)


> c(1:15)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

> sum(1:15) ## it sums the numbers from 1 to 15


[1] 120
VARIABLE TYPES IN R
 Strings (characters)
 A string should be specified by using quotes. Both single and double

quotes will work


a <- "hello" ## Assigning a string to variable a
a ## Printing variable a
"hello" ## Output of variable a
b <- c("hello","there") ## Assigning two strings to variable b
b ## Printing variable b
"hello" "there" ## Output of variable b
b[1] ## Printing first element of variable b
"hello” ## Output of variable b[1]
VARIABLE TYPES IN R
 Factors
 Another important way R can store data is in the form of factors
 Example of Factorial data Yes/No, Male/Female, A/B/C/D
VARIABLE TYPES IN R
 Logical Vectors
 In contrast to numeric vectors, a logical vector stores a group of TRUE
or FALSE values.
 The simplest logical vectors are TRUE and FALSE themselves
 A more usual way to obtain a logical vector is to ask logical questions about
R objects.
 For example, we can ask R whether 1 is greater than 2:
1>2
## [1] FALSE
c(1, 2) > 2
## [1] FALSE FALSE
VARIABLE TYPES IN R
 Logical Vectors
 Examples
c(1, 2) > c(2, 1)
## [1] FALSE TRUE
Execution c(1 > 2, 2 > 1)

c(2, 3) > c(1, 2, -1, 3)


## [1] TRUE TRUE TRUE FALSE
Execution c(2 > 1, 3 > 2, 2 > -1, 3 > 3)
VARIABLE TYPES IN R
 Named Vectors
 It is a vector with names corresponding to the elements.
 We can give names to a vector when we create it
x <- c(a = 1, b = 2, c = 3)
x
## a b c
## 1 2 3
EXTRACTING AN ELEMENT
 While [ ] creates a subset of a vector, [[]] extracts an element from a vector.
 Example: A vector is like ten boxes of candy, [] gets you three boxes of candy,
but [[]] opens a box and gets you a candy from it.
 For simple vectors, using [] and [[]] to get one element will produce the same
result.
x <- c(a = 1, b = 2, c = 3)
x["a"]
## a
## 1
x[["a"]]
## [1] 1
EXTRACTING AN ELEMENT
 Example: Extract elements which are greater than certain value
input <- c(21, 44, 69, 9, 12, 16, 19, 224, 261, 300)

input > 220


[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
TRUE

input[input > 220]


[1] 224 261 300
VARIABLE TYPES IN R – DATA FRAMES
 Data Frames
 It is the collection of many vectors of different types, stores in single variable
> a<-c(1,2,3,4)
> b<-c(2,4,6,8)
> levels <- factor(c("A","B","A","B"))
> MyDataFrame<-data.frame(a, b, levels)
> MyDataFrame
a b levels
1 1 2 A
2 2 4 B
3 3 6 A
4 4 8 B
TELLING THE CLASS OF VECTORS
 Sometimes we need to tell which kind of vector we are dealing with before
taking an action.
 The class() function tells us the class of any R object:
class(c(1, 2, 3))
## [1] "numeric"
class(c(TRUE, TRUE, FALSE))
## [1] "logical"
class(c("Hello", "World"))
## [1] "character"
TELLING THE CLASS OF VECTORS
 If we need to ensure that an object is indeed a vector of a specific class, we
can use is.numeric, is.logical, is.character, and some other
functions with similar names:

is.numeric(c(1, 2, 3))
## [1] TRUE
is.numeric(c(TRUE, TRUE, FALSE))
## [1] FALSE
is.numeric(c("Hello", "World"))
## [1] FALSE
CONVERTING VECTORS strings <- c("1", "2", "3")
 Different classes of vectors class(strings)
can be coerced to a specific ## [1] "character”
--------------------------
class of vector.
strings + 10
 For example, some data are ## Error in strings + 10: non-numeric
string representation of argument to binary operator
numbers, such as 1 and 20. -------------------------
numbers <- as.numeric(strings)
 We need to convert it to
numbers
numeric representation in ## [1] 1 2 3
order to apply numeric class(numbers)
functions. ## [1] "numeric”
----------------------------
numbers + 10
## [1] 11 12 13
CONVERTING VECTORS as.numeric(c("1", "2", "3", "a"))
 Different classes of vectors ## Warning: NAs introduced by coercion
can be coerced to a specific ## [1] 1 2 3 NA
class of vector. -------------------------
 For example, some data are as.logical(c(-1, 0, 1, 2))
string representation of ## [1] TRUE FALSE TRUE TRUE
numbers, such as 1 and 20. --------------------------
as.character(c(1, 2, 3))
 We need to convert it to
## [1] "1" "2" "3"
numeric representation in
--------------------------
order to apply numeric
as.character(c(TRUE, FALSE))
functions.
## [1] "TRUE" "FALSE"
CALLING FUNCTIONS IN R
 Many predefined functions are there in R.
 To invoke, user has to type their names

 For example
> sum(10,20,30)
1] 60
> rep("Hello",3)
[1] "Hello" "Hello" "Hello“
> sqrt(100)
[1] 10
> substr("example",2,4)
[1] "xam"
CREATING AND USING OBJECTS
R uses objects to store the results of a computation
> myobj<-25+12/2-16+(7*pi/2) Assigns a mathematical
> myobj Invokes the myobj object expression to an object
called myobj
[1] 25.99557

 R is case sensitive – that is, it treats data15 and Data15 as completely


different objects.
CREATING AND USING OBJECTS
 An object can be assigned a set of numbers, as for example:
> x12 <- c(10,6,8)
> x12
[1] 10 6 8

 Operations can then be performed on the whole set of numbers.


 For example, for the object x12 created above, check the results of the
following:
> x12 * 10
[1] 100 60 80
READING DATASETS
 Using the c() command:
 c() function is used to combine or concatenate two or more values. Here
example shown is concatenating 2 numerical vectors.
 Syntax for the c() command
HANDLING DATA IN R WORKSPACE
 Handling Workspace includes following
 Using the working directory
 Inspecting the working environment
 Modifying global options
 Managing the library of packages
HANDLING DATA IN R WORKSPACE
 Handling Workspace includes following
 Using the working directory
 The directory in which R is running is called the working directory of

the R session.
 When you access other files on your hard drive, you can use either
absolute paths (for example, D:\Workspaces\test-project\data\2015.csv)
 In an R terminal, you can get the current working directory of the
running R session using getwd()
INSPECTING THE ENVIRONMENT
 In R, every expression is evaluated within a specific environment.
 An environment is a collection of symbols and their bindings.

 If you type commands in the RStudio console, your commands are evaluated
in the Global Environment.
 Example:

 If we run x <- c(1, 2, 3), the numeric vector c(1, 2, 3) is bound to symbol x in
the global environment.
 Global environment has one binding that maps x to integer vector
c(1,2,3)
HANDLING DATA IN R WORKSPACE
The ls() or objects() function is used to return the list of objects in the
workspace
> ls()
[1] "a” "b” "bubba" "fun" "levels" "msg“
[7] "myobj" "n" "x12" "yourname“
The rm() function is used to remove the variables that are not required
anymore in a session
> rm(a)
> ls()
[1] "b“ "bubba” "fun“ "levels” "msg” "myobj" "n"
[8] "x12" "yourname”
HANDLING DATA IN R WORKSPACE
getwd() function: Function used to display the current working directory of
the user
> getwd()
[1] "/home/bioinfo/pavank/rstudio-0.99.489/bin"

save() function: Function used to save the objects created in the active
session.
> save(x12, file="x12.rda")
 It will save in the current working directory with the name “x12.rda”
 You can also save entire working image save.image()
HANDLING DATA IN R WORKSPACE
load() function : Function used to retrieve the saved data
yourname<-"mary“
> ls()
[1] "b” "fun” "levels” "msg” "myobj” "n” "x12” "yourname"
> save(yourname, file="yourname.rda")
> rm(yourname)
> ls()
[1] "b” "fun” "levels" "msg” "myobj” "n” "x12“
> load("yourname.rda")
> ls()
[1] "b” "fun” "levels” “msg” "myobj” "n” "x12” "yourname”
Executing R Scripts
 Creating and Executing R script on Windows:
 Open Notepad, and write R commands
 Save it has “filename.R”
 From the Rgui, file->Open script. It opens a window for browsing the Rscript
 Click Open
EXECUTING R SCRIPTS
 Creating and Executing R script on Linux:
 R script is the series of commands written and saved in .R extension
 To run a script “/home/bioinfo/pavank/R/use1.R”

 You may either use:


 From R Shell
source("/home/bioinfo/pavank/R/use1.R")
 On the Linux Shell
R CMD BATCH /home/bioinfo/pavank/R/use1.R (OR)
Rscript use1.R
ACCESSING HELP AND DOCUMENTATION IN R
 Function used to get help pages of the in-built functions are help() and
example()
> help(ls) or ?ls()
> example(ls) – It shows the examples of ls function

 Sample datasets : R has many in-built datasets


> data()
USING BUILT-IN DATASETS IN R
 There many built-in data sets which can be viewed by data() command. The
output is shown
>data() ##Generates the list of built-in datasets
Using Built-in Datasets in R
 There is a command for viewing all the data sets that are user-built or
contributed packages.
data(package = .packages(all.available = TRUE))
data(package='boot')
THANK YOU !!!

You might also like