0% found this document useful (0 votes)
6 views

Lec 1

The document discusses features of the R programming language including that it is used for statistical computing and graphical presentation, its common uses include analyzing and visualizing data, and it provides many statistical techniques. It also discusses how to run R in interactive and batch modes and introduces some basic functions and operations in R.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Lec 1

The document discusses features of the R programming language including that it is used for statistical computing and graphical presentation, its common uses include analyzing and visualizing data, and it provides many statistical techniques. It also discusses how to run R in interactive and batch modes and introduces some basic functions and operations in R.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

CS251: STATISTICAL

FOUNDATIONS OF COMPUTER
SCIENCE
Introduction

What is R?
• R is a popular programming language used for
statistical computing and graphical presentation.
• Its most common use is to analyse and visualize
data.
• R is a scripting language (are often interpreted
rather than compiled)
• It was inspired by, and is mostly compatible with,
the statistical language S developed by AT&T.
• R is designed by Ross Ihaka and Robert Gentleman,
developed by R core team.
Why Use R?
• It is a great resource for data analysis, data
visualization, data science and machine learning.
• It provides many statistical techniques (such as
statistical tests, classification, clustering and data
reduction)
• It is easy to draw graphs in R, like pie charts,
histograms, box plot, scatter plot, etc
• It works on different platforms (Windows, Mac,
Linux)
• It is open-source and free
• It has a large community support
• It has many packages (libraries of functions) that
can be used to solve different problems.
R Features

• Programming language for graphics and statistical


computations
• Available freely under the GNU public license
• Used in data mining and statistical analysis
• Included time series analysis, linear and nonlinear
modeling among others
• Very active community and package contributions
• Very little programming language knowledge
necessary
• Can be downloaded from http://www.r-project.org/
opensource
Free tools of R
• RStudio
• StatET
• ESS (Emacs Speaks Statistics)
• R Commander
• JGR (Java GUI for R)
What is R used for?

• Statistical inference
• Data analysis
• Machine learning algorithm
How to Run R

• R operates in two modes:


interactive and batch mode.
1.Interactive Mode:- Interactive sessions prompt the user for input
as data or commands. Typically, in an interactive session there is a
software running on a computer environment and accepts input from
human. This is the simplest way to work on any system – you simply
log on and run whatever commands you need to, whether on the
command line or in a graphical environment and you log out when
you’ve finished
2. Batch mode:- Batch processing is the execution of a series of
programs or only one task on a computer environment without
manual intervention. All data and commands are preselected through
scripts or command-line parameters and therefore run to completion
without human contact. This is termed as “batch processing” or
“batch mode” because the input data are collected into batches of
files and are processed in batches by the program. In many cases
batch jobs are submitted to a job scheduler and run on the first
available compute node(s).
• We could run this code automatically, without entering R’s
interactive mode, by invoking R with an operating system shell
command (such as at the $ prompt commonly used in Linux
systems):
How to Run R

What is CRAN?
• CRAN abbreviates Comprehensive R
Archive Network will provide binary
files and follow the installation
instructions and accepting all
defaults.
• Download from http://cran.r-
project.org/ we can see the R
Console window will be in the RGUI
(graphical user interface).
Following figure is the sample R GUI.
R Studio: R Studio is an Integrated Development Environment
(IDE) for R Language with advanced and more user-friendly GUI. R
Studio allows the user to run R in a more user-friendly
environment. It is open-source (i.e., free) and available at
http://www.rstudio.com/.
The fig shows the GUI of R Studio. The R Studio screen has four
windows:
1. Console.
2. Workspace and history.
3. Files, plots, packages and help.
4. The R script(s) and data view.

The R script is where you keep a record of your work. Create a


new R script file:
1) File -> New -> R Script,
2) Click on the icon with the “+” sign and select “R Script”
3) Use shortcut as: Ctrl+Shift+N.
• Console: The console is where you can type commands and see
output.
• Workspace tab: The workspace tab shows all the active objects.
The workspace tab stores any object, value, function or anything
you create during your R session.
• History tab: The history tab shows a list of commands used so
far. The history tab keeps a record of all previous commands. It
helps when testing and running processes. Here you can either
save the whole list or you can select the commands you want and
send them to an R script to keep track of your work.
• Files Tab: The files tab shows all the files and folders in your
default workspace as if you were on a PC/Mac window. The plots
tab will show all your graphs. The packages tab will list a series of
packages or add-ons needed to run certain processes.
• Changing the working directory:
• To Show the present working directory (wd)
>getwd()
C:/mydocuments #The default working directory is mydocuments To
change the working directory
>setwd("C:/myfolder/data")
First R program: Using R as calculator:
R commands can run in two ways:
1) Type at console and press enter to see the output.
Output will get at console only in R studio.
2) Open new R Script file and write the command,
keep the curser on the same line and press
Ctrl+enter or click on Run. Then see the output at
console along with command.

• At console:
R as a calculator, typing commands directly into the
R Console. Launch R and type the following code,
pressing
< Enter > after each command. Type an expression
on console.
• R Sessions:-
• R is a case-sensitive, interpreted language. You can enter
commands one at a time at the command prompt (>) or run a
set of commands from a source file.
• There are a wide variety of data types, including vectors,
matrices, data frames (similar to datasets), and lists
(collections of objects).
• The standard assignment operator in R is <-. = can also used,
but this is discouraged, as it does not work in some special
situations.
• The variables can be printed without any print statement by
giving name of the variable.
> y <- 5
>y # print out y

• Comments (#) are especially valuable for documenting


program code
• Functions:- A function is a group of
instructions that takes inputs, uses them
to compute other values, and returns a
result.
Mathematical functions:-
• Definition of round R function: The round function
rounds a numeric input to a specified number of decimal
places.
• Definition of ceiling R function: The ceiling function
rounds a numeric input up to the next higher integer.
• Definition of floor R function: The floor function rounds
a numeric input down to the next lower integer.
• Definition of trunc R function: The trunc function
truncates (i.e. cuts off) the decimal places of a numeric
input.
• Definition of signif R function: The signif function
rounds a numeric input to a specified number of digits.
• Note: The difference between round and signif is
that round allows to specify the number of decimal
places and signif allows to specify the number of digits
Statistical function:-
Basic Math:- R is a powerful tool for all manner calculations, data
manipulation and scientific computations. R can certainly be used to
do basic math.
Examples:-
> 1+1
[1] 2

>4/3
[1] 1.333333
R follows the basic order of operations: Parenthesis, Exponents,
Multiplication, Division, Addition and
•Subtraction (PEMDAS). This means the operations inside parenthesis
take priority over other operations.
•Next on the priority list is exponentiation. After that multiplication and
division are performed, followed by addition and subtraction.

Example:-
> 4 * (6 + 5)
[1] 44
• Variables:- Variables are integral part of
any programming language. R does not
require variable types to be declared. A
variable can take on any available datatype.
It can hold any R object such as a function,
the result of an analysis or a plot. A single
variable, at one point hold a number, then
later hold a character and then later a
number again.
• Variable Assignment:- There a number of
ways to assign a value to a variable, it does
not depend on the type of value being
assigned. There is no need to declare your
variable first
Example:-
• > x <- 6 # assignment operator: a less-than character (<) and a hyphen
(-) with no space
>x
[1] 6
• >y=3 # assignment operator = is used.
>y
[1] 3
• > z <<- 9 # assignment to a global variable rather than a local variable.
>z
[1] 9
• > 5 -> fun #A rightward assignment operator (->) can be used anywhere
> fun
[1] 5
• > a <- b <- 7 # Multiple values can be assigned simultaneously.
>a
[1] 7
>b
[1] 7
• > assign("k",12) # assign function can be used.
> k [1] 12
Removing Variables:- rm() function is used to remove
variables. This frees up memory so that R can store more
objects, although it does not necessarily free up memory for
the operating system.
There is no “undo”; once the variable is removed.
Variable names are case sensitive.

> x <- 2*pi


>x
[1] 6.283185
> rm(x) # x variable is removed
>x
Error: object 'x' not found
> rm(x,a,y) # removing multiple variables
>a
Error: object 'a' not found
>x
Error: object 'x' not found
>y
Error: object 'y' not found
Modifying existing variable: Rename the existing
variable by using rename() function. For example
mydata<- rename(mydata, c(oldname="newname"))
Variable (Object) Names: Certain variable names
are reserved for particular purposes. Some reserved
symbols are: c q t C D F I T
### meaning of c q t C D F I T
? ## to see help document
?c ## c means Combine Values into a Vector or List
?q ## q means Terminate an R Session
?t ## t means Matrix Transpose
?C ## C means sets contrast for a factor
?D## D means Symbolic and Algorithmic
Derivatives of Simple Expressions
?F ## F means logical vector Character strings
>F ##[1] FALSE
?I ##Inhibit Interpretation/Conversion of
Objects
c("T", "TRUE", "True", "true") are true, c("F",
"FALSE", "False", "false") as false, and all
others as NA.
Data Types:- There are numerous data types in R
that store various kinds of data. The four main types
of data are
1) Numeric
2) character,
3) Date/POSIXct (time-based) and
4) logical (TRUE /FALSE).
The type of data contained in a variable is checked
with the class function.
Example:
> x <- 8
> class(x)

[1] "numeric"
Numeric Data:- The most commonly used numeric data is
numeric. This is similar to float or double in other languages.
It handles integers and decimals, both positive and negative,
and also zero.

> i <- 5L # To set an integer to a variable, append the


value with an ‘L’.
>i

[1] 5
> is.integer(i) # Testing whether a variable is integer or not

[1] TRUE
> is.numeric(i)
[1] TRUE
• R promotes integers to numeric when needed.
• Multiplying an integer to numeric results in decimal number.
• Dividing an integer with numeric results in decimal number.
• Dividing an integer with integer results in decimal number.

> class(4L) > class(x)


[1] "integer" [1] "numeric"

> class(2.8) > k <- (5L/2L)


[1] "numeric“ >k
[1] 2.5

> x <- (4L*2.8) > class(k)


>x [1] "numeric"
[1] 11.2
Character Data:- The character datatype is used to store
character and widely used in statistical analysis. x contains
the word “data” encapsulated in quotes.
> x <- "data"
>x
[1] "data"
Characters are case sensitive. To find the length of the
character nchar function can be used.
> x <- "Vishnu"
> nchar(x)
[1] 6
> nchar(567)
[1] 3
> nchar("hello")
[1] 5
Dates:- R has numerous different types of dates. The most
useful are Date and POSIXct. Date stores just a date while
POSIXct stores a date and time.
> date1 <- as.Date("2017-06-23")
> date1
[1] "2017-06-23“

> class(date1)
[1] "Date"

> date2 <- as.POSIXct("2017-06-23 17:42")


> date2

[1] "2017-06-23 17:42:00 IST"


• Logical:- Logical are a way of
representing data that can be either TRUE
or FALSE. Numerically, TRUE is the same
as 1 and FALSE is the same as 0. So
TRUE*5 equals 5 while FALSE*5 equals 0.

> TRUE
[1] TRUE

> TRUE*5
[1] 5
Mode vs Class:

'mode' is a mutually exclusive classification of objects according


to their basic structure. The 'atomic' modes are numeric, complex,
character and logical. An object has one and only one mode.

'class' is a property assigned to an object that determines how


generic functions operate with it.
> x <- 1:16 > is.numeric(x)
> mode(x) [1] TRUE
[1] "numeric"

> mode(x) <- "character"


> mode(x)
[1] "character"
• Advanced Data Structures:- R has
a wide variety of objects for holding
data, including scalars, vectors,
matrices, arrays, data frames, and
lists. They differ in terms of the type
of data they can hold.
Vector:- Vectors must be homogeneous i.e, the type of data
in a given vector must all be the same. Vectors are one-
dimensional arrays that can hold numeric data, character
data, or logical data. The combine function c() is used to
form the vector. Here are examples of each type of vector:
> a <- c(1, 2, 5, 3, 6, -2, 4)
>a
[1] 1 2 5 3 6 -2 4

> b <- c("one", "two", "three")


>b
[1] "one" "two" "three"

> c <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE)


>c
[1] TRUE TRUE TRUE FALSE TRUE FALSE
Here, a is numeric vector, b is a character vector, and c is a
logical vector. Note that the data in a vector must only be one
type or mode (numeric, character, or logical). You can’t mix
modes in the same vector. Following are some other
possibilities to create vectors
> x <- 1:10 > y <-
seq(10) #Create a sequence
>y >x
[1] 1 2 3 4 5 6 7 8 9 10 [1] 1 2 3 4 5 6
7 8 9 10

> z <- rep(1,10) #Create a repetitive pattern


>z
[1] 1 1 1 1 1 1 1 1 1 1

The colon operator is used to generate a sequence of


numbers. For example, a <- c(2:6) is equivalent to a <- c(2, 3,
4, 5, 6).
Vectors cannot contain a mix of data types, such as numbers
and strings. If you create a vector from mixed elements, R will
try to accommodate you by converting one of them:
> v1 <- c(1,2,3)
> v3 <- c("A","B","C")
> c(v1,v3)
[1] "1" "2" "3" "A" "B" "C"

To make a vector from them, R converts 3.1415 to character


mode so it will be compatible with "foo":
> c(3.1415, "foo")
[1] "3.1415" "foo"
> mode(c(3.1415, "foo"))
[1] "character"
• Matrices: A matrix is a two-dimensional array where each
element has the same mode (numeric,character, or logical).
Matrices are created with the matrix function . The general
format is
myymatrix <- matrix(vector, nrow=number_of_rows,
ncol=number_of_columns, byrow=logical_value,
dimnames=list(char_vector_rownames, char_vector_colnames))
• where vector contains the elements for the matrix, nrow and
ncol specify the row and column dimensions, and dimnames
contains optional row and column labels stored in character
vectors. The option byrow indicates whether the matrix should
be filled in by row (byrow=TRUE) or by column (byrow=FALSE).
The default is by column. The following listing demonstrates
the matrix function.Matrix can be created in three ways
• matrix(): A vector input to the matrix function.
• Using rbind() and cbind() functions.
• Using dim() to the existing vector
Creating a matrix using matrix():
# Create a matrix.
M = matrix( c('a','a','b','c','b','a'), nrow=2,ncol=3,byrow = TRUE)
print(M)
[,1] [,2] [,3]
[1,] "a" "a" "b"
[2,] "c" "b" "a"
# Create a matrix.
> y <- matrix(1:20, nrow=5, ncol=4)
>y

[,1] [,2] [,3] [,4]


[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
Creating a matrix using rbind() or cbind(): First create two vectors
and then create a matrix using rbind() .It binds the two vectors data into
two rows of matrix.
Example:
create two vectors as xr1,xr2

> xr1 <- c( 6, 2, 10)


> xr2 <- c(1, 3, -2)
> x <- rbind (xr1, xr2) ## binds the vectors into rows of a matrix (2X3)
>x
[,1] [,2] [,3]
xr1 6 2 10
xr2 1 3 -2
Create a matrix using dim(): Create a vector and add the dimensions
using the dim ( ) function.It’s especially useful if you have your data
already in a vector.

Using matrix subscripts :- You can identify rows, columns, or elements


of a matrix by using subscripts and brackets. X[i,] refers to the ith row of
matrix X, X[,j] refers to jth column, and X[i, j] refers to the ijth element,
respectively
Apply functions on matrices:- apply(), which
instructs R to call a user-specified function on each of
the rows or each of the columns of a matrix.
Using the apply() Function
This is the general form of apply for matrices:
apply(m,dimcode,f,fargs) where the arguments are :
• m is the matrix.
• dimcode is the dimension, equal to 1 if the function
applies to rows or 2 for columns.
• f is the function to be applied.
• fargs is an optional set of arguments to be supplied
to f.

You might also like