0% found this document useful (0 votes)
145 views67 pages

Introduction To R

R is a programming language and software environment for statistical analysis, graphics, and reporting. It provides tools for data manipulation, analysis, and visualization. R has a simple programming language and is widely used in academia and industry, including companies like Google, Bank of America, and ANZ Bank. RStudio is a popular integrated development environment for R that provides a graphical user interface and tools to help develop R code and projects.

Uploaded by

rrr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
145 views67 pages

Introduction To R

R is a programming language and software environment for statistical analysis, graphics, and reporting. It provides tools for data manipulation, analysis, and visualization. R has a simple programming language and is widely used in academia and industry, including companies like Google, Bank of America, and ANZ Bank. RStudio is a popular integrated development environment for R that provides a graphical user interface and tools to help develop R code and projects.

Uploaded by

rrr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 67

Introduction to R

What R covered…
• Basic introduction of R
• Features of R
• Programming features of R
• Exploring RGUI and RStudio
• Basic concepts in R
• Working with R environment
• Handling data in R workspace.
• Reading datasets and exporting data from R
BIG DATA WITH 5 V’S
BIG DATA AND HADOOP
 BigData processing with tools and techniques of
Hadoop.
 Hadoop is an open source, Java-based programming
framework.
 Hadoop supports the processing and storage of
extremely large data sets.
 The processing uses a framework called as MapReduce.
 A storage with distributed file system called HDFS.

 Data analysis and visualization is required.


 It is possible with R.
• R is a programming language and software
environment for statistical analysis, graphics
representation and reporting.

• R was written by Ross Ihaka and Robert Gentleman


at the Department of Statistics of the University of
Auckland in Auckland, New Zealand.

• Name of R language is derived from the initial letters


of name of its authors.
• R made its first appearance in 1993.
• R language is developed by the R Development Core
Team (R Core Team) who can modify the R source
code archive.
• R is freely available under the GNU General Public
License.
• Pre-compiled binary versions are provided for various
operating systems.
• R programs can be compiled and run on a variety of
Linux, Windows and Mac.
• R is a data analysis and visualization tool.
• It allows you to represent complex data in form of
charts, plots and graphs.
• It provides a variety of statistical and graphical
techniques for data analysis and visualization.
• RGUI and RStudio are the commonly used GUI editor
of R language.
• To handle R projects, developers use a command line
interface and several graphical front-ends.
Exploring features of R
• R is a programming language and software
environment for statistical analysis, graphics
representation and reporting.

• R is a well-developed.
• R is a simple and effective programming language
includes conditionals, loops, functions & input &
output facilities.

• R has an effective and powerful data handling and


storage facility.
Exploring features of R
• R provides a suite of operators for calculations on
arrays, lists, vectors and matrices.
• R provides a large, coherent and integrated collection
of tools for data analysis.
• R provides data analysis and graphical facilities.
• R is extended through various predefined and user-
defined functions and packages.

• R is strong object-oriented programming language.


Programming features of R
• R is an interpreted language.
• It uses command line interpreter for execution of
commands.

• R supports matrix arithmetic.

• R includes data structures like scalars, vectors,


matrices, data frames and lists.

• R provides an extensible object system.


Programming features of R
• R support for procedural programming with
functions.

• R support for object-oriented programming with


generic functions.

• R language can be used with several scripting


languages like Python, Perl, Ruby etc.

• R programming development support several text


editors and integrated development environment
(IDE).
Three things in one single tool:
Data Manipulation:
• R allows the data scientist to shape the data set into
a format.
Data Analysis:
• Any kind of statistical data analysis could be found in
R. 4000 packages to implement various statistical
analysis tools.
Data Visualization:
• R is "the package" for data visualization.
WHY R IMPORTANT FOR DATA
SCIENCE?
 You can run your code without any Compiler –
 R is an interpreted language.
 Hence we can run Code without any compiler.
 R interprets the Code and makes the development of code
easier.
 Many calculations done with vectors –
 R is a vector language, so anyone can add functions to a
single Vector without putting in a loop.
 R is powerful and faster than other languages.
 Statistical Language-
 R used in biology, genetics as well as in statistics.
 R is a turning complete language where any type of task can
perform.
WHY IS R GOOD FOR BUSINESS?
Ris good for business:
It is an open source.
R is great for visualization.

As per new research, R has far more capabilities as


compared to earlier tools.

For data-driven business, data science talent


shortage is a very big problem.

Companies are using R programming as their


platform and recruit trained users of R.
GROWTH AND SALARY IN R
 The average salary of people who are having R skills

Data Scientist- Rs 805,656

Data Analyst- Rs 495,199

Data Scientist,IT- Rs 723,362

Analyst Manager- Rs1,552,127

Senior Data Analyst- Rs 734,851

Business Analyst- Rs 580,006

Analyst Consultant- Rs 785,051


WHO USES R?
 The Consumer Financial Protection Bureau uses R for
data analysis.

 Statisticians
at John Deere use R for time series
modelling and geospatial analysis in a reliable and
reproducible way.

 Bank of America uses R for reporting.


 ANZ, the fourth largest bank in Australia, using R for
credit risk analysis.
 Google uses R to predict Economic Activity.
 Mozilla (Firefox web browser) uses R to visualize Web
activity.
WHERE R USED?
Packages
• A package is collection of functions and datasets.

• R provides two types of packages:


• Standard packages:
-Base packages, in-built packages
-It contains basic functions and datasets of R.

• Contributed packages:
-User-defined packages.
-Packages written by various users.
GUI-Graphical User Interface
• R language has several editors.
• RGUI
• Rstudio
• Deducer
• RKWard
• Rweka
• Eclipse StartET
• Emacs Speaks Statistics
• jEdit
Common GUI for R

• RGUI
• Specifies a tool with pre-compiled version of R for
Microsoft Windows.

• Rstudio
• Specifies cross-platform and open source IDE for R
programming development.
Exploring RGUI

• RGUI- R Graphical User Interface

• RGUI consists of
-R Console
-Development of program
-Quitting R
Exploring RStudio
• RStudio is a code editor and development
environment.

• It provides integrated tools to develop productive R


programs.

• Download and install RStudio from


http://www.rstudio.org
Exploring RStudio
Rstudio window contains following panes:
• Script pane- Top left corner
• Console pane- Bottom left corner
• Workspace/History pane- Top right corner
• Files, Plots, Package and Help pane- Bottom right
corner
-Files- allows to browse folders and file
-Plots- to display user’s plot
-Packages- to view all installed packages
-Help- to browse built-in help system
R STUDIO
Handling Basic expressions in R
• Symbol “>” is R prompt.

• Character after # treated as comments.

• To view information about handling of console


through keyboard
HelpConsole

• Basic Data types.


• Mathematical operators.
• Perform basic Arithmetic in R.
Variables in R
• Variables are symbols that are used to contain and
store values.

• Two ways to assign a value to a variable in R.

• Using the “=” symbol


>Num=20
>a=20

• Using “<-” symbol


>Num<-20
>a<-20
Calling functions in R
• To invoke predefined functions in R, type their names
on R console.

• Pass comma separated parameters as arguments


within parentheses.

• Predefined functions in R:
>sum(10,12,13)
>rep(“Hello”, 5)
>sqrt(100)
Working with Vectors
• Vector can be defined as a single entity consisting of
an ordered collection of numbers.

• Numerical vector consists of multiple numbers, as an


array.

• Vector is logical elements that holds values.

• Vectors can also store objects or collection of


objects.

• Easily increase or decrease container size of vector.


Working with vector
• Constructing a vector in R.
• Using function c() to create vector.
>c(10,20,30,40,50)
10 20 30 40 50
• Combining text values
>hw<-c(“Hello”, “World”)
“Hello” “World”
• Using ‘:’ operator between range of numbers.
>1:6
123456
• Calculation with vector
>sum(1:6)
Storing & calculating values in R
• R allows to store intermediate results for other
calculations.
• Store values in R:
• Assign values to variable
>x=1:6
123456
>y=30
• Addition of variables in R
>z=x+y
31 32 33 34 35 36
• Combining text values in R
>hw<-c(“Hello”, “World”)
Creating and using objects
• R uses objects to store the result of computation.

• Object creation in R:
object.name= mathematical.expression

>myobj=20
>myobj
20
>myobj=25+12/2-16
>myobj
5
Interacting with users
• We can write R script to interact with users.
• The readline() function helps to ask questions from
user.

• The paste() function to display the concatenation of


the text values saved in variables.

>msg=“Welcome ”
>yourname=readline(“What is your name?”)
What is your name?CSE
>paste(msg, yourname)
“Welcome CSE”
Handling data in R workspace
• R working environment includes variables, functions,
vectors, matrices, data frames and lists.

• R uses various function to handle data in R


workspace.

• Functions in R:
-The ls() function
-The rm() function
-The getwd() function
-The save() function
-The load() function
The ls() function
• The ls() function is used to view/list all created
variables in current active R workspace.

>ls()
>msg=“Hello”
>myobj=25+12/2-16
>hw=c(“Hello”, “World”)
>ls()
“hw” “msg” “myobj”
The rm() function
• The rm() function is used to remove the variables
that are not required anymore in a session.
>rm()
>msg=“Hello”
>myobj=25+12/2-16
>hw=c(“Hello”, “World”)
>ls()
“hw” “msg” “myobj”
>rm(hw)
>ls()
“msg” “myobj”
The getwd() function
• getwd()- Get working directory

• The getwd() function is used to display the current


working directory of the user.

>getwd()
“D:/Myfiles”
The save() function
• save()-save created variables.

• The save() function is used to save the variables


created in active session.

• Save value of variable also.

>msg=“Hello”
>save(msg, file=“name.rda”)

• To check saved file:


File Display file(s)
The load() function
• load()- retrieve saved data.
• The load() function is used to retrieve the saved data.
• Create variable msg
>msg=“Hello”
>save(msg, file=“msg.rda”)
>msg
“Hello”
>rm(msg)
>ls()
>load(“msg.rda”)
>ls()
“msg”
>msg
“Hello”
Executing Scripts
• Create script and execute it.
• Two ways:
• Create script using R Editor directly in RGUI and
execute it.
-In RGUI, open R Editor, select File New Script

• Create script in notepad, open it in R Editor in RGUI


and execute it.
-Open notepad and write script.
-Save it with script.txt
-Open RGUI tool  File Open script
-To execute all commands listed in script.txt
-Select Edit Run all
script.txt

print(“Hello World”)
p<-1:5
P
q<-20
r<-p+q
r
ls()
Reading & writing data from R
• Analysis can not perform without data.

• Big data analytics includes creating complex samples,


examining lengthy sets of data, putting data into R for
computational purpose.

• Reading data To import data from other source to


R
• Writing data To move processed data out of R.
Export data from R to other source.
Reading & writing data from R
R commands to import data into R:
• Using c() command
• Using scan() command
• Using read.csv()
• Using read.table()

R commands to export data from R:


• Using write.csv()
• Using write.table()
Using c() command
• The c() command is used to combine or concatenate
two or more values.

• Syntax for c() command is:


>c(item1, item2, item3, item4)

• Assign combined values to a variable:


>items=c(item1, item2, item3, item4)

• The c() command is used to read data values of a


small dataset.
• Data can be in numeric as well as in text format.
Reading & combining Numeric data
• Numerical values are passed within the parentheses
of c() command.
• Multiple values are separated with comma.
>c(10,11,12,13,14,15,16,17,18,19,20)
>c
>items= c(11,12,13,14,15,16,17,18,19,20)
>items
>sample=c(21,22,23,24,25,26,27,28,29,30)
>sample
>set=c(items, 31,32,333,34,35)
>set
>dataset=c(items, sample, set, 36,37,38,39,40)
>dataset
Reading & combining text data
• Text values using quotes are passed within the
parentheses of c() command.
• Multiple text values are separated with comma.
• We can single or double quotes for text values.
• R can convert all quotes to double quotes.

• Syntax for c() command is:


>c(“item1”, “item2”, “item3”, “item4”)
>c(‘item1’, ‘item2’, ‘item3’, ‘item4’)

• Assign combined values to a variable:


>textdata= c(“item1”, “item2”, “item3”, “item4”)
>c(“Amar”, “BE”, “CSE”, “Rollno-20”, “Per-78”)
>c
>data=c(“Amar”, “BE”, “CSE”, “Rollno-20”, “Per-78”)
>data
>mydata=c(data, “Ram”, “TE”, “CSE”, “Rollno-22”,
“Per-80”)
>mydata

• Reading both Numerical and Text values in R

>combine=c(data, mydata)
>combine
Using scan() command
• The c() command uses commas to separate values.
• The scan() command is used to separate values
without commas.

• The scan() command used to enter numeric or text


data into dataset.
• The scan() command uses empty parentheses.
• After executing, you are prompted to enter the
desired data.

• Syntax:
>scan() #Ask to enter data from users.
Reading numerical values using scan() command
>scan()
12
13
14
15
16

Read 5 items
12 13 14 15 16

Read and assign values to variable.


>emp=scan()
>emp
Reading text data using scan() command
• Use scan() command to enter text into dataset.
• Entering the items in quotes will generate errors.

• Syntax:
>scan(what = ‘character’)

• No need to enter text data into quotes.


>names=scan(what=‘character’)
SE
TE
BE
>names
Using clipboard to create data
• If data in text file is separated by simple spaces, copy
and paste can be done to read the data in R.
• If data is separated with some other character, like
comma(,) or period(.), then specify separator in R.
• Use sep=‘ , ’
• Use sep=‘ . ’
>empdata=scan()
10
20
30
>empdata=scan(sep = ‘ , ’)
Reading data of file from Disk
• To read data of file from disk using scan() command,
add file=‘filename’
>readdata=scan(file= ‘sample.txt’)
• To get current working directory
>getwd()
• To navigate R console to locate directory in which file is
available.
>setwd(‘E:/’)
>getwd()
>dir()
>list.files()
>dir(‘E:/DA’)
>scan(file.choose())
Using read.csv() command

• The read.csv() command is used for:


-Reading multiple data values from large files.
-Reading large amount of data from complicated files
containing multiple items.

• The syntax:
>read.csv()

• Read the entire CSV file and display the data on


console.
Using read.csv() command
• Use various instructions with the read.csv()
command:

-file-specify file name


-sep-provide the separator
-header-specify whether or not the first row of CSV
file should be set as column name by setting
value to TRUE.
-row.names-specify row name for the data.
Set row.names=n, n is column number.

>read.csv(file.choose(), header=TRUE, row.names=1)


Using read.table() command
• The read.table() command is used to read plain, text
data which are in the form of table.

>read.table(file.choose(), header=TRUE,
row.names=1)

>var=read.table(file.choose(), header=TRUE,
row.names=1, sep=“\t”)
>var
Exporting data from R
• Writing data To move processed data out of R.
Export data from R to other source.

• R commands to export data from R to other source.

• R commands used for export data:


-Using write.csv()
-Using write.table()
Using write.table() command
• The write.table() command is used to write data
stored in a vector into file.
• The data is saved using the delimiters such as spaces
or tabs.
• The data in file will be saved using tab.
• Use variable to write data into any text file.
• The syntax is:
>write.table(“D:/mydata.txt”)
>para=scan()
>write.table(para, “D:/mydata.txt”, sep=“\t”)
Using write.csv() command

• We can store comma separated values in files.

• Write data into CSV file using write.csv() command.

• The syntax is:


>write.csv(“D:/file.txt”)

>my=scan(what=‘character’)
>write.csv(my, “E:/file.txt”, sep=“,”)
PIE CHART IN R
 In R the pie chart is created using the pie() function.
 It takes positive numbers as a vector input.
pie(x, labels, radius, main, col, clockwise)
 The parameters used −
x is a vector containing the numeric values used in the
pie chart.
labels is used to give description to the slices.
radius indicates the radius of the circle of the pie chart.
main indicates the title of the chart.
col indicates the color palette.
clockwise is a logical value indicating if the slices are
drawn clockwise or anti clockwise.
PIE CHART EXAMPLE
# Create data for the graph.
x <- c(21, 62, 10, 53)
labels <- c("London", "New York",
"Singapore", "Mumbai")
# Give the chart file a name.
png(file = "city.jpg")
# Plot the chart.
pie(x,labels)
# Save the file.
dev.off()
BAR CHART IN R
R uses the function barplot() to create bar charts.
 R draw both vertical and Horizontal bars in bar chart.

barplot(H,xlab,ylab,main, names.arg,col)

 The parameters used −


 H is a vector or matrix containing numeric values used in
bar chart.
 xlab is the label for x axis.
 ylab is the label for y axis.
 main is the title of the bar chart.
 names.arg is a vector of names appearing under each bar.
 col is used to give colors to the bars in the graph.
BAR CHART EXAMPLE
# Create the data for the chart
H <- c(7,12,28,3,41)
# Give the chart file a name
png(file = "barchart.png")
# Plot the bar chart
barplot(H)
# Save the file
dev.off()
Thank You…..

You might also like