0% found this document useful (0 votes)
15 views

Lab 01

The document introduces R programming language and its use for statistical analysis and data science. It covers downloading and installing R and RStudio, defines variables, comments, basic operations and data types in R. Key features of R like statistics, graphics, probabilities and advantages like open source nature are also discussed.

Uploaded by

Ahmad Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Lab 01

The document introduces R programming language and its use for statistical analysis and data science. It covers downloading and installing R and RStudio, defines variables, comments, basic operations and data types in R. Key features of R like statistics, graphics, probabilities and advantages like open source nature are also discussed.

Uploaded by

Ahmad Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Lab 01

Probability and Statistics

The focus of this lab is to introduce you to R and the R Commander (a graphical user interface to
R). To use R to analyze data, you will need to become familiar with the technical components of
this software package. This lab will help familiarize you with the R software, including how to
access data files, the various base components and how to define new variables and how to enter
data.
Introduction to R
R is an open-source programming language that is widely used as a statistical software and data
analysis tool. R generally comes with the Command-line interface. R is available across widely
used platforms like Windows, Linux, and macOS. Also, the R programming language is the latest
cutting-edge tool.
Defining and Downloading R
The R system for statistical computing is an environment for data analysis and graphics. The root
of R is the S language, developed by John Chambers and colleagues (Becker et al., 1988, Chambers
and Hastie, 1992, Chambers, 1998) at Bell Laboratories (formerly AT&T, now owned by Lucent
Technologies) starting in the 1960s.
The S language was designed and developed as a programming language for data analysis tasks
but in fact it is a full-featured programming language in its current implementations. The
development of the R system for statistical computing is heavily influenced by the open source
idea:
All scientists, especially including those working in developing countries, have access to state-of-
the art tools for statistical data analysis without additional costs. With the help of the R system for
statistical computing, research really becomes reproducible when both the data and the results of
all data analysis steps reported in a paper are available to the readers through an R transcript file.
R is most widely used for teaching undergraduate and graduate statistics classes at universities all
over the world because students can freely use the statistical computing tools. The base distribution
of R is maintained by a small group of statisticians, the R Development Core Team. A huge amount
of additional functionality is implemented in add-on packages authored and maintained by a large
group of volunteers. The main source of information about the R system is the World Wide Web
with the official home page of the R project being http://www.R-project.org . or
https://mirror.las.iastate.edu/CRAN/
Why R Programming Language?

 R programming is used as a leading tool for machine learning, statistics, and data analysis.
Objects, functions, and packages can easily be created by R.
 It’s a platform-independent language. This means it can be applied to all operating system.
 It’s an open-source free language. That means anyone can install it in any organization
without purchasing a license.
 R programming language is not only a statistic package but also allows us to integrate with
other languages (C, C++). Thus, you can easily interact with many data sources and
statistical packages.
 The R programming language has a vast community of users and it’s growing day by day.
 R is currently one of the most requested programming languages in the Data Science job
market that makes it the hottest trend nowadays.

Features of R Programming Language


Statistical Features of R:

 Basic Statistics: The most common basic statistics terms are the mean, mode, and median.
These are all known as “Measures of Central Tendency.” So using the R language we
can measure central tendency very easily.
 Static graphics: R is rich with facilities for creating and developing interesting static
graphics. R contains functionality for many plot types including graphic maps, mosaic
plots, biplots, and the list goes on.
 Probability distributions: Probability distributions play a vital role in statistics and by
using R we can easily handle various types of probability distribution such as Binomial
Distribution, Normal Distribution, Chi-squared Distribution and many more.
 Data analysis: It provides a large, coherent and integrated collection of tools for data
analysis.

Advantages of R:
 R is the most comprehensive statistical analysis package. As new technology and concepts
often appear first in R.
 As R programming language is an open source. Thus, you can run R anywhere and at any
time.
 R programming language is suitable for Linux and Windows operating system.
 R programming is cross-platform which runs on any operating system.
 In R, everyone is welcome to provide new packages, bug fixes, and code enhancements.
Disadvantages of R:
 In the R programming language, the standard of some packages is less than perfect.
 Although, R commands give little pressure to memory management. So R programming
language may consume all available memory.
 R programming language is much slower than other programming languages such as
Python and MATLAB.
Applications of R:
1. We use R for Data Science. It gives us a broad variety of libraries related to statistics. It
also provides the environment for statistical computing and design.
2. R is used by many quantitative analysts as its programming tool. Thus, it helps in data
importing and cleaning.
3. R is the most prevalent language. So many data analysts and research programmers use it.
Hence, it is used as a fundamental tool for finance.
4. Tech giants like Google, Facebook, bing, Twitter, Accenture, Wipro and many more using
R nowadays
R and Python both play a major role in data science. It becomes confusing for any newbie to choose
the better or the most suitable one among the two, R and Python

Installing R
The R system for statistical computing consists of two major parts: the base system and a collection
of user contributed add-on packages. The R language is implemented in the base system.
Implementations of statistical and graphical procedures are separated from the base system and are
organized in the form of packages. A package is a collection of functions, examples and
documentation. The functionality of a package is often focused on a special statistical
methodology. Both the base system and packages are distributed via the Comprehensive R Archive
Network (CRAN) accessible under http://CRAN.R-project.org
The first step is to download R, the programming language on which RStudio is based. After
installing R, you can now proceed to next step which is to download rstudio. The integrated
development environment (IDE) for the programming language R is called RStudio. Users can
write and run R code using RStudio, communicate with other R users to create interactive graphs
and charts, and visualise data.
Link to Download RStudio (For Windows 10 & Windows 11) https://posit.co/download/rstudio-
des...
RStudio installation is a simple process that shouldn't take too long to finish.
So that's how you can download and install RStudio on your Windows computer (PC)

One can change the appearance of the prompt by > options (prompt = "R> ") and we will use the
prompt R> for the display of the code examples throughout this manual. Essentially, the R system
evaluates commands typed on the R prompt and returns the results of the computations. The end
of a command is indicated by the return key. Virtually all introductory texts on R start with an
example using R as pocket calculator, and so do we:
R Print Output
print("Hello World!")

And there are times you must use the print() function to output code, for example when working
with for loops
for (x in 1:10) {
print(x)
}

Comments

Comments can be used to explain R code, and to make it more readable. It can also be used to
prevent execution when testing alternative code.

Comments starts with a #. When executing code, R will ignore anything that starts with #.

# This is a comment
"Hello World!"

"Hello World!" # This is a comment


Comments does not have to be text to explain the code, it can also be used to prevent R from
executing the code:
# "Good morning!"
"Good night!"

Creating Variables in R
Variables are containers for storing data values.
R does not have a command for declaring a variable. A variable is created the moment you first
assign a value to it. To assign a value to a variable, use the <- sign. To output (or print) the variable
value, just type the variable name:

name <- "Ali"


age <- 40

name # output "Ali"


age # output 40
Concatenate Elements

You can also concatenate, or join, two or more elements, by using the paste() function.

To combine both text and a variable, R uses comma (,):

text <- "awesome"

paste("R is", text)

You can also use , to add a variable to another variable:

text1 <- "R is"


text2 <- "awesome"

paste(text1, text2)

Multiple Variables

R allows you to assign the same value to multiple variables in one line:

# Assign the same value to multiple variables in one line


var1 <- var2 <- var3 <- "Orange"

# Print variable values


var1
var2
var3

Variable Names
A variable can have a short name (like x and y) or a more descriptive name (age, carname,
total_volume). Rules for R variables are:

 A variable name must start with a letter and can be a combination of letters, digits, period(.)
and underscore(_). If it starts with period(.), it cannot be followed by a digit.
 A variable name cannot start with a number or underscore (_)
 Variable names are case-sensitive (age, Age and AGE are three different variables)
 Reserved words cannot be used as variables (TRUE, FALSE, NULL, if...)

# Legal variable names:


myvar <- "Saim"
my_var <- "Saim"
myVar <- "Saim"
MYVAR <- "Saim"
myvar2 <- "Saim"
.myvar <- "Saim"

# Illegal variable names:


2myvar <- "Saim"
my-var <- "Saim"
my var <- "Saim"
_my_var <- "Saim"
my_v@ar <- "Saim "
TRUE <- "Saim"

R Data Types
Let’s now explore what R can do. R is really just a big fancy calculator. For example, type in the
following mathematical expression next to the > in the R console (left window)

1+1

Note that spacing does not matter: 1+1 will generate the same answer as 1 + 1. Can you
say hello to the world?

hello world
## Error: <text>:1:7: unexpected symbol
## 1: hello world
## ^

Nope. What is the problem here? We need to put quotes around it.

"hello world"
## [1] "hello world"

“hello world” is a character and R recognizes characters only if there are quotes around it. This
brings us to the topic of basic data types in R.
There are four basic data types in R: character, logical, numeric, and factors (there are two others
- complex and raw - but we won’t cover them because they are rarely used).
Characters
Characters are used to represent words or letters in R. We saw this above with “hello world”.
Character values are also known as strings. You might think that the value "1" is a number.
Well, with quotes around, it isn’t! Anything with quotes will be interpreted as a character. No ifs,
ands or buts about it.
Numeric
Numerics are separated into two types: integer and double. The distinction between integers and
doubles is usually not important. R treats numerics as doubles by default because it is a less
restrictive data type. You can do any mathematical operation on numeric values. We added one
and one above. We can also multiply using the * operator

2*3
## [1] 6

Divide

4/2
## [1] 2

And even take the logarithm!

log(1)
## [1] 0

log(0)
## [1] -Inf

Uh oh. What is -Inf? Well, you can’t take the logarithm of 0, so R is telling you that you’re
getting a non numeric value in return.

Addition
a <- c (1, 0.1)
b <- c (2.33, 4)
print (a+b)

Subtraction
a <- 6
b <- 8.4
print (a-b)
Multiplication
B= c(4,4)
C= c(5,5)
print (B*C)

Division
a <- 10
b <- 5
print (a/b)

Modulo Operation
list1<- c(2, 22)
list2<-c(2,4)
print(list1 %% list2)

# R program to illustrate
# the use of Arithmetic operators
vec1 <- c(0, 2)
vec2 <- c(2, 3)

# Performing operations on Operands


cat ("Addition of vectors :", vec1 + vec2, "\n")
cat ("Subtraction of vectors :", vec1 - vec2, "\n")
cat ("Multiplication of vectors :", vec1 * vec2, "\n")
cat ("Division of vectors :", vec1 / vec2, "\n")
cat ("Modulo of vectors :", vec1 %% vec2, "\n")
cat ("Power operator :", vec1 ^ vec2)
Tasks:
Question 1: Practice
R> 2+3
R> 14/6
R> 14/6+5
R> 14/(6+5)
R> 3^2
R> 2^3

Question 2:
Question 3:

You might also like