0% found this document useful (0 votes)
60 views10 pages

R Manual

R is an open-source programming language used for statistics and graphics. It can be used along with RStudio, a popular IDE. The document provides an introduction to R including how to download and set up R and RStudio, the basic syntax and structure of R programs, different data types like vectors, matrices and data frames, functions, packages, reading external data, statistical analysis, distributions and fitting distributions to data. Key functions and concepts are explained through examples.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views10 pages

R Manual

R is an open-source programming language used for statistics and graphics. It can be used along with RStudio, a popular IDE. The document provides an introduction to R including how to download and set up R and RStudio, the basic syntax and structure of R programs, different data types like vectors, matrices and data frames, functions, packages, reading external data, statistical analysis, distributions and fitting distributions to data. Key functions and concepts are explained through examples.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

An Introduction to R

1. R introduction and download source


 R is an open-source programming language developed for use in the fields
of statistics and graphics and is used by data miners and statisticians to data
manipulation, calculation, and graphical display.

 R is usually used along with RStudio, a popular free IDE for the same. The
steps you can follow to set it up on your computer are:

1. Download R from this link: https://cran.rstudio.com/bin/windows/base/


2. Download RStudio from this link
https://rstudio.com/products/rstudio/download/
3. Familiarize yourself with the RStudio interface. You can use the following
resources:
a). for a variety of purposes and use cases
https://rstudio.com/resources/cheatsheets/
b). List of Keyboard Shortcuts:
https://support.rstudio.com/hc/enus/articles/200711853-Keyboard-
Shortcuts

2. Getting Started:
You can follow the following steps to set up and use an elementary script
for R within RStudio.
 Set the current working directory of RStudio as the folder where you
would want to store all the code, data files and would act as the root of
paths supplied by you. Do this by going into Session > Set working
directory > Choose directory. (Verify the directory by running getwd()
into the terminal within the IDE)

 Create a script window by selecting File > New file > R script. All the
commands hereon would be written into this script file and not directly on
the terminal to ensure code usability. Some useful shortcuts are:
a. Ctrl + R: Run complete or selected fragment of code
b. Ctrl + S/O: Save/Open a file

3. Basic syntax
Depending on the needs, you can program either at R command prompt or
you can use an R script file to write your program. For example,
 In R prompt, your program as follows;
> myString <- "Hello, World!"
> print (myString)
[1] "Hello, World!"

 In R script file, you can write your code as,

# My first program in R Programming


myString <- "Hello, World!"

print (myString)

4. Data Types There are different classes of the elements, for example,
Logical, Numeric, Integer, Complex, Character, and Raw.

 Vectors: Use c() function which means to combine the elements into a
vector. For example,
# Create a vector.
apple <- c('red', 'green', 'yellow')
print(apple)

# Get the class of the vector.


print(class(apple))

 Lists: Contain many different types of elements (vectors, functions,


another list). For example,
# Create a list.
list1 <- list(c(2,5,3), 21.3, sin)

# Print the list.


print(list1)

 Matrices
# Create a matrix.
M = matrix( c('a','a','b','c','b','a'), nrow = 2, ncol = 3, byrow = TRUE)
print(M)
 Arrays
# Create an array.
a <- array(c('green','yellow'),dim = c(3,3,2))
print(a)

 Factors: Using the factor() function, nlevels() = the count of levels


# Create a vector.
apple_colors <- c('green','green','yellow','red','red','red','green')

# Create a factor object.


factor_apple <- factor(apple_colors)

# Print the factor.


print(factor_apple)
print(nlevels(factor_apple))

 Data Frames: The data. frame() function


# Create the data frame.
emp.data <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),

start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15",


"2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE
)
# Print the data frame.
print(emp.data)

# Get the structure of the data frame.


str(emp.data)
# Print the summary.
print(summary(emp.data))
 Functions: Keyword function. The basic syntax of an R function
definition is as follows.
function_name <- function(arg_1, arg_2, ...) {Function body}

1. Built-in Function: Simple examples of in-built functions are


seq(), mean(), max(), sum(x) and paste(...) etc.

# Create a sequence of numbers from 32 to 44.


print(seq(32,44))
2. User- defined Function
# Create a function to print squares of numbers in sequence.
new.function <- function(a) {
for(i in 1:a) {
b <- i^2
print(b)
}}
# Call the function without giving any argument.
new.function(a)
# Call the function with giving new values of the argument.
new.function(a=9)

5. R Package:
 Check Available R Packages: Get library locations containing R
packages
.libPaths()

 For list of all packages


library()

 Currently loaded in the R environment


search()
 Install a new package from directly CRAN
The Syntax is
install.packages("Package Name")

# Install the package named "XML".


install.packages("XML")
6. R data Interfaces

I. CSV file- File Name - Input.csv

data <- read.csv("input.csv")


print(data)
print(is.data.frame(data))
print(ncol(data))
print(nrow(data))

# example
# Get the max value(salary) from data frame.
sal <- max(data$salary)
print(sal)
# Get the person detail having max salary.
retval <- subset(data, salary == max(salary))
print(retval)
reaval <- subset( data, dept == "IT")
print(reaval)
info <- subset(data, salary > 600 & dept == "IT")
print(info)

II. Excel file (file name- input. Xlsx): First install xlsx packages

o install.packages("xlsx")
o
# Verify the package is installed.
any(grepl("xlsx",installed.packages()))

# Load the library into R workspace.


library("xlsx")

# Read the first worksheet in the file input.xlsx.


data <- read.xlsx("input.xlsx", sheetIndex = 1)
print(data)
o
o Similarly, you can apply for binary file, XML file etc.
7. R Statistics: Statistical analysis in R is performed by using many in-
built functions. Most of these functions are part of the R base package.
o
Start with mean: The function mean() / median()/
o mean(x, trim = 0, na.rm = FALSE, …) where, x is the input vector.

o trim is used to drop some observations from both end of the sorted
vector.
o na.rm is used to remove the missing values from the input vector.
o Example:
# Create a vector.
x <- c(12, 7, 3, 4.2, 18, 2, 54, -21, 8, -5)

# Find Mean.
result.mean <- mean(x)
print(result.mean)

# Find Mean.
result.mean <- mean(x,trim = 0.3)
print(result.mean)

# Create a vector.
x <- c(12, 7, 3, 4.2, 18, 2, 54, -21, 8, -5, NA)

# Find mean.
result.mean <- mean(x)
print(result.mean)

# Find mean dropping NA values.


result.mean <- mean(x,na.rm = TRUE)
print(result.mean)

o
o
8. R- Distributions
Normal distribution: R has four in built functions to generate normal
distribution. They are described below: #X= vector,
 dnorm(x, mean, sd):height of the probability distribution at each point
for a given mean and standard deviation.
 pnorm(x, mean, sd): probability of a normally distributed random
number to be less that the value of a given number. It is also called
"Cumulative Distribution Function".
 qnorm(p, mean, sd): takes the probability value and gives a number
whose cumulative value matches the probability value
 rnorm(n, mean, sd): used to generate random numbers whose
distribution is normal.
 Example

# Create a sequence of numbers between -10 and 10 incrementing by


0.1.
x <- seq(-10, 10, by = .1)

# Choose the mean as 2.5 and standard deviation as 0.5.


y <- dnorm(or qnorm)(x, mean = 2.5, sd = 0.5)

# Give the chart file a name.


png(file = "dnorm.png")
# Plot the graph.
plot(x,y)
# save the file
dev.off()

# Give the chart file a name.


png(file = "pnorm.png")
# Plot the graph.
plot(x,y)
# save the file
dev.off()

# Give the chart file a name.


png(file = "qnorm.png")
# Plot the graph.
plot(x,y)
# save the file
dev.off()

# Create a sample of 50 numbers which are normally distributed.


y <- rnorm(50)
# Give the chart file a name.
png(file = "rnorm.png")

# Plot the histogram for this sample.


hist(y, main = "Normal DIstribution")
# save the file
dev.off()

9. R – Polynomial Regression: The general mathematical equation


for multiple regression is −
𝑦 = 𝑎 + 𝑏1 𝑥1 + 𝑏2 𝑥2 + ⋯ + 𝑏𝑛 𝑥𝑛
syntax: lm() Function
for linear regression

x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131) # value of
height
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48) # value of weight

# Apply the lm() function.


relation <- lm(y~x)

print(relation)
print(summary(relation))
# Find weight of a person with height 170.
a <- data.frame(x = 170)
result <- predict(relation,a)
print(result)
# Give the chart file a name.
png(file = "linearregression.png")

# Plot the chart.


plot(y,x,col = "blue",main = "Height & Weight Regression",
abline(lm(x~y)), xlab = "Weight in Kg", ylab = "Height in cm")
# save the file
dev.off()

Similarly, you can build program for multiple regression, and logistic
regression.
10. Distribution fit: Install “fitdistrplus” package
# Earthquake interevent time

Data<-c(5,6,9,10,11,13,14,23,38,49,50,65,66,66,76,90,105,109,132,175)

# fit distributions on the data


A<-fitdist(Data, 'weibull')
B<-fitdist(Data, 'lnorm')
C<-fitdist(Data, 'gamma')
D<-fitdist(Data, 'exp')
# histogram of the data
hist(Data)

# Summary of the results


summary(A)
summary(B)
summary(C)
summary(D)

# give a chart name


png(file= "A.png")
plot(A)
# save the file
dev.off()

#plot the data


par(mfrow=c(1,1.5))

a<-c("Weibull","lognormal","gamma", "exponential")

denscomp(list(A,B,C,D),legendtext = a)
cdfcomp(list(A,B,C,D),legendtext=a)
qqcomp(list(A,B,C,D),legendtext=a)
qqcomp(C)

ks.test(Data,"pweibull", shape , scale)


ks.test(Data,"plnorm", mean, std )
ks.test(Data,"pgamma",shape, rate )
ks.test(Data,"pexp", rate)

You might also like