0% found this document useful (0 votes)

45 views12 pages

Debarghya Das (Ba-1), 18021141033

This document provides an analysis of the Old Faithful geyser dataset using RStudio. It begins with an overview of the dataset and its variables. Descriptive statistics are then calculated, including the mean, standard deviation, range, and percentiles of the eruption and waiting time variables. The mean eruption time is 3.49 minutes and mean waiting time is 70.90 minutes. The minimum and maximum values are also reported to indicate the range for each variable. Percentiles at various probabilities are computed to gain insight into the distributions. This document demonstrates how to effectively summarize and explore the key attributes of a dataset using R.

Uploaded by

Rocking Heartbroker Deb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views12 pages

Debarghya Das (Ba-1), 18021141033

Uploaded by

Rocking Heartbroker Deb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

NAME:DEABRGHYA DAS DATA ANALYSIS ON RSTUDIO

PRN NO. :18021141033

Asus
Fri Dec 07 20:21:46 2018
FAITHFUL DATA SET:
Description
Waiting time between eruptions and the duration of the eruption for the Old Faithful geyser in
Yellowstone National Park, Wyoming, USA.

Usage
faithful

Format

A data frame with 272 observations on 2 variables.

[,1] eruptions numeric Eruption time in mins

[,2] waiting numeric Waiting time to next eruption (in mins)

Details

A closer look at faithful$eruptions reveals that these are heavily rounded times originally in seconds,
where multiples of 5 are more frequent than expected under non-human measurement. For a better
version of the eruption times, see the example below.

There are many versions of this dataset around: Azzalini and Bowman (1990) use a more complete
version.

Source

W. Härdle.

plot(faithful, col = "darkblue", cex = 2)

A. SUMMARY and DESCRIPTIVE STATISTICS

Summary (or descriptive) statistics are the first figures used to represent nearly every dataset.
They also form the foundations for more complicated computations and analyses. Therefore,
they are essential to the analysis process. In this paper, we will explore the ways in which R can
be used to calculate summary statistics, including mean, standard deviation, range, and
percentile. Also included here is the summary function, which is one of the most useful tools in
the R set of commands.
First, let us inspect the FAITHFUL dataset.

# Load the packages that contain the dataset and the data viz package
library(ggplot2)
library(MASS)
# Display or print the first 100 observations of our dataset
print(head(faithful, n = 100))
## eruptions waiting
## 1 3.600 79
## 2 1.800 54
## 3 3.333 74
## 4 2.283 62
## 5 4.533 85
## 6 2.883 55
## 7 4.700 88
## 8 3.600 85
## 9 1.950 51
## 10 4.350 85
## 11 1.833 54
## 12 3.917 84
## 13 4.200 78
## 14 1.750 47
## 15 4.700 83
## 16 2.167 52
## 17 1.750 62
## 18 4.800 84
## 19 1.600 52
## 20 4.250 79
## 21 1.800 51
## 22 1.750 47
## 23 3.450 78
## 24 3.067 69
## 25 4.533 74
## 26 3.600 83
## 27 1.967 55
## 28 4.083 76
## 29 3.850 78
## 30 4.433 79
## 31 4.300 73
## 32 4.467 77
## 33 3.367 66
## 34 4.033 80
## 35 3.833 74
## 36 2.017 52
## 37 1.867 48
## 38 4.833 80
## 39 1.833 59
## 40 4.783 90
## 41 4.350 80
## 42 1.883 58
## 43 4.567 84
## 44 1.750 58
## 45 4.533 73
## 46 3.317 83
## 47 3.833 64
## 48 2.100 53
## 49 4.633 82
## 50 2.000 59
## 51 4.800 75
## 52 4.716 90
## 53 1.833 54
## 54 4.833 80
## 55 1.733 54
## 56 4.883 83
## 57 3.717 71
## 58 1.667 64
## 59 4.567 77
## 60 4.317 81
## 61 2.233 59
## 62 4.500 84
## 63 1.750 48
## 64 4.800 82
## 65 1.817 60
## 66 4.400 92
## 67 4.167 78
## 68 4.700 78
## 69 2.067 65
## 70 4.700 73
## 71 4.033 82
## 72 1.967 56
## 73 4.500 79
## 74 4.000 71
## 75 1.983 62
## 76 5.067 76
## 77 2.017 60
## 78 4.567 78
## 79 3.883 76
## 80 3.600 83
## 81 4.133 75
## 82 4.333 82
## 83 4.100 70
## 84 2.633 65
## 85 4.067 73
## 86 4.933 88
## 87 3.950 76
## 88 4.517 80
## 89 2.167 48
## 90 4.000 86
## 91 2.200 60
## 92 4.333 90
## 93 1.867 50
## 94 4.817 78
## 95 1.833 63
## 96 4.300 72
## 97 4.667 84
## 98 3.750 75
## 99 1.867 51
## 100 4.900 82
# To see the variable in the dataset; Use names(dataset) or ls(dataset)
ls(faithful)
## [1] "eruptions" "waiting"
# To see the number of columns and number of rows in the FAITHFUL dataset;
use ncol(dataset) and nrow(dataset)
ncol(faithful)
## [1] 2
nrow(faithful)
## [1] 272

From the result output, the Prestige dataset contains 2 variables and 272 rows. A more
advanced technique to see the structure of the dataset is to use the str(DATAVAR) function
# A more advanced and complete way to see the structure of our dataset
str(faithful)
## 'data.frame': 272 obs. of 2 variables:
## $ eruptions: num 3.6 1.8 3.33 2.28 4.53 ...
## $ waiting : num 79 54 74 62 85 55 88 85 51 85 ...

FAITHFUL is a data.frame, a datatype with more than one row and column. FAITHFUL includes
2 numeric variables . From here, we can now perform a summary and descriptive statistics of our
dataset.
MEAN of EACH VARIABLE
In R, a mean can be calculated on an isolated variable via the mean(VAR) command, VAR is the
name of the variable whose mean we want to compute. Alternative, the mean can be calculated
for all the variables in the dataset using mean(DATAVAR) function, where DATAVAR is the
name of the dataset containing the variables. For analysis purposes, we are going to exclude the
variables census and type from the descriptive statistics. To select a subset of a dataset, use
subset(DATAVAR, select = c(“VAR1”, “VAR2”, “VAR3”….“VARi”)) command. You can also type
?subset() in your R console followed by ENTER to learn how to subset() vectors, matrices and
data frames. The code below demonstrates how to select a subset of FAITHFUL dataset

# Subsetting eruptions and waiting from the dataset FAITHFUL

subset.data <- subset(faithful, select = c("eruptions","waiting"))
# Checking subset.data to make sure we have the needed subset
str(subset.data)
## 'data.frame': 272 obs. of 2 variables:
## $ eruptions: num 3.6 1.8 3.33 2.28 4.53 ...
## $ waiting : num 79 54 74 62 85 55 88 85 51 85 ...

From the above output, we see that we got the subset we need. Let’s find the mean of each
variable in the selected subset.

# Calculate the mean of a variable with mean(DATAVAR$VAR); mean of variable

eruptions
mean(subset.data$eruptions)
## [1] 3.487783
THE AVERAGE ERUPTION TIME IS 3.49MINS
# Calculate the mean of a variable with mean(DATAVAR$VAR); mean of variable
waiting
mean(subset.data$waiting)
## [1] 70.89706
THE AVERAGE WAITING TIME IS 70.90MIN

STANDARD DEVIATION OF EACH VARIABLE

Standard deviations are calculated in the same way as means within R. The standard deviation
of a single variable can be computed using the formula sd(VAR), where VAR is the name of the
variable whose standard deviation you want to retrieve. Standard deviation measures how
spread your data are. The codes below demonstrate the use of the standard deviation function.

# What is the standard deviation of eruptions?

sd(subset.data$eruptions)
## [1] 1.141371
# What is the standard deviation of waiting?
sd(subset.data$waiting)
## [1] 13.59497

RANGE of EACH VARIABLE : MINIMUM & MAXIMUM

Continuing in the same trajectory, minimum can be computed on a single variable using the
min(VAR) formula. In the same token, max(VAR) operates similarly. Minimum and Maximum give
the min and max of individual variables in the dataset. The codes below show how to calculate
minimums and maximums.
# Minimum and Maximum of eruptions time of survey
min(subset.data$eruptions);max(subset.data$eruptions)
## [1] 1.6
## [1] 5.1

From the output, the minimum eruption time is 1.6mins and the maximum eruption time is
5.1mins
# Minimum and Maximum of waiting time of survey
min(subset.data$waiting);max(subset.data$waiting)
## [1] 43
## [1] 96

From the output, the minimum eruption time is 43mins and the maximum eruption time is 96mins

RANGE
The range of a particular variable, that is, its minimum and maximum, can be retrieved using the
range(VAR) command. Like with min and max functions, using range(dataset) is not very useful
since it considers the entire dataset, rather than each individual variable. Consequently, it is
recommended that ranges be computed on individual variables. These computations are
demonstrated in the following codes:

# Calculate the range of a variable with range(VAR)

# Range of variable eruptions
range(subset.data$eruptions)
## [1] 1.6 5.1
# Range of variable waiting
range(subset.data$waiting)
## [1] 43 96

PERCENTILES : VALUES from PERCENTILES (QUANTILE)

You can get more insight into the distribution of a set of observations by examining quantiles. A
quantile is a value computed from a collection of numeric measurements that indicates an
observation’s rank when compared to all other present observations. Alternatively, quantile can
be expressed as a percentile, this is identical but on a percent scale of 0 to 100.
Obtaining quantiles and percentile in R is done using the quantile() function. The command is
written quantile(VAR, c(PROB1, PROB2, PROB3,….PROBi)) or quantile(VAR, prob = c(prob
value1, prob value 2, prob value 3…prob valuei)).

# Calculate the 25th, 50th, 75th, and 95th percentiles for eruptions in the
subset dataset
quantile(subset.data$eruptions, prob = c(0,.25, .50, .75, .95))
## 0% 25% 50% 75% 95%
## 1.60000 2.16275 4.00000 4.45425 4.81700
# Calculate the 25th, 50th, 75th, and 95th percentiles for waiting in the
subset dataset
quantile(subset.data$waiting, prob = c(0,.25, .50, .75, .95))
## 0% 25% 50% 75% 95%
## 43 58 76 82 89

PERCENTILES FROM VALUES (PERCENTILE RANK)

In the opposite situation, where a percentile rank corresponding to a given value is needed, one
has to devise a custom method. Here are the steps involved in computing a percentile rank.

1. Count the number of data points that are at or below the given value
2. Divide by the total number of data points
3. multiple by 100

The formula for calculating a percentile rank can be derived from this command: percentile rank
= length(VAR[VAR <= VALUE]) / length(VAR) * 100. length(VAR[VAR <= VALUE]) counts the
number of data points in a variable that are below the given value. The ‘<=’ can be replaced with
other operators, such as ‘<’, ‘>’, and =. The length(VAR) counts the number of data points in the
variable. The final step is to multiply the result by 100 to transform the decimal value into a
percentage. Let’s apply these steps in the following examples:

# In the sample, 3mins of eruptions time is at what percentile rank?

length(subset.data$eruptions[subset.data$eruptions <= 3]) /
length(subset.data$eruptions) * 100
## [1] 35.66176
# In the sample, 70mins of waiting time is at what percentile rank?
length(subset.data$waiting[subset.data$waiting <= 70]) /
length(subset.data$waiting) * 100
## [1] 39.33824

SUMMARY
A very useful function in R is summary(x), where x can be one of any number of objects,
including datasets, variables, and linear models. When used, the summary(x) provides summary
data related to the individual object that is included into it. Thus, the summary function has a
different output depending on what kind of object it takes as an argument. This method is
valuable because it often sums up what we previously did and provides exactly what is needed in
summary statistics. Let’s apply this command to the sample dataset.

# Summarize a variable using summary(VAR). Summary statistics of eruptions

print(summary(subset.data$eruptions))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.600 2.163 4.000 3.488 4.454 5.100
# Summarize a variable using summary(VAR). Summary statistics of waiting
print(summary(subset.data$waiting))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 43.0 58.0 76.0 70.9 82.0 96.0

We get the min, 1st quartile, Median, Mean, 3rd quartile, and the maximum years of education.
This step provides more summary statistics information. We will apply,the same command this
time,to the dataset.
# Summarize the subset.data sample using the command summary(subset.data)
print(summary(subset.data))
## eruptions waiting
## Min. :1.600 Min. :43.0
## 1st Qu.:2.163 1st Qu.:58.0
## Median :4.000 Median :76.0
## Mean :3.488 Mean :70.9
## 3rd Qu.:4.454 3rd Qu.:82.0
## Max. :5.100 Max. :96.0

The output of the preceding summary provides the descriptive statistics of all objects in the
sample data set. Under each variable, we have its summary statistics. Now that we know how to
do summary statistics, we can delve into the visual part of the analysis to see the relation
between the variables.

DATA VISUALIZATION

1. MATRIX of PLOTS

The single type of planar scatterplot is really useful only when comparing two numeric-
continuous variables. When there are more continuous variables of interest, it is possible to
display this information satisfactorily on a single plot. A simple and common solution is to
generate a two-variable scatterplot for each pair of variables and show them together in a
structured way; this is referred to as a Scatterplot Matrix. We have four continuous/numeric
variables in the subset dataset we have selected. Working with base R graphics, use the pairs
function.

# Drawing a scatterplot matrix of eruptions and waiting using the pairs

function
pairs(subset.data, pch = 16, col = "blue", main = "Matrix Scatterplot of
eruptions,waiting")
Our matrix scatterplot may be too big to fit on our screen. However, if you run the above code in
your console, you will get the matrix plot in Rstudio’s graphics area. The interpretation of the
above plots depends upon the labeling of the diagonal panels, running from the top left to the
bottom right. They will appear in the same order as the columns given as the first argument.
These “label panels” allow you to determine which individual plot in the matrix corresponds to
each pair of variables. For instance, the first column of the scatterplot matrix corresponds to an x-
axis variable of education; the third row of the matrix corresponds to a y-axis variable of women,
and each row and column displays a scale that is constant moving left/right or up/down,
respectively. The plot of prestige(y) on income(x) at position row 4 and column 2 displays the
same data as the scatterplot at position row 2 and column 4 but flipped on its axis. Likewise, the
plot of income(y) on education(x) at position row 2 and column 1 displays the same data as the
scatterplot at column 2 row 1 flippedd on its axis. The scatterplot matrices therefore allow for
easier comparison of pairwise relationships formed by observations made on multiple continuous
variables. Instead of viewing a panoramic relationship between all the variables in our subset
dataset, let’s use simple scatter plots to visualize the relationship between two variables.

2. SCATTERPLOT

A scatterplot is a useful way to visualize the relationship between two variables displayed as x-y
coordinate plots. Similar to correlations, scatterplots are often used to make initial diagnoses
before any statistical analyses are performed.
The simplest way to create a scatterplot is to directly graph two variables using the default
settings. In R, this can be achieved using the formula plot(VARX, VARY) function, where VARX
is the variable to plot along the x-axis and VARY is the variable to plot along the y-axis. I will also
add the ggplot2 version for the scatterplot. Let’s look at the relationship between eruptions and
waiting.

# Plot of the relationship between eruption and waiting

ggplot(subset.data) +
geom_point(aes(x = eruptions, y = waiting), col = 'blue', size = 3) +
ggtitle("Eruptions vs. waiting Scatterplot") +
theme(plot.title = element_text(hjust = 0.5))

The scatterplot also displays the presence of potential outliers. We can further expand our visual
analysis by calculating the slope and intercept of line of best fit.

# Calculate slope and intercept of line of best fit

coef(lm(waiting ~ eruptions, data = subset.data))
## (Intercept) eruptions
## 33.47440 10.72964

We will use geom_abline() function to add the intercept avlue and estimated coefficient of
eruptions to our plot.

# Adding a line of best fit; intercept and slope

ggplot(subset.data) +

geom_point(aes(x = eruptions, y = waiting), col = "blue", size = 3) +

geom_abline(aes(intercept = -2853, slope = 899), col = "darkred") +

ggtitle("Eruptions vs. Waiting Scatterplot With The Best Fit Line") +

theme(plot.title = element_text(hjust = 0.5))

Aircraft Landing Gear Design Project
100% (1)
Aircraft Landing Gear Design Project
20 pages
Stastistics and Probability With R Programming Language: Lab Report
50% (2)
Stastistics and Probability With R Programming Language: Lab Report
44 pages
Statistics Exp 1
100% (1)
Statistics Exp 1
15 pages
Protection Coordination at Utp
No ratings yet
Protection Coordination at Utp
57 pages
History of Mathematics Reviewer Compilation of Lesson
No ratings yet
History of Mathematics Reviewer Compilation of Lesson
26 pages
Mens Fitness - November 2013 AU
100% (1)
Mens Fitness - November 2013 AU
140 pages
SML Practical 1to11
No ratings yet
SML Practical 1to11
23 pages
BP Khazzan Concrete KZ02-CV-SPE-68-2002-C02
No ratings yet
BP Khazzan Concrete KZ02-CV-SPE-68-2002-C02
82 pages
Lec 13
No ratings yet
Lec 13
46 pages
Design and Operation of Fermenters
No ratings yet
Design and Operation of Fermenters
82 pages
R Examples
No ratings yet
R Examples
56 pages
Terro's REA
No ratings yet
Terro's REA
43 pages
R Intro 2011
No ratings yet
R Intro 2011
115 pages
R Lab Manual
No ratings yet
R Lab Manual
31 pages
Vaishnav Pawar r3
No ratings yet
Vaishnav Pawar r3
32 pages
Lab File AD PDF
No ratings yet
Lab File AD PDF
25 pages
G8 - English - Week 1&2 MODULE 7 Q2 Context Clues
No ratings yet
G8 - English - Week 1&2 MODULE 7 Q2 Context Clues
22 pages
Preprocessing - Preprocessing Your Data With R
No ratings yet
Preprocessing - Preprocessing Your Data With R
23 pages
22bce1859 Rprogramming
No ratings yet
22bce1859 Rprogramming
29 pages
Rcmds From Class
No ratings yet
Rcmds From Class
17 pages
Lec7 8
No ratings yet
Lec7 8
28 pages
Maths Lab
No ratings yet
Maths Lab
17 pages
WEEK
No ratings yet
WEEK
17 pages
Recent Developments in Compact Spinning
No ratings yet
Recent Developments in Compact Spinning
17 pages
Bai11 1 1
No ratings yet
Bai11 1 1
11 pages
Practical 10
No ratings yet
Practical 10
22 pages
Chapter 5
No ratings yet
Chapter 5
22 pages
CsvfilesProjectCOMPUTER 1
No ratings yet
CsvfilesProjectCOMPUTER 1
28 pages
A Brief Introduction To Linear Models in R
No ratings yet
A Brief Introduction To Linear Models in R
21 pages
212011497-4SE5-Kautsar Hilmi Izzuddin Pertemuan 5
No ratings yet
212011497-4SE5-Kautsar Hilmi Izzuddin Pertemuan 5
13 pages
MBA SectionD MBA20235 PranayGupta Assignment R
No ratings yet
MBA SectionD MBA20235 PranayGupta Assignment R
16 pages
Sheet1 Sol
No ratings yet
Sheet1 Sol
10 pages
Useful R Commands
No ratings yet
Useful R Commands
17 pages
Lab Wk1soln PDF
No ratings yet
Lab Wk1soln PDF
14 pages
Chapter 6. Product Limit Stimator Peter Smith: Required Libraries
No ratings yet
Chapter 6. Product Limit Stimator Peter Smith: Required Libraries
9 pages
Practical List
No ratings yet
Practical List
21 pages
STAT2 2e R Markdown Files Sec4.4
No ratings yet
STAT2 2e R Markdown Files Sec4.4
13 pages
Kautsar Hilmi Izzuddin - Tugas SAE P5
No ratings yet
Kautsar Hilmi Izzuddin - Tugas SAE P5
13 pages
R Commands
No ratings yet
R Commands
5 pages
Last-Mile Delivery - A Six Sigma Approach (SIBM-H)
100% (1)
Last-Mile Delivery - A Six Sigma Approach (SIBM-H)
13 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
LSD
No ratings yet
LSD
7 pages
Week 2-A.Guess The Distribution
No ratings yet
Week 2-A.Guess The Distribution
10 pages
R Notes For Data Analysis and Statistical Inference
No ratings yet
R Notes For Data Analysis and Statistical Inference
10 pages
Lab0 R Tutorial EHS
No ratings yet
Lab0 R Tutorial EHS
9 pages
Practical 4
No ratings yet
Practical 4
9 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
8 pages
R Console
No ratings yet
R Console
6 pages
Experiment 11
No ratings yet
Experiment 11
9 pages
A Short List of Some Useful R Commands: Input and Display
No ratings yet
A Short List of Some Useful R Commands: Input and Display
2 pages
R Session - Note2 - Updated
No ratings yet
R Session - Note2 - Updated
7 pages
Exame Do Dia 13 12 2019
No ratings yet
Exame Do Dia 13 12 2019
8 pages
Midterm STAT380 (Part2) Rawan
No ratings yet
Midterm STAT380 (Part2) Rawan
9 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
Answser Keys To Practices in R Short Course R Basics: Practice 1
No ratings yet
Answser Keys To Practices in R Short Course R Basics: Practice 1
7 pages
R Code
No ratings yet
R Code
9 pages
Clase 01: Library Library Library Library Library Library Library Library Library Read - CSV
No ratings yet
Clase 01: Library Library Library Library Library Library Library Library Library Read - CSV
7 pages
Paint - Code 1
No ratings yet
Paint - Code 1
14 pages
BAN5
No ratings yet
BAN5
2 pages
Ejercicio 21
No ratings yet
Ejercicio 21
5 pages
R Cheatsheet ABC
No ratings yet
R Cheatsheet ABC
3 pages
Tugas Uts Kemo
No ratings yet
Tugas Uts Kemo
5 pages
File Show-11
No ratings yet
File Show-11
5 pages
The British Industrial Revolution in Global Perspective
No ratings yet
The British Industrial Revolution in Global Perspective
4 pages
P3000 English Book
No ratings yet
P3000 English Book
225 pages
CIND123 Lab 1 Console
No ratings yet
CIND123 Lab 1 Console
4 pages
Supply Chain Management in Canadian Tire: Jie Wang, Ph.D. PMP, (P.E.)
No ratings yet
Supply Chain Management in Canadian Tire: Jie Wang, Ph.D. PMP, (P.E.)
66 pages
R Cheatsheet ABC
No ratings yet
R Cheatsheet ABC
3 pages
R Cheatsheet ABCD
No ratings yet
R Cheatsheet ABCD
3 pages
R Cheatsheet ABCD
No ratings yet
R Cheatsheet ABCD
3 pages
IoT Lab Manual
No ratings yet
IoT Lab Manual
60 pages
Rear Spring Removal and Installation
No ratings yet
Rear Spring Removal and Installation
4 pages
Case-Commerce Bank: Submitted By, Debarghya Das PRN No.18021141033
No ratings yet
Case-Commerce Bank: Submitted By, Debarghya Das PRN No.18021141033
5 pages
Case Study of Dabbawalas
No ratings yet
Case Study of Dabbawalas
2 pages
Qualcom Case
No ratings yet
Qualcom Case
111 pages
Business Process Management - Flipkart
No ratings yet
Business Process Management - Flipkart
11 pages
T L 9254 Letter Formation With Rhymes Powerpoint
No ratings yet
T L 9254 Letter Formation With Rhymes Powerpoint
30 pages
Oberoi Hotel (Debarghya Das)
0% (1)
Oberoi Hotel (Debarghya Das)
3 pages
Wind Power Solutions
No ratings yet
Wind Power Solutions
10 pages
Nalco Ash Pond Case
No ratings yet
Nalco Ash Pond Case
10 pages
Sustainable Manufacturing Operations (Debarghya Das)
No ratings yet
Sustainable Manufacturing Operations (Debarghya Das)
6 pages
Consumer Behaviour
No ratings yet
Consumer Behaviour
31 pages
CDM HealthCareAnalyst ClinicalSAS
No ratings yet
CDM HealthCareAnalyst ClinicalSAS
5 pages
Nike
No ratings yet
Nike
33 pages
ThoughtWorks Case
No ratings yet
ThoughtWorks Case
12 pages
Qualcom Case
No ratings yet
Qualcom Case
111 pages
Tribology International: Ronaldo Câmara Cozza
No ratings yet
Tribology International: Ronaldo Câmara Cozza
11 pages
Orotol Plus GB 1219
No ratings yet
Orotol Plus GB 1219
13 pages
Soul Bound
No ratings yet
Soul Bound
11 pages
Digital Photography and Enhancement
No ratings yet
Digital Photography and Enhancement
9 pages
Diabetes Type 2 Medications
No ratings yet
Diabetes Type 2 Medications
10 pages
Sharing Blue Gold: Locating Water Conflicts in India
No ratings yet
Sharing Blue Gold: Locating Water Conflicts in India
17 pages
PPM Final Project
No ratings yet
PPM Final Project
16 pages
About Indigo Airlines and Analysis of Their Successes and Failures
No ratings yet
About Indigo Airlines and Analysis of Their Successes and Failures
3 pages
Why No One Loves You - Expanded Script
No ratings yet
Why No One Loves You - Expanded Script
4 pages
Match Making
No ratings yet
Match Making
4 pages
Saturation Lab Report
0% (1)
Saturation Lab Report
4 pages
DS-2CE76K0T-LMFS Datasheet 20240626
No ratings yet
DS-2CE76K0T-LMFS Datasheet 20240626
4 pages
Report of Doing in Business
No ratings yet
Report of Doing in Business
3 pages
Evolution Changes Over Time Questions
No ratings yet
Evolution Changes Over Time Questions
2 pages
Astec Fury Burner EN
No ratings yet
Astec Fury Burner EN
2 pages
Program - AFMP Orientation and Workshop
No ratings yet
Program - AFMP Orientation and Workshop
2 pages
Economic Performance: Corporate Governance
No ratings yet
Economic Performance: Corporate Governance
2 pages
Our Strength Comes From Shivaji and Devotion Comes From Lord
No ratings yet
Our Strength Comes From Shivaji and Devotion Comes From Lord
2 pages
Antisana Ecological Reserve
No ratings yet
Antisana Ecological Reserve
1 page
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

Debarghya Das (Ba-1), 18021141033

Uploaded by

Debarghya Das (Ba-1), 18021141033

Uploaded by

NAME:DEABRGHYA DAS DATA ANALYSIS ON RSTUDIO

PRN NO. :18021141033

A data frame with 272 observations on 2 variables.

[,1] eruptions numeric Eruption time in mins

[,2] waiting numeric Waiting time to next eruption (in mins)

plot(faithful, col = "darkblue", cex = 2)

# Subsetting eruptions and waiting from the dataset FAITHFUL

# Calculate the mean of a variable with mean(DATAVAR$VAR); mean of variable

STANDARD DEVIATION OF EACH VARIABLE

# What is the standard deviation of eruptions?

RANGE of EACH VARIABLE : MINIMUM & MAXIMUM

# Calculate the range of a variable with range(VAR)

PERCENTILES : VALUES from PERCENTILES (QUANTILE)

PERCENTILES FROM VALUES (PERCENTILE RANK)

# In the sample, 3mins of eruptions time is at what percentile rank?

# Summarize a variable using summary(VAR). Summary statistics of eruptions

# Drawing a scatterplot matrix of eruptions and waiting using the pairs

# Plot of the relationship between eruption and waiting

# Calculate slope and intercept of line of best fit

# Adding a line of best fit; intercept and slope

geom_point(aes(x = eruptions, y = waiting), col = "blue", size = 3) +

geom_abline(aes(intercept = -2853, slope = 899), col = "darkred") +

ggtitle("Eruptions vs. Waiting Scatterplot With The Best Fit Line") +

theme(plot.title = element_text(hjust = 0.5))

You might also like