0% found this document useful (0 votes)

3 views

F24_Lab-01 (1)

Uploaded by

JUBAYAD

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

F24_Lab-01 (1)

Uploaded by

JUBAYAD

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

STAT 7000: Lab 1.

Descriptive Statistics, Distributions

Yasin Fatemi, JooChul Lee

8/28/2024

1. Working with Data Frames

Let us begin with a simple data set. In 1609 Galileo proved mathematically that the trajectory of a body
falling with a horizontal velocity component is a parabola. His search for an experimental setting in which
horizontal motion was not affected appreciably (to study inertia) let him to construct a certain apparatus.
The data comes from one of his experiments.

Location <- c("A", "A", "A", "B", "B", "B", "C")

Height <- c(100,200,300,450,600,800,1000)
Distance <- c(253,337,395,451,495,534,573)

Create a data frame called Galileo with the two variables

Galileo <- data.frame(Location, Height, Distance)

Check contents of the data frame

Galileo # display the content of the dataframe

head(Galileo) # display the first 6 rows
tail(Galileo) # display the last 6 rows
str(Galileo) # display the structure
dim(Galileo) # the number of rows and columns
names(Galileo) # the names of the variables

Index the data frame

Galileo$Height # output a vector

Galileo[[2]] # output a vector (same as previous)
Galileo[2] # output a data frame
Galileo["Height"] # output a data frame (same as previous)
Galileo[c(1,3)] # output a data frame
Galileo[-2] # output a data frame (same as previous)

Galileo[1,2] # value in row 1, column 2

Galileo[ ,2] # all values in column 2
Galileo[1, ] # all values in row 1

1
Summary statistics

summary(Galileo) # summary statistics for a data frame

summary(Galileo$Distance) # summary statistics for a variable

length(Galileo$Distance) # count the number of components

mean(Galileo$Distance) # mean
sd(Galileo$Distance) # standard deviation
var(Galileo$Distance) # variance
min(Galileo$Distance) # minimum
max(Galileo$Distance) # maximum
median(Galileo$Distance) # median
IQR(Galileo$Distance) # IQR

Create and add variables to data frames

Create a variable for estimated height D.Hat = 200 + .708 Height − .000344 Height2 and add it to the data
frame Galileo.

Galileo$D.Hat <- 200 + .708Height - .000344Heightˆ2

Create a new variable LO that takes a value of TRUE when the estimated distance is lower than the measured
distance (D.Hat < Distance) and a value of FALSE otherwise and add it to the data frame Galileo. Use
this to get a subset of the Galileo data frame removing the observations for which the estimated distance
is lower than the measured distance.

# Create the variable LO

Galileo$LO <- Galileo$D.Hat < Galileo$Distance

# Remove cases whose estimated distance is lower than the measured distance
Galileo[!Galileo$LO, ]

2. Motivation and Creativity

For Case Study 1: Motivation and Creativity from the textbook, the following questions are posed: Do grad-
ing systems promote creativity in students? Do ranking systems and incentive awards increase productivity
among employees? Do rewards and praise stimulate children to learn?
Data from an experiment concerning the effects of intrinsic and extrinsic motivation on creativity. Subjects
with considerable experience in creative writing were randomly assigned to one of two treatment groups.
(page 2 of the textbook).
Install the package associated with the textbook data. You only need to do this once.

install.packages("Sleuth3")

Load the library and look at the summary of the data.

library(Sleuth3)
summary(case0101)

2
Obtain summary statistics of the scores for the two treatment groups.

# save scores of intrinsic

int.score <- case0101$Score[case0101$Treatment == "Intrinsic"]

# save scores of extrinsic

ext.score <- case0101$Score[case0101$Treatment == "Extrinsic"]

# get summary statistics of the two

summary(int.score)
summary(ext.score)

Plot side-by-side histograms of scores for the two treatments.

par(mfrow=c(1,2))
hist(int.score)
hist(ext.score)

Obtain stem-and-leaf plots of scores for the two treatments.

stem(int.score)
stem(ext.score)

Find the average score difference between the two treatment groups.

mean(int.score) - mean(ext.score)

Compare the variances of the scores in the two treatments.

var(int.score)
var(ext.score)

Draw a comparison boxplot of the scores for the two treatments.

boxplot(Score ~ Treatment, data = case0101)

3. Gross Domestic Product (GDP) per Capita

The data file ex0116 contains the gross domestic product per capita for 228 countries in 2010 on the
following 3 variables: Rank: rank order of country from highest to lowest GDP; Country: name of country;
PerCapitaGDP: per capita GDP in $US.
Obtain the summary statistics for the data.

summary(ex0116)

Draw a histogram of per capita GDPs with a bin width of $5,000.

3
hist(ex0116$PerCapitaGDP, breaks = seq(0, max(ex0116$PerCapitaGDP)+5000, 5000))

Make a boxplot of per capita GDPs.

boxplot(ex0116$PerCapitaGDP)

Flag potential outliers.

# Compute the inner fences

LIF <- quantile(ex0116$PerCapitaGDP, .25) - 1.5*IQR(ex0116$PerCapitaGDP)
UIF <- quantile(ex0116$PerCapitaGDP, .75) + 1.5*IQR(ex0116$PerCapitaGDP)

# Get a list of countries with GDP < LIF or GDP > UIF
ex0116[ex0116$PerCapitaGDP < LIF | ex0116$PerCapitaGDP > UIF, ]

4. Exercise
1. Download the baseball data set baseball.csv given on the Canvas module for this lab. It contains
data from the back-side of 59 baseball cards. The file has 59 observations on the following 6 variables:
height: Height in inches; weight: Weight in pounds; bat: a factor with levels L R S; throw: a factor
with levels L R; field: a factor with levels 0 1, average: ERA if the player is a pitcher and his batting
average if the player is a fielder.
2. Create a data frame.

3. Calculate the mean standard deviation of the ERA of pitchers.

4. Calculate the mean standard deviation of the batting average of fielders.
5. Define a new variable BMI defined by
weight × 703
bmi =
height2
and add it to the data frame.

6. Sort the observations in increasing BMI order.

7. Draw a comparison boxplot of the BMIs of pitchers (field = 0) and fielders (field = 1).
8. Calculate the mean and standard deviation of the heights, weights, and BMIs of fielders.

9. Calculate the difference between the mean ERA of pitchers who are classified as overweight (BMI ≥ 25)
and the mean ERA of pitchers with BMI < 25.
10. Create a new data frame owbb that contains baseball players classified as overweight according to their
BMI.

Subjective Assessment Form
No ratings yet
Subjective Assessment Form
8 pages
A330 21 L3
100% (4)
A330 21 L3
341 pages
QM 2 Tute 3
No ratings yet
QM 2 Tute 3
32 pages
Lab 5
0% (1)
Lab 5
5 pages
Activity No. 1 For TLE 108.1
88% (8)
Activity No. 1 For TLE 108.1
2 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
R Notes For Data Analysis and Statistical Inference
No ratings yet
R Notes For Data Analysis and Statistical Inference
10 pages
R_record-1
No ratings yet
R_record-1
57 pages
R For Data Exploration
No ratings yet
R For Data Exploration
52 pages
R Regression Commands
No ratings yet
R Regression Commands
5 pages
Introduction To Quantitative Analysis. Leonardo D. Villamil. HW2 09/26/2016
No ratings yet
Introduction To Quantitative Analysis. Leonardo D. Villamil. HW2 09/26/2016
7 pages
Unit 1 Assignment SKELETON R spr18
No ratings yet
Unit 1 Assignment SKELETON R spr18
23 pages
Aditya Garg DMDW
No ratings yet
Aditya Garg DMDW
40 pages
Commands for Data Analysis using R
No ratings yet
Commands for Data Analysis using R
11 pages
100 Anova
No ratings yet
100 Anova
4 pages
Questions With No Solutions
No ratings yet
Questions With No Solutions
20 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
R Intro 2011
No ratings yet
R Intro 2011
115 pages
MDPN460 Lecture05
No ratings yet
MDPN460 Lecture05
32 pages
Lecture 10 R
No ratings yet
Lecture 10 R
117 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Chapter - 3 Common Statistical Procedure
No ratings yet
Chapter - 3 Common Statistical Procedure
20 pages
Lab2
No ratings yet
Lab2
22 pages
STAT501 Online - HW2R - Spring2024
No ratings yet
STAT501 Online - HW2R - Spring2024
7 pages
Functions and Packages
No ratings yet
Functions and Packages
7 pages
MultivariateRGGobi PDF
No ratings yet
MultivariateRGGobi PDF
60 pages
Lab file AD pdf
No ratings yet
Lab file AD pdf
25 pages
Parta PDF
No ratings yet
Parta PDF
153 pages
Computational Techniques in Statistics: Exercise 1
No ratings yet
Computational Techniques in Statistics: Exercise 1
5 pages
Algorithm M
No ratings yet
Algorithm M
8 pages
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
No ratings yet
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
39 pages
RMPE_handout
No ratings yet
RMPE_handout
9 pages
Unit3__R
No ratings yet
Unit3__R
19 pages
Karan Parmar BBA (MS) Section-A - R-Programming Assignment
No ratings yet
Karan Parmar BBA (MS) Section-A - R-Programming Assignment
21 pages
R Manual PDF
No ratings yet
R Manual PDF
78 pages
Report Stats PDF
No ratings yet
Report Stats PDF
23 pages
AMDA Practical - A048
No ratings yet
AMDA Practical - A048
35 pages
2023 Tutorial 12
No ratings yet
2023 Tutorial 12
6 pages
BES - R Lab 6
No ratings yet
BES - R Lab 6
7 pages
Cost Practical
No ratings yet
Cost Practical
13 pages
ProbList2-24-Sln
No ratings yet
ProbList2-24-Sln
20 pages
Descriptive and Inferential Statistics With R
No ratings yet
Descriptive and Inferential Statistics With R
6 pages
Basics of Data Analysis and Graphics In
No ratings yet
Basics of Data Analysis and Graphics In
103 pages
Carlos_Willis_Problem-Set-1
No ratings yet
Carlos_Willis_Problem-Set-1
10 pages
Lab3Instructions_Knitr
No ratings yet
Lab3Instructions_Knitr
5 pages
Assignment 1
No ratings yet
Assignment 1
12 pages
Lab Wk1soln PDF
No ratings yet
Lab Wk1soln PDF
14 pages
Mock Exam - Appendix
No ratings yet
Mock Exam - Appendix
15 pages
Get (Ebook) Advanced High School Statistics by David Diez, Mine Cetinkaya-Rundel, Leah Dorazio, Christopher D. Barr free all chapters
100% (2)
Get (Ebook) Advanced High School Statistics by David Diez, Mine Cetinkaya-Rundel, Leah Dorazio, Christopher D. Barr free all chapters
57 pages
Lab 5 - Shell
No ratings yet
Lab 5 - Shell
7 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
Lecture-1 Descriptive Statistics
No ratings yet
Lecture-1 Descriptive Statistics
50 pages
Lab0 R Tutorial EHS
No ratings yet
Lab0 R Tutorial EHS
9 pages
INTRODUCTION TO PSYCH PACKAGE
No ratings yet
INTRODUCTION TO PSYCH PACKAGE
65 pages
Statistical Computing by Using R
100% (1)
Statistical Computing by Using R
11 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
Intro To Statistic Using R - Session 1
No ratings yet
Intro To Statistic Using R - Session 1
1 page
R
No ratings yet
R
4 pages
07C LineOfBestFit
No ratings yet
07C LineOfBestFit
10 pages
An Introduction To The Psych Package: Part I: Data Entry and Data Description
No ratings yet
An Introduction To The Psych Package: Part I: Data Entry and Data Description
63 pages
Descriptives: Notes
No ratings yet
Descriptives: Notes
39 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Exam-1
No ratings yet
Exam-1
12 pages
CIVL_3010_Fall_2024_HW_1
No ratings yet
CIVL_3010_Fall_2024_HW_1
6 pages
Exam-2
No ratings yet
Exam-2
5 pages
Ce-6004 Lecture-8
No ratings yet
Ce-6004 Lecture-8
2 pages
Lecture 11
No ratings yet
Lecture 11
14 pages
RSTP A Comprehensive Study
No ratings yet
RSTP A Comprehensive Study
6 pages
The Conservation of The Flood Flow Zone of Turag River and Compact Township Development Project, Turag
No ratings yet
The Conservation of The Flood Flow Zone of Turag River and Compact Township Development Project, Turag
6 pages
Assignment # 2: Q1. What Are The Transportation Planning Institutions in Bangladesh and How Do They Work?
No ratings yet
Assignment # 2: Q1. What Are The Transportation Planning Institutions in Bangladesh and How Do They Work?
2 pages
Shear Strength of Soildetails
No ratings yet
Shear Strength of Soildetails
53 pages
Geology Module 1
No ratings yet
Geology Module 1
33 pages
TEPZZ 894 - ZA - T: European Patent Application
No ratings yet
TEPZZ 894 - ZA - T: European Patent Application
8 pages
Raheja Part Rate Released
No ratings yet
Raheja Part Rate Released
11 pages
Creamy & Juicy Menu - 20240619 - 073938 - 0000
No ratings yet
Creamy & Juicy Menu - 20240619 - 073938 - 0000
4 pages
Ishac Pharmacology Intro
No ratings yet
Ishac Pharmacology Intro
3 pages
Menu Sarinah
No ratings yet
Menu Sarinah
7 pages
Navajo Story Monster Slayer
No ratings yet
Navajo Story Monster Slayer
5 pages
Structural Works - Sharing
No ratings yet
Structural Works - Sharing
37 pages
Paediatrics
No ratings yet
Paediatrics
132 pages
SB - 15 Varilla Saldobrase
No ratings yet
SB - 15 Varilla Saldobrase
1 page
Continer Requirement - Final
No ratings yet
Continer Requirement - Final
17 pages
Science Quiz Bee - Grade 9
33% (3)
Science Quiz Bee - Grade 9
23 pages
Final Project Report Format
No ratings yet
Final Project Report Format
27 pages
ESPAÑOL Traduccion MADRE TERESA CALCULTA
No ratings yet
ESPAÑOL Traduccion MADRE TERESA CALCULTA
10 pages
PWC Emerging Mhealth Countries
No ratings yet
PWC Emerging Mhealth Countries
59 pages
journalsresaim,+IJRESM V3 I8 137
No ratings yet
journalsresaim,+IJRESM V3 I8 137
4 pages
ITAPTECH AUTOMATIC-AIR-VENT Eng
No ratings yet
ITAPTECH AUTOMATIC-AIR-VENT Eng
14 pages
Income Tax and GST PDF
No ratings yet
Income Tax and GST PDF
152 pages
Scientific Report
No ratings yet
Scientific Report
92 pages
Summary of Competency Plan
No ratings yet
Summary of Competency Plan
7 pages
Analytical Paragraph Writing Class10 Term2
No ratings yet
Analytical Paragraph Writing Class10 Term2
4 pages
Role of Financial Institution in Economic Development
No ratings yet
Role of Financial Institution in Economic Development
4 pages
S124-Solar Based Mobile Charger For Rural Areas
No ratings yet
S124-Solar Based Mobile Charger For Rural Areas
4 pages
Buy ebook (Ebook) Qualitative Research in Communication Disorders: An introduction for students and clinicians by RENA LYONS, LINDY MCALLISTER cheap price
100% (7)
Buy ebook (Ebook) Qualitative Research in Communication Disorders: An introduction for students and clinicians by RENA LYONS, LINDY MCALLISTER cheap price
71 pages
Etl 1110-2-311
No ratings yet
Etl 1110-2-311
12 pages
TRỌNG TÂM KIẾN THỨC TIẾNG ANH 7
No ratings yet
TRỌNG TÂM KIẾN THỨC TIẾNG ANH 7
140 pages

F24_Lab-01 (1)

Uploaded by

F24_Lab-01 (1)

Uploaded by

STAT 7000: Lab 1.

Descriptive Statistics, Distributions

1. Working with Data Frames

Location <- c("A", "A", "A", "B", "B", "B", "C")

Create a data frame called Galileo with the two variables

Galileo <- data.frame(Location, Height, Distance)

Check contents of the data frame

Galileo # display the content of the dataframe

Index the data frame

Galileo$Height # output a vector

Galileo[1,2] # value in row 1, column 2

summary(Galileo) # summary statistics for a data frame

length(Galileo$Distance) # count the number of components

Create and add variables to data frames

Galileo$D.Hat <- 200 + .708*Height - .000344*Heightˆ2

# Create the variable LO

2. Motivation and Creativity

Load the library and look at the summary of the data.

# save scores of intrinsic

# save scores of extrinsic

# get summary statistics of the two

Plot side-by-side histograms of scores for the two treatments.

Obtain stem-and-leaf plots of scores for the two treatments.

Compare the variances of the scores in the two treatments.

Draw a comparison boxplot of the scores for the two treatments.

boxplot(Score ~ Treatment, data = case0101)

3. Gross Domestic Product (GDP) per Capita

Draw a histogram of per capita GDPs with a bin width of $5,000.

Make a boxplot of per capita GDPs.

Flag potential outliers.

# Compute the inner fences

3. Calculate the mean standard deviation of the ERA of pitchers.

6. Sort the observations in increasing BMI order.

You might also like

Galileo$D.Hat <- 200 + .708Height - .000344Heightˆ2