R Notes
R Notes
Statistical inference
(statr) 1. StrC>
:library
c.
library (dplyr) use to return a list of objectand their structure.
3.
library (ggplotz)
2. Side
by side box plots =
-
Sum (n, na.rm:TRUE ) first variable sand variable
-
WA. remove
to compart mean of a
groups
a data 1.3
+group-by CA')%>1
-summarise (mean-weight :
mean weight)
↓ "habit' mean-weight'
I 1.23
-
2. 4.56
3. NA
7.84
NA to
compare
e,
no a
group
given (data weight, datas habit,
data
by summary)
parameter were
3. to conduct
interal-mean,
hypotesting confidence
estimating
a
sample
dist
-
sinference
Cy M,nitdeddata---Iac,
median
type: "e', null 0,
proportion
=
=
null
dist alternative:less method:simulation
greater
0.95
tidyvere
Install:
install
packages ('dplyr')
IPdd
>library (dplyr)
load EBA
1.
library stats ( 1. load data
2.
Library (aplyr)
3.
library (ggplotz) #summarize data
a. Summarize Data dataset
1.
Min: minimum value
2. IstQu=25th
percentile
3. Visualize the Data.
using (gyp1+)
skewness
3. Median:median value create
histogram of certain variable
-
Mean:mean
4. value
>ggplotC dataname, als (n-variable)) .5%
5. 3rd Qu:75th
percentive +geom-histogram (binwidth:10)
#create
scatterplotofvariable,
is variable a fix variable
using a
ggplot (data:dataname) +
grom-point(mapping:aes ( =
-1y
= - ))
Create
Barplot #create boxplot
#
+geom-boxplot
## deldy
d. arrange's ordering of
the row
#
# I 12.6
e. rename is Rename war, name
f.
group-by Graps data summarize group-by, *AFs, LA45 I
+
from nyc-adu
<
ran-flights. 1.71.
+group-by (Origin) 1.2%
+summarize (meandd:meansdlp-delay), sd-dd:
sd<dyp-delay), n n(s)
=
2 median
Measure
spread min
2 -
of 1 a max
3. Percentile
4.21 IQR
&
R codes -
Regression.
Scatterplot
a protecta spacey-y>
1. (
+
gen-point
dependent
war
&
which is more
wantto
are Vardadd when this compare
significantpredictor]
, independent L
M -xX - XX
Regression ·
Then to show model
output:
summary (model-name) summary
5 (M.xx -xx)
variable in modal
predictor
0
=
:value o f
y when wis
**- *I*
geom-smooth (method:"Im",
scatterplot se:FALSE)
on t =
stat
>plot- S
SE:TRUE the of Regression
-
- Im( to see standard error
-
,
*
En PAEph bEY
>abline C-1-)=LEMIAAY
4.1
Jitterplot a
ggplot
(data:-,aes (n=x, j.ys(
·
scatterplot +geom-point (position:position -jitter (r: -
,h
=
- 1)
↑
AA I:Do +
ylab("-"( · NULL
nab("-")
+
·0.1
5. View data of fitted & a New-of-name:cbind Cdf, model - name $fitted values, model- name presiduals)
Residual
* Residual
9. of
Checking
a
Multicollinearity
the
potential-predictoria,
Corplot (cor(potential predictor),
-
method: "number" s
Air 1 f
10. Run multiple
a.
Model-namec - 1m
(yU rulerva, data:-)
-
Linear Regression summary (model- name ( variables.
independent
1. Filter some variable.
t #Fr
saboc- [dataname] %.5% filter (variable="(
·
Histogram ↑
Scatterplot
·
geom-point(mapping y ()
(n
aes
1
= =
=
(data:
a
ggplot Itgeom-bar
me
(mapping aes
= (n=
fill= (-geom-col (position:"stack")
use to create bar chart
· to separate plot each
for differentvariable, use
facet-wrap.
(data:- +
ggplot
a
st.*A
e
Man!
3. Variance test
[abc $ (
I var
-
> var (abod $( 0.558.12
>war.test Cabos abcd$S 0
-
05
p
>
p
< 0.05
&
p10.05. Rejectnull, acceptHi :resethall acceptnull
ifC1 contain 1:nodiff
between arms of
study
4. Residual fitted
plot
is
a
plot (fitted (modelname), res)
, abline 10,01
2. ANOVA
One
way
·
>one-way<-900CY-A, data: 3
summary (one way
Two
way
·
, two (YrA +
B, data= C
way--nor
>summary (two-way s
Correlation
·
coefficient
s round (cor(dataname [CC'abs',' bad', 'def'(3),2]