Introduction To Statistics Using R - Session 2
Introduction To Statistics Using R - Session 2
using R
Seminar series - session 2
Session 2 – Learning objectives
• Understand the concept of probability distribution
• A very short intro to R and Rstudio
But first,
Distribution
The concept of distribution is the soul of data
analysis
6
Frequency and probability distribution
Frequency:
• Empirical
• Summarizes observed data
Probability:
• Theoretical
• Calculates the observation probabilities
7
An example
• We captured 5 sandgrouses, 7
ravens, and 3 falcons
• Frequency distribution of
species is 5, 7 and 3
• Probability distribution of
species is 5/15, 7/15 and 3/15
Why use probability?
• Data come from experiment or complex systems
• Variability and randomness
• Measure uncertainty in the data
• Probability measures uncertainty
Some definitions
• random variable: variable
which outcome varies from
measurement to measurement
• Probability distribution:
possible values a random
variable can take, and how
likely they are
• Behavior of random variables
must be defined using
probability. 10
Do not confuse random
variable and variable
• Event
Some more Set of outcomes from a chance
definitions experiment
more
What was the probability?
definitions 1/20*1/20*1/20 = 1/8000 = 0.000125
Probability
Distribution
Continuous Discrete
Probability
Distribution
Continuous Discrete
Continuous
variables
Probability
Distribution
Continuous Discrete
Continuous
variables
Continuum of value within range
Probability
Distribution
Continuous Discrete
Continuous Discrete
variables variables
Continuum of value within range
Probability
Distribution
Continuous Discrete
Continuous Discrete
variables variables
Continuum of value within range Only takes distinct values
Continuous variable:
Most • Normal (a.k.a. Gaussian)
common • Uniform (a.k.a. rectangular)
distributions • Beta
• gamma 24
Gaussian or Normal • Symmetrical
distribution • Defined by μ and σ
Uniform or rectangular distribution
1
n
• Count distribution
The Poisson • Defined by 1 parameter λ mean
distribution number of events
• λ = μ = σ2 (lambda = mean = var)
• As mean increases, Poisson
distributions approximate normal
The Poisson distribution
distribution • Count data with large mean can
often be modelled as continuous
Expected relative frequency
• A programming language
• A calculator
• An environment:
integrated suite of
software
• Open source
• Get under the hood,
unlike most software:
you know what you’re
doing and what’s going
on
Why is R?
• Data handling
• Computation
• Graphics
• integrated development
environment for R
• A tool to make your life easier
with R
Why is
RStudio?
tools for:
• plotting
• history
• debugging
• workspace
management
R resources
• https://r4ds.had.co.nz/ for data wrangling
and graphs
• List of sites I provided during the last
seminar
• I’m happy to help you trouble shoot your
code, but from now on, I’m expecting you
to first try these resources
1 3
2 4
1 Source editor
Black
Objects and anything you define or call
Blue
commands
Green
“characters”
# will not be considered as code, thus won’t run
1 Source editor
But you
can change
the theme!
2 R Console
>3+2
[1] 5
3 Environment pane
Very useful
Shows all your objects
3 Environment pane
Dataframes, matrices,
arrays and lists are under
“Data”
3 Environment pane
4 tabs of interest
4 Output pane
The package tab also has the install button from which
you can directly install packages
4 Output pane