0% found this document useful (0 votes)
24 views

Stat and Prob

Statistics is the science of collecting, analyzing, and drawing conclusions from data. It involves methods like descriptive statistics which describe data properties, inferential statistics which test hypotheses and draw conclusions, and probability which quantifies likelihoods of events. Important distributions in statistics include the binomial, Poisson, and normal distributions which model count data and continuous variables. Random sampling techniques are used to select representative samples from populations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Stat and Prob

Statistics is the science of collecting, analyzing, and drawing conclusions from data. It involves methods like descriptive statistics which describe data properties, inferential statistics which test hypotheses and draw conclusions, and probability which quantifies likelihoods of events. Important distributions in statistics include the binomial, Poisson, and normal distributions which model count data and continuous variables. Random sampling techniques are used to select representative samples from populations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

01 HANDOUT AND PPT

Statistics

 A science that studies data to be able to make a decision


 It is a tool in the decision-making process
 It involves the methods of collecting, processing, summarizing, and analyzing data in order to
provide answers or solutions to an inquiry

FAMOUS STATISTICIANS
 Gertrude Cox  William Sealy Gosset
 Florence Nightingale  Ronald A. Fisher
 J. Stuart Hunter  George E.P. Box
 John Carl Friedrich Gauss  Thomas Bayes

Area of Statistics
Descriptive Statistics
- Describes the properties of sample and population data
- Include mean (average), variance, skewness and kurtosis

Inferential Statistics
- Use those properties to test hypotheses and draw conclusions
- Include linear regression analysis, analysis of variance (ANOVA), and null hypothesis testing

Sources of Data
Primary Data – the researcher gathers the data him/herself
Secondary Data – the researcher uses data gathered by somebody

Data Science
- The center of data science is data, especially Big Data
- The purpose of data science is to obtain information or knowledge from the data that will help
in making better decisions and understanding the development and change of nature or
society better
- Data science is a multidisciplinary field that has applied theories and technologies from several
disciplines
R is a language and environment for statistical computing and graphics developed by Bell
Laboratories (present-day Lucent Technologies).
Python is an object-oriented, interpreted, and interactive programming language developed by Guido
van Rossum.
The SAS language is a programming language developed by Anthony James Barr as a statistical
analysis tool.

Probability
- A number that reflects the chance or likelihood that a particular event will occur
- 0 to 1 or 0% to 100%

Interpretation of Probability
Classical – equally likely to happen
Frequentist – long frequency of repeatable experiments
Subjective – a probability derived from an individual’s personal judgement or own experience
Bayesian – measures a degree of belief

Sample Space – the collection of all possible outcomes


Tree Diagram – a way of organizing the information of two or more probability events
Events – the set of outcomes from an experiment
 Union – combine the elements of the 2 sets
 Intersection – must be in BOTH sets
 Mutually Exclusive/Disjoint – 2 events have no elements in common
02 HANDOUT AND PPT
Random Variables – is a set of possible values from a random experiment

Types of Random Variable


Discrete – countable or finite (whole number)
Continuous – infinite (decimal)

Discrete Probability Distribution – is a table, graph, or a formula listing all possible values that a
discrete random variable can take on, along with the associated probabilities
03 HANDOUT AND PPT
Binomial Distribution

 In an experiment of trials, each trial has two (2) possible outcomes: success or failure.
 The trials are independent, meaning, the result of the first trial does not affect the result of the
next.
 The process is called binomial experiment, and each trial in a process that has two (2) possible
outcomes is called the Bernoulli Trial

Binomial Distribution Formula where:


n = the number of (Bernoulli) trials
x = total number of choices
p = the number of probabilities of each success
q = the probability of each failure (1-p)

Poisson Distribution – counts the number of rare events or successes that occur in a specified time
interval or region

Poisson Distribution Formula where:


x = the number of choices we want
e = the natural base of the natural algorithms, also known as Euler’s constant
λ = the average number of successes occurring in an interval
04 NORMAL DISTRIBUTION
The Normal Curve
The most important of all continuous probability distributions is the normal distribution. Its graph, called
the normal curve, is a bell-shaped curve. It lies entirely above the horizontal axis. It is symmetrical,
unimodal, and asymptotic to the horizontal axis.

Properties of the Normal Curve


• The entire family of the normal probability distributions is differentiated by two (2) parameters: the
mean 𝜇 and the standard deviation 𝜎.
• The highest point on the normal curve is at the mean, which is also the median and mode of the
distribution.
• The mean of the distribution can be any numerical value: the negative, zero, or positive.
• The normal distribution is symmetric, with the shape of the normal curve to the left of the mean a
mirror image of the shape of the normal curve to the right of the mean.
• The standard deviation determines how flat and wide the normal curve is.
• The total area under the curve for the normal distribution is 1.

The Empirical Rule


The empirical rule, also known as the three-sigma rule or the 68-95-99.7 rule, provides a quick estimate
of the spread of data in a normal distribution given the mean and standard deviation. For a distribution
that is symmetrical and bell-shaped (in particular, for a normal distribution):
• Approximately 68% of the data values will lie within 1 standard deviation on each side of the mean.
• Approximately 95% of the data values will lie within 2 standard deviations on each side of the
mean.
• Approximately 99.7% (or almost all) of the data values will lie within 3 standard deviations on each
side of the mean.

Formula for 𝒛-scores


The 𝑧 value or 𝑧 score gives the number of standard deviations between a measurement 𝑥 and the
mean 𝜇 of the 𝑥 distribution.

Standard Normal Distribution


The standard normal distribution is a normal distribution with mean 𝜇=0 and standard deviation 𝜎=1.
04 NORMAL DISTRIBUTION
Random Sampling is a method of selecting a sample (random sample) from a statistical population.

Types of Random Sampling


1. Simple Random Sampling is a sampling technique in which every element of the population
has the same probability of being selected for inclusion in the sample
2. Systematic Sampling is a random sampling technique in which a list of elements of the
population is used as a sampling frame, and the elements to be included in the desired sample
are selected by skipping through the list at regular intervals
3. Stratified Sampling is a random sampling technique in which the population is first divided into
strata and then samples are randomly selected separately from each stratum.
4. Cluster or Area Sampling is a random sampling technique in which the entire population is
broken into small groups, or clusters, and then, some of the clusters are randomly selected.

Parameter – is a measure that describes a population


Statistic – is a measure that describes a sample
Sampling Distribution – describes the probability for each mean of all samples with the same sample
size n
Central Limit Theorem – If samples of size 𝑛, where 𝑛 is sufficiently large, are drawn from any
population with a mean 𝜇 and a standard deviation 𝜎, then the sampling distribution of sample means
approximates a normal distribution.

You might also like