lec # 2
lec # 2
Descriptive Statistics
• Descriptive statistics summarizes or describes the
characteristics of a data set.
• Descriptive statistics consists of three basic categories
of measures: measures of central tendency, measures of
variability (or spread), and frequency distribution.
• Measures of central tendency describe the center of the
data set (mean, median, mode).
• Measures of variability describe the dispersion of the
data set (variance, standard deviation, skewness and
kutosis).
• Measures of frequency distribution describe the
occurrence of data within the data set (count).
3
Types of Variables
• A variable is a characteristic that can be measured and
that can assume different values.
▫ Height, age, income, province or country of birth,
grades obtained at school and type of housing are all
examples of variables.
• Variables may be classified into two main categories:
Categorical and Numeric.
• Each category is then classified in two subcategories:
nominal or ordinal for categorical variables, discrete
or continuous for numeric variables.
4
Types of Variables
• Categorical variables
A categorical variable (also called qualitative
variable) refers to a characteristic that can’t be
quantifiable. Categorical variables can be either
nominal or ordinal.
• Variables may be classified into two main
categories: categorical and numeric.
• Each category is then classified in two
subcategories: nominal or ordinal for categorical
variables, discrete or continuous for numeric
variables.
1) Nominal variable is one that describes a name,
label or category without natural order. Sex and type
of dwelling are examples of nominal variables.
5
Types of Variables
2) Ordinal variables
An ordinal variable is a variable whose values are defined
by an order relation between the different categories.
In Table 1, the variable “behaviour” is ordinal because the
category “Excellent” is better than the category “Very
good,” which is better than the category “Good,” etc.
Table 1
Behaviour Number of students
Excellent 5
Very good 12
Good 10
Bad 2
Very bad 1
6
Types of Variables
• A Numeric Variable
(also called quantitative variable) is a quantifiable
characteristic whose values are numbers (except
numbers which are codes standing up for categories).
Numeric variables may be either continuous or discrete.
Examples include height, weight, number of cars in the
parking, number of students in the class etc.
1) Continuous Variable
A variable is said to be continuous if it can assume an infinite
number of real values within a given interval. For instance,
consider the height of a student
2) Discrete Variable
A variable is said to be continuous if it can take on a finite
number of values in a given interval. For instance, number
of children in a family.
7
Types of Probability
Distributions
• There are two Probability distribution types:
Types of Discrete
Probability Distributions
– Binomial Distribution
▫ This is generated for random variables with only two possible
outcomes. Let p denote the probability of an event is a success
which implies 1 – p is the probability of the event being a
failure. Performing the experiment repeatedly and plotting the
probability each time gives us the Binomial distribution.
▫ The most common example given for Binomial distribution is
that of flipping a coin n number of times and calculating the
probabilities of getting a particular number of heads.
▫ More real-world examples include the number of successful
sales calls for a company or whether a drug works for a
disease or not.
▫ The binomial distribution is used in options pricing models
that rely on binomial trees
12
Types of Discrete
Probability Distributions
– Poisson Distribution
▫ This distribution describes the events that occur in a fixed
interval of time or space.
▫ An example might make this clear. Consider the case of the
number of calls received by a customer care center per hour.
▫ We can estimate the average number of calls per hour but we
cannot determine the exact number and the exact time at
which there is a call.
▫ Each occurrence of an event is independent of the other
occurrences.
13
Types of Discrete
Probability Distributions
– Geometric Distribution
▫ The geometric distribution is the probability
distribution used to represent the chances of
experiencing a certain number of failures before
encountering the first success of an event.
▫ The events follow a similar pattern as followed by
the Bernoulli trials, i.e., the experiment has success
and failure as the only two possible outcomes.
1. Cost-Benefit Analysis
2. Sports Applications
3. Tossing a Coin
4. Feedback from Customers
5. Number of Supporters of a Law
6. Number of Faulty Products Manufactured at an Industry
7. Number of Bugs in a Code
8. A Teacher Examining Test Records
9. Playing a Game
15
Types of Continuous
Probability Distributions
– Normal Distribution
• The normal distribution (sometimes referred to as the Gaussian
distribution) is the most common continuous distribution used in
statistics.
• The normal distribution is vitally important in statistics for three main
reasons:
1) Numerous continuous variables common in business have distributions that
closely resemble the normal distribution.
2) The normal distribution can be used to approximate various discrete
probability distributions.
3) The normal distribution provides the basis for classical statistical inference
because of its relationship to the central limit theorem.
Example: Set of final examination grades in an introductory statistics
course is normally distributed, with a mean of 73 and a standard
deviation of 8.
a. What is the probability that a student scored below 91 on this
exam?
16
Types of Continuous
Probability Distributions
– Exponential Distribution
• The exponential distribution is
a continuous probability distribution used in statistics
that frequently deals with how long until a particular
event occurs.
• Events occur continually, independently, and at a
steady average pace during this process.
• The crucial characteristic of the exponential
distribution is that it has no memory.
17
Types of Continuous
Probability Distributions
– Exponential Distribution
• The following list includes some of the Exponential
distribution models fields:
• Equivalent distribution aids in determining the separation of
mutations on a DNA strand.
• Figuring out how long it will take the radioactive particle to
decay.
• Assists in determining the height of various molecules in a
gas at a constant temperature, pressure, and gravitational
field.
• Aids in computing the highest monthly and annual amounts
of normal rainfall and river outflow volumes.
18
Inferential Statistics
• Most of the time, you can only acquire data from
samples, because it is too difficult or expensive to
collect data from the whole population that you’re
interested in.
• While descriptive statistics can only summarize a
sample’s characteristics, inferential statistics use your
sample to make reasonable guesses about the larger
population.
• With inferential statistics, it’s important to use
random and unbiased sampling methods. If your
sample isn’t representative of your population, then
you can’t make valid statistical inferences or
generalize.
19
Types of Inferential
Statistics
• The types of inferential statistics include the
following:
• Regression Analysis: This consists of linear
regression, nominal regression, ordinal regression,
etc.
• Hypothesis tests: This consists of the Z-test, F-test, t-
test, Analysis of Variance (ANOVA), etc.
20
Regression Analysis
• Regression analysis is a common statistical method used in
finance and investing.
• Linear regression is one of the most common techniques of
regression analysis when there are only two variables.
• Multiple regression is a broader class of regressions that
encompasses linear and nonlinear regressions with
multiple explanatory variables.
• Whereas linear regress only has one independent variable
impacting the slope of the relationship, multiple regression
incorporates multiple independent variables.
• Each independent variable in multiple regression has its
own coefficient to ensure each variable is weighted
appropriately.
21
Regression Analysis
Case Example
• Consider an analyst who wishes to establish a relationship
between the daily change in a company's stock prices and
the daily change in trading volume.
• Using linear regression, the analyst can attempt to
determine the relationship between the two variables:
• However, the analyst realizes there are several other factors
to consider including the company's P/E ratio, dividends,
and prevailing inflation rate.
• The analyst can perform multiple regression to determine
which—and how strongly—each of these variables
impacts the stock price:
22
Difference between
Regression and Correlation
• The most commonly used techniques for investigating the
relationship between two quantitative variables are
correlation and linear regression.
• Correlation quantifies the strength of the linear relationship
between a pair of variables, whereas regression expresses
the relationship in the form of an equation.
• For example, in patients attending an accident and
emergency unit (A&E), we could use correlation and
regression to determine whether there is a relationship
between age and urea level, and whether the level of urea
can be predicted for a given age.