0% found this document useful (0 votes)
10 views

Chapter 1_Introduction to Principles of Statistics

Uploaded by

lelan6982
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Chapter 1_Introduction to Principles of Statistics

Uploaded by

lelan6982
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 97

PRINCIPLES OF STATISTICS

FOR ECONOMICS

(TOAE301)

Nguyen Thu Hang

[email protected]
Assessment

 Attendance: 10%
 Mid-term test: 30%
 Final exam: 60%
Course outline
 Chapter 1: Introduction to Statistics
 Chapter 2: Summarizing Data
 Chapter 3: Numerical Descriptive Techniques
 Chapter 4: Sampling Distributions
 Chapter 5: Inferences Based on a Two Samples
Confidence Intervals and Tests of Hypothesis
 Chapter 6: ANOVA Analysis
 Chapter 7: Regression Analysis
 Chapter 8: Time series analysis
Text book

 Gerald Keller (2018), Statistics for Management


and Economics, Cengage Learning
 James T. McClave • P. George Benson • Terry
Sincich (2018), Statistics for Business and
Economics, Pearson Education
 Levin, Stephan, Krehbiel & Berenson, Statistics
for Managers Using Microsoft Excel, 8e © 2017
Pearson Prentice-Hall, Inc.
Chapter 1

Introduction to Statistics
Learning Objectives
In this chapter you learn:
 1. Statistics Definition and Objectives
 2. Statistical Concepts
 3. Types of data and variable
measurements
 4. Statistical Analysis Process
 5. Source of Data
 6. Questionnaire design
Business Statistics Marks
 A student enrolled in a
business program is
attending the first class of
the required statistics
course. The student is
somewhat apprehensive
because he believes the
myth that the course is
difficult. To alleviate his
anxiety, the student asks
the professor about last
year’s marks. The professor
obliges and provides a list
of the final marks, which is
composed of term work
plus the final exam. What
information can the
student obtain from the
Business Statistics Marks
 A student enrolled in a
business program is
attending the first class of
the required statistics
course. The student is
somewhat apprehensive
because he believes the
myth that the course is
difficult. To alleviate his
anxiety, the student asks
the professor about last
year’s marks. The professor
obliges and provides a list
of the final marks, which is
composed of term work
plus the final exam. What
information can the
student obtain from the
Case Pepsi’ Agreement
Case Pepsi’ Agreement
1. What is Statistics?

1. Collecting Data Data


e.g., Survey Analysis
2. Presenting Data
Why?
e.g., Charts & Tables

3. Characterizing Data
e.g., Average Decision-
Making
1. What is statistics?

 A branch of mathematics taking and


transforming numbers into useful information for
decision makers. Statistics is a way to get
information from data.
 Methods for processing & analyzing numbers
 Methods for helping reduce the uncertainty
inherent in decision making.
1. What Is Statistics?
Statistics is the science of data.
It involves
collecting,
classifying,
summarizing,
organizing,
analyzing,
interpreting
numerical information.
Application Areas

 Economics  Engineering
 Forecasting  Construction
 Demographics  Materials

 Sports  Business
 Individual & Team  Consumer
Performance Preferences
 Financial Trends
Objectives of Statistics

Decision Makers Use Statistics To:


 Present and describe business data and information
properly.
 Draw conclusions about large groups of individuals or
items, using information collected from subsets of the
individuals or items.
 Make reliable forecasts about a business activity.
 Improve business/production processes.
 Improve product quality.
 Manage risk.
Statistics: Two Processes

A Describing sets of data

B Drawing conclusions
making estimates,
decisions,
predictions, etc.
about sets of data based on sampling
Types of Statistics

 Statistics
 The branch of mathematics that transforms data into
useful information for decision makers.

Descriptive Statistics Inferential Statistics

Collecting, summarizing, and Drawing conclusions and/or


describing data making decisions concerning a
population based only on sample
data
Descriptive Statistics

 Collect data
 e.g., Survey
 Present data
 e.g., Tables and graphs
 Characterize data

e.g., Sample mean =  X i

n
Descriptive Statistics

Descriptive statistics
utilizes numerical and graphical methods to
explore data,
i.e., to look for patterns in a data set,
to summarize the information revealed in a
data set,
to present the information in a convenient
form.
Inferential Statistics
 Estimation
 e.g., Estimate the population

mean weight using the sample


mean weight
 Hypothesis testing
 e.g., Test the claim that the

population mean weight is 120


pounds

Drawing conclusions about a large group of


individuals based on a subset of the large group.
Inferential Statistics

 Inferential statistics utilizes sample data to


make estimates,
 decisions,
 predictions,
 other generalizations
 about a larger set of data.
Example- Inferential statistics
2. Statistical Concepts
Experimental unit Object upon which we collect data
Population
the totality of objects under consideration • P in
Variable Population
Characteristic of an individual & Parameter
experimental unit • S in Sample
Measurement & Statistic
the process we use to assign numbers to variables of
individual population units
Sample
Subset of the units of a population that is selected for
analysis
Measurement
 Numerical representations are not often readily available
for some variables, so the process of measurement
plays an important supporting role in statistical studies.
 Measurement is the process we use to assign numbers
to variables of individual population units.
 Measure the preference for a food product by asking a
consumer to rate the product’s taste on a scale from 1 to
10.
 Measure workforce age by simply asking each worker,
“How old are you?”.
 Measure gender by giving 0 and 1 for female and male,
respectively.
2. Statistical Concepts
 Data
 facts or information that is relevant or appropriate to

a decision maker
 Parameter
 a summary measure (e.g., mean) that is computed

to describe a characteristic of the population


 Statistic
 a summary measure (e.g., mean) that is computed

to describe a characteristic of the sample


Example

 According to a report in the Washington Post (Sep.


5, 2014), the average age of viewers of television
programs broadcast on CBS, NBC, and ABC is 54
years. Suppose a rival network (e.g., FOX)
executive hypothesizes that the average age of FOX
viewers is less than 54. To test her hypothesis, she
samples 200 FOX viewers and determines the age
of each. FOX Viewers
 a. Describe the population. Age
 b. Describe the variable of interest.
200 FOX Viewers
 c. Describe the sample.
 d. Describe the inference.
2. Statistical Concepts
 Measure of Reliability
• Statement (usually qualified) about the degree of
uncertainty associated with a statistical inference
Four Elements of Descriptive
Statistical Problems
1. The population or sample of interest
2. One or more variables (characteristics of the
population or sample units) that are to be
investigated
3. Tables, graphs, or numerical summary tools
4. Identification of patterns in the data
Five Elements of Inferential
Statistical Problems
1. The population of interest
2. One or more variables (characteristics of the
population units) that are to be investigated
3. The sample of population units
4. The inference about the population based on
information contained in the sample
5. A measure of reliability for the inference
Example

Cola consumers
Taste – Binary assignment for preference

1000 Cola consumers


Proportion of those prefer Pepsi
Process

A process is a series of actions or operations that


transforms inputs to outputs. A process produces
or generates output over time.
Process

A process whose operations or actions are


unknown or unspecified is called a black box.

Any set of output (object or numbers) produced by


a process is called a sample.
Example
 A particular fast-food restaurant chain has 6,289 outlets with
drive-through windows. To attract more customers to its
drive-through services, the company is considering offering
a 50% discount to customers who wait more than a
specified number of minutes to receive their order. To help
determine what the time limit should be, the company
decided to estimate the average waiting time at a particular
drive-through window in Dallas, Texas. For 7 consecutive
days, the worker taking customers’ orders recorded the time
that every order was placed. The worker who handed the
order to the customer recorded the time of delivery. In both
cases, workers used synchronized digital clocks that
reported the time to the nearest second. At the end of the 7-
day period, 2,109 orders had been timed.
Example (cont)
 a. Describe the process of interest at the Dallas
restaurant. Take order -> Prep food -> Deliver order
 b. Describe the variable of interest. Waiting time = Time of delivery – Time of order
 c. Describe the sample. 2109 orders
 d. Describe the inference of interest. 95% confident that the waiting
time is between …
 e. Describe how the reliability of the inference could be
measured.
A manufacturer of industrial wheels suspects that profitable
orders are being lost because of the long time the firm takes
to develop price quotes for potential customers. To
investigate this possibility, 50 requests for price quotes were
randomly selected from the set of all quotes made last year,
and the processing time (in days) was determined for each
quote. Each quote was classified according to whether the
order was “lost” or not (i.e., whether or not the customer
placed an order after receiving a price quote).
A.Describe the process studied.
B.Describe the variables of interest.

C.Describe the measurement of the variables.

D.Describe the population and sample.


3. Types of Data and variable
measurements

Quantitative data are measurements that are


recorded on a naturally occurring numerical scale.
Qualitative data are measurements that cannot
be measured on a natural numerical scale; they
can only be classified into one of a group of
categories.
3. Types of Data

Types of
Data

Quantitative Qualitative
Data Data
Quantitative Data
Measured on a numeric 4
scale.
Number of defective
943
items in a lot. 21 52
Salaries of CEOs of

oil companies. 120 12


Ages of employees at
8
a company. 71 3
Qualitative Data
Classified into categories.
College major of each
student in a class.
Gender of each employee

at a company.
Method of payment

(cash, check, credit card).

$ Credit
Example
 Chemical and manufacturing plants sometimes
discharge toxic-waste materials such as DDT
into nearby rivers and streams. These toxins
can adversely affect the plants and animals
inhabiting the river and the riverbank. The U.S.
Army Corps of Engineers conducted a study of
fish in the Tennessee River (in Alabama) and its
three tributary creeks: Flint Creek, Limestone
Creek, and Spring Creek. A total of 144 fish
were captured, and the following variables were
measured for each: (continued on next slide)
Example (cont)
 1. River/creek where each fish was captured
 2. Species (channel catfish, largemouth bass,
or smallmouth buffalo fish)
 3. Length (centimeters)
 4. Weight (grams)
 5. DDT concentration (parts per million)

 These data are saved in the DDT file. Classify


each of the five variables measured as
quantitative or qualitative.
Types of Variables

 Categorical (qualitative) variables have values that


can only be placed into categories, such as “yes” and
“no.”

 Numerical (quantitative) variables have values that


represent quantities.
Types of Variables
Data

Categorical Numerical

Examples:
 Marital Status Discrete Continuous
 Political Party
 Eye Color
(Defined categories)
Examples: Examples:
 Number of Children  Weight
 Defects per hour  Voltage
(Counted items) (Measured characteristics)
Levels of Measurement

 A nominal scale classifies data into distinct categories in


which no ranking is implied.

Categorical Variables Categories

Personal Computer Yes / No


Ownership

Type of Stocks Owned Growth Value Other

Internet Provider Microsoft Network / AOL/ Other


Levels of Measurement

 An ordinal scale classifies data into distinct categories


in which ranking is implied

Categorical Variable Ordered Categories

Student class designation Freshman, Sophomore, Junior,


Senior
Product satisfaction Satisfied, Neutral, Unsatisfied

Faculty rank Professor, Associate Professor,


Assistant Professor, Instructor
Standard & Poor’s bond ratings AAA, AA, A, BBB, BB, B, CCC, CC,
C, DDD, DD, D
Student Grades A, B, C, D, F
Levels of Measurement
 An interval scale is an ordered scale in which the
difference between measurements is a meaningful
quantity but the measurements do not have a true zero
point.

 A ratio scale is an ordered scale in which the difference


between the measurements is a meaningful quantity
and the measurements have a true zero point.
Interval and Ratio Scales
Difference between interval and
ordinal scales

 The critical difference between them is that the


intervals or differences between values of
interval data are consistent and meaningful
(which is why this type of data is called interval).
 For example, the difference between marks of 85
and 80 is the same five-mark difference that
exists between 75 and 70—that is, we can
calculate the difference and interpret the results.
Difference between interval and
ordinal scales

 Because the codes representing


ordinal data are arbitrarily assigned
except for the order, we cannot
calculate and interpret differences.
 Using a 1-2-3-4-5 coding system to represent poor, fair,
good, very good, and excellent, we note that the
difference between excellent and very good is identical
to the difference between good and fair. With a 6-18-23-
45-88 coding, the difference between excellent and very
good is 43, and the difference between good and fair is
5.
4. Statistical Analysis Process

 Identify research goals


 Identify variables of interest and measuring
methods
 Data collection
 Data summarization
 Data analysis
 Forecasting
 Decision making
The role of statistics in business
analytics

Source: From The American


Statistician by George
Discussion
 Monitoring product quality. The Wallace Company of Houston is a
distributor of pipes, valves, and fittings to the refining, chemical, and
petrochemical industries. The company was a recent winner of the
Malcolm Baldrige National Quality Award. One of the steps the company
takes to monitor the quality of its distribution process is to send out a
survey twice a year to a subset of its current customers, asking the
customers to rate the speed of deliveries, the accuracy of invoices, and
the quality of the packaging of the products they have received from
Wallace.
a. Describe the process studied.
b. Describe the variables of interest.
c. Describe the sample.
d. Describe the inferences of interest.
e. What are some of the factors that are likely to affect the reliability of the
inferences?
5. Sources of Data

1. Data from a published source


2. Data from a designed experiment
3. Data from an observationally study
5. Sources of Data

 Primary Sources: The data collector is the one using the data
for analysis
 Data from a political survey
 Data collected from an experiment
 Observed data
 Secondary Sources: The person performing data analysis is
not the data collector
 Analyzing census data
 Examining data from print journals or data published on the internet.
5. Sources of Data
Published source:
book, journal, newspaper, Web site (
https://www.wider.unu.edu/data),
https://data.worldbank.org/
Designed experiment:
researcher exerts strict control over the units
Survey:
a group of people are surveyed and their
responses are recorded
Observation study:
units are observed in natural setting and variables
of interest are recorded
Designed Experiment

 A designed experiment is a data-collection


method where the researcher exerts full control
over the characteristics of the experimental
units sampled. These experiments typically
involve a group of experimental units that are
assigned the treatment and an untreated (or
control) group.
Observational Study

 An observational study is a data-collection


method where the experimental units sampled
are observed in their natural setting. No attempt
is made to control the characteristics of the
experimental units sampled. (Examples include
opinion polls and surveys.)
Samples
A representative sample exhibits characteristics
typical of those possessed by the population of
interest.

A simple random sample of n experimental units is


a sample selected from the population in such a way
that every different sample of size n has an equal
chance of selection.
Random Sample
 A simple random sample of n experimental units
is a sample selected from the population in such a
way that every different sample of size n has an
equal chance of selection.
Random Number Generators

Most researchers rely on random number


generators to automatically generate the
random sample.
Random number generators are available in
table form, and they are built into most statistical
software packages.
Example

 Suppose you wish to assess the feasibility of


building a new high school. As part of your
study, you would like to gauge the opinions of
people living close to the proposed building site.
The neighborhood adjacent to the site has 711
homes. Use a random number generator to
select a simple random sample of 20
households from the neighborhood to
participate in the study
Importance of Selection

How a sample is selected from a population is of


vital importance in statistical inference because
the probability of an observed sample will be
used to infer the characteristics of the sampled
population.
Random Sampling

Stratified random sampling used when the


experimental units associated with the
population can be separated into two or more
groups of units.

Cluster sampling sample natural grouping of


experimental units and collect data from all
experimental units within each cluster
Random Sampling

Systematic sampling systematically selecting


every kth experimental unit from a list of all
experimental units.

Randomized response sampling useful when


the questions of a pollster are likely to elicit false
answers.
Nonrandom Sample Errors
Selection bias results when a subset of the
experimental units in the population is excluded so
that these units have no chance of being selected
for the sample.
Nonresponse bias results when the researchers
conducting a survey or study are unable to obtain
data on all experimental units selected for the
sample.
Measurement error refers to inaccuracies in the
values of the data recorded. In surveys, the error
may be due to ambiguous or leading questions and
the interviewer’s effect on the respondent.
Example
 How do consumers feel about using the Internet
for online shopping? To find out, United Parcel
Service (UPS) commissioned a nationwide
survey of 5,118 U.S. adults who had conducted
at least two online transactions in 2015. One
finding from the study is that 74% of online
shoppers have used a smartphone to do their
shopping.
 a. Identify the data-collection method.
 b. Identify the target population.
 c. Are the sample data representative of the
population?
Example

 Suppose you wish to assess the feasibility of


building a new high school. As part of your
study, you would like to gauge the opinions of
people living close to the proposed building site.
The neighborhood adjacent to the site has 711
homes. Use a random number generator to
select a simple random sample of 20
households from the neighborhood to
participate in the study
Measurement error

 Refer to inaccuracies in the values of the data


collected. In the surveys, the error may be due
to ambiguous or leading questions and the
interviewer’s effect on the respondent.
Questions

 What are some of the factors that are likely to


lead to a selection bias problem in:
- A survey of customers’ satisfaction towards
digital banking service?
- A survey of customers’ satisfaction towards
bancassurance service?
Questionnaire Design
74

Questionnaires

 The validity of the results depends on the quality


of these instruments.
 Good questionnaires are difficult to construct; bad
questionnaires are difficult to analyze.
 Difficult to design for several reasons:
 Each question must provide a valid and reliable
measure.
 The questions must clearly communicate the research
intention to the survey respondent.
 The questions must be assembled into a logical, clear
instrument that flows naturally and will keep the
respondent sufficiently interested to continue to
cooperate.
75

Quality aims in survey research


Goal is to collect information that is:
 Valid: measures the quantity or concept that is

supposed to be measured
 Reliable: measures the quantity or concept in a

consistent or reproducible manner


 Unbiased: measures the quantity or concept in

a way that does not systematically under- or


overestimate the true value
 Discriminating: can distinguish adequately

between respondents for whom the underlying


level of the quantity or concept is different
Steps to design a 76

questionnaire:
Step 1: Write out the primary and secondary aims
of your study.
Step 2: Write out concepts/information to be
collected that relates to these aims.
Step 3: Review the current literature to identify
already validated questionnaires that measure
your specific area of interest.
Step 4: Compose a draft of your questionnaire.
Step 5: Revise the draft.
Step 6: Assemble the final questionnaire.
Step 1: Define the aims of the 77

study

 Write out the problem and primary and


secondary aims using one sentence per aim.
Formulate a plan for the statistical analysis of
each aim.
 Make sure to define the target population in
your aim(s).
78

Step 2: Define the variables to be collected

 Write a detailed list of the information to be collected and the


concepts to be measured in the study. Are you trying to
identify:
 Attitudes
 Needs
 Behavior
 Demographics
 Some combination of these concepts
 Translate these concepts into variables that can be measured.
 Define the role of each variable in the statistical analysis:
79

Step 3: Review the literature


 Review current literature to identify related
surveys and data collection instruments that
have measured concepts similar to those
related to your study’s aims.
80

Step 4: Compose a draft


 Determine the mode of survey administration:
face-to-face interviews, telephone interviews, self-
completed questionnaires, computer-assisted
approaches.
 Format the draft as if it were the final version with
appropriate white space to get an accurate
estimate as to its length – longer questionnaires
reduce the response rate.
 Make sure questions flow naturally from one to
another.
81

Compose a draft

 Question: How many cups of coffee or tea do


you drink in a day?
 Principle: Ask for an answer in only one
dimension.
 Solution: Separate the question into two –
 (1) How many cups of coffee do you drink during a
typical day?
 (2) How many cups of tea do you drink during a
typical day?
82

Compose a draft
 Question: What brand of computer do you own?

(A) IBM PC

(B) Apple
 Principle: Avoid hidden assumptions. Make sure to
accommodate all possible answers.
 Solution:

(1) Make each response a separate dichotomous item

Do you own an IBM PC? (Circle: Yes or No)

Do you own an Apple computer? (Circle: Yes or No)

(2) Add necessary response categories and allow for multiple
responses.

What brand of computer do you own? (Circle all that apply)

Do not own computer

IBM PC

Apple

Other
83

Compose a draft

 Question: Have you had pain in the last week?


[ ] Never [ ] Seldom [ ] Often [ ] Very often
 Principle: Make sure question and answer
options match.
 Solution: Reword either question or answer to
match.
 How often have you had pain in the last week?
[ ] Never [ ] Seldom [ ] Often [ ] Very Often
84

Compose a draft

 Question: Are you against drug abuse? (Circle:


Yes or No)
 Principle: Write questions that will produce
variability in the responses.
 Solution: Eliminate the question.
85

Compose a draft
 Question: Which one of the following do you think increases
a person’s chance of having a heart attack the most?
(Check one.)
[ ] Smoking [ ] Being overweight [ ] Stress
 Principle: Encourage the respondent to consider each
possible response to avoid the uncertainty of whether a
missing item may represent either an answer that does not
apply or an overlooked item.
 Solution: Which of the following increases the chance of
having a heart attack?
 Smoking: [ ] Yes [ ] No [ ] Don’t know
 Being overweight: [ ] Yes [ ] No [ ] Don’t know
 Stress: [ ] Yes [ ] No [ ] Don’t know
86

Compose a draft

 Question:
 (1) Do you currently have a life insurance policy?
(Circle: Yes or No)
 If no, go to question 3.
 (2) How much is your annual life insurance premium?
 Principle: Avoid branching as much as possible
to avoid confusing respondents.
 Solution: If possible, write as one question.
 How much did you spend last year for life insurance?
(Write 0 if none).
87

Step 5: Revise

 Shorten the set of questions for the study. If a


question does not address one of your aims,
discard it.
 Refine the questions included and their wording
by testing them with a variety of respondents.
 Ensure the flow is natural.
 Verify that terms and concepts are familiar and easy
to understand for your target audience.
88
Step 6: Assemble the final
questionnaire

 Decide whether you will format the questionnaire yourself or


use computer-based programs for assistance:
 SurveyMonkey.com
 Google form
 At the top, clearly state:
 The purpose of the study
 How the data will be used
 Instructions on how to fill out the questionnaire
 Your policy on confidentiality
Assemble the final 93

questionnaire

 Group questions concerning major subject


areas together and introduce them by heading
or short descriptive statements.
 Order and format questions to ensure unbiased
and balanced results.
Assemble the final 94

questionnaire
 Include white space to make answers clear and
to help increase response rate.
 Space response scales widely enough so that it
is easy to circle or check the correct answer
without the mark accidentally including the
answer above or below.
 Open-ended questions: the space for the response
should be big enough to allow respondents with large
handwriting to write comfortably in the space.
 Closed-ended questions: line up answers vertically
and precede them with boxes or brackets to check, or
by numbers to circle, rather than open blanks.
95

Non-responders

 Understanding the characteristics of those who


did not respond to the survey is important to
quantify what, if any, bias exists in the results.
 To quantify the characteristics of the non-
responders to postal surveys, Moser and Kalton
suggest tracking the length of time it takes for
surveys to be returned. Those who take the
longest to return the survey are most like the
non-responders. This result may be situation-
dependent.
96

Conclusions

 You need plenty of time!


 Design your questionnaire from research hypotheses
that have been carefully studied and thought out.
 Discuss the research problem with colleagues and
subject matter experts is critical to developing good
questions.
 Review, revise and test the questions on an iterative
basis.
 Examine the questionnaire as a whole for flow and
presentation.
 End of Chapter 1

You might also like