0% found this document useful (0 votes)
37 views

Chapter1 2023

This chapter introduces key concepts in defining variables, measurement scales, data collection, and sample selection. It discusses why statistics are important for modern businesses to make data-driven decisions. Specifically, it provides examples of how Disney used statistical analysis of ticket sales data to increase gross revenues by 67% by adjusting ticket pricing dynamically. The chapter emphasizes that statistics are a tool to obtain information from data to support fact-based decision making.

Uploaded by

lonaarbilly
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Chapter1 2023

This chapter introduces key concepts in defining variables, measurement scales, data collection, and sample selection. It discusses why statistics are important for modern businesses to make data-driven decisions. Specifically, it provides examples of how Disney used statistical analysis of ticket sales data to increase gross revenues by 67% by adjusting ticket pricing dynamically. The chapter emphasizes that statistics are a tool to obtain information from data to support fact-based decision making.

Uploaded by

lonaarbilly
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 76

Chapter 1

Defining and
Collecting Data
Objectives
In this chapter you learn:
 To understand issues that arise when defining
variables.
 How to define variables.
 To understand the different measurement scales.
 How to collect data.
 To identify different ways to collect a sample.
 To understand the issues involved in data
preparation.
 To understand the types of survey errors.
Why Use Statistics?
 Imagine you are a promotor of a theatrical production in
the 1900s. How do you promote and price tickets?
 Print flyers and place advertisements in local papers.
 Set a price based on past experience.
 If the tickets sell out quickly, increase the price next time.
 If the tickets don’t sell, decrease the price next time.
 Jump ahead about 85 years. Start using computer
systems to:
 Sell more categories of tickets, such as premium-priced seats.
 As customers buy tickets, we monitor sales with schedule
reports to add or remove performance dates and customise
seat pricing.
Why Use Statistics?
 Jump ahead to today, your fully online ticketing system
allows you to:
 Update seat inventory automatically.
 Use dynamic pricing to automatically alter seat prices based on
factors like peak demand.
 Gain insights into your customers based on sales data, such as
where they live or what other tickets they buy.
 Knowing your customers allows you to aim advertising and
publicity at the correct target market.
 Using social media, you can determine who is viewing or
reacting to your advertising.
 So how effective is this modern approach to
advertising?
Why Use Statistics?
 The early 2014 financial results showed that The Lion King
from Disney Theatrical Productions, which aired in 1997, was
the top-grossing show on Broadway for 2013.
 This is after the grosses declined by 25% in 2009.
 Four years later, grosses were up 67% and weekly grosses
typically exceeded that of the opening weeks of the show,
after being adjusted for inflation!
 How did Disney increase sales by 67%? By combining
business domain knowledge with business statistics and
analytics to sell tickets.
 As a musical producer on Broadway said: “We make
educated predictions on price, Disney, on the other hand,
has turned it into a science.”
Why Use Statistics?
 Disney followed the plan-of-action presented in this
course. Disney had:
 Collected and summarised daily and weekly data.
 Performed tests and experiments on their data to analyse
it.
 Using the results from these analyses, the insights were
used to develop a new interactive seating map that
allowed customers to buy tickets for specific seats and
permitted Disney to adjust ticket pricing for each seat and
each performance.
 As a result, The Lion King still achieves weekly grosses
of around $2 million after more than 20 years.
Why Use Statistics?

Link to data (Paybill)


Think Differently About
Statistics
 Modern-day information technology has allowed businesses
to apply statistics in ways that could not be done years ago.
 This course, and prescribed textbook, focusses on business
statistics and data science.
 Business statistics emphasises business problem solving
and shows a preference to using software to perform
calculations.
 All statistical methods require data. Data are the facts about
the world that one seeks to study and explore.
 Data can be summarised, like weekly or monthly sales.
 Data can be raw (unsummarised), such as the amount or
time of a transaction.
What is Statistics?
 Statistics is the collection of methods that allow one to
work with data effectively.
 Statistics is a TOOL to obtain INFORMATION from
DATA.
 It provides us with a formal basis to summarise and
visualise data, reach conclusions about the data, make
reliable predictions about business activities, and
improve the business process.
 Statistics must be applied correctly. Many professionals
make errors by misusing statistical methods or
mistaking statistics as a substitution for, and not an
enhancement of, a decision-making process.
A Framework for Statistics
 To minimise errors, we use the DCOVA framework that
organises a set of tasks to apply statistics correctly.
 Define the data that you want to study to meet an
objective.
 Collect the data from appropriate sources.
 Organise the data collected by developing tables.
 Visualise the data by developing charts.
 Analyse the data collected, reach conclusions, and
present the results.
 Note that the Define and Collect steps must be done
before the others. The remaining three are done in
varying orders.
Business Analytics
 Business analytics combine statistical methods with
management science and information systems to form
an interdisciplinary tool that supports fact-based
decision making. This includes
 statistical methods to analyse and explore data that can
uncover previously unknown or unforeseen relationships.
 information systems methods to collect and process datasets of
all sizes, including very large datasets that would otherwise be
hard to use efficiently.
 management science methods to develop optimisation models
that support all levels of management, from strategic planning to
daily operations.
Data Science
 Data Science is the field of study that combines domain
expertise, programming skills, and knowledge of
mathematics and statistics to extract meaningful
insights from data. Data science practitioners use their
methods to:
 Use a wide range of tools and techniques for evaluating and preparing
data.
 Extract insights from data using predictive analytics and artificial
intelligence (AI), including machine learning and deep learning models.
 Write applications that automate data processing and calculations.
 Tell and illustrate stories that clearly convey the meaning of results to
decision-makers and stakeholders at every level of technical
knowledge and understanding.
 Explain how these results can be used to solve business problems.
Big Data
 In modern statistics, business analytics, or data science,
big data plays a vital role.
 Big data is a collection of data that cannot be easily
browsed or analysed using traditional methods.
 Big data are data being collected in massive volumes, at
very fast rates (real time), and in a variety of forms.
 Big data might refer to large datasets of structured data
stored in files or worksheets.
 Big data might be unstructured such that the data have
an irregular pattern and contain values that are not
comprehensible without further interpretation.
 Unstructured data could be text, pictures, videos, or
audio.
Definitions and Terminology
 A variable defines a characteristic or property of an item that
can vary among the occurrences of those items.
 Using this definition, data is a set of values associated with
one or more variables.
 Notice that each value for a variable is a single fact – not a
list of facts.
 Statistics can be defined as the methods that analyse the
data of the variables of interest.
 Descriptive statistics are the methods of organising,
summarising, and presenting data in an informative and
convenient way.
 Inferential statistics are the methods used to make a
conclusion about a characteristic of a population, based on a
smaller sample of the population.
Classifying Variables By Type
DCOVA
 Categorical (qualitative) variables take
categories as their values such as “yes”, “no”,
or “blue”, “brown”, “green”.

 Numerical (quantitative) variables have values


that represent a counted or measured quantity.
 Discrete variables arise from a counting process.
Values are countable over a finite range.
 Continuous variables arise from a measuring
process. Values are uncountable over a finite
range.
Examples of Types of Variables
DCOVA

Question Responses Variable Type

Do you have a Facebook


profile? Yes or No Categorical

How many text messages Numerical


have you sent in the past --------------- (discrete)
three days?
How long did the mobile Numerical
app update take to --------------- (continuous)
download?
Measurement Scales
DCOVA
A nominal scale classifies categorical data into
distinct categories in which no ranking is
implied.
Categorical Variables Categories

Do you have a
Facebook profile? Yes, No

Type of investment Growth, Value, Other

Cellular Provider Vodacom, MTN, Cell C,


Other, None
Measurement Scales (con’t.)
DCOVA
An ordinal scale classifies categorical data into distinct
categories in which ranking is implied.
Categorical Variable Ordered Categories

Uber rating for a driver 1 star, 2 stars, …, 5 stars

Product satisfaction Very unsatisfied, Fairly unsatisfied,


Neutral, Fairly satisfied, Very
satisfied
Faculty rank Associate Professor, Senior
Lecturer, Lecturer, Tutor
Standard & Poor’s bond ratings AAA, AA, A, BBB, BB, B, CCC, CC,
C, DDD, DD, D
Student Grades A, B, C, D, F
Measurement Scales (con’t.)
DCOVA
Numerical variables use an interval scale or ratio
scale.

 An interval scale is an ordered scale in which the


difference between measurements is a meaningful
quantity but the measurements do not have a true
zero point.

 A ratio scale is an ordered scale in which the


difference between the measurements is a
meaningful quantity and the measurements have a
true zero point.
Interval and Ratio Scales
DCOVA
Types of Variables
DCOVA
Variables

Categorical Numerical

Nominal Ordinal Discrete Continuous


Examples: Examples: Ratings Examples: Examples:
 Marital Status  Good, Better, Best  Number of Children  Weight
 Political Party  Low, Med, High  Defects per hour  Time
 Eye Color (Ordered Categories) (Counted items) (Measured
(Defined Categories) characteristics)
Data Is Collected From Either A
Population or A Sample
DCOVA

POPULATION
A population contains all the items or
individuals of interest that you seek to
study.

SAMPLE
A sample contains only a portion of a
population of interest.
Population vs. Sample DCOVA

Population Sample

All the items or individuals A portion of the


about which you want to population of items or
reach conclusion(s). individuals.

A Population of Size 40 A Sample of Size 4


Why Use Sampling?
DCOVA

 Less time consuming than selecting every item


in the population.

 Less costly than selecting every item in the


population.

 Less cumbersome and more practical than


analysing the entire population.
A Sample Is Analysed To Estimate
Characteristics Of An Entire Population
DCOVA
 A population parameter summarises the value
of a specific variable for a population.

 A sample statistic summarises the value of a


specific variable for sample data.

 Sample statistics are used to estimate


population parameters.
Sources Of Data Arise From
The Following Activities DCOVA
 Capturing data generated by ongoing business
activities.
 Distributing data compiled by an organisation or
individual.
 Compiling the responses from a survey.
 Conducting a designed experiment and
recording the outcomes.
 Conducting an observational study and
recording the results.
Examples of Data Collected From
Ongoing Business Activities
DCOVA
 A bank studies years of financial transactions to
help them identify patterns of fraud.

 Economists use data on searches done via


Google to help forecast future economic
conditions.

 Marketing companies use tracking data to


evaluate the effectiveness of a web site.
Examples Of Data Distributed
By An Organisation or Individual
DCOVA
 Financial data on a company provided by
investment services.

 Industry or market data from market research


firms and trade associations.

 Stock prices, weather conditions, and sports


statistics in daily newspapers.
Examples of Survey Data
DCOVA
 A survey asking people which laundry detergent
has the best stain-removing abilities.

 Political polls of registered voters during political


campaigns.

 People being surveyed to determine their


satisfaction with a recent product or service
experience.
Examples of Data From A
Designed Experiment
DCOVA
 Consumer testing of different versions of a
product to help determine which product should
be pursued further.

 Material testing to determine which supplier’s


material should be used in a product.

 Market testing on alternative product


promotions to determine which promotion to
use more broadly.
Examples of Data Collected
From Observational Studies
DCOVA
 Market researchers using focus groups to elicit
unstructured responses to open-ended
questions.

 Measuring the time it takes for customers to be


served in a fast food establishment.

 Measuring the volume of traffic through an


intersection to determine if some form of
advertising at the intersection is justified.
Observational Studies & Designed
Experiments Have A Common Objective
DCOVA
 Both are attempting to quantify the effect that a
process change (called a treatment) has on a
variable of interest.

 In an observational study, there is no direct


control over which items receive the treatment.

 In a designed experiment, there is direct control


over which items receive the treatment.
Sources of Data DCOVA

 Primary Sources: The data collector is the one


using the data for analysis:
 Data from a political survey.
 Data collected from an experiment.
 Observed data.
 Secondary Sources: The person performing
data analysis is not the data collector:
 Analysing census data.
 Examining data from print journals or data published
on the Internet.
A Sampling Process Begins With A
Sampling Frame
DCOVA

 The sampling frame is a listing of items that


make up the population.
 Frames are data sources such as population
lists, directories, or maps.
 Inaccurate or biased results can result if a
frame excludes certain groups or portions of the
population.
 Using different frames to generate data can
lead to dissimilar conclusions.
Types of Samples DCOVA

Samples

Non Probability Probability Samples


Samples

Simple Stratified
Random
Judgment Convenience

Systematic Cluster
Types of Samples:
Nonprobability Sample DCOVA

 In a nonprobability sample, items included are


chosen without regard to their probability of
occurrence.

 In convenience sampling, items are selected based


only on the fact that they are easy, inexpensive, or
convenient to sample.

 In a judgment sample, you get the opinions of pre-


selected experts on the subject matter.
Types of Samples:
Probability Sample DCOVA

 In a probability sample, items in the


sample are chosen on the basis of known
probabilities.
Probability Samples

Simple
Systematic Stratified Cluster
Random
Probability Sample:
Simple Random Sample DCOVA

 Every individual or item from the frame has an


equal chance of being selected.

 Selection may be with replacement (selected


individual is returned to frame for possible
reselection) or without replacement (selected
individual is not returned to the frame).

 Samples obtained from table of random


numbers or computer random number
generators.
Selecting a Simple Random Sample
Using A Random Number Table DCOVA

Sampling Frame For Portion Of A Random Number Table


Population With 850 49280 88924 35779 00283 81163 07275
11100 02340 12860 74697 96644 89439
Items 09893 23997 20048 49420 88872 08401

Item Name Item # Select item as the number corresponding to three


digits in the random number table.
Bev R. 001
Ulan X. 002
. . The First 5 Items in a simple
. . random sample
. . Item # 492
Item # 808
. . Item # 892 -- does not exist so ignore
Joann P. 849 Item # 435
Item # 779
Paul F. 850
Item # 002
Probability Sample:
Systematic Sample DCOVA
 Decide on sample size: n
 Divide frame of N individuals into groups of k
individuals: k=N/n
 Randomly select one individual from the 1st
group, i.e. choose a sample between 1 and k
First Value
 Select every kth individual thereafter
First Group
N = 40
n=4
k = 10
Probability Sample:
Stratified Sample DCOVA

 Divide population into two or more subgroups (called


strata) according to some common characteristic.
 A simple random sample is selected from each subgroup,
with sample sizes proportional to strata sizes.
 Samples from subgroups are combined into one.
 This is a common technique when sampling population of
voters, stratifying across provincial or socio-economic
lines.
Probability Sample
Cluster Sample DCOVA

 Population is divided into several “clusters,” each representative of


the population.

 A simple random sample of clusters is selected.

 All items in the selected clusters can be used, or items can be


chosen from a cluster using another probability sampling technique.

 A common application of cluster sampling involves election exit polls,


where certain election districts are selected and sampled.

Population
divided into
16 clusters. Randomly selected
clusters for sample
Probability Sample:
Comparing Sampling Methods
DCOVA
 Simple random sample and Systematic sample:
 Simple to use.
 May not be a good representation of the
population’s underlying characteristics.
 Stratified sample:
 Ensures representation of individuals across the
entire population.
 Cluster sample:
 More cost effective.
 Less efficient (need larger sample to acquire the
same level of precision).
Probability Sample:
Selection with Probability
Proportionate To Size DCOVA

 In the case of a random sample, elements of the


population (e.g. invoices) are selected without the
monetary value on the invoice playing a role.
 For example, if we consider sales, an invoice with
an entry of R10 has the same probability of being
selected as an invoice with an entry of R1 000.
 If the correctness of a monetary value must be
verified, the magnitude of the monetary value
becomes important.
Selection with PPS
DCOVA

 In such a case a selection process that takes the


magnitude of monetary values on each invoice into
account, is preferred.
 We refer to this type of selection process as
selection proportional to size, where size refers to
the monetary value on each invoice.
 Suppose several invoices must be selected from 𝑁𝑁
invoices via the probability proportional to size
(PPS) selection process.

Copyright reserved 45
Selection with PPS
DCOVA

 Let 𝑇𝑇 denote the total monetary value (measured


in rand) of the 𝑁𝑁 invoices.
 According to the PPS selection process, each of
the 𝑇𝑇 rand units has an equal probability of being
selected.
 This implies that an invoice with a R4 000 entry
has a probability of selection four times as large
as the selection probability of an invoice with a
R1 000 entry.

Copyright reserved 46
Selection with PPS
DCOVA

 From there the phrase PROBABILITY


PROPORTIONAL TO SIZE.
 It is important to note that in this case we are
dealing with two types of elements, invoices and
rand units.
 With PPS selection an invoice is selected in an
indirect manner, because a rand unit is selected first
and then the invoice on which it occurs is selected.

Copyright reserved 47
Selection with PPS
DCOVA

Note that each rand unit has the same


chance of selection, but the chance of
selection for each invoice is proportionate to
the number of rand units that appears on it.

Copyright reserved 48
Selection with PPS: Example
DCOVA
Suppose the total monetary value of sales on 𝑁𝑁 =
1500 invoices is R225 000 and that 𝑛𝑛 = 20 invoices
must be selected according to a PPS selection
process. First, generate twenty 6-digit random
numbers between 000001 and 225000 and arrange
these numbers in ascending order.
Then use the following table to determine which
invoices (which could be less than 20 if some invoices
were selected more than once) to associate with the
random numbers:

Copyright reserved 49
Invoice Monetary Value Accumulated Interval 6-Digit
(R) value (R) Values Random
Numbers

1 10
2 112
3 78
4 150

5 5872

6 613
7 114
8 14
etc etc etc etc etc
Copyright reserved 50
Selection with PPS
DCOVA

Suppose the first four 6-digit random numbers are


000098, 000517, 000972 and 006863.

The number 000098 lies between 11 and 122,


therefore invoice 2 is selected; 000517 lies between
351 and 6222, therefore invoice 5 is selected.

Copyright reserved 51
Selection with PPS
DCOVA
The number 000972 also indicates invoice 5, but note
that no invoice may be selected twice.

Moreover, 006863 lies between 6 836 and 6 949,


therefore invoice 7 is selected.

Note that the selection probability of invoice 1 is


10/225 000, while the selection probability of invoice 5
is 5872/225000, etc.

Copyright reserved 52
Invoice Monetary Value Accumulated Interval 6-Digit
(R) value (R) Values Random
Numbers

1 10 10 01 - 10
2 112 122 11-122 000098
3 78 200 123-200
4 150 350 201-350

5 5872 6222 351-6222 000517 &


000972
6 613 6835 6223-6835
7 114 6949 6836-6949 006863
8 14 6963 6950-6963
etc etc etc etc etc
Copyright reserved 53
Data Cleaning Is An Important Data
Preprocessing Task Prior To Analysis DCOVA
Data cleaning corrects irregularities in the data:
 Invalid variable values, including:

 Non-numerical data for numerical variable.


 Invalid categorical values for a categorical variable.
 Numeric values outside a defined range.
 Coding errors, including:
 Inconsistent categorical values.
 Inconsistent case for categorical values.
 Unrelated / Unwanted characters.
 Data integration errors, including:
 Redundant columns.
 Duplicated rows.
 Differing column lengths.
 Different units of measure or scale for numerical variables.
Cleaning Invalid Variable Values
Can Be Semi-Automated DCOVA
 Invalid variable values can be identified by
simple scanning techniques, for example:
 Non-numeric entries for numerical variables.

 Values for categorical variables that don’t match a


pre-defined category.

 Values for a numeric variable outside a pre-defined


explicit range.
 Features exist in Excel to assist in this task.
Examples Of Coding Errors
DCOVA
Copy-and-paste or data import can result in poor
recording or entry of data.

Categorical variable: Gender, Correct coding: F or M


 Correctable error: Female.
 Invalid data: New York.
 Correctable or software tolerated: m.
 Irregular and nonprintable (hidden) characters:
 Leading or trailing space(s): _F or F_.
 Other nonprintable characters may also be leading or trailing
Data Integration Errors From Combining
Two Different Computerised Data Sources
DCOVA
 Data integration errors often requires time-
consuming manual effort.
 Some examples:
 Variable names or definitions may differ.

 Duplicated rows (observations) may also occur.

 Different units of measurement (or scale) may not be


obvious without human interpretation.
Data Can Be Formatted and / or
Encoded In More Than One Way
DCOVA
 Some electronic formats are more readily
usable than others.

 Different encodings can impact the precision of


numerical variables and can also impact data
compatibility.

 As you identify and choose sources of data you


need to consider / deal with these issues.
Missing Values Are Values Not
Collected For A Variable
DCOVA
 Survey data may include answers for which no
response was given by the survey taker.

 Missing values an also result from integrating


two data sources.

 Do not confuse missing values with miscoded


values.
Data Cleaning Cannot Be A Fully
Automated Process DCOVA

 Excel and Tableau have functionality to lessen


the burden of data cleaning.

 The software guides in the course textbook


explain this functionality.

 When performing data cleaning, always


preserve a copy of the original data for later
reference.
Other Data Preprocessing Tasks
DCOVA
 Data Formatting
 Rearranging data structure or changing electronic encoding of
the data or both.
 Stacking and Unstacking Data
 Analysis of a numerical variable may require subdividing that
data into two or more groups.
 Unstacking involves creating separate numerical variables for
the different groups.
 Stacking involves pairing the one numerical variable with a
second categorical variable.
 Recoding Variables
 Redefining categories for a categorical variable.
 Transforming a numerical variable into a categorical variable.
Recoding Of Variables
DCOVA
 Recoding a variable can either supplement or replace
the original variable.

 Recoding a categorical variable involves redefining


categories.

 Recoding a numerical variable involves changing this


variable into a categorical variable.

 When recoding be sure that the new categories are


mutually exclusive (categories do not overlap) and
collectively exhaustive (categories cover all possible
values).
Evaluating Survey Worthiness
DCOVA
 What is the purpose of the survey?
 Is the survey based on a probability sample?
 Coverage error – appropriate frame?
 Nonresponse error – follow up.
 Measurement error – good questions elicit good
responses.
 Sampling error – always exists.
Types of Survey Errors DCOVA

 Coverage error or selection bias:


 Exists if some groups are excluded from the frame and have
no chance of being selected.

 Nonresponse error or bias:


 People who do not respond may be different from those who
do respond.

 Sampling error:
 Variation from sample to sample will always exist.

 Measurement error:
 Due to weaknesses in question design and / or respondent
error.
Types of Survey Errors (continued)
DCOVA

Excluded from
 Coverage error
frame

 Nonresponse error Follow up on


nonresponses

Random
 Sampling error differences from
sample to sample

 Measurement error Bad or leading


question
Ethical Issues About Surveys
DCOVA
 Coverage error and nonresponse error can be
leveraged by survey designers to purposely
bias survey results.
 Sampling error can be an ethical issue if the
findings are purposely not reported with the
associated margin of error.
 Measurement error can be an ethical issue:
 Survey sponsor chooses leading questions.
 Interviewer purposely leads respondents in a
particular direction.
 Respondent(s) wilfully provide false information.
Excel: Sampling
 We use the Data Analysis tool in Excel to obtain a
simple random sample with replacement or a
systematic sample.
 We do not cover Excel implementations of sampling
without replacement, stratified sampling or cluster
sampling.
 The population data are structured in a table where
each column is a variable, and each row is an
observation of these variables.
 Notice we need a column that specifies the
observation number (index) in the population data
frame.
Excel: Sampling

We collect data on two variables, 𝑋𝑋𝑋 and 𝑋𝑋𝑋.

The population consists of six observations.

𝑋𝑋𝑋 is a numerical variable and 𝑋𝑋𝑋 is a categorical


variable.
Excel: Sampling

The following steps are used to perform a simple


random sample with replacement of the row index.

1. Make sure the Data Analysis tool is installed.


2. Click the Data ribbon and select Data Analysis.
3. Select Sampling and click OK.
4. Set the Input Range equal to A3:A8.
5. Since the input range does not contain a column name,
make sure Labels is unchecked.
6. Set the Sampling Method to Random, and type 4 in the
Number of Samples box.
7. Under Output options, click the Output Range option
and type E3.
8. Click OK.
Excel: Sampling
 After running the Sampling tool, Excel returns
four random values selected from the Row
Index column in cells E3 to E6.
 We must now return the values for 𝑋𝑋𝑋 and 𝑋𝑋𝑋
corresponding to these row indexes.
 The output should look as follows:
Excel: Sampling
 We notice that the Sampling tool selected the row
indices 6, 3, 2, and 3 from the population data
frame.
 To return the corresponding values for 𝑋𝑋𝑋 and 𝑋𝑋𝑋,
we use the INDEX() function.
 For example, the first observation sampled from
the population is row 6. Here 𝑋𝑋𝑋 = 340 and 𝑋𝑋𝑋 =
𝐴𝐴.
 To return the 340 in cell F3, type the following in
cell F3: =INDEX($B$3:$B$8, E3).
 This function first looks at the value of E3 which is
6. It then searches and returns the 6th value in the
range B3:B8. The dollar signs “$” ensure that this
range remains fixed if we move the formula.
Excel: Sampling
 This is done in each following row in column F by
dragging this function down to cell F6. Notice that
the second entry for the INDEX function changes
with each row.

 Next, we apply the same function to column G.


Notice that we change the range to column C.
Excel: Sampling
Sampled data:

Excel formulas:
Excel: Sampling
 The Sampling tool in the Data Analysis tool
can be used for a systematic sample.
 Go Data -> Data Analysis -> Sampling and
click OK.
 Choose 𝑘𝑘 = 2 for a systematic sample by
selecting Periodic under Sampling Method
and set the Period equal to 2.
 In the Output Range dialog click cell A10.
 This produces a systematic sample of the row
indices of the population data frame.
 As before, we can return the corresponding
values for 𝑋𝑋𝑋 and 𝑋𝑋𝑋 using INDEX().
Excel: Sampling
Note that we can use INDEX(array, row num) or
VLOOKUP(lookup value, table array, column index)
to return 𝑋𝑋𝑋 and 𝑋𝑋𝑋.
Chapter Summary
In this chapter we have discussed:
 Understanding issues that arise when defining
variables.
 How to define variables.
 Understanding the different measurement scales.
 How to collect data.
 Identifying different ways to collect a sample.
 Understanding the issues involved in data
preparation.
 Understanding the types of survey errors.

You might also like