0% found this document useful (0 votes)
27 views

Chapter 1

hmkdjg ndnb g bnm

Uploaded by

it.krrishseth123
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Chapter 1

hmkdjg ndnb g bnm

Uploaded by

it.krrishseth123
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Contents

Contents 1

1 Statistics, Data, and Statistical Thinking 2


1.1 The Science of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Types of Statistical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Fundamental Elements of Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Types of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Collecting Data: Sampling and Related Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.6 The Role of Statistics in Critical Thinking and Ethics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1
Chapter 1

Statistics, Data, and Statistical Thinking

Learning Objectives

• Definition of Statistics
• Real-world Applications of Statistics
• Statistical Terminologies
• Population versus Sample Data
• Descriptive Statistics and Inferential Statistics
• Data Collection Methods

1.1 The Science of Statistics

What is statistics? Statistics is used in many aspects of life and has a wide spectrum of applications. It is a significant science

and has a meticulous methodology that is used in almost all applied sciences such as medicine, psychology, economics, and

actuarial science.

Statistics is the science of data. This involves collecting, classifying, summarizing, organizing, analyzing, presenting, and

interpreting numerical and categorical information.

1.2 Types of Statistical Applications

Often the data are selected from some larger set of data, population, whose characteristics we wish to estimate. We call

this selection process sampling.

For example, you might collect the ages of a sample of customers who shop for a particular product online to estimate the

average age of all customers who shop online for the product. Then you could use your estimate to target the Web site’s

advertisements to the appropriate age group. Notice that statistics involves two different processes: (1) describing sets of data

and (2) drawing conclusions (making estimates, decisions, predictions, etc.) about the sets of data on the basis of sampling.

So, the applications of statistics can be divided into two broad areas: descriptive statistics and inferential statistics.

Descriptive statistics utilizes numerical and graphical methods to look for patterns in a data set, to summarize the

information revealed in a data set, and to present that information in a convenient form.

2
CHAPTER 1. STATISTICS, DATA, AND STATISTICAL THINKING 3

Inferential statistics utilizes sample data to make estimates, decisions, predictions, or other generalizations about a larger

set of data.

Study 1.1 “Best-Selling Girl Scout Cookies” (Source: girlscouts.org.) Since 1917, the Girl Scouts of America have been

selling boxes of cookies. Currently, there are 12 varieties for sale: Thin Mints, Samoas, Lemonades, Tagalongs, Do-si-dos,

Trefoils, Savannah Smiles, Thanks-A-Lot, Dulce de Leche, Cranberry Citrus Crisps, Chocolate Chip, and Thank U Berry

Much. Each of the approximately 150 million boxes of Girl Scout cookies sold each year is classified by variety. The results

are summarized in Figure 1.1. From the graph, you can clearly see that the best-selling variety is Thin Mints (25%), followed

by Samoas (19%) and Tagalongs (13%). Since the figure describes the various categories of boxes of Girl Scout cookies sold,

the graphic is an example of descriptive statistics.

Figure 1.1: MINITAB graph of best-selling Girl Scout cookies (Based on girlscouts.org, 2011–12 sales.)

1.3 Fundamental Elements of Statistics

An experimental (or observational) unit is an object (e.g., person, thing, transaction, or event) about which we collect

data.

A population is a set of all units (usually people, objects, transactions, or events) that we are interested in studying.

For example, populations may include all Canadians who were aged 65 or older, all UFV students, and all MacBook Air

models. Notice also that each set includes all the units in the population.

In studying a population, we focus on one or more characteristics or properties of the units in the population. We call such

characteristics variables. For example, we may be interested in the variables age, gender, and number of children.

A variable is a characteristic or property of an individual experimental (or observational) unit in the population.

The name variable is derived from the fact that any particular characteristic may vary among the units in a population.

Often, numerical representations for variables are not readily available, so measurement plays an important supporting role

in statistical studies.

Measurement is the process we use to assign numbers to variables of individual population units.
CHAPTER 1. STATISTICS, DATA, AND STATISTICAL THINKING 4

Data are values obtained after the process of measurement. A collection of data called a data set. When we measure a

variable for every unit of a population, it is called a census of the population. If the population you wish to study is large,

conducting a census would be prohibitively time consuming or costly. A reasonable alternative would be to select and study

a subset (or portion) of the units in the population.

A sample is a subset of the units of a population.

For example, instead of polling all 145 million registered voters in the United States during a presidential election year,

a pollster might select and question a sample of just 1,500 voters. (See Figure 1.2.) If he is interested in the variable

“presidential preference,” he would record (measure) the preference of each vote sampled.

Figure 1.2: A sample of voter registration cards for all registered voters

A statistical inference is an estimate, prediction, or some other generalization about a population based on information

contained in a sample.

Example1.1 According to Variety (Jan. 10, 2014), the average age of Broadway ticketbuyers is 42.5 years. Suppose a

Broadway theatre executive hypothesizes that the average age of ticketbuyers to her theatre’s plays is less than 42.5 years.

To test her hypothesis, she samples 200 ticketbuyers to her theatre’s plays and determines the age of each.

a. Describe the population.

b. Describe the variable of interest.

c. Describe the sample.

d. Describe the inference.

After making the inference ; we also need to know its reliability—that is, how good the inference is. The only way we

can be certain that an inference about a population is correct is to include the entire population in our sample. However,

because of resource constraints (i.e., insufficient time or money), we usually can’t work with whole populations, so we base

our inferences on just a portion of the population (a sample). Thus, we introduce an element of uncertainty into our in-

ferences. Consequently, whenever possible, it is important to determine and report the reliability of each inference made.

Reliability, then, is the fifth element of inferential statistical problems.


CHAPTER 1. STATISTICS, DATA, AND STATISTICAL THINKING 5

A measure of reliability is a statement (usually quantitative) about the degree of uncertainty associated with a statistical

inference.

Four Elements of Descriptive Statistical Problems

1. The population or sample of interest

2. One or more variables (characteristics of the population or sample units) that are to be investigated

3. Tables, graphs, or numerical summary tools

4. Identification of patterns in the data

Five Elements of Inferential Statistical Problems

1. The population of interest

2. One or more variables (characteristics of the population units) that are to be investigated

3. The sample of population units

4. The inference about the population based on information contained in the sample

5. A measure of the reliability of the inference

1.4 Types of Data

All data (and hence the variables we measure) can be classified as one of two general types: quantitative data and

qualitative data.

Quantitative data are measurements that are recorded on a naturally occurring numerical scale.

Often, we assign arbitrary numerical values to qualitative data for ease of computer entry and analysis. But these assigned

numerical values are simply codes: They cannot be meaningfully added, subtracted, multiplied, or divided. For example, we

might code Democrat = 1, Republican = 2, and Independent = 3. Similarly, a taste tester might rank the barbecue sauces

from 1 (best) to 4 (worst). These are simply arbitrarily selected numerical codes for the categories and have no utility beyond

that.

Qualitative (or categorical) data are measurements that cannot be measured on a natural numerical scale; they can only

be classified into one of a group of categories.

1.5 Collecting Data: Sampling and Related Issues

Generally, you can obtain data in three different ways:

1. From a published source

2. From a designed experiment

3. From an observational study (e.g., a survey which is the most common)


CHAPTER 1. STATISTICS, DATA, AND STATISTICAL THINKING 6

A designed experiment is a data collection method where the researcher exerts full control over the characteristics of

the experimental units sampled. These experiments typically involve a group of experimental units that are assigned the

treatment and an untreated (or control) group.

An observational study is a data collection method where the experimental units sampled are observed in their natural

setting. No attempt is made to control the characteristics of the experimental units sampled. (Examples include opinion

polls and surveys.)

Regardless of which data collection method is employed, it is likely that the data will be a sample from some population.

And if we wish to apply inferential statistics, we must obtain a representative sample.

A representative sample exhibits characteristics typical of those possessed by the target population.

The most common way to satisfy the representative sample requirement is to select a random sample. A simple random

sample ensures that every subset of fixed size in the population has the same chance of being included in the sample.

A simple random sample of n experimental units is a sample selected from the population in such a way that every

different sample of size n has an equal chance of selection.

The procedure for selecting a simple random sample typically relies on a random number generator. Random number

generators are available in table form online, and they are built into most statistical software packages. In addition to simple

random samples, there are more complex random sampling designs that can be employed. These include (but are not limited

to) stratified random sampling, cluster sampling, systematic sampling, and randomized response sampling. No matter what

type of sampling design you employ to collect the data for your study, be careful to avoid selection bias.

Selection bias results when a subset of experimental units in the population has little or no chance of being selected for

the sample.

This results in samples that are not representative of the population.

Nonresponse bias is a type of selection bias that results when data on all experimental units in a sample are not obtained

since certain questions are not answered in a meaningful way.

Finally, even if your sample is representative of the population, the data collected may suffer from measurement error.

Measurement error refers to inaccuracies in the values of the data collected. In surveys, the error may be due to ambiguous

or leading questions and the interviewer’s effect on the respondent.

1.6 The Role of Statistics in Critical Thinking and Ethics

The growth in data collection associated with scientific phenomena, business operations, and government activities (quality

control, statistical auditing, forecasting, etc.) has been remarkable in the past several decades. Consequently, each of us has to

develop a discerning sense—an ability to use rational thought to interpret and understand the meaning of data. Quantitative

literacy can help you make intelligent decisions, inferences, and generalizations; that is, it helps you think critically using

statistics.
CHAPTER 1. STATISTICS, DATA, AND STATISTICAL THINKING 7

Statistical thinking involves applying rational thought and the science of statistics to critically assess data and inferences.

Fundamental to the thought process is that variation exists in populations of data.

Exercises

1- Extinct birds. Biologists at the University of California (Riverside) are studying the patterns of extinction in the New

Zealand bird population. (Evolutionary Ecology Research, July 2003.) At the time of the Maori colonization of New Zealand

(prior to European contact), the following variables were measured for each bird species:

a. Flight capability (volant or flightless)

b. Type of habitat (aquatic, ground terrestrial, or aerial terrestrial)

c. Nesting site (ground, cavity within ground, tree, cavity above ground)

d. Nest density (high or low)

e. Diet (fish, vertebrates, vegetables, or invertebrates)

f. Body mass (grams)

g. Egg length (millimeters)

h. Extinc status (extinct, absent from island, present)

Identify each variable as quantitative or qualitative.

2- Insomnia and education. Is insomnia related to education status? Researchers at the Universities of Memphis, Alabama

at Birmingham, and Tennessee investigated this question in the Journal of Abnormal Psychology (Feb. 2005). Adults living

in Tennessee were selected to participate in the study, which used a random-digit telephone dialing procedure. Two of the

many variables measured for each of the 575 study participants were number of years of education and insomnia status

(normal sleeper or chronic insomniac). The researchers discovered that the fewer the years of education, the more likely the

person was to have chronic insomnia.

a. Identify the population and sample of interest to the researchers.

b. Identify the data collection method. Are there any potential biases in the method used?

c. Describe the variables measured in the study as quantitative or qualitative.

d. What inference did the researchers make?


CHAPTER 1. STATISTICS, DATA, AND STATISTICAL THINKING 8

3- Drafting NFL quarterbacks. The Journal of Productivity Analysis (Vol. 35, 2011) published a study of how successful

National Football League (NFL) teams are in drafting productive quarterbacks. Data were collected for all 331 quarterbacks

drafted over a 38-year period. Several variables were measured for each QB, including draft position (one of the top 10

players picked, selection between picks 11–50, or selected after pick 50), NFL winning ratio (percentage of games won), and

QB production score (higher scores indicate more productive QBs). The researchers discovered that draft position was only

weakly related to a quarterback’s performance in the NFL. They concluded that “quarterbacks taken higher [in the draft] do

not appear to perform any better.”

a. What is the experimental unit for this study?

b. Identify the type (quantitative or qualitative) of each variable measured.

c. Is the study an application of descriptive or inferential statistics?

Explain.

Acknowledgement

The core content of the slides are from the textbook of this course;

STATISTICS (13th Edition)

by

JAMES McCLAVE and TERRY SINCICH

You might also like