0% found this document useful (0 votes)
2 views

Ch1 Basic Terms

The document outlines basic terms and concepts in statistics, focusing on the role of data, individuals, variables, and the importance of transforming raw data into usable formats for analysis. It emphasizes the distinction between populations and samples, the significance of statistical techniques, and the challenges associated with conducting a census. Additionally, it categorizes variables into qualitative and quantitative types, detailing their measurement scales.

Uploaded by

Angie R
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Ch1 Basic Terms

The document outlines basic terms and concepts in statistics, focusing on the role of data, individuals, variables, and the importance of transforming raw data into usable formats for analysis. It emphasizes the distinction between populations and samples, the significance of statistical techniques, and the challenges associated with conducting a census. Additionally, it categorizes variables into qualitative and quantitative types, detailing their measurement scales.

Uploaded by

Angie R
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Basic Terms

STA 215: Introductory Applied Statistics

Grand Valley State University


Department of Statistics

Winter 2025

1
STA 215 Basic Terms Winter 2025
1.1 Data and Its Role in Statistics

Introduction to Data and Statistics


Core concepts:
• Raw Data • Census
• Data • Population and Sample
• Individual • Statistic and Parameter
• Variable • Categorical Variable
• Observation • Quantitative Variable
• Distribution of a Variable

2
Statistics

• The science of decision-making in the face


of uncertainty.

• The science of transforming data into


information for making decisions.

• The science of understanding variability


and transforming data into information to
make better decisions.

• The science of
 collecting
 organizing
 summarizing, and Statistics is about using the information
 interpreting in data to help guide decisions.
data for making decisions.
3
Definition: Raw Data

Raw data can come in various forms, depending on the source and purpose.

4
Raw Data

• Example (Figure 1.1): Play-by-play results of a baseball game before any analysis is done.
• Depicts play-by-play results of the May 6, 2010, baseball game between the Los Angeles
Angels and Boston Red Sox.
• Requires familiarity with baseball and the Box Score Explanation on the website for
interpretation.
• Details such as pitch locations are omitted but major game characteristics are captured.

5
Raw Data

Challenges with Figure 1.1 data:


• Not formatted for practical analysis.
• Example: Determining the number of base hits for the Red Sox from the raw
data would be complex.

6
Raw Data

• Example (Figure 1.2):


• Shows raw baseball data in a different format, used by a GVSU statistics major,
Jordan Jahnke, to analyze May 2009 games.
• Appears unintelligible ("gobbledygook") without processing.

Conclusion:
• In both examples, raw data must be transformed into a usable format to enable
meaningful data analysis. 7
Definition: Data

Example: Processed baseball game data ready for statistical analysis.

8
Data

• Preparing raw data for analysis is a key part of a statistician's job.


• This text provides cleaned and ready-to-analyze data.
• Data is organized into user-friendly formats like spreadsheets or
software data windows.
• This allows students to focus on analysis rather than data
preparation.

9
Definition: Individual

10
Individual

• Individual: The objects, things, or entities on which we collect data.


• Examples: Students in a class, counties in a state, or bridges in
Michigan (not necessarily people).

11
Definition: Variable

Example: Weight, Age, Blood Cell Count in a medical experiment.

12
Definition: Observation

Example: The weight, age, and white blood cell count of a mouse in an experiment.

13
Summary

Individuals in the experiment: Mice are the individuals being studied in a medical
experiment.
Variables collected: Data includes weight, age, white blood cell count before
treatment, and white blood cell count after one week of treatment with a new
cancer medication.
Observation: An observation consists of all the variables (weight, age, and white
blood cell counts) recorded for a particular mouse.
Multiple observations: Data is typically collected on multiple mice, resulting in more
than one observation.
14
Example 1.1: Structural Health of Bridges in Michigan
Scenario: A state report on the structural health of bridges.
Questions:
1. What is an individual in this data set?
Individual:
2. Name each variable included in the data set.
Variables:
3. Describe the data connected to the first individual in the data set.
Data:

15
Example 1.1: Structural Health of Bridges in Michigan
Scenario: A state report on the structural health of bridges.
Questions:
1. What is an individual in this data set?
Individual: A Michigan bridge.
2. Name each variable included in the data set.
Variables: Bridge, County, Route, NHS, Age, Inspection, Deficient,
Obsolete.
3. Describe the data connected to the first individual in the data set.
Data: B1 (Bridge), Alcona (County), US-23 (Route), Yes (NHS), 77
(Age), 9/22/2010 (Inspection), No (Deficient), No (Obsolete). 16
Example 1.1: Structural Health of Bridges in Michigan

17
Data and Its Role in Statistics
• Data is essential for informed decisions: Statistical tools help analyze data and
provide insights beyond anecdotes.
• Data beats anecdotes: Anecdotal evidence, like exceptional stories, is unreliable
for decision-making. Statistics focuses on both the typical and exceptional to
provide a balanced understanding.
• Statistical techniques are objective: They reduce subjectivity by relying on data
rather than personal opinions, though subjectivity still exists in selecting
individuals, variables, and methods.
• Statistics account for uncertainty: Statistical thinking quantifies uncertainties in
outcomes, helping make decisions even when results vary among individuals. 18
Data and Its Role in Statistics

19
Data and Its Role in Statistics

• Perception of difficulty: Some believe statistics is hard to do and understand, but


this text and STA 215 aim to ease those concerns.
• More than formulas: While statistics involves formulas, its essence lies in
designing quality data collection plans and answering well-defined research
questions.
• Role of humans: Computers excel at number crunching, but humans are essential
for interpreting results and understanding their implications.
• Critical thinking in statistics: The art of statistics includes analyzing what the
results reveal and identifying their limitations.
20
1.2 Core Concepts

Introduction to Data and Statistics


Core concepts:
• Raw Data • Census
• Data • Population and Sample
• Individual • Statistic and Parameter
• Variable • Categorical Variable
• Observation • Quantitative Variable
• Distribution of a Variable

21
Core Concepts

• Purpose of statistics: Focuses on analyzing data to make informed decisions in the


presence of uncertainty.
• Data structure: Comprises values collected from multiple individuals across one or
more variables.
• Focus of analysis: Revolves around examining variables and their relationships.
• Key concept: Understanding a variable's distribution is central to analyzing its
values.

22
Definition: Distribution of a Variable

23
Distribution of a Variables

• Every variable has a distribution, and understanding distributions is fundamental in


statistics.
• Some distributions are so common they have specific names, like the bell-shaped
distribution for numeric data.
• The bell-shaped distribution is called the normal distribution, a key concept
explored in later chapters.

24
Example 1.2: High School GPA Data for Incoming Freshmen
Scenario: High school GPAs of all incoming freshmen at GVSU for fall 2020 (made-up data).
Questions:
1. What are the individuals in the data set?
Individuals:
2. What does the distribution of the variable "First Name" mean?
Distribution of the variable “First Name”:

3. What does the distribution of the variable "HSGPA" mean?


Distribution of the variable “HSGPA”:

25
Example 1.2: High School GPA Data for Incoming Freshmen
Scenario: High school GPAs of all incoming freshmen at GVSU for fall 2020 (made-up data).
Questions:
1. What are the individuals in the data set?
Individuals: The incoming freshmen.
2. What does the distribution of the variable "First Name" mean?
Distribution of the variable “First Name”: All the different first names in the data
set and how many students have each first name.
3. What does the distribution of the variable "HSGPA" mean?
Distribution of the variable “HSGPA”: all the different high school GPA values in
the data set and how many students have each GPA value. 26
Population and Sample

• Statistical techniques: descriptive statistics or inference on the


population.

27
Population and Sample

• Descriptive Statistics deals with


the numerical summarization and
graphical display of data.
• Inferential Statistics is the use of
sample data to draw conclusions
about the population from which
the sample was taken.

28
Definition: Population

Example 1.3: All 2005 used GM cars.

29
Definition: Sample

Example 1.3: Over 800 (a representative sample) 2005 used GM


cars.

30
Population and Sample

• Representative samples: A sample must reflect the population's characteristics for


accurate conclusions.
• Distribution similarity: The sample's variable distribution needs to be similar to the
population's variable distribution.
• Impact of non-representative samples: Conclusions drawn from non-representative
samples can be misleading, causing bad publicity for statistics.
• Sample importance: Inferences depend on the quality of the sample, as studying an
entire population is often impractical. Occasionally, complete population data is
possible.
31
Definition: Census

32
Census

• Census advantages: A census provides complete data on the entire population,


eliminating the need for inferences about variable distributions.
• Census challenges: Conducting a census is costly, time-consuming, and often
unnecessary, with potential undercounting issues (e.g., U.S. Census 2000
undercounted 3+ million individuals).
• Undercount impact: Even small discrepancies (e.g., 1.18%) can lead to significant
consequences, such as billions in federal funding losses.
• Statistics focus: Typically, statistics uses sample data to make inferences about
population distributions, relying on sample-based statistics to estimate population
parameters. 33
Definition: Statistic

Example: The average high school GPA of students in a sample.

34
Definition: Parameter

Example: The average high school GPA of all students in a population.

35
Summary

• Census: data collection from every individual in the population.


Population – set of units we are
interested in. A population consists of
all objects of a particular type.
Sample – subset of units from the
population.
Statistic – a quantity that is computed
from a sample.
Parameter – a characteristic of the
population, such as the population
mean, standard deviation, etc.

Note: In practice, the value of a parameter is not known


because we can rarely examine the entire population.
We often use a statistic to estimate an unknown
parameter.

36
Types of Variables

• Categorical (qualitative)
• Quantitative (numeric)

37
Types of Variables

A categorical variable produces categorical data and a quantitative variable


produces quantitative data.

38
Definition: Categorical Variable

39
Definition: Quantitative Variable

40
Measurement scale

Variables by measurement scale


• Categorical
 Nominal: have no natural ordering from least to greatest (e.g., on-campus / off-
campus residence, academic major)
 Ordinal: have an ordering of the possible categories, but differencing (i.e.,
subtraction) makes no sense (e.g., course grade)
• Quantitative
 Ratio: have a natural ordering of possible values from least to greatest and
differencing and multiplication make sense (e.g., weight, height, income, age).
41
Measurement scale

42
Example 1.6: Library's Survey
Sometimes researchers convert categorical measurements into numerical measurements.
Scenario: A local library wants to gauge patrons' attitudes using a survey. The survey
employs a Likert scale with the responses: 1 = Strongly Disagree, 2 = Disagree, 3 = Neither
Agree nor Disagree, 4 = Agree, and 5 = Strongly Agree.
Questions:
1. Should we consider this to be quantitative data?
Answer:

2. What measurement scale is it?


Answer: 43
Example 1.6: Library's Survey
Sometimes researchers convert categorical measurements into numerical measurements.
Scenario: A local library wants to gauge patrons' attitudes using a survey. The survey
employs a Likert scale with the responses: 1 = Strongly Disagree, 2 = Disagree, 3 = Neither
Agree nor Disagree, 4 = Agree, and 5 = Strongly Agree.
Questions:
1. Should we consider this to be quantitative data?
Answer: No. It is categorical data with numbers assigned to categories to make
data entry easier.
2. What measurement scale is it?
Answer: The data has an ordinal scale of measurement. 44
Homework Assignment

Reading Assignment
Chapter 1: Basic Terms, pp. 9 – 23
Textbook: STA 215 @ GVSU: Introductory Applied Statistics, 2023, by John Gabrosek
and Diann Reischman.

45

You might also like