0% found this document useful (0 votes)
3 views

Exploring-Data-Descriptive-Data

Chapter 2 of Math 1530 covers categorical and quantitative variables, detailing their definitions, examples, and types including nominal, ordinal, interval, and ratio scales. It discusses methods for summarizing data graphically, such as pie charts, bar graphs, and histograms, as well as measures of central tendency and spread, including mean, median, mode, range, and standard deviation. The chapter also introduces concepts like quartiles, interquartile range, and boxplots for comparing distributions.

Uploaded by

Marla Basa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Exploring-Data-Descriptive-Data

Chapter 2 of Math 1530 covers categorical and quantitative variables, detailing their definitions, examples, and types including nominal, ordinal, interval, and ratio scales. It discusses methods for summarizing data graphically, such as pie charts, bar graphs, and histograms, as well as measures of central tendency and spread, including mean, median, mode, range, and standard deviation. The chapter also introduces concepts like quartiles, interquartile range, and boxplots for comparing distributions.

Uploaded by

Marla Basa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Math 1530 - Chapter 2

Categorical Variable
• A variable is categorical if each observation
Exploring Data belongs to one of a set of categories.
• Examples:
Descriptive Data 1. Gender (Male or Female)
2. Religion (Catholic, Jewish, …)
3. Type of residence (Apartment, House, …)
4. Belief in life after death (Yes or No)

1 4

Content Quantitative Variable

• Types of Variables
• Describing data using graphical summaries • A variable is called quantitative if observations
• Describing the Center of Quantitative Data take numerical values for different magnitudes
of the variable.
• Describing the Spread of Quantitative Data
• How Measures of Position Describe Spread
• Examples:
1. Age
2. Number of brothers/sisters
3. Annual Income
2 5

Variable Categorical vs. Quantitative


• A variable is any characteristic that
is recorded for the subjects in a
• Categorical variables
study – percentage of observations in each category is
important
• Examples: Marital status, Height,
– E.g. % Male, % Female
Weight, IQ
• A variable can be classified as either
– Categorical (e.g. Male / Female) • Quantitative variables
– Quantitative (e.g. Age) – center (a representative value) and spread
• Discrete or (number of children in family) (variability) are important
• Continuous (weight: 70,25 kg) – Average Age
– Variation around the average age
www.thewallstickercompany.com.au
3 6

1
Math 1530 - Chapter 2

4 types of scale
• Nominal
• Ordinal
• Interval
• Ratio

7 10

Discrete Quantitative Variable Nominal Scale


• A quantitative variable is discrete if its possible values • Nominal scale is simplest scale.
form a set of separate numbers: 0,1,2,3,….
• They are numbers or letters assigned to objects
• Examples:
– serve as labels for identification or classification
1. Number of pets in a household
• e.g. names and gender are categorical variables;
2. Number of children in a family
– ‘M’ for Male and ‘F’ for Female,
3. Number of foreign languages spoken by an
– or ‘1’ for male and ‘2’ for female,
individual
– or ‘1’ for female and ‘2’ for male.
• Other examples include
– marital status, religion, race, colour and employment
status, and so forth.
8 11

Ordinal Scale
Continuous Quantitative Variable
• A subset of the nominal scale
• A quantitative variable – Where the scale follows an order
is continuous if it has • Ordinal scale creates an ordered (ranked)
an infinite number of relationship
possible values • Typical ordinal scales
• Measurements – (i) result of examination: first, second, third and fail;
– (ii) quality of products: ‘excellent’, ‘good’, ‘fair’ or ‘poor’
• Examples:
– (iii) social class: upper, middle, lower class
1. Height/Weight
2. Age
3. Blood pressure
www.wtvq.com
9 12

2
Math 1530 - Chapter 2

Interval Scale Frequency Table


• Indicate order and distance in units. • Frequency table
• The Interval is a measuring tool – a listing of possible values for a variable
• But Zero point is arbitrary – together with the number of observations
• Example: a price index – or relative frequencies (%) for each value
– the number of the base year (say year 2010) is set to be usually 100
– Price of bread is 40 kn (= 100) is year 2010
– Price of bread is 50 kn (= 125) in year 2015
– We then know price of bread is higher in 2015 by 25%
• Another example of interval scale
– temperature where the initial point is always arbitrary
– O degrees is freezing point in Celsius (used in Europe)
– 32 degrees is freezing point in Fahrenheit (used in US)
13 16

Be careful to distinguish
Ratio Scale
Proportions & Percentages (Rel. Freq.)
• Ratio scales are absolute rather than relative
• If interval scale can only have an absolute zero
– then it is really a ratio scale.
• Absolute zero
– a point on scale where the attribute is zero
• Examples
– age, money and weight are ratio scales
– because they possess an absolute zero and interval
properties
– A person can’t have a negative weight or negative age Proportions and percentages are also
14 called relative frequencies. 17

Graphs for Categorical Variables


• Use pie charts and bar
graphs to summarize
Describing data using graphical categorical variables
summaries 1. Pie Chart: A circle
having a “slice of pie”
for each category
2. Bar Graph: A graph
that displays a vertical
bar for each category

15 18

3
Math 1530 - Chapter 2

Pie Charts Constructing a Histogram

• Summarize categorical 1. Divide into intervals of equal width Sodium in


variable 2. Count # of observations in each interval Cereals
• Drawn as circle where
each category is a slice

• The size of each slice is


proportional to the
percentage in that
category

19 22

Bar Graphs Constructing a Histogram


• Summarizes categorical 3. Label endpoints
variable of intervals on
• Vertical bars for each category horizontal axis
• Height of each bar represents 4. Draw a bar over
either counts or percentages each value or
• Easier to compare categories interval with
with bar graph than with pie height equal to
chart its frequency (or
percentage)
• Called Pareto Charts when
ordered from tallest to
5. Label and title
shortest
Sodium in Cereals
20 23

Histograms Interpreting Histograms


• Assess where a Left and right sides
• Graph that uses bars to distribution is are mirror images
centered by finding
portray frequencies or the median
relative frequencies for a • Assess the spread of
quantitative variable a distribution
• Shape of a
distribution: roughly
symmetric, skewed to
• Frequency is always on
vertical axis
the right, or skewed
• Intervals always on to the left
horizontal axis

21 24

4
Math 1530 - Chapter 2

Examples of Skewness Time Plots


• Display a time series, Time Plot from 1995 – 2001 of
data collected over number of people globally
time who use the Internet
• Plots observation on
the vertical against
time on the
horizontal
• Points are usually
connected

25 28

Shape: Type of Mound


Electricity demand or demand for
Height of 10 year olds seats in a restaurant different
times of day

Describing the Centre of


Quantitative Data

26 29

Outlier Mean
• The mean is the sum
An outlier falls far from the rest of the data of the observations
divided by the
number of
observations
• It is the center of
mass

27 30

5
Math 1530 - Chapter 2

Median Resistant Measures


Order Data • Midpoint of the observations • A measure is resistant if extreme observations
1 78 Order Data when ordered from least to (outliers) have little, if any, influence on its value
2 91 1 78 greatest
– Median is resistant to outliers
3 94 2 91 1. Order observations
– Mean is not resistant to outliers
4 98 3 94 2. If the number of observations
5 99 4 98 is: • Example: 75 people in class
6 101 5 99 a) Odd, the median is the – 72 people absent for 1 day year in year
7 103 6 101 middle observation (99) – 2 people absent for 50 day each
8 105 7 103 b) Even, the median is the – 1 person absent for 100 days
9 114 8 105 average of the two middle
– Median = 1 day
observations (99+101 =100)
9 114 – Mean = 2.42 days
10 121 – Mode = 1 day
31 34

Comparing the Mean and Median


• Mean and median of a symmetric distribution are close
– Mean is often preferred because it uses all data
• But in a skewed distribution, the mean is farther out in
Describing the Spread of
the skewed tail than is the median Quantitative Data
– Median is preferred because it is better representative of a
typical observation

32 35

Mode Range
Range = max – min
Two teams with same average (mean) height = 2.0m
The range is strongly affected by outliers.

• Value that occurs most often


• Highest bar in the histogram
2.1m
2.0m
1.9m
1.8m

2.2m

1.5m

1.8m

2.1m

2.5m
2.1m

• Mode is most often used with categorical data


33 36

6
Math 1530 - Chapter 2

Standard Deviation Empirical Rule: Magnitude of s


• Each data value has an associated deviation
from the mean, xx
• A deviation is positive if it falls above the
mean and negative if it falls below the mean
• The sum of the deviations is always zero

37 40

Standard Deviation

• Standard deviation gives a measure of variation by


summarizing the deviations of each observation from the
mean and calculating an adjusted average of these How Measures of Position
deviations:
1. Find mean
Describe Spread
2. Find each
deviation
3. Square deviations
4. Sum squared
deviations
5. Divide sum by n-1
6. Take square root 38
41

Properties of Sample Standard Deviation Percentile


The pth percentile is a value
such that p percent of the
observations fall below or at
that value

1. Measures spread of data


2. Only zero when all observations are same; otherwise, s > 0 70th percentile
3. As the spread increases, s gets larger
4. Same units as observations
5. Not resistant
6. Strong skewness or outliers greatly increase s
39 42

7
Math 1530 - Chapter 2

Finding Quartiles Criteria for Identifying an Outlier


• Splits the data into four parts
with same number of • An observation is a potential outlier if:
observations in each part – it falls more than 1.5 x IQR below the first quartile or
1. Arrange data in order – more than 1.5 x IQR above the third quartile.
2. The median is the second IQR: (75-25) = 50
quartile, Q2
Outlier < -25
3. Q1 is the median of the lower Outlier > 150
half of the observations
4. Q3 is the median of the upper
half of the observations

43 25 50 75 46

Measure of Spread: Quartiles 5 Number Summary


• The five‐number summary
• Quartiles divide a ranked of a dataset consists of:
data set into four equal parts: 1. Minimum value
1. 25% of the data at or below Q1= first quartile = 2.2
2. First Quartile
Q1 and 75% above
2. 50% of the obs are above 3. Median
the median and 50% are M = median = 3.4
4. Third Quartile
below 5. Maximum value
3. 75% of the data at or below
Q3 and 25% above Q3= third quartile = 4.35

44 47

Calculating Interquartile Range Boxplot


1. Box goes from the Q1 to Q3 (the IQR)
• The interquartile range is the distance between 2. Line is drawn inside the box at the median
the third and first quartile, giving spread of middle (the middle value)
50% of the data: IQR = Q3 ‐ Q1 3. Lines go from
– lower end of box to smallest observation
that’s not a potential outlier
– from upper end of box to largest observation
that’s not a potential outlier
4. Potential outliers are shown separately,
often with * or +

45 48

8
Math 1530 - Chapter 2

Comparing Distributions using Boxplots


• Boxplots do not display the shape of the distribution as
clearly as histograms
• but are useful for making graphical comparisons of two
or more distributions

1,3
1,3 m
m 1,6 m 1,9 m
49

You might also like