0% found this document useful (0 votes)
124 views

Introduction To Statistics

This document provides an introduction to statistics and key concepts. It defines statistics as the science of collecting, organizing, analyzing, and interpreting data to make decisions. It distinguishes between descriptive and inferential statistics, with descriptive focusing on summarizing data and inferential on drawing conclusions from samples. Variables are defined as characteristics that can assume different values, and are classified as quantitative or qualitative. The document also discusses populations, samples, parameters, and statistics.

Uploaded by

Laiba Zahir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views

Introduction To Statistics

This document provides an introduction to statistics and key concepts. It defines statistics as the science of collecting, organizing, analyzing, and interpreting data to make decisions. It distinguishes between descriptive and inferential statistics, with descriptive focusing on summarizing data and inferential on drawing conclusions from samples. Variables are defined as characteristics that can assume different values, and are classified as quantitative or qualitative. The document also discusses populations, samples, parameters, and statistics.

Uploaded by

Laiba Zahir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

Fourth Edition

1
Introduction to Statistics & Graphs

Chapter Outline
• Introduction:
Definition of statistics.
• Descriptive and Inferential Statistics:
Differentiate between the two branches of statistics,
Statistical terms.
• Variables and Types of Data:
Identify types of data.

Rizwan Yusuf Khan 2


Associate Professor
What is STATISTICS?
Statistics is the science of collecting, organizing,
analyzing, and interpreting data in order to make
decisions.
- Data are the values that the variables can assume.
- Variables. A characteristic that varies with an individual or an
object, is called a variable. For example age is a variable as it
varies from person to person. A variable can assume number of
values. Variables may be classified into quantitative and qualitative
(attribute) according to the form of the characteristic of interest.
When a characteristic can be expressed numerically such as age,
weight, income or number of children are called quantitative
variable. On the other hand, if the characteristic is non-numerical
such as education, gender, quality or intelligence, etc the variable is
referred to as a qualitative variable.
- Branches of Statistics
The study of statistics can be categorized into two main branches.
These branches are descriptive statistics and inferential statistics.
3
1. Descriptive Statistics 2. Inferential Statistics
Collecting, organizing, summarizing Draw conclusions about population
& display of data. based on sample data.
(A basic tool in the study of inferential
statistics is probability)

To collect data for any statistical study, a population must first be


defined. Researchers gather data from a sample. They use this
information to make inferences about the population that the sample
represents. A population consists of all subjects (human or otherwise)
that are being studied. A sample is a group of subjects selected from a
population. Thus, sample and population are relative terms.
A population is a whole and a sample is a fraction or segment of that
whole.

A specific characteristic of a Population is called Parameter.


A specific characteristic of a Sample is called Statistic.

4
Example # 1: A survey conducted among 1017 men and women by
Opinion Research Corporation found that 76% of women and 60% of
men had a physical examination within the previous year. (Source: Men’s Health)
(a) Identify the descriptive aspect of the survey.
(b) What inferences could be drawn from this survey.

Descriptive 76% of women and 60% of men had a physical examination


Statistics within the previous year.

Inferential Higher percentage of women had a physical examination


Statistics within the previous year.

5
PRACTICE QUESTION

Example # 2: A large sample of men, aged, 48 was studied for 18


years. For unmarried men, 60% to 70% were alive at age 65. For
married men, 90% were alive at age 65. Which part of the study
represents the descriptive branch of statistics? What conclusions might
be drawn from this study using inferential statistics. (Source: The Journal of Family Issues)

Married Men Unmarried Men


Descriptive
90% were alive at age 65. 60% ~ 70% were alive at age 65.
Statistics

Being married is associated with a longer life for men.


Inferential
Statistics

6
PRACTICE QUESTIONS
Identify the population and the sample. Identify a parameter and a statistic.
• In a recent survey, 3002 American • A recent survey of a sample of
adults were asked if they read news on MPhil’s reported that the average
the Internet at least once a week. Six starting salary for an MPhil is less
hundred of the adults said yes. than $65,000.

• The annual salary for each • Starting salaries for the 667 MS
employee at a company. graduates from the University of
Chicago School of Business
increased 8.5% from the previous
• The speed of every fifth car year.
passing a police speed trap.
• In 2007, the interest category for
• A survey of 1420 U.S. undergraduate 12% of all new magazines was
English majors asked which sports
Shakespearean play was most
relevant in the year 2004. • The average annual salary for 35
of a company’s 1200 accountants
• A survey of 500 students from a is $57,000.
university with 2000 students.

7
Types of Data
Data

Categorical Numerical

Discrete Continuous

8
Data Sources
Primary Secondary
Data Collection Data Compilation

Print or Electronic
Observation Survey

Experimentation

9
PRACTICE
( Basic Skills & Concepts )
• •
How is a sample related to a population?
A sample is a subset of a population.
• Why is a sample used more often than a population?
• It is usually impractical (too expensive & time consuming) to obtain all the population data.
• What is probability? Name two areas where probability is used.
• Probability deals with events that occur by chance. It is used in insurance and gambling.
• Give three reasons why samples are used in statistics.
• 1. Saves time 2. Saves money 3. Use when population is infinite.

PRACTICE
True or False
1. A statistic is a measure that describes a population characteristic.
1. False

2. A sample is a subset of a population. 2. True


3. False
3. Probability is used as a basis for descriptive statistics. 4. False

4. The number of birds in a tree is an example of a continuous variable.

10
Question # 1
One airline claims that less than 1% of its scheduled
flights out of Orlando International Airport depart late.
From a random sample of 200 flights, 1.5% were
found to depart later than the scheduled time.
(i) What is the population?
(ii) What is the sample?
(iii) What is the statistic?
(iv) Is 1.5% a parameter or a statistic?
Question # 2
Your university surveyed its students to determine
average weekly time spent surfing the Internet. From a
random sample of 174 students the average time was
computed to be 6.1 hours.
(i) What is the population?
(ii) What is the sample?
(iii) What is the statistic?
(iv) Is the value 6.1 hours a parameter or a statistic?

11
Question # 3
Determine if descriptive or inferential statistics should be used
to obtain the following information.
a. A graph that shows the number of defective bottles
produced during the day shift over one week’s time.
• Descriptive -- To describe information about a one-week sample.
b. An estimate of the percentage of employees who arrive to
work late.
• Inferential -- To estimate the true percentage of all employees who arrive to work late.
c. An indication of the relationship between years of
employee experience and pay scale.
• Inferential – To predict the relationship between years of experience and pay scale.

12
Constant.
A quantity which can assume only one value is called constant for example e = 2.71828, p = 3.14159.
Variable.
A measurable quantity which changes from one individual to another is called variable.
Discrete Variable.
A variable which can take some specific values within a given range is called discrete variable.
Continuous Variable.
A variable which can take any value within a given range is called continuous variable.
Attribute.
A characteristic which cannot be measured numerically but only its presence or absence can be described is
called an attribute.
Frequency Distribution.
A frequency distribution is a table in which the values of a variable are grouped into classes and observed
frequencies are recorded.
Observation.
Any numerical analysis or reading found after a research.
Data.
A single observation is known as datum and more than one observation is called data.
Ungrouped Data/Raw Data.
Data which have not been condensed in the form of frequency distribution are called ungrouped data or raw
data.
Grouped Data.
The data which have been condensed in the form of frequency distribution are called grouped data.
Primary Data.
Data obtained from the original source and by direct observation is called primary data.
Secondary Data.
It is a sequence of observations which have undergone any sort of statistical treatment at least once is called
secondary data.
Permutations.
An arrangement of ‘r’ objects taken from ‘n’ distinct objects in particular order is called permutation and denoted
by n P
r 13
Combinations.
An arrangement of ‘r’ objects taken from ‘n’ distinct objects without regarding any order is called
combination and denoted by n C
r
Presentation of Data.
(i) Classification of data (ii) Tabulation (iii) Graphical Display
The process of arranging observation into homogenous groups is known as classification. A table is a systematic
arrangement of data into vertical and horizontal rows and the process of arranging data into rows and columns is
called tabulation.
Aims of Classification. The main aims of classification are:
(i) To convert the large sets of data to as easily understood summary.
(ii) To provide the ground for comparison an inference.
(iii) To delete the unimportant details.
(iv) To show the points of similarity and dissimilarity, and
(v) To reflect the important aspects of the data.
Diagrammatic and Graphic Representation.
Diagrammatic and graphic representation is the pictorial representation of the numerical data. Diagrams and
graph give effective and long-lasting impresses more than simple figures do. They help the reader in
understanding the shape of the distribution of the data.
Graph means the drawing of geometrical curves in conformity with the given data. It is a representation of data by
a continuous curve. Diagram means the translation of statistical figure into geometrical figure. It is a one, two or
three dimensional form of visual representation.
Important diagrams are bar diagrams, rectangles and pie diagram. Similarly, important graphs of frequency
distribution are histogram, Pareto Chart, frequency polygon, frequency curve, ogive or cumulative frequency
polygon and historigram.
Model.
A model is a relationship between variables, which are intended to represent a real-life process, situation or
problem.
Decimal Places.
The figures to the right of the decimal point that are required to express the magnitude of a number to a specified
degree of accuracy. For example 3.1415927 written to four decimal places is 3.1416; i.e., there are four
digits after the decimal point, namely 1, 4, 1 and 6 (where the final digit represents a rounding-up or –down 14
of any subsequent digits).
Organizing Numerical Data

Numerical Data 41, 24, 32, 26, 27, 27, 30, 24, 38, 21

Ordered Array Frequency Distributions


21, 24, 24, 26, 27, 27, 30, 32, 38, 41

Stem and Leaf Histograms Ogive


Display 2 144677
3 028 Tables Polygons
4 1 15
Steps for constructing a frequency distribution:
• Find the highest value: H and lowest value: L
• Find the range R: R=H-L

• Select the number of classes desired: (usually between 5 and 20)


R
• Find the class width (h) h= no of classes
( rounding the result up to the nearest whole number)

• Select a starting point for the lowest class limits. This can be
smallest data value. Add the width to the lowest data value to
get the lower limit of the next class. Keep adding in order to get
the desired no of classes. Subtract one unit from the lower limit
of the second class to get the upper limit of the first class. Then
add the width to each upper limit to get all the upper limits.
16
• Tally the data
• Find the numerical frequencies from the tallies
• Find the class boundaries
• Find the cumulative frequencies
• Find the class mark (X)
• Find the relative frequency

Example # 3:
(a) The following data represent the record high temperatures for each of the 50 states. Construct
frequency distribution table with 7 classes.
(b) Find the Class marks, relative frequencies, class boundaries & cumulative frequencies.
112 100 127 120 134 118 105 110 109 112 110 118 117 116
118 122 114 114 105 109 107 112 114 115 118 117 118 122
106 110 116 108 110 121 113 120 119 111 104 111 120 113
120 117 105 110 118 112 114 114

17
Solution
H = 134, L = 100, R = H – L = 134 – 100 = 34
No of classes = 7
h = R / (no of classes) = 34 / 7 = 4.9 ≈ 5 (rounded up).
Select the starting point for the lowest class limit. In this case 100 is used. Add
the width (h) to the starting point, keep adding until there are 7 classes.
Frequency Distribution Table

Relative Class
Classes Tally f X Frequency Boundaries C
100 - 104 ∕∕
2 102 0.04 99.5 – 104.5 2
105 - 109 ∕∕∕∕ ∕∕∕ 8 107 0.16 104.5 – 109.5 10
110 - 114 ∕∕∕∕ ∕∕∕∕ ∕∕∕∕ ∕∕∕ 18 112 0.36 109.5 – 114.5 28
115 - 119 ∕∕∕∕ ∕∕∕∕ ∕∕∕ 13 117 0.26 114.5 – 119.5 41
120 - 124 ∕∕∕∕ ∕∕ 7 122 0.14 119.5 – 124.5 48
125 - 129 ∕ 1 127 0.02 124.5 – 129.5 49
130 - 134 ∕ 1 132 0.02 129.5 – 134.5 50
Totals 50 - 1.00 - -
18
Example # 4: From the following data construct frequency distribution,
using 6 class. Indicate the class boundaries and cumulative frequencies.
54.6 59.1 70.5 68.1
68.5 60.4 64.0 59.2
60.2 62.1 59.1 59.2
67.0 57.1 55.1 55.9
48.5 59.3 64.0 63.0
Solution
H = 70.5, L = 48.5, R = H – L = 70.5 – 48.5 = 22
No of classes = 6
𝑅 22
ℎ= = = 3.7 ≅ 𝟒
𝑛𝑜 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 6
Selecting the starting value as a lowest limit = 48.5, keep add 4
until there are 6 classes.
Subtract one unit from the lower limit of the second class to set the
upper-class limit of first class = 52.4, then add h to each upper limit
to get all the upper limits. Tally the data. Find the numerical
frequencies from the tallies.

19
Class
Class limits Tally f C
Boundaries
48.5 – 52.4 I 1 48.45 – 52.45 1
52.5 – 56.4 III 3 52.45 – 56.45 4
56.5 – 60.4 IIII IIII 9 56.45 – 60.45 13
60.5 – 64.4 III 3 60.45 – 64.45 16
64.5 – 68.4 II 2 64.45 – 68.45 18
68.5 – 72.4 II 2 68.45 – 72.45 20
Total – 20 – –

Question # 4.
From the following data construct frequency distribution table with six classes.
Also find the Class marks, relative frequencies, class boundaries & cumulative
frequencies. 13, 6, 9, 3, 19, 8, 10, 11, 12, 8, 14, 16, 17, 10, 2, 5, 9, 7, 6, 10.

20
Exercises
Question # 5.
Find the class boundaries, class marks and class widths for the
following intervals.
(i) 7 – 13 (ii) 10.4 – 18.7 (iii) (-5) – (-1)
(iv) (-2.75) – 1.35 (v) 0.346 – 0.418 (vi) 78.49 – 86.72

Question # 6.
In a music competition, students are asked to rate the music on
five points scale A, B, C, D, E, where A represents the maximum
enjoyment and E represents minimum enjoyment. The ratings are
A, D, A, D, E, B, C, D, A, B, B, C, E, A, C, E, C, A, B, E, D, E, B,
A, B, E, E, C, B, A. Construct a frequency distribution for the
above rating.

Question # 7.
From the following data construct frequency distribution table with
five classes. Also find the Class marks, relative frequencies, class
boundaries & cumulative frequencies.
17, 14, 15, 14, 13, 10, 14, 7, 8, 10, 6, 25, 18, 21.
Answer: Here range is 25 – 6 = 19 & no of classes = 5, so h = 19/5 = 3.8 ≈ 4. Classes 6 – 9, 10 – 13, 14 – 17, 18 – 21, 22 – 25. Frequency f 3, 3, 5, 2, 1 21
= 14, X = 7.5, 11.5, 15.5, 17.5, 23.5. R.f = 3/14, 3/14, 5/14, 2/14, 1/14. C.B = 5.5 – 9.5, 9.5 – 13.5, 13.5 – 17.5, 17.5- 21.5, 21.5- 25.5.
1. Histogram. Histogram is a bar graph of a frequency
distribution.
2. Frequency A frequency polygon is a line graph of a
Polygon. frequency distribution in which the frequencies
are plotted against the midpoints of the classes.
3. Ogive. A graph showing the cumulative frequencies
against the upper class boundaries is called
Ogive or cumulative frequency polygon.
4. Pie Chart. A pie chart is a circle that is divided into sectors
according to the percentage of frequencies in
each category of the distribution.
5. Time Series Time series graph represents data over a specific
Graph. period of time.
6. Pareto This chart is used to represent a frequency
Chart. Distribution for a categorical variable, & the
frequencies are displayed by the heights of
vertical bars, which are arranged in order from
22
highest to lowest.
Example # 5: Using example # 3 construct Histogram, frequency polygon,
& ogive.

18
Histogram
15

12
Frequency

0
x
99.5 104.5 109.5 114.5 119.5 124.5 129.5 134.5

Temperature
23
y
18
Frequency Polygon
F 15
r
e 12
q
u 9
e
n 6
c
y 3

0
x
102 107 112 117 122 127 132
Temperature

24
y Ogive

50

40
Cumulative Frequency

30

20

10

0
x
99.5 104.5 109.5 114.5 119.5 124.5 129.5 134.5

Temperature

25
Question # 8.
Using the following histogram.
(i) Construct a frequency distributions include Class limits, class
frequencies, midpoints and cumulative frequencies.
(ii) How many values are in the class 27.5 – 30.5?
(iii) How many values fall between 24.5 and 36.5?
(iv) How many values are below 33.5?
(v) How many values are above 30.5?

26
Example # 6: The following table lists the number of cellular telephone
subscribers in millions. Construct a time series graph for the number of cellular
subscribers. What can you conclude? Year Subscribers
(in millions)

2001 5.3
2002 7.6
2003 11.0
2004 16.0
2005 24.1
2006 33.8
45 2007 44.0

35

25

15

0
2001 2002 2003 2004 2005 2006 2007
From the graph we can see that the number of subscribers are increasing since 2001. Recent years 27
show greater increases.
Example # 7: Construct a Pareto chart & pie chart for the total investment of the various types of categories
during the year 2005.

Investment Investor A Investor B Investor C Total


Category
CD 15.5 20
20 13.5 49
Stocks 46.5 55
55 27.5 129
Bonds 32 44
44 19 95
Savings 16 28
28 77 51
51

Total - - - -

Solution

28
Pie Table
Investment Investor A Investor B Investor C Total Percentages Angle of Sectors Cumulative
Category Investment (Degrees) Angles.
Bonds 32 44 19 95 95 / 324 = 0.29 0.29 x 360 = 104.4 104.4
Stocks 46.5 55 27.5 129 129 / 324 = 0.40 0.40 x 360 = 144 248.4
Savings 16 28 7 51 51 / 324 = 0.16 0.16 x 360 = 57.6 306
CD 15.5 20 13.5 49 49 / 324 = 0.15 0.15 x 360 = 54 360

Total - - - 324 1.00 360

Pie Chart

15%
CD 29%
Bonds
16%
Savings

40%
Stocks

Bonds Stocks Savings CD


29
Question # 9 Complete the following frequency distribution table and find class
boundaries & class mark.

Cumulative
Classes Frequency Relative Frequency
Frequency

0.7312 – 0.7313 - - -
0.7314 – 0.7315 23 - 29
0.7316 – 0.7317 - 0.34 -
0.7318 – 0.7319 17 0.17 -
0.7320 – 0.7321 - - 92
0.7322 – 0.7323 - - -
Total 100 1.00

Question # 10 Find the missing entries in the following frequency distribution table.
Class f Relative Cumulative Cumulative
Limits Frequency Frequency Percentage
8 to – – – – 25
– to – – 0.05 – –
– to – – – 9 –
– to – – 0.30 15 –
– to 32 – – – –
– –
Answer: Here range is 32 – 8 = 24 & no of classes = 5, so h = 24/5 = 4.8 ≈ 5. Thus limits are 8 to 12, 13 to 17, …, 28 to 32. The f against 4th class is
15 – 9 = 6. Let the total frequency be X. Then 0.30X = 6 gives X = 20. Relative frequency 0.05 gives a f = 1. f = 5,1,3,6,5. Rf = 0.25,0.05,0.15,0.30,0.25. 30
C = 5,6,9,15,20. C%age = 25,30,45,75,100.
Question # 11
Shown here are four frequency distributions. Each is incorrectly constructed. State the
reason why.
(a) Class Frequency (b) Class Frequency
27 – 32 1 5–9 1
33 – 38 0 9 – 13 2
39 – 44 6 13 – 17 5
45 – 49 4 17 – 20 6
50 – 55 2 20 - 24 3

(c) Class Frequency (d) Class Frequency


123 – 127 3 9 – 13 1
128 – 132 7 14 – 19 6
138 – 142 2 20 – 25 2
143 – 147 19 26 – 28 5
29 – 32 9

Answer: (a) Class width is not uniform. (b) Class limits overlap, and class width is not uniform. (c) A class has been omitted.
(d) Class width is not uniform. 31
Stem and Leaf Display
A stem and leaf is a table used to display data. The 'stem' is on the left
displays the first digit or digits. The 'leaf' is on the right and displays the
last digit. In EDA (exploratory data analysis) stem and leaf display is the
most useful technique which gives us the rank order of the items in the
data set and the shape of distribution. It is convenient method to display
every piece of data by showing the digits of each number.
• In a stem-and leaf plot, the greatest common place value of the
data is used to form stems (leading digits).
• The numbers in the next greatest place-value position are then
used to form the leaves (trailing digits).
• Rearrange the leaf in numerical order from least to greatest.
The stem and leaf plot which is similar to a histogram is a method of
organizing data and is a combination of sorting and graphing. It has the
advantage over grouped frequency distribution of retaining the actual
data while showing them in graphic form.

32
Example # 8
At an outpatient testing center, a sample of 20 days showed the following
number of cardiograms done each day. Construct a stem and leaf plot for
the data and draw conclusion. 25, 14, 36, 32, 31, 43, 32, 52, 20, 2, 33, 44,
32, 57, 32, 51, 13, 23, 44, 45.
Solution
Arrange the data in order: 02, 13, 14, 20, 23, 25, 31, 32, 32, 32, 32, 33, 36, 43, 44,
44, 45, 51, 52, 57. A display can be made by using the leading and trailing digit. For
example, for the value 32, 3 is the leading digit and 2 is the trailing digit. Now a plot
can be constructed as shown below.
Leading digit (stem) Trailing digit (leaf) Frequency
0 2 1
1 3 4 2
2 0 3 5 3
3 1 2 2 2 2 3 6 7
4 3 4 4 5 4
5 1 2 7 3
From the above plot we can conclude that the distribution peaks in the center and
that there are no gaps in the data. For 7 of the 20 days, the number of patients
receiving cardiograms was between 31 and 36. The plot also shows that testing
center treated from a minimum of 2 patients to a maximum of 57 patients in 33
any one day.
Question # 12
Circle the correct option i.e. A / B / C / D.
(1) If there is no gap between consecutive classes, the limits are called:
(A) Class limits (B) Class boundaries
(C) Class intervals (D) Class marks
(2) Data arranged in ascending or descending order of magnitude is called:
(A) Ungrouped data (B) Arrayed data
(C) Grouped data (D) Discrete frequency distribution
(3) The class interval (h) is the difference between:
(A) Two extreme values (B) Two successive frequencies
(C) Two largest values (D) Two successive upper limits
(4) The number of tally sheet count for each value or a group is called:
(A) Class width (B) Frequency
(C) Class boundary (D) Class limit
Answer: 1. B 2. B 3. D 4. A
Example # 9
Here are the scores from two periods of math class. Students
took the same test. Prepare stem and leaf plot for the data.
Period 1: 77 79 85 58 97 94 82 81 75 63 60 92 75 98 83 58
72 57 70 81
Period 2: 57 60 88 85 79 70 65 98 97 59 58 65 62 77 77 75
73 69 82 81 34
Stem and Leaf Display
Solution

Notice that the data (numerical facts) are numbers between 57-98.
Create the stem by listing numbers from 5-9.

Period 1: 76 79 85 58 97 94 82 81 75 63 60 92 Rearrange
the leaf in
75 98 83 58 72 57 70 81 numerical
order from
Stem Leaf least to

A key 5 8 8 7 Stem Leaf greatest

should be
included 6 3 0 5 7 8 8
when
making a 7 6 9 5 5 2 0 6 0 3
stem-
and-leaf 8 5 2 1 3 1 7 0 2 5 5 6 9
plot.
9 7 4 2 8 8 1 1 2 3 5
Key: 7 | 9 means 79
9 2 4 7 8
Match up the data to the stem-and-leaf. The last digit in 76 will match up with the stem 7. Then the last
digit in 79 will match up with the stem 7. Then the last digit in 85 will match up with the stem 8 and this 35
pattern will continue until all data have been recorded in the stem-and-leaf.
Stem and Leaf Display
Notice that the data (numerical facts) are numbers between 57-98. Create the
stem by listing numbers from 5-9.

Period 2: 57 60 88 85 79 70 65 98 97 59 58 65 Rearrange
the leaf in
62 77 77 75 73 69 82 81 numerical
order from
Stem
least to
Leaf Stem Leaf greatest

5 7 9 8 5 7 8 9
6 0 5 5 2 9 6 0 2 5 5 9
7 9 0 7 7 5 3 7 0 3 5 7 7 9
8 8 5 2 1 8 1 2 5 8
9 8 7 Key: 7 | 9 means 79 9 7 8
Match up the data to the stem-and-leaf. The last digit in 57 will match up with the stem 5. Then the
last digit in 60 will match up with the stem 6. Then the last digit in 88 will match up with the stem 8
and this pattern will continue until all data have been recorded in the stem-and-leaf. 36
Example # 10
The table lists the maximum running speeds of various animals. Display
the speeds of the animals in a ordered stem-and-leaf plot.

SOLUTION

STEP 1 Choose the stems and leaves. The numbers range from 32 to 70, so let the
stems be the tens’ digits from 3 to 7. Let the leaves be the ones’ digits.

STEP 2 Write the stems first. Draw a vertical line segment next to the stems. Then
record each speed by writing its ones’ digit on the same line as its
corresponding tens’ digit. 37
EXAMPLE 1

Stem and Leaf Display

STEP 3 Make an ordered stem-and-leaf plot. Include a key to


show what the stems and leaves represent.

Unordered Plot Ordered Plot

3 92 29
3
4 5703
4 0357
5 0
5 0
7 0 0
7

Key: 4 7 = 47 Key: 4 7 = 47

38
EXAMPLE 2

Example # 11
Bicycle Stunt Competition
The point totals (rounded to the nearest tenth) for the 20 participants in a
bicycle stunt competition are listed below. The rider with the greatest point
total out of 100 points wins. Display a ordered stem-and-leaf plot. Make a
conclusion about the data.

89.4 90 87.5 84.3 89.7 90.3 91.1 91 86 84.1


89.2 86 89.1 88.2 89.5 85.6 90.5 90.2 91.1 88.9
Solution

Begin by making an unordered stem-and-leaf plot. Because the point totals


range from 84.1 to 91.4, the stems are the digits in the tens’ and ones’
places. The leaves are the digits in the tenths’ place.

39
Then make an ordered stem-and-leaf plot.
EXAMPLE 2

Unordered Plot Ordered Plot


84 3 1 84 1 3
85 6 85 6
86 0 0 86 0 0
87 5 87 5
88 2 9 88 2 9
89 4 7 2 1 5 89 1 2 4 5 7
90 0 3 5 2 90 0 2 3 5
91 1 0 1 91 0 1 1
Key: 87 5 = 87.5 Key: 87 5 = 87.5
More than half of the participants finished near the top of the range, with
12 of the 20 participants having point totals greater than or equal to 89.

40
Related distributions can be compared using a back-to-back
stem and leaf plot. The back-to-back stem and leaf plot uses
the same digits for the stems of both distributions, but the
digits that are used for the leaves are arranged in order out
from the stems on both sides. The next example shows a
back-to-back stem and leaf plot.
Example # 12
The number of stories in two selected samples of tall buildings in Atlanta
and Philadelphia are shown. Construct a back-to-back stem and leaf plot
and compare the distribution.
Atlanta Philadelphia
55 70 44 36 40 61 40 38 32 30
63 40 44 34 38 58 40 40 25 30
60 47 52 32 32 54 40 36 30 30
50 53 32 28 31 53 39 36 34 33
52 32 34 32 50 50 38 36 39 32
26 29 41
Solution
Arrange the data for both the data sets in order. Construct a stem and leaf
plot using the same digits as stems. Place the digits for the leaves for Atlanta
on the left side of the stem and the digits for the leaves for Philadelphia on
the right side as shown.
Atlanta Philadelphia

986 2 5
8644222221 3 000022346668899
74400 4 0000
532200 5 0348
30 6 1
0 7
Compare the distribution. The buildings in Atlanta have a larger variation in
the number of stories per building. Although both distributions are peaked in
the 30- to 39- story class, Philadelphia has more buildings in this class.
Atlanta has more buildings that have 40 or more stories than Philadelphia.

42
Question # 13
A teacher asked 10 of her students how many books they had read in the
last 12 months. Their answers were as follows:
12, 23, 19, 6, 10, 7, 15, 25, 21, 12. Prepare a stem and leaf plot.
Answer: Tip: The number 6 can be written as 06, which means that it has a stem of 0 and a leaf of 6. (stem: 0 1 2, leaf: 6 7, 0 2 2 5 9, 1 3 5

Question # 14
The weights (to the nearest tenth of a kilogram) of 30 students were
measured and recorded as follows:
59.2, 61.5, 62.3, 61.4, 60.9, 59.8, 60.5, 59.0, 61.1, 60.7, 61.6, 56.3, 61.9,
65.7, 60.4, 58.9, 59.0, 61.2, 62.1, 61.4, 58.4, 60.8, 60.2, 62.7, 60.0, 59.3,
61.9, 61.7, 58.4, 62.2
Prepare an ordered stem and leaf plot for the data. Answer: (stem: 56 58 59 60 61 62 65)
Question # 15
The data shown represent the percentage of unemployed males and
females in 2005 for a sample of countries of the world. Using the whole
numbers as stems and the decimals as leaves, construct a back-to-back
stem and leaf plot and compare the distributions of the two groups.
Females Males
Answer: (stem: 0 1 2 3 4 5 6 7 8 9) this
8.0 3.7 8.6 5.0 7.0 8.8 1.9 5.6 4.6 1.5 distribution for unemployed males is more
3.3 8.6 3.2 8.8 6.8 2.2 5.6 3.1 5.9 6.6 variable than the distribution for unemployed
females. There are more unemployed
9.2 5.9 7.2 4.6 5.6 9.8 8.7 6.0 5.2 5.6 females than males world-wide.
5.3 7.7 8.0 8.7 0.5 4.4 9.6 6.6 6.0 0.3 43
6.5 3.4 3.0 9.4 4.6 3.1 4.1 7.7
Measurement Scales
Measurement: A set of rules for assigning numbers to represent
objects, traits, attributes, or behaviors
Scale of Measurement: A system or scheme for assigning
values or scores to the characteristic being measured
 The four scales of measurement are nominal, ordinal, interval,
and ratio. Each scale of measurement has its own limitations.

44
1. Nominal Scale ( )
 The simplest of the four scales. Used for identification but have no intrinsic
order (lesser to greater) such as gender, ethnicity, marital status and most
string variables.
 Categories are typically mutually exclusive (do not overlap)
 Numbers may be assigned to each category for ease of interpretation, but
the numbers are arbitrary values and not ordered, and thus should not be
manipulated or ranked (e.g., Male = 1, Female = 2 in an SPSS spreadsheet)
 Examples are Gender (Male, Female), Eye colour, Religion, Specialization
(Major field), Nationality, Zip code, Political affiliation, Species classification
2. Ordinal Scale ( )
 Allows one to rank people or objects according to the amount or quantity of
a characteristic they possess
 Provide more information than nominal scales
 Traditionally, ranking is ordered from “most” to “least”
 Intervals between the ranks may not be consistent (for example the 1st
rank student may have a score of 95, the 2nd rank may have 90, and the
3rd rank may have 89) .
 Other examples include Grades (A, B, C, D,F), Position (1st, 2nd, 3rd etc.),Ranking
of cricket player, Rating (poor, good, excellent), Socio-economic status (upper,
middle, lower), size(small, medium, large), Educational attainment (elementary, high school,
college, graduate), Order of child in the family (eldest, second eldest … youngest), Wind 45
intensity.
3. Interval Scale ( )
 Provides more information that either nominal or ordinal scales
 Allows you to rank people or objects like an ordinal scale, but on a scale
with equal units (so 81, 82, and 83 are equidistant from each other)
 Many educational and psychological tests are designed to produce
interval level scores
 Interval level data can be manipulated using math (addition, subtraction,
multiplication, and division) and most statistical procedures – typical of
teacher-made tests
 However, these scales do not have a true zero point (a score of zero on a
test does not mean an attribute, like understanding, is completely absent )
 Examples are Temperature, IQ score, SAT score, Wind speed.
4. Ratio Scale ( )
 Have the properties of interval scales along with a true zero point
 Examples include miles per hour, length, and weight
 Can be used to interpret ratios between scores (e.g, 24 is twice of 12, and
4 times of 6).
 Other examples include Age, Time, Salary, Distance.
Some scales of measurement have a natural zero and some do not. For example, height, weight etc have
a natural 0 at no height or no weight. Consequently, it makes sense to say that 2miles is twice as large as
1mile. Both of these variables are ratio scale. On the other hand, year and temperature
(C for centigrade) do not have a natural zero. 46
Types of Scale & Tests

47
48
49
50
Question # 16
Classify each as nominal-level, ordinal-level, interval-level or ratio-level measurement.
(i) Weights of suitcases on a selected commercial airline flight.
(ii) Number of exams given in a statistics course.
(iii) Rating of word-processing programs as user-friendly.
(iv) Classification of students according to major field.
(v) IQ score.
(vi) Ages of students.
(vii) Marital status
(viii) Temperature of the city.
(ix) Rating of movies.
(x) Salaries of the top five executives in bank.
(xi) Nationality
(xii) Level of conflict
(xiii) Group size
(xiv) Body length
(xv) No of children
(xvi) Year of birth
(xvii) Favorite animal

Answer: (i) Ratio (ii) Ratio (iii) Ordinal (iv) Nominal (v) Interval (vi) Ratio (vii) Nominal (viii) Interval
(ix) Ordinal (x) Ratio (xi) Nominal (xii) Ordinal (xiii) Ratio (xiv) Ratio (xv) Ratio (xvi) Interval
(xvii) Ordinal 51
Question # 17

The scale is ordinal. There is an inherent


ordering in that a Major is higher than a
Captain, which is higher than a Lieutenant.

Since clothes are categorized and have no


inherent order, the scale is nominal.

The scale is interval because there are


equal intervals between temperatures but
no true zero point.
52
It is ordinal because higher scores are better
than lower scores. However, there is no
guarantee that the difference between, say, a
2 and a 3 represents the same difference in
knowledge as the difference between a 4 and
a 5.

Most statisticians agree that it is


valid to compute means of ordinal
data, although some vehemently
disagree.

53
CRITICAL THINKING PROBLEM
( No # 1 )

WATER-UTILITY COMPANY

What is misleading about this graphic that was used by a water-utility


company in its 2002 annual report to show the growth in its customer base?
255 Answer: Because the origin is at
250 230, rather than 0, a
245 visual impression of great
240 customer growth is
235 created. The height of the
230 water in the 2006 graph is
four times the height of
2002 2003 2004 2005 2006 the water in the 2002
235 237 245 247 250 graph.
Number of Metered Water Customers (thousands)

Answer # 7: Here range is 32 – 8 = 24 &


no of classes = 5, so h = 24/5 = 4.8 ≈ Class Limits f Relative Cumulative Cumulative
Frequency Frequency Percentage
5. Thus limits are 8 to 12, 13 to 17, 8 to – 5 0.25 5 25
…, 28 to 32. The f against 4th class is – to – 1 0.05 6 30
15 – 9 = 6. Let the total frequency be – to – 3 0.15 9 45
X. Then 0.30X = 6 gives X = 20. – to – 6 0.30 15 75
Relative frequency 0.05 gives a f = 1 – to 32 5 0.25 20 100 54
20 1
CRITICAL THINKING PROBLEM
( No # 2 )

MISLEADING GRAPHS

Explain why the graph is misleading. Redraw the graph so that it is not misleading.
2nd quarter
(i) Company Sales 15% Answer: The pie chart
should be
1st 2nd 3rd 4th
displaying all
quarter quarter quarter quarter
four quarters,
not just the first
20% 15% 45% 20% three.

(ii)
Answer When data is
Company Sales taken at
regular
intervals
120
over a period
110 of time, a
time series
100
chart should
90 be used.
3rd 2nd 1st 4th
Quarters
55
Homework
EXERCISES. (Statistics for Business & Economics, Newbold, 6th Edition)

• Exercises on Page # 30.


• Problems #: 2.30 ~ 2.34.
• Exercises on page # 42 & 43.
• Problems #: 2.47, 2.48, 2.51, 2.52, 2.53, 2.55, 2.56

EXERCISES. (Elementary Statistics, Bluman, 5th Edition)

• Exercises on Page # 58 & 59.


• Problems #: 3, 4, 5, 19, 20.
• Exercises on page # 77.
• Problems #: 1, 3, 6, 8, 10, 11, 13, 14.

56
Definition of Quantitative Research
Quantitative research is defined as a systematic investigation of phenomena by
gathering quantifiable data and performing statistical, mathematical, or
computational techniques. Quantitative research collects information from existing
and potential customers using sampling methods and sending out online surveys,
online polls, questionnaires etc., the results of which can be depicted in the form of
numerical.
There are four main types of quantitative research.
1. Descriptive
2. Correlational
3. Quasi-experimental and
4. Experimental.
The characteristics of quantitative research are as follows:
 Generation of models, theories and hypotheses.
 Collecting empirical data.
What are the characteristics of quantitative
 Modelling of data. research?
Its main characteristics are: The data is usually gathered
 Analysis of data. using structured research instruments. The results are
based on larger sample sizes that are representative of the
 Experimental control. population. The research study can usually be replicated or
 Variable manipulation. repeated, given its high reliability.

 Development of instruments. 57
 Measurement methods.
Quantitative Advantages
Controlled, objective testing and experimentation ultimately supports or
rejects your hypotheses. Each step is standardized to reduce bias when
collecting and analyzing data. A big advantage of this approach is that the
results are valid, reliable and generalizable to a larger population.
What is the difference between quantitative research and
qualitative research?
Quantitative research deals with numbers and statistics, while
qualitative research deals with words and meanings. Each of these
types of research has different objectives and methods, and both are
important for gaining different kinds of knowledge.

58
59
60

You might also like