Lecture-2 & 3
Lecture-2 & 3
Probability Theory
Lecture-2 & 3
12
Table 2.1 Ages of 50 students
21 19 24 25 29 34 26 27 37 33
18 20 19 22 19 19 25 22 25 23
25 19 31 19 23 18 23 19 23 26
22 28 21 20 22 22 21 20 19 21
25 23 18 37 27 23 21 25 21 24
13
Table 2.2 Status of 50 Students
J F SO SE J J SE J J J
F F J F F F SE SO SE J
J F SE SO SO F J F SE SE
SO SE J SO SO J J SO F SO
SE SE F SE J SO F J SO SO
14
ORGANIZING AND GRAPHING
QUALITATIVE DATA
• Frequency Distributions
• Relative Frequency and Percentage
Distributions
• Graphical Presentation of Qualitative Data
– Bar Graphs
– Pie Charts
15
TABLE 2.3 Type of Employment Students Intend to
Engage In
Number of
Variable Type of Employment Students
Private companies/businesses 44 Frequency
column
Federal government 16
State/local government 23
Categor Own business 17 Frequenc
y y
Sum = 100
16
Frequency Distributions
Definition
A frequency distribution for qualitative
data lists all categories and the number
of elements that belong to each of the
categories.
17
Example 2-1
A sample of 30 employees from large
companies was selected, and these
employees were asked how stressful
their jobs were. The responses of these
employees are recorded next where
very represents very stressful,
somewhat means somewhat stressful,
and none stands for not stressful at all.
18
Example 2-1
Some None Somewh Very Very None
what at
Very Somewh Somewh Very Somewha Somewhat
at at t
Very Somewh None Very None Somewhat
at
Somewha Very Somewh Somewh Very None
t at at
Somewha Very very somewh None Somewhat
t at
Construct a frequency distribution table for
these data.
19
Solution 2-1
Table 2.4 Frequency Distribution of Stress on Job
20
Relative Frequency and Percentage Distributions
21
Relative Frequency and Percentage Distributions cont.
Calculating Percentage
22
Example 2-2
Determine the relative frequency and
percentage for the data in Table 2.4.
23
Solution 2-2
Table 2.5 Relative Frequency and Percentage
Distributions of Stress on Job
24
Example R Code for Categorical Data Frequency
Distribution
Let's assume you have survey data where respondents are asked about
their favorite programming language:
Definition
A graph made of bars whose heights
represent the frequencies of respective
categories is called a bar graph.
29
Figure 2.1 Bar graph for the frequency distribution of
Table 2.4
16
14
12
Frequency
10
8
6
4
2
0
Very Somewhat None
Strees on Job
30
Graphical Presentation of Qualitative Data cont.
Definition
A circle divided into portions that
represent the relative frequencies or
percentages of a population or a
sample belonging to different
categories is called a pie chart.
31
Table 2.6 Calculating Angle Sizes for the Pie Chart
Relative
Stress on Job Angle Size
Frequency
Very .333 360(.333) =
Somewhat .467 119.88
None .200 360(.467) =
168.12 360(.200)
= 72.00
Sum = 1.00 Sum = 360
32
Figure 2.2 Pie chart for the percentage distribution of
Table 2.5.
None, 20%
Very,
33.30%
Somewhat,
46.70%
33
Graphical Presentation of Data
TYPES OF DATA
Qualitative Quantitative
Frequency
Curve 34
Presentation of Qualitative
Data
Qualitative
Univariate Bivariate
Frequency Frequency
Table Table
Percentages
Component Multiple
Bar Chart Bar Chart
Pie Chart
Bar Chart
1200
800
600
400
200
0
1 2
36
Dividing the cell frequencies by the total frequency and
multiplying by 100 we obtain the following:
Medium of
f %
Institution
Urdu 719 59.9 = 60%
English 481 40.1 = 40%
1200
37
PIE CHART
Medium of
f Angle
Institution
Urdu 719 215.70
ENGLISH 481 144.30
1200
Urdu
215.70
English
144.30
38
Bivariate Data:
Suppose that along with the enquiry about the Medium of Institutio
you are also recording the gender of the student.
50,000
40,000
30,000
20,000
10,000
0
1965 1966 1967 1968 1969 41
MULTIPLE BAR CHART
Suppose we have information regarding the imports and exports of Pak
for the years 1970-71 to 1974-75 as shown in the table below:
Imports Exports
Years
(Crores of Rs.) (Crores of Rs.)
1970-71 370 200
1971-72 350 337
1972-73 840 855
1973-74 1438 1016
1974-75 2092 1029
42
Multiple Bar Chart Showing Imports & Exports of Pakistan 1970-71 to 1974-75
2500
2000
1500
1000 Imports
Exports
500
0
1
5
-7
-7
-7
-7
-7
70
71
72
73
74
19
19
19
19
19 43
ORGANIZING AND GRAPHING QUANTITATIVE DATA
• Frequency Distributions
• Constructing Frequency Distribution
Tables
• Relative and Percentage Distributions
• Graphing Grouped Data
– Histograms
– Polygons
44
Frequency Distributions
Table 2.7 Weekly Earnings of 100 Employees of a
Company
Weekly Earnings Number of Employees Frequency
Variable
(dollars) f column
401 to 600 9
601 to 800 22
801 to 1000 39 Frequency of
Third class
the third class
1001 to 1200 15
1201 to 1400 9
1401 to 1600 6
45
Frequency Distributions cont.
Definition
A frequency distribution for
quantitative data lists all the classes and
the number of values that belong to
each class. Data presented in the form
of a frequency distribution are called
grouped data.
46
Example
Let’s say we have data representing the response times (in
milliseconds) of a server to user requests. The dataset contains the
following times:
102, 150, 178, 199, 102, 220, 145, 178, 204, 190, 175, 160, 120, 210,
198, 145, 102, 150
For example, the relative frequency for the class 100–119 is:
# Sample dataset
data <- c(102, 150, 178, 199, 102, 220, 145, 178, 204, 190, 175, 160,
120, 210, 198, 145, 102, 150)
# Define the number of bins (intervals)
breaks <- seq(100, 240, by=20) # Define breaks for class intervals
53
Table 2.9 Home Runs Hit by Major League Baseball
Teams During the 2002 Season
54
Solution 2-3
55
Solution 2-3
The lower limit of the first class can be taken as 124
or any number less than 124. Suppose we take 124
as the lower limit of the first class. Then our classes
will be
124 – 145, 146 – 167, 168 – 189, 190 – 211,
and 212 - 233
56
Table 2.10 Frequency Distribution for the Data of
Table 2.9
58
Example 2-4
Calculate the relative frequencies and
percentages for Table 2.10
59
Solution 2-4
Table 2.11 Relative Frequency and Percentage
Distributions for Table 2.10
Total
Home Relative
Class Boundaries Percentage
Frequency
Runs
124 – 145 123.5 to less than .200 20.0
146 – 167 145.5 .433 43.3
168 – 189 145.5 to less than .133 13.3
190 – 211 167.5 .133 13.3
212 - 233 167.5 to less than .100 10.0
189.5
189.5 to less than
211.5
60
211.5 to less than
Graphing Grouped Data
Definition
A histogram is a graph in which classes are marked
on the horizontal axis and the frequencies, relative
frequencies, or percentages are marked on the
vertical axis. The frequencies, relative frequencies, or
percentages are represented by the heights of the
bars. In a histogram, the bars are drawn adjacent to
each other.
61
Figure 2.3 Frequency histogram for Table 2.10.
15
12
Frequency
0
124 146 168 - 190 212 -
- - 189 - 233
62
Total167
145 home runs211
Figure 2.4 Relative frequency histogram for Table
2.10.
.40
.30
.20
.10
0
124 146 168 - 190 212 -
- - 189 - 233
63
Total167
145 home runs211
Graphing Grouped Data cont.
Definition
A graph formed by joining the
midpoints of the tops of successive bars
in a histogram with straight lines is
called a polygon.
64
Example R Code for Histogram
12
Frequency
0
124 146 168 - 190 212 -
- - 189 - 233
67
145 167 211
SHAPES OF HISTOGRAMS
1. Symmetric
2. Skewed
3. Uniform or rectangular
68
Figure 2.8 Symmetric histograms.
69
Figure 2.9 (a) A histogram skewed to the right. (b) A
histogram skewed to the left.
(a) (b)
70
Figure 2.10 A histogram with uniform distribution.
71
Figure 2.11 (a) and (b) Symmetric frequency curves. (c) Frequency curve skewed to
the right. (d) Frequency curve skewed to the left.
72
CUMULATIVE FREQUENCY DISTRIBUTIONS
Definition
A cumulative frequency distribution
gives the total number of values that
fall below the upper boundary of each
class.
73
Example 2-7
Using the frequency distribution of
Table 2.10, reproduced in the next slide,
prepare a cumulative frequency
distribution for the home runs hit by
Major League Baseball teams during
the 2002 season.
74
Example 2-7
Total Home f
Runs
124 – 145 6
146 – 167 13
168 – 189 4
190 – 211 4
212 - 233 3
75
Solution 2-7
Class
Class Boundaries Cumulative Frequency
Limits
Table1242.14 Cumulative Frequency Distribution of Home
– 145 123.5 to less than 145.5 6
Runs by Baseball Teams
124 – 167 123.5 to less than 167.5 6 + 13 = 19
124 – 189 123.5 to less than 189.5 6 + 13 + 4 = 23
124 – 211 123.5 to less than 211.5 6 + 13 + 4 + 4 = 27
124 – 233 123.5 to less than 233.5 6 + 13 + 4 + 4 + 3 =
30
76
CUMULATIVE FREQUENCY DISTRIBUTIONS cont.
77
Table 2.15 Cumulative Relative Frequency and Cumulative Percentage
Distributions for Home Runs Hit by baseball Teams
Cumulative Cumulative
Class Limits Relative Frequency Percentage
124 – 145 6/30 = .200 20.0
124 – 167 19/30 = .633 63.3
124 – 189 23/30 = .767 76.7
124 – 211 27/30 = .900 90.0
124 - 233 30/30 = 1.00 100.0
78
CUMULATIVE FREQUENCY DISTRIBUTIONS cont.
Definition
An ogive is a curve drawn for the
cumulative frequency distribution by
joining with straight lines the dots
marked above the upper boundaries of
classes at heights equal to the
cumulative frequencies of respective
classes.
79
Figure 2.12 Ogive for the cumulative frequency
distribution in Table 2.14
Cumulative frequency
3
0
5
123.5 145.5 167.5 189.5 211.5 233.5
1 80
Total home runs
STEM-AND-LEAF DISPLAYS
Definition
In a stem-and-leaf display of
quantitative data, each value is divided
into two portions – a stem and a leaf.
The leaves for each stem are shown
separately in a display.
81
Example 2-8
The following are the scores of 30
college students on a statistics test:
75 52 80 96 65 79 71 87 93 95
69 72 81 61 76 86 79 68 50 92
83 84 77 64 71 87 72 92 57 98
82
Solution 2-8
To construct a stem-and-leaf display for
these scores, we split each score into
two parts. The first part contains the
first digit, which is called the stem. The
second part contains the second digit,
which is called the leaf.
83
Solution 2-8
We observe from the data that the
stems for all scores are 5, 6, 7, 8, and 9
because all the scores lie in the range
50 to 98
84
Figure 2.13 Stem-and-leaf display.
Stems
Leaf for 52
5 2
Leaf for 75
6
7 5
8
9
85
Solution 2-8
After we have listed the stems, we read
the leaves for all scores and record
them next to the corresponding stems
on the right side of the vertical line.
86
Figure 2.14 Stem-and-leaf display of test scores.
5 2 0 7
6 5 9 1 8 4
7 5 9 1 2 6 9 7 1
8 2
9 0 7 1 6 3 4 7
6 3 5 2 2 8
87
Figure 2.15 Ranked stem-and-leaf display of test
scores.
5 0 2 7
6 1 4 5 8 9
7 1 1 2 2 5 6 7 9
8 9
9 0 1 3 4 6 7 7
2 2 3 5 6 8
88
Example 2-9
The following data are monthly rents
paid by a sample of 30 households
selected from a small city.
880 108 721 107 102 775 1235 750 965 960
1210 1 123 5 3 825 1000 915 119 103
1151 985 1 932 850 1140 750 114 1 5
630 117 952 110 0 137 128
Construct a stem-and-leaf display for
5 0 0 0
these data.
89
Solution 2-9
6 30
Figure 7 75 50 21 50
2.16 Stem-
and-leaf 8 80 25 50
display of 9 32 52 15 60 85 65
rents.
10 23 81 35 75 00
11 91 51 40 75 40 00
12 10 31 35 80
13 70
90
Example 2-10
The following stem-and-leaf display is
prepared for the number of hours that
25 students spent working on
computers during the last month.
91
Example 2-10
0 6
1 1 7 9
2 2 6
3 2 4 7 8
4 1 5 6 9 9
5 3 6 8
6 2 4 4 5 7
7
8 5 6
0–2 6 * 1 7 9 * 2 6
2 4 7 8 * 1 5 6 9 9 * 3 6
8
3–5 2 4 4 5 7 * * 5 6
6–8
93
Assignment # 1
• Download any large dataset from internet which includes
both quantitative and qualitative variables. Write down the
link of website from where dataset is downloaded (for
verification) and each student have different datasets.
• Describe the dataset with respect to each variable
• Make frequency distributions separately for quantitative and
qualitative variables in R and write down script and results
on simple word page (or can add screenshot).
• Draw all graphs which could be possible for all quantitative
and qualitative variables in R and write down script and
results on simple word page (or can add screenshot).