100% found this document useful (1 vote)
131 views

Lecture 1 Introduction Prob

The document provides information about a course on probability and statistics taught by Dr. Faisal Bukhari at Punjab University College of Information Technology. It lists textbooks and reference books for the course and gives details on readings, distribution of marks, basic concepts covered, and differences between descriptive and inferential statistics.

Uploaded by

Fahad Akeel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
131 views

Lecture 1 Introduction Prob

The document provides information about a course on probability and statistics taught by Dr. Faisal Bukhari at Punjab University College of Information Technology. It lists textbooks and reference books for the course and gives details on readings, distribution of marks, basic concepts covered, and differences between descriptive and inferential statistics.

Uploaded by

Fahad Akeel
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Probability and Statistics

Dr. Faisal Bukhari


Punjab University College of Information Technology
(PUCIT)
Textbooks
Probability & Statistics for Engineers & Scientists,
Ninth Edition, Ronald E. Walpole, Raymond H. Myer

Elementary Statistics: Picturing the World, 6th


Edition, Ron Larson and Betsy Farber

Elementary Statistics, 13th Edition, Mario F. Triola

Dr. Faisal Bukhari, PUCIT, PU, Lahore 2


Reference books
Probability Demystified, Allan G. Bluman
Schaum's Outline of Probability and Statistics

MATLAB Primer, Seventh Edition

MATLAB Demystified by McMahon, David

Dr. Faisal Bukhari, PUCIT, PU, Lahore 3


References
Readings for these lecture notes:
 Elementary Statistics: Picturing the World, 6th Edition,
Ron Larson and Betsy Farber
Probability & Statistics for Engineers & Scientists, Ninth
Edition, Ronald E. Walpole, Raymond H. Myer
 Probability Demystified, Allan G. Bluman
 Practical Statistics for Data Scientists: 50 Essential
Concepts, Peter Bruce and Andrew Bruce
 https://www.mymarketresearchmethods.com/types-of-
data-nominal-ordinal-interval-ratio/
 http://www.thefreedictionary.com/statistics

These notes contain material from the above three


resources.
Dr. Faisal Bukhari, PUCIT, PU, Lahore 4
Distribution of marks
Mid term = 35 points

Final term = 40 points

Sessional marks = 25 points


I. Assignments = 2 × 2 = 4 points

II. Quizzes = 4 × 4 = 16 points

III. A survey based project presentation = 5


points
Dr. Faisal Bukhari, PUCIT, PU, Lahore 5
Basic concepts [1]
Statistics is defined as
“The mathematics of the collection, organization, and
interpretation of numerical data, especially the analysis
of population characteristics by inference from sampling”
OR
Statistics is a science which deals with collection,
classification, distribution and interpretation of data.
OR
Statistics is a science of uncertainty.
OR
Statistics is the science of collecting, organizing, analyzing,
and interpreting data in order to make decisions.
Dr. Faisal Bukhari, PUCIT, PU, Lahore 6
Data sets
Data consist of information coming from
observations, counts, measurements, or responses.

Statistics is the science of collecting, organizing,


analyzing, and interpreting data in order to make
decisions.
There are two types of data sets you will use when
studying statistics. These data sets are called
populations and samples.

A population is the collection of all outcomes,


responses, measurements, or counts that are of
interest.

A sample is a subset, or part, of a population.


Dr. Faisal Bukhari, PUCIT, PU, Lahore 7
Identifying Data Sets
In a recent survey, 614 small business owners in the
United States were asked whether they thought their
company’s Facebook presence was valuable. Two
hundred fifty-eight of the 614 respondents said yes.
Identify the population and the sample. Describe the
sample data set.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 8


Solution:
The population consists of the responses of all
small business owners in the United States, and the
sample consists of the responses of the 614 small
business owners in the survey.

 Notice that the sample is a subset of the


responses of all small business owners in the
United States. The sample data set consists of 258
owners who said yes and 356 owners who said no.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 9


Descriptive Statistics vs. Inferential
Statistics
The study of statistics has two major branches:
descriptive statistics and inferential statistics.

Descriptive statistics is the branch of statistics that


involves the organization, summarization, and
display of data.

Inferential statistics is the branch of statistics that


involves using a sample to draw conclusions about
a population. A basic tool in the study of inferential
statistics is probability.
Dr. Faisal Bukhari, PUCIT, PU, Lahore 10
Descriptive and Inferential Statistics
Example :Determine which part of the study represents
the descriptive branch of statistics. What conclusions
might be drawn from the study using inferential
statistics?
1. A large sample of men, aged 48, was studied for 18
years. For unmarried men, approximately 70% were
alive at age 65. For married men, 90% were alive at
age 65. (Source: The Journal of Family Issues)

2. In a sample of Wall Street analysts, the percentage


who incorrectly forecasted high-tech earnings in a recent
year was 44%. (Source: Bloomberg News)

Dr. Faisal Bukhari, PUCIT, PU, Lahore 11


Solution:
1.Descriptive statistics involves statements such as
“For unmarried men, approximately 70% were alive
at age 65” and “For married men, 90% were alive at
age 65.” Also, the figure represents the descriptive
branch of statistics. A possible inference drawn from
the study is that being married is associated with a
longer life for men.
2.The part of this study that represents the
descriptive branch of statistics involves the statement
“the percentage [of Wall Street analysts] who
incorrectly forecasted high-tech earnings in a recent
year was 44%.” A possible inference drawn from the
study is that the stock market is difficult to forecast,
even for professionals.
Dr. Faisal Bukhari, PUCIT, PU, Lahore 12
Parameter vs. Statistic
A parameter is a numerical description of a
population characteristic.

A statistic is a numerical description of a sample


characteristic.
Distinguishing Between a Parameter and a
Statistic
Example: Determine whether the numerical value describes a
population parameter or a sample statistic. Explain your
reasoning.
1. A recent survey of approximately 400,000 employers
reported that the average starting salary for marketing majors
is $53,400. (Source: National Association of Colleges and
Employers)
2. The freshman class at a university has an average SAT math
score of 514.
3. In a random check of 400 retail stores, the Food and Drug
Administration found
that 34% of the stores were not storing fish at the proper
temperature.
Dr. Faisal Bukhari, PUCIT, PU, Lahore 14
Solution
1. Because the average of $53,400 is based on a
subset of the population, it is a sample statistic.

2. Because the average SAT math score of 514 is


based on the entire freshman class, it is a population
parameter.

3. Because the percent, 34%, is based on a subset of


the population, it is a sample statistic.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 15


Types of Data
Data sets can consist of two types of data: qualitative
data and quantitative data.

Qualitative data consist of attributes, labels, or


nonnumerical entries.

Quantitative data consist of numerical


measurements or counts.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 16


Classifying Data by Type
Example: The suggested retail prices of several Honda
vehicles are shown in the table. Which data are
qualitative data and which are quantitative data?
Explain your reasoning. (Source: American Honda
Motor Company, Inc.)

Dr. Faisal Bukhari, PUCIT, PU, Lahore 17


Solution
The information shown in the table can be
separated into two data sets. One data set contains
the names of vehicle models, and the other
contains the suggested retail prices of vehicle
models.

 The names are nonnumerical entries, so these


are qualitative data.

The suggested retail prices are numerical entries,


so these are quantitative data.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 18


Levels of Measurement
Another characteristic of data is its level of
measurement. The level of measurement
determines which statistical calculations are
meaningful.

The four levels of measurement, in order from


lowest to highest, are nominal, ordinal, interval,
and ratio.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 19


Nominal vs Ordinal
Data at the nominal level of measurement are
qualitative only. Data at this level are categorized
using names, labels, or qualities. No mathematical
computations can be made at this level.

Data at the ordinal level of measurement are


qualitative or quantitative. Data at this level can be
arranged in order, or ranked, but differences
between data entries are not meaningful.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 20


Example
Two data sets are shown. Which data set consists of
data at the nominal level? Which data set consists of
data at the ordinal level? Explain your reasoning.
(Source: The Numbers)

Dr. Faisal Bukhari, PUCIT, PU, Lahore 21


Solution
The first data set lists the ranks of five movies. The
data set consists of the ranks 1, 2, 3, 4, and 5.
Because the ranks can be listed in order, these data
are at the ordinal level. Note that the difference
between a rank of 1 and 5 has no mathematical
meaning.

The second data set consists of the names of


movie genres. No mathematical computations can
be made with the names and the names cannot be
ranked, so these data are at the nominal level.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 22


Interval vs. Ratio
Data at the interval level of measurement can be
ordered, and meaningful differences between data
entries can be calculated. At the interval level, a zero
entry simply represents a position on a scale; the
entry is not an inherent zero.

Data at the ratio level of measurement are similar to


data at the interval level, with the added property
that a zero entry is an inherent zero. A ratio of two
data entries can be formed so that one data entry
can be meaningfully expressed as a multiple of
another.
Dr. Faisal Bukhari, PUCIT, PU, Lahore 23
An inherent zero is a zero that implies “none.” For instance,
the amount of money you have in a savings account could be
zero dollars. In this case, the zero represents no money; it is
an inherent zero. On the other hand, a temperature of 0°C
does not represent a condition in which no heat is present.
The 0°C temperature is simply a position on the Celsius
scale; it is not an inherent zero.

To distinguish between data at the interval level and at the


ratio level, determine whether the expression “twice as
much” has any meaning in the context of the data.

 For instance, $2 is twice as much as $1, so these data are


at the ratio level. On the other hand, 2°C is not twice as
warm as 1°C, so these data are at the interval level.
Dr. Faisal Bukhari, PUCIT, PU, Lahore 24
Classifying Data by Level
Example: Two data sets are shown at below. Which data set
consists of data at the interval level? Which data set consists
of data at the ratio level? Explain your reasoning. (Source:
Major League Baseball)

Dr. Faisal Bukhari, PUCIT, PU, Lahore 25


Solution
Both of these data sets contain quantitative data. Consider
the dates of the Yankees’ World Series victories. It makes
sense to find differences between specific dates. For
instance, the time between the Yankees’ first and last World
Series victories is 2009 - 1923 = 86 years. But it does not
make sense to say that one year is a multiple of another.
So, these data are at the interval level.

However, using the home run totals, you can find


differences and write ratios. From the data, you can see that
Baltimore hit 39 more home runs than Tampa Bay hit and
that New York hit about 1.5 times as many home runs as
Detroit hit. So, these data are at the ratio level.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 26


The tables below summarize which operations
are meaningful at each of the four levels of
measurement.
When identifying a data set’s level of measurement, use the
highest level that applies.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 27


Summary of Four Levels of Measurement

Dr. Faisal Bukhari, PUCIT, PU, Lahore 28


Key Terms for Data Types
Continuous
• Data that can take on any value in an interval.
• Synonyms: interval, float, numeric

Discrete
• Data that can only take on integer values, such as
counts.
• Synonyms: integer, count

Dr. Faisal Bukhari, PUCIT, PU, Lahore 29


Key Terms for Data Types
Categorical
• Data that can only take on a specific set of
values.
• Example: Sex, type of chocolate, color
• Synonyms: enums, enumerated, factors, nominal,
polychotomous
Binary
• A special case of categorical with just two
categories (0/1, True, False).
• Synonyms: dichotomous, logical, indicator
Ordinal
• Categorical data that has an explicit ordering.
• Synonyms: ordered factor
Dr. Faisal Bukhari, PUCIT, PU, Lahore 30
Data Types
Binary data is an important special case of
categorical data that takes on only one of two
values, such as 0/1, yes/no or true/false.
Synonyms: dichotomous, logical, indicator

Ordinal
• Categorical data that has an explicit ordering.
• Synonyms: ordered factor
An example of this is a numerical rating (1, 2, 3, 4,
or 5)

Dr. Faisal Bukhari, PUCIT, PU, Lahore 31


Data Types
There are two basic types of structured data:
numeric and categorical.

Numeric data comes in two forms: continuous,


such as wind speed or time duration, and discrete,
such as the count of the occurrence of an event.

Categorical data takes only a fixed set of values,


such as a type of TV screen (plasma, LCD, LED, …) or
a state name (Alabama, Alaska, …).

Dr. Faisal Bukhari, PUCIT, PU, Lahore 32


Nominal scales
oNominal scales are used for labeling variables, without any
quantitative value.
o “Nominal” scales could simply be called “labels.”
o Here are some examples, below. Notice that all of these
scales are mutually exclusive (no overlap) and none of them
have any numerical significance.
o A good way to remember all of this is that “nominal”
sounds a lot like “name” and nominal scales are kind of like
“names” or labels.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 33


Nominal scale example

oType of chocolate
• Dark(1)
• Milk(2)
• White (3)
oSex
• Male(0)
• Female(1)
oColor
• Red(1)
• Green(2)
• Blue(3)
• Yellow(4)

Dr. Faisal Bukhari, PUCIT, PU, Lahore 34


Ordinal scale
oWith ordinal scales, it is the order of the values is what’s
important and significant, but the differences between each
one is not really known.
oTake a look at the example on below. In each case, we know
that option 4 is better than option 3 or option 2, but we
don’t know–and cannot quantify–how much better it is.
o For example, is the difference between “OK” and
“Unhappy” the same as the difference between “Very
Happy” and “Happy” ? We can’t say.
oOrdinal scales are typically measures of non-numeric
concepts like satisfaction, happiness, discomfort, etc.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 35


Ordinal Scale Example
o“Ordinal” is easy to remember because is sounds
like “order” and that’s the key to remember with
“ordinal scales”–it is the order that matters, but
that’s all you really get from these.
oAdvanced note: The best way to determine central
tendency on a set of ordinal data is to use the mode
or median; the mean cannot be defined from an
ordinal set.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 36


Key Ideas

Data are typically classified in software by their


type

Data types include continuous, discrete,


categorical (which includes binary), and ordinal

Data-typing in software acts as a signal to the


software on how to process the data

Dr. Faisal Bukhari, PUCIT, PU, Lahore 37


Suggested Readings
Elementary Statistics: Picturing the World, 6th
Edition, Ron Larson and Betsy Farber
1.1 An Overview of Statistics
1.2 Data Classification

Dr. Faisal Bukhari, PUCIT, PU, Lahore 38

You might also like