0% found this document useful (0 votes)
7 views

AEM Lecture 1

The Advanced Engineering Mathematics course, taught by Dr. Madiha Liaqat, focuses on mathematical methods essential for applications in Information Technology, covering probability modeling and logic. The grading policy includes midterms, finals, assignments, quizzes, and a project presentation. The course also emphasizes the importance of statistics and probability in real-world applications, providing foundational knowledge for various fields.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

AEM Lecture 1

The Advanced Engineering Mathematics course, taught by Dr. Madiha Liaqat, focuses on mathematical methods essential for applications in Information Technology, covering probability modeling and logic. The grading policy includes midterms, finals, assignments, quizzes, and a project presentation. The course also emphasizes the importance of statistics and probability in real-world applications, providing foundational knowledge for various fields.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

Advanced Engineering

Mathematics
Instructor: Dr. Madiha Liaqat
About Course

 The aim of this course is to introduce and develop mathematical methods

that are key to many modern applications in Information Technology. The

course proceeds on two fronts: (i) probability modeling techniques that allow

stochastic systems and algorithms to be described and better understood and

(ii) Logic and its types which are used in knowledge representation.

 The style of the course is necessarily concise but will attempt to blend a mix

of theory with examples that glimpse ahead at applications developed in

other courses.
Books

 Probability and Statistics for Engineers by Richard Johnson

 OpenStax: Statistics for High School (online available)

 Discrete Mathematics and Its Applications, Author: Kenneth Rosen


Grading Policy

 20% Midterm exam.


 40% Final Exam.
 10% Assignments.
 10% Quizzes.
 20% Project/Presentation
 Students will be required to present a project in using Mathematical Methods
in Computer Science/Software Engineering related problem. Objective of the
project is to produce some applicable results.
MS Teams

 Team Title: AEM 2024

 Team Code: teuiph1

 Join the team for Lectures and Assignments


Course Description
Week 1-8:

 Probability of an Event, Additive Rules, Conditional Probability and Multiplicative Rules,


Concept of a Random Variable and Discrete Probability Distributions, Continuous Probability
Distributions, Mean of a Single Random Variable and variance, Means of linear Combinations
Binomial Distribution and Poisson Distributions, Continuous Uniform Distribution, Normal
Distribution, Applications of the Normal Distribution, Chi-Squared distribution, ANOVA.

Week:9-16

 Mathematical Modeling, Propositional Logic and its Syntax, Notions of satisfiability, validity,
inconsistency. First Order Logic, Syntax, Semantics and Applications. Fractals and its
applications. Interval Analysis and its applications in modeling. Some advanced topics.
Outline of Today’s Lecture

Introduction
 Definitions of Statistics, Probability, and Key Terms
 Data, Sampling, and Variation in Data and Sampling
 Frequency, Frequency Tables
 Experimental Design and Ethics
We encounter statistics in our daily lives more often than we probably realize and from
many different sources, like the news. (David Sim)
Introduction

 You are probably asking yourself the question, "When and where will I use
statistics?"

 If you read any newspaper, watch television, or use the Internet, you will see
statistical information. There are statistics about crime, sports, education,
politics, and real estate.

 Typically, when you read a newspaper article or watch a television news


program, you are given sample information. With this information, you may
make a decision about the correctness of a statement, claim, or fact.

 Statistical methods can help you make the best educated guess.
Introduction

 Since you will undoubtedly be given statistical information at some


point in your life, you need to know some techniques for analyzing the
information thoughtfully.

 Think about buying a house or managing a budget. Think about your


chosen profession. The fields of economics, business, psychology,
education, biology, law, computer science, police science, and early
childhood development require at least one course in statistics.
Included in this lecture are the basic ideas and words of probability and statistics. You
will soon understand that statistics and probability work together. You will also learn
how data are gathered and what good data can be distinguished from bad.
Statistics

 The science of statistics deals with the collection, analysis,


interpretation, and presentation of data. We see and use data in our
everyday lives.

Exercise:
 Write down the average time—in hours, to the nearest half-hour—you
sleep per night.
 Now create a simple graph, called a dot plot, of the data. A dot plot
consists of a number line and dots, or points, positioned above the
number line.
Statistics
 For example, consider the following data:
5, 5.5, 6, 6, 6, 6.5, 6.5, 6.5, 6.5, 7, 7, 8, 8, 9.
 The dot plot for this data would be as follows:

 Where do your data appear to cluster? How might you interpret the
clustering?
 The questions above ask you to analyze and interpret your data.
Statistics
 You will learn how to organize and summarize data. Organizing and
summarizing data is called descriptive statistics. Two ways to summarize data
are by graphing and by using numbers, for example, finding an average.
 After you have studied probability and probability distributions, you will use
formal methods for drawing conclusions from good data. The formal methods
are called inferential statistics. Statistical inference uses probability to
determine how confident we can be that our conclusions are correct.
 Effective interpretation of data, or inference, is based on good procedures for
producing data and thoughtful examination of the data. The goal of statistics is
not to perform numerous calculations using the formulas, but to gain an
understanding of your data. If you can thoroughly grasp the basics of statistics,
you can be more confident in the decisions you make in life.
Statistical Models

 Statistics, like all other branches of mathematics, uses mathematical


models to describe phenomena that occur in the real world.
 Some mathematical models are deterministic. These models can be
used when one value is precisely determined from another value.
 Examples of deterministic models are the quadratic equations that
describe the acceleration of a car from rest or the differential
equations that describe the transfer of heat from a stove to a pot.
These models are quite accurate and can be used to answer questions
and make predictions with a high degree of precision. Space agencies,
for example, use deterministic models to predict the exact amount of
thrust that a rocket needs to break away from Earth’s gravity and
achieve orbit.
Statistical Models

 However, life is not always precise. While scientists can predict to the
minute the time that the sun will rise, they cannot say precisely where a
hurricane will make landfall.
 Statistical models are very useful because they can describe the
probability or likelihood of an event occurring and provide alternative
outcomes if the event does not occur.
 For example, weather forecasts are examples of statistical models.
Meteorologists cannot predict tomorrow’s weather with certainty.
However, they often use statistical models to tell you how likely it is to
rain at any given time, and you can prepare yourself based on this
probability.
Probability

 Probability is a mathematical tool used to study randomness. It deals


with the chance of an event occurring.
 For example, if you toss a fair coin four times, the outcomes may not
be two heads and two tails. However, if you toss the same coin 4,000
times, the outcomes will be close to half heads and half tails.
 The expected theoretical probability of heads in any one toss is ½ or
0.5. Even though the outcomes of a few repetitions are uncertain,
there is a regular pattern of outcomes when there are many
repetitions. After reading about the English statistician Karl Pearson
who tossed a coin 24,000 times with a result of 12,012 heads, one of
the authors tossed a coin 2,000 times. The results were 996 heads.
The fraction 9962,000 is equal to 0.498 which is very close to 0.5,
the expected probability.
Probability

 The theory of probability began with the study of games of chance


such as poker. Predictions take the form of probabilities.
 To predict the likelihood of an earthquake, of rain, or whether you
will get an A in this course, we use probabilities.
 Doctors use probability to determine the chance of a vaccination
causing the disease the vaccination is supposed to prevent.
 A stockbroker uses probability to determine the rate of return on a
client's investments.
Deciding which data approach is best relies on the
underlying target business goal. If the goal is to identify
actual buyers of a product for marketing or outreach
purposes, deterministic data is the best option. However, if
the goal is to convert new customers that may be
interested in the product, probabilistic data can be of help.
Key Terms

 In statistics, we generally want to study a population.


 You can think of a population as a collection of persons, things, or
objects under study. To study the population, we select a sample.
 The idea of sampling is to select a portion, or subset, of the larger
population and study that portion—the sample—to gain information
about the population.
 Data are the result of sampling from a population.
Key Terms

 Because it takes a lot of time and money to examine an entire


population, sampling is a very practical technique.
 If you wished to compute the views of people for upcoming elections
in your country then you can take opinion samples of 1,000–2,000
people. The opinion poll is supposed to represent the views of the
people in the entire country.
 Manufacturers of canned carbonated drinks take samples to determine
if a 16-ounce can contains 16 ounces of carbonated drink.
Key Terms

From the sample data, we can calculate a statistic.


 A statistic is a number that represents a property of the sample.
 For example, if we consider one math class as a sample of the population
of all math classes, then the average number of points earned by
students in that one math class at the end of the term is an example of a
statistic. Since we do not have the data for all math classes, that statistic
is our best estimate of the average for the entire population of math
classes. If we happen to have data for all math classes, we can find the
population parameter.
 A parameter is a numerical characteristic of the whole population that
can be estimated by a statistic. Since we considered all math classes to
be the population, then the average number of points earned per student
over all the math classes is an example of a parameter.
Key Terms

 One of the main concerns in the field of statistics is how accurately a


statistic estimates a parameter.
 In order to have an accurate sample, it must contain the characteristics
of the population in order to be a representative sample.
 We are interested in both the sample statistic and the population
parameter in inferential statistics.
 In a later chapter, we will use the sample statistic to test the validity of
the established population parameter.
Key Terms
 A variable, usually notated by capital letters such as X and Y, is a
characteristic or measurement that can be determined for each member
of a population.
 Variables may describe values like weight in pounds or favorite subject in
school.
 Numerical variables take on values with equal units such as weight in
pounds and time in hours.
 Categorical variables place the person or thing into a category.
 If we let X equal the number of points earned by one math student at the
end of a term, then X is a numerical variable. If we let Y be a person's
party affiliation, then some examples of Y include Republican, Democrat,
and Independent. Y is a categorical variable. We could do some math
with values of X—calculate the average number of points earned, for
example—but it makes no sense to do math with values of Y—calculating
an average party affiliation makes no sense.
Key Terms
 Data are the actual values of the variable. They may be numbers or they
may be words. Datum is a single value.
 Two words that come up often in statistics are mean and proportion.
 If you were to take three exams in your math classes and obtain scores of
86, 75, and 92, you would calculate your mean score by adding the three
exam scores and dividing by three. Your mean score would be 84.3 to one
decimal place. If, in your math class, there are 40 students and 22 are
males and 18 females, then the proportion of men students is 22/40 and
the proportion of women students is 18/40
 Mean and proportion are discussed in more detail in later chapters.

The words mean and average are often used interchangeably.


Exercise
 Determine what the population, sample, parameter, statistic, variable,
and data referred to in the following study.

We want to know the mean amount of extracurricular activities in which


high school students participate. We randomly surveyed 100 high school
students. Three of those students were in 2, 5, and 7 extracurricular
activities, respectively.
Solution

 The population is all high school students.


 The sample is the 100 high school students interviewed.
 The parameter is the mean amount of extracurricular activities in which all
high school students participate.
 The statistic is the mean amount of extracurricular activities in which the
sample of high school students participate.
 The variable could be the amount of extracurricular activities by one high
school student. Let X = the amount of extracurricular activities by one high
school student.
 The data are the number of extracurricular activities in which the high school
students participate. Examples of the data are 2, 5, 7.
Exercise
Determine what the key terms refer to in the following study.
 A study was conducted at a local high school to analyze the average cumulative
GPAs of students who graduated last year. Fill in the letter of the phrase that best
describes each of the items below.
1. Population ____ 2. Statistic ____ 3. Parameter ____ 4. Sample ____ 5. Variable ____
6. Data ____
a) all students who attended the high school last year
b) the cumulative GPA of one student who graduated from the high school last year
c) 3.65, 2.80, 1.50, 3.90
d) a group of students who graduated from the high school last year, randomly selected
e) the average cumulative GPA of students who graduated from the high school last
year
f) all students who graduated from the high school last year
g) the average cumulative GPA of students in the study who graduated from the high
school last year
Solution
Determine what the key terms refer to in the following study.
 A study was conducted at a local high school to analyze the average cumulative
GPAs of students who graduated last year. Fill in the letter of the phrase that best
describes each of the items below.
1. Population f 2. Statistic g 3. Parameter e 4. Sample d 5. Variable b 6. Data c
a) all students who attended the high school last year
b) the cumulative GPA of one student who graduated from the high school last year
c) 3.65, 2.80, 1.50, 3.90
d) a group of students who graduated from the high school last year, randomly selected
e) the average cumulative GPA of students who graduated from the high school last
year
f) all students who graduated from the high school last year
g) the average cumulative GPA of students in the study who graduated from the high
school last year
Exercise
Determine what the population, sample, parameter, statistic, variable, and
data referred to in the following study.
 As part of a study designed to test the safety of automobiles, the National
Transportation Safety Board collected and reviewed data about the effects
of an automobile crash on test dummies (The Data and Story Library, n.d.).
Here is the criterion they used.

 Cars with dummies in the front seats were crashed into a wall at a speed of
35 miles per hour. We want to know the proportion of dummies in the
driver’s seat that would have had head injuries, if they had been actual
drivers. We start with a simple random sample of 75 cars.
Solution

 The population is all cars containing dummies in the front seat.


 The sample is the 75 cars, selected by a simple random sample.
 The parameter is the proportion of driver dummies—if they had been
real people—who would have suffered head injuries in the population.
 The statistic is proportion of driver dummies—if they had been real
people—who would have suffered head injuries in the sample.
 The variable X = whether driver dummies—if they had been real
people—would have suffered head injuries.
 The data are either: yes, had head injury, or no, did not.
Practice

Determine what the population, sample, parameter, statistic, variable,


and data referred to in the following study.
 An insurance company would like to determine the proportion of all
medical doctors who have been involved in one or more malpractice
lawsuits. The company selects 500 doctors at random from a
professional directory and determines the number in the sample who
have been involved in a malpractice lawsuit.
Data
 Data may come from a population or from a sample. Lowercase
letters like x or y generally are used to represent data values. Most
data can be put into the following categories: Qualitative,
Quantitative
 Qualitative data are the result of categorizing or describing attributes
of a population. Qualitative data are also often called categorical
data. Hair color, blood type, ethnic group, the car a person drives,
and the street a person lives on are examples of qualitative data.
Qualitative data are generally described by words or letters. For
instance, hair color might be black, dark brown, light brown, blonde,
gray, or red. Blood type might be AB+, O–, or B+. Researchers often
prefer to use quantitative data over qualitative data because it lends
itself more easily to mathematical analysis. For example, it does not
make sense to find an average hair color or blood type.
Data
 Quantitative data are always numbers. Quantitative data are the
result of counting or measuring attributes of a population. Amount of
money, pulse rate, weight, number of people living in your town, and
number of students who take statistics are examples of quantitative
data. Quantitative data may be either discrete or continuous.
 All data that are the result of counting are called quantitative
discrete data. These data take on only certain numerical values. If
you count the number of phone calls you receive for each day of the
week, you might get values such as zero, one, two, or three.
 Data that are not only made up of counting numbers, but that may
include fractions, decimals, or irrational numbers, are
called quantitative continuous data. Continuous data are often the
results of measurements like lengths, weights, or times. A list of the
lengths in minutes for all the phone calls that you make in a week,
with numbers like 2.4, 7.5, or 11.0, would be quantitative continuous
data.
Data Sample of Quantitative Discrete
Data
Examples:
 The data are the number of books students carry in their backpacks.
You sample five students. Two students carry three books, one student
carries four books, one student carries two books, and one student
carries one book. The numbers of books, 3, 4, 2, and 1, are the
quantitative discrete data.
 The data are the number of machines in a gym. You sample five gyms.
One gym has 12 machines, one gym has 15 machines, one gym has 10
machines, one gym has 22 machines, and the other gym has 20
machines. What type of data is this?
Data Sample of Quantitative Continuous Data

 The data are the weights of backpacks with books in them. You sample the
same five students. The weights, in pounds, of their backpacks are 6.2, 7,
6.8, 9.1, 4.3. Notice that backpacks carrying three books can have different
weights. Weights are quantitative continuous data.
 You go to the supermarket and purchase three cans of soup (19 ounces tomato
bisque, 14.1 ounces lentil, and 19 ounces Italian wedding), two packages of
nuts (walnuts and peanuts), four different kinds of vegetable (broccoli,
cauliflower, spinach, and carrots), and two desserts (16 ounces pistachio ice
cream and 32 ounces chocolate chip cookies).
Problem
 Name data sets that are quantitative discrete, quantitative continuous, and
qualitative.
Solution

A possible solution

 One example of a quantitative discrete data set would be three cans of soup,
two packages of nuts, four kinds of vegetables, and two desserts because you
count them.

 The weights of the soups (19 ounces, 14.1 ounces, 19 ounces) are quantitative
continuous data because you measure weights as precisely as possible.

 Types of soups, nuts, vegetables, and desserts are qualitative data because
they are categorical.
Exercise
Work collaboratively to determine the correct data type: quantitative or
qualitative. Indicate whether quantitative data are continuous or discrete.
Hint: Data that are discrete often start with the words the number of.
 a. the number of pairs of shoes you own
 b. the type of car you drive
 c. the distance from your home to the nearest grocery store
 d. the number of classes you take per school year
 e. the type of calculator you use
 f. weights of sumo wrestlers
 g. number of correct answers on a quiz
 h. IQ scores
Items a, d, and g are quantitative discrete; items c, f, and h are quantitative
continuous; items b and e are qualitative or categorical.
Exercise

 A statistics professor collects information about the classification of


her students as freshmen, sophomores, juniors, or seniors. The data
she collects are summarized in the pie chart. What type of data does
this graph show?

This pie chart shows the students in each year, which


is qualitative or categorical data.
Exercise

 A large school district keeps data of the scores students earn on an end of the
year standardized exam. The data he collects are summarized in the
histogram. The class boundaries are 50 to less than 60, 60 to less than 70, 70
to less than 80, 80 to less than 90, and 90 to less than 100.

What type of data does


this graph show?
Qualitative Data Discussion

 The data in Table compares the number of part-time and full-time


students at De Anza College and Foothill College enrolled for the spring
2010 quarter.

Tables are a good way of organizing and displaying data. But graphs can be
even more helpful in understanding the data.
Qualitative Data Discussion

 Two graphs that are used to display qualitative data are pie charts and
bar graphs.
 In a pie chart, categories of data are shown by wedges in a circle that
represent the percent of individuals/items in each category. We use
pie charts when we want to show parts of a whole.
 In a bar graph, the length of the bar for each category represents the
number or percent of individuals in each category. Bars may be
vertical or horizontal. We use bar graphs when we want to compare
categories or show changes over time.
 A Pareto chart consists of bars that are sorted into order by category
size (largest to smallest).
Determine which graph (pie or bar) you think displays the
comparisons better.
Omitting Categories/Missing Data
The table displays Ethnicity of Students but is missing the Other/Unknown category.

Bar graph Pareto chart


Pie charts: no missing data
Two-way table
A two-way table, also called a contingency table, showing the favorite sports for
50 adults: 20 women and 30 men.

Data of this type (two variable data) are referred to as bivariate data. Because the data
represent a count, or tally, of choices, it is a two-way frequency table. The entries in
the total row and the total column represent marginal frequencies or marginal
distributions.

Note—The term marginal distributions gets its name from the fact that the distributions
are found in the margins of frequency distribution tables. Marginal distributions may be
given as a fraction or decimal: For example, the total for men could be given as .6 or
3/5 since 30/50 = 0.6 = 3/5
Two-way table

 Marginal distributions require bivariate data and only focus on one of


the variables represented in the table. In other words, the reason 20
is a marginal frequency in this two-way table is because it represents
the margin or portion of the total population that is women (20/50).
The reason 25 is a marginal frequency is because it represents the
portion of those sampled who favor football (25/50). Note: The values
that make up the body of the table (e.g., 20, 8, 2) are called joint
frequencies.
Conditional distributions in Two Way
Tables
 The distinction between a marginal distribution and a conditional
distribution is that the focus is on only a particular subset of the
population (not the entire population).

For Example:
The subpopulation of football players who are women is 5/25 which is .2.
How to find the subpopulation of women who play football???
Sampling

 Gathering information about an entire population often costs too


much or is virtually impossible. Instead, we use a sample of the
population. A sample should have the same characteristics as the
population it is representing. Most statisticians use various methods
of random sampling in an attempt to achieve this goal. The easiest
method to describe is called a simple random sample. In a simple
random sample, each group has the same chance of being selected.
 Other well-known random sampling methods are the stratified
sample, the cluster sample, and the systematic sample.
Stratified sample
 To choose a stratified sample, divide the population into groups called
strata and then the sample is selected by picking the same number of
values from each strata until the desired sample size is reached.
 For example, you could stratify (group) your high school student
population by year (freshmen, sophomore, juniors, and seniors) and
then choose a proportionate simple random sample from each stratum
(each year) to get a stratified random sample. To choose a simple
random sample from each year, number each student of the first year,
number each student of the second year, and do the same for the
remaining years. Then use simple random sampling to choose
proportionate numbers of students from the first year and do the
same for each of the remaining years. Those numbers picked from the
first year, picked from the second year, and so on represent the
students who make up the stratified sample.
Cluster Sample

 In cluster sampling, the population is divided into clusters, which are


usually naturally occurring groups (e.g., neighborhoods, schools, or
cities). Each cluster should ideally be heterogeneous (i.e., diverse),
representing the population as a whole. Instead of sampling
individuals from each cluster, entire clusters are randomly selected,
and all individuals within the chosen clusters are surveyed.
 For example, If you're studying the quality of education in a large city,
you might divide the schools into clusters and randomly select a few
schools (clusters) to survey every student in those selected schools.
Systematic sample

 To choose a systematic sample, establish and follow a rule. The most


common way to select a systematic sample is to list the members of
the population and choose every nth entry from a random starting
point.
 For example, suppose you have 100,000 individuals in your population
and you want to choose a sample of 1,000. Use a random number
generator to select your starting point. Now, 100,000/1,000 = 100, so
to ensure coverage throughout the list, choose every 100th entry in
the list. When you reach the end of the list, continue the count from
the beginning until you have selected the complete sample.
Sampling

 Sampling data should be done very carefully. Collecting data


carelessly can have devastating results.
 In statistics, a sampling bias is created when a sample is collected
from a population and some members of the population are not as
likely to be chosen as others. Remember, each member of the
population should have an equally likely chance of being chosen.
When a sampling bias happens, there can be incorrect conclusions
drawn about the population that is being studied. For instance, if a
survey of all students is conducted only during noon lunchtime hours is
biased. This is because the students who do not have a noon
lunchtime would not be included.
Critical Evaluation

We need to evaluate the statistical studies we read about critically and


analyze them before accepting the results of the studies. Common
problems to be aware of include the following:
 Problems with samples: —A sample must be representative of the
population. A sample that is not representative of the population is
biased. Biased samples that are not representative of the population
give results that are inaccurate and not reliable. Reliability in
statistical measures must also be considered when analyzing data.
Reliability refers to the consistency of a measure. A measure is
reliable when the same results are produced given the same
circumstances.
 Self-selected samples—Responses only by people who choose to
respond, such as internet surveys, are often unreliable.
Critical Evaluation

 Sample size issues—: Samples that are too small may be unreliable.
Larger samples are better, if possible. In some situations, having small
samples is unavoidable and can still be used to draw conclusions.
Examples include crash testing cars or medical testing for rare
conditions.
 Undue influence—: collecting data or asking questions in a way that
influences the response.
 Non-response or refusal of subject to participate: —The collected
responses may no longer be representative of the population. Often,
people with strong positive or negative opinions may answer surveys,
which can affect the results.
Exercise
 Suppose ABC college has 10,000 upperclassman (junior and senior level)
students (the population). We are interested in the average amount of money
an upperclassmen spends on books in the fall term. Asking all 10,000
upperclassmen is an almost impossible task.
 Suppose we take two different samples.
 First, we use convenience sampling and survey ten upperclassman students
from a first term organic chemistry class. Many of these students are taking
first term calculus in addition to the organic chemistry class. The amount of
money they spend on books is as follows: $128, $87, $173, $116, $130, $204,
$147, $189, $93, $153.
 The second sample is taken using a list of seniors who take P.E. classes and
taking every fifth senior on the list, for a total of ten seniors. They spend the
following:
 $50, $40, $36, $15, $50, $100, $40, $53, $22, $22.
 It is unlikely that any student is in both samples.
Exercise--Problem

 Do you think that either of these samples is representative


of (or is characteristic of) the entire 10,000 part-time
student population?
 Since these samples are not representative of the entire
population, is it wise to use the results to describe the
entire population?
Exercise

 Now, suppose we take a third sample. We choose ten different part-


time students from the disciplines of chemistry, math, English,
psychology, sociology, history, nursing, physical education, art, and
early childhood development. We assume that these are the only
disciplines in which part-time students at ABC College are enrolled
and that an equal number of part-time students are enrolled in each
of the disciplines. Each student is chosen using simple random
sampling. Using a calculator, random numbers are generated and a
student from a particular discipline is selected if he or she has a
corresponding number. The students spend the following amounts:
 $180, $50, $150, $85, $260, $75, $180, $200, $200, $150.
Problem
Is the sample biased?
Variation in Data
 Variation is present in any set of data. For example, 16-ounce cans of
beverage may contain more or less than 16 ounces of liquid. In one
study, eight 16 ounce cans were measured and produced the following
amount (in ounces) of beverage:
15.8, 16.1, 15.2, 14.8, 15.8, 15.9, 16.0, 15.5.
 Measurements of the amount of beverage in a 16-ounce can may vary
because different people make the measurements or because the
exact amount, 16 ounces of liquid, was not put into the cans.
Manufacturers regularly run tests to determine if the amount of
beverage in a 16-ounce can falls within the desired range.
 Be aware that as you take data, your data may vary somewhat from
the data someone else is taking for the same purpose. This is
completely natural. However, if two or more of you are taking the
same data and get very different results, it is time for you and the
others to reevaluate your data-taking methods and your accuracy.
Variation in Samples

 It was mentioned previously that two or more samples from the


same population, taken randomly, and having close to the same
characteristics of the population will likely be different from each other.
 Suppose Doreen and Jung both decide to study the average amount of
time students at their high school sleep each night. Doreen and Jung
each take samples of 500 students. Doreen uses systematic sampling and
Jung uses cluster sampling. Doreen's sample will be different from Jung's
sample. Even if Doreen and Jung used the same sampling method, in all
likelihood their samples would be different. Neither would be wrong,
however.
Size of a sample

 The size of a sample (often called the number of observations) is


important. The examples you have seen in this book so far have been
small. Samples of only a few hundred observations, or even smaller,
are sufficient for many purposes. In polling, samples that are from
1,200–1,500 observations are considered large enough and good
enough if the survey is random and is well done. You will learn why
when you study confidence intervals.
 Be aware that many large samples are biased. For example, internet
surveys are invariably biased, because people choose to respond or
not.
Frequency, Relative Frequency and
Cumulative Frequency

Frequency Table of Soccer Player Height


Exercise
Problem
Answer the following questions:
a.What is the frequency of deaths measured
from 2006 through 2009?
b.What percentage of deaths occurred after
2009?
c.What is the relative frequency of deaths
that occurred in 2003 or earlier?
d.What is the percentage of deaths that
occurred in 2004?
e.What kind of data are the numbers of
deaths?
f.The Richter scale is used to quantify the
energy produced by an earthquake.
Examples of Richter scale numbers are 2.3,
4.0, 6.1, and 7.0. What kind of data are
these numbers?
Experimental Design

 The purpose of an experiment is to investigate the relationship between two


variables. In an experiment, there is the explanatory variable which affects
the response variable. In a randomized experiment, the researcher manipulates
the explanatory variable and then observes the response variable. Each value of
the explanatory variable used in an experiment is called a treatment.
 You want to investigate the effectiveness of vitamin E in preventing disease. You
recruit a group of subjects and ask them if they regularly take vitamin E. You
notice that the subjects who take vitamin E exhibit better health on average
than those who do not. Does this prove that vitamin E is effective in disease
prevention? It does not. There are many differences between the two groups
compared in addition to vitamin E consumption. People who take vitamin E
regularly often take other steps to improve their health: exercise, diet, other
vitamin supplements. Any one of these factors could be influencing health. As
described, this study does not prove that vitamin E is the key to disease
prevention.
Exercise

 Researchers want to investigate whether taking The population is men aged


aspirin regularly reduces the risk of a heart attack.
50 to 84.
400 men between the ages of 50 and 84 are recruited
The sample is the 400 men
as participants. The men are divided randomly intowho participated.
two groups: one group will take aspirin, and the other
The experimental units are
group will take a placebo. Each man takes one pillthe individual men in the study.
each day for three years, but he does not know The explanatory variable is
whether he is taking aspirin or the placebo. At the
oral medication.
end of the study, researchers count the number of The treatments are aspirin
men in each group who have had heart attacks. and a placebo.
 Identify the following values for this study:
The response variable is
population, sample, experimental units, explanatory whether a subject had a heart
variable, response variable, treatments. attack.
Ethics

 The widespread misuse and misrepresentation of statistical


information often gives the field a bad name. Some say that “numbers
don’t lie,” but the people who use numbers to support their claims
often do.
 A recent investigation of famous social psychologist, Diederik Stapel,
has led to the retraction of his articles from some of the world’s top
journals including, Journal of Experimental Social Psychology, Social
Psychology, Basic and Applied Social Psychology, British Journal of
Social Psychology, and the magazine Science. Diederik Stapel is a
former professor at Tilburg University in the Netherlands. Over the
past two years, an extensive investigation involving three universities
where Stapel has worked concluded that the psychologist is guilty of
fraud on a colossal scale. Falsified data taints over 55 papers he
authored and 10 Ph.D. dissertations that he supervised.
Ethics
 Stapel did not deny that his deceit was driven by ambition. But it was more
complicated than that. He insisted that he loved social psychology but had
been frustrated by the messiness of experimental data, which rarely led to
clear conclusions. His lifelong obsession with elegance and order, he said, led
him to concoct results that journals found attractive. “It was a quest for
aesthetics, for beauty—instead of the truth,” he said. He described his
behavior as an addiction that drove him to carry out acts of increasingly
daring fraud.
 The committee investigating Stapel concluded that he is guilty of several
practices including creating datasets, which largely confirmed the prior
expectations, altering data in existing datasets, changing measuring
instruments without reporting the change, and misrepresenting the number of
experimental subjects.
Ethics

 Clearly, it is never acceptable to falsify data the way this researcher


did. Sometimes, however, violations of ethics are not as easy to spot.
 Researchers have a responsibility to verify that proper methods are
being followed. The report describing the investigation of Stapel’s
fraud states that, “statistical flaws frequently revealed a lack of
familiarity with elementary statistics.” Many of Stapel’s co-authors
should have spotted irregularities in his data. Unfortunately, they did
not know very much about statistical analysis, and they simply trusted
that he was collecting and reporting data properly.
Ethics

 Many types of statistical fraud are difficult to spot. Some researchers


simply stop collecting data once they have just enough to prove what
they had hoped to prove. They don’t want to take the chance that a
more extensive study would complicate their lives by producing data
contradicting their hypothesis.
 Professional organizations, like the American Statistical Association,
clearly define expectations for researchers. There are even laws in
the federal code about the use of research data.
Ethics
 When a statistical study uses human participants, as in medical studies, both ethics
and the law dictate that researchers should be mindful of the safety of their research
subjects. The U.S. Department of Health and Human Services oversees federal
regulations of research studies with the aim of protecting participants. When a
university or other research institution engages in research, it must ensure the safety
of all human subjects. For this reason, research institutions establish oversight
committees known as Institutional Review Boards (IRB). All planned studies must be
approved in advance by the IRB. Key protections that are mandated by law include
the following:
 Risks to participants must be minimized and reasonable with respect to projected
benefits.
 Participants must give informed consent. This means that the risks of participation
must be clearly explained to the subjects of the study. Subjects must consent in
writing, and researchers are required to keep documentation of their consent.
 Data collected from individuals must be guarded carefully to protect their privacy.
Ethics
 These ideas may seem fundamental, but they can be very difficult to verify in
practice. Is removing a participant’s name from the data record sufficient to protect
privacy? Perhaps the person’s identity could be discovered from the data that remains.
What happens if the study does not proceed as planned and risks arise that were not
anticipated? When is informed consent really necessary? Suppose your doctor wants a
blood sample to check your cholesterol level. Once the sample has been tested, you
expect the lab to dispose of the remaining blood. At that point the blood becomes
biological waste. Does a researcher have the right to take it for use in a study?
 It is important that students of statistics take time to consider the ethical questions
that arise in statistical studies. How prevalent is fraud in statistical studies? You might
be surprised—and disappointed. There is a website dedicated to cataloging retractions
of study articles that have been proven fraudulent. A quick glance will show that the
misuse of statistics is a bigger problem than most people realize.
 Vigilance against fraud requires knowledge. Learning the basic theory of statistics will
empower you to analyze statistical studies critically.
Thank You!

You might also like