AEM Lecture 1
AEM Lecture 1
Mathematics
Instructor: Dr. Madiha Liaqat
About Course
course proceeds on two fronts: (i) probability modeling techniques that allow
(ii) Logic and its types which are used in knowledge representation.
The style of the course is necessarily concise but will attempt to blend a mix
other courses.
Books
Week:9-16
Mathematical Modeling, Propositional Logic and its Syntax, Notions of satisfiability, validity,
inconsistency. First Order Logic, Syntax, Semantics and Applications. Fractals and its
applications. Interval Analysis and its applications in modeling. Some advanced topics.
Outline of Today’s Lecture
Introduction
Definitions of Statistics, Probability, and Key Terms
Data, Sampling, and Variation in Data and Sampling
Frequency, Frequency Tables
Experimental Design and Ethics
We encounter statistics in our daily lives more often than we probably realize and from
many different sources, like the news. (David Sim)
Introduction
You are probably asking yourself the question, "When and where will I use
statistics?"
If you read any newspaper, watch television, or use the Internet, you will see
statistical information. There are statistics about crime, sports, education,
politics, and real estate.
Statistical methods can help you make the best educated guess.
Introduction
Exercise:
Write down the average time—in hours, to the nearest half-hour—you
sleep per night.
Now create a simple graph, called a dot plot, of the data. A dot plot
consists of a number line and dots, or points, positioned above the
number line.
Statistics
For example, consider the following data:
5, 5.5, 6, 6, 6, 6.5, 6.5, 6.5, 6.5, 7, 7, 8, 8, 9.
The dot plot for this data would be as follows:
Where do your data appear to cluster? How might you interpret the
clustering?
The questions above ask you to analyze and interpret your data.
Statistics
You will learn how to organize and summarize data. Organizing and
summarizing data is called descriptive statistics. Two ways to summarize data
are by graphing and by using numbers, for example, finding an average.
After you have studied probability and probability distributions, you will use
formal methods for drawing conclusions from good data. The formal methods
are called inferential statistics. Statistical inference uses probability to
determine how confident we can be that our conclusions are correct.
Effective interpretation of data, or inference, is based on good procedures for
producing data and thoughtful examination of the data. The goal of statistics is
not to perform numerous calculations using the formulas, but to gain an
understanding of your data. If you can thoroughly grasp the basics of statistics,
you can be more confident in the decisions you make in life.
Statistical Models
However, life is not always precise. While scientists can predict to the
minute the time that the sun will rise, they cannot say precisely where a
hurricane will make landfall.
Statistical models are very useful because they can describe the
probability or likelihood of an event occurring and provide alternative
outcomes if the event does not occur.
For example, weather forecasts are examples of statistical models.
Meteorologists cannot predict tomorrow’s weather with certainty.
However, they often use statistical models to tell you how likely it is to
rain at any given time, and you can prepare yourself based on this
probability.
Probability
Cars with dummies in the front seats were crashed into a wall at a speed of
35 miles per hour. We want to know the proportion of dummies in the
driver’s seat that would have had head injuries, if they had been actual
drivers. We start with a simple random sample of 75 cars.
Solution
The data are the weights of backpacks with books in them. You sample the
same five students. The weights, in pounds, of their backpacks are 6.2, 7,
6.8, 9.1, 4.3. Notice that backpacks carrying three books can have different
weights. Weights are quantitative continuous data.
You go to the supermarket and purchase three cans of soup (19 ounces tomato
bisque, 14.1 ounces lentil, and 19 ounces Italian wedding), two packages of
nuts (walnuts and peanuts), four different kinds of vegetable (broccoli,
cauliflower, spinach, and carrots), and two desserts (16 ounces pistachio ice
cream and 32 ounces chocolate chip cookies).
Problem
Name data sets that are quantitative discrete, quantitative continuous, and
qualitative.
Solution
A possible solution
One example of a quantitative discrete data set would be three cans of soup,
two packages of nuts, four kinds of vegetables, and two desserts because you
count them.
The weights of the soups (19 ounces, 14.1 ounces, 19 ounces) are quantitative
continuous data because you measure weights as precisely as possible.
Types of soups, nuts, vegetables, and desserts are qualitative data because
they are categorical.
Exercise
Work collaboratively to determine the correct data type: quantitative or
qualitative. Indicate whether quantitative data are continuous or discrete.
Hint: Data that are discrete often start with the words the number of.
a. the number of pairs of shoes you own
b. the type of car you drive
c. the distance from your home to the nearest grocery store
d. the number of classes you take per school year
e. the type of calculator you use
f. weights of sumo wrestlers
g. number of correct answers on a quiz
h. IQ scores
Items a, d, and g are quantitative discrete; items c, f, and h are quantitative
continuous; items b and e are qualitative or categorical.
Exercise
A large school district keeps data of the scores students earn on an end of the
year standardized exam. The data he collects are summarized in the
histogram. The class boundaries are 50 to less than 60, 60 to less than 70, 70
to less than 80, 80 to less than 90, and 90 to less than 100.
Tables are a good way of organizing and displaying data. But graphs can be
even more helpful in understanding the data.
Qualitative Data Discussion
Two graphs that are used to display qualitative data are pie charts and
bar graphs.
In a pie chart, categories of data are shown by wedges in a circle that
represent the percent of individuals/items in each category. We use
pie charts when we want to show parts of a whole.
In a bar graph, the length of the bar for each category represents the
number or percent of individuals in each category. Bars may be
vertical or horizontal. We use bar graphs when we want to compare
categories or show changes over time.
A Pareto chart consists of bars that are sorted into order by category
size (largest to smallest).
Determine which graph (pie or bar) you think displays the
comparisons better.
Omitting Categories/Missing Data
The table displays Ethnicity of Students but is missing the Other/Unknown category.
Data of this type (two variable data) are referred to as bivariate data. Because the data
represent a count, or tally, of choices, it is a two-way frequency table. The entries in
the total row and the total column represent marginal frequencies or marginal
distributions.
Note—The term marginal distributions gets its name from the fact that the distributions
are found in the margins of frequency distribution tables. Marginal distributions may be
given as a fraction or decimal: For example, the total for men could be given as .6 or
3/5 since 30/50 = 0.6 = 3/5
Two-way table
For Example:
The subpopulation of football players who are women is 5/25 which is .2.
How to find the subpopulation of women who play football???
Sampling
Sample size issues—: Samples that are too small may be unreliable.
Larger samples are better, if possible. In some situations, having small
samples is unavoidable and can still be used to draw conclusions.
Examples include crash testing cars or medical testing for rare
conditions.
Undue influence—: collecting data or asking questions in a way that
influences the response.
Non-response or refusal of subject to participate: —The collected
responses may no longer be representative of the population. Often,
people with strong positive or negative opinions may answer surveys,
which can affect the results.
Exercise
Suppose ABC college has 10,000 upperclassman (junior and senior level)
students (the population). We are interested in the average amount of money
an upperclassmen spends on books in the fall term. Asking all 10,000
upperclassmen is an almost impossible task.
Suppose we take two different samples.
First, we use convenience sampling and survey ten upperclassman students
from a first term organic chemistry class. Many of these students are taking
first term calculus in addition to the organic chemistry class. The amount of
money they spend on books is as follows: $128, $87, $173, $116, $130, $204,
$147, $189, $93, $153.
The second sample is taken using a list of seniors who take P.E. classes and
taking every fifth senior on the list, for a total of ten seniors. They spend the
following:
$50, $40, $36, $15, $50, $100, $40, $53, $22, $22.
It is unlikely that any student is in both samples.
Exercise--Problem