Introduction
Introduction
Introduction
1. Shopping: When we go shopping, we often look at the prices of different products and compare them. The mean price
of the products can give us an idea of the average price range of the products we want to buy.
2.Grades: Students often receive grades on their assignments, tests, and exams. The average of these grades can be
calculated to find the mean, median, or mode of the scores. This can give the students an idea of how well they did
compared to others in their class.
3.Sports: In sports, we often use statistics to analyze the performance of individual players or teams. Mean, median, and
mode can be used to determine the average points per game, rebounds per game, or assists per game.
4.Health: In the field of medicine, statistics play a crucial role in analyzing health-related data. For example, the average
weight, height, and body mass index (BMI) of a population can be determined using mean, median, and mode.
5.Finance: Statistics are used extensively in finance, including analyzing the performance of stocks, bonds, and other
financial instruments. Mean, median, and mode can be used to determine the average returns or losses of a particular
investment.
6.Public Opinion: In surveys and polls, statistics are used to analyze public opinion on various issues. Mean, median, and
mode can be used to determine the average responses or the most common response to a particular question.
4
7. Quality Control: Order statistics are used to analyze and evaluate the quality of products in manufacturing
processes. For example, the minimum and maximum values of a set of measurements can be used to determine
whether a particular product meets certain quality standards.
8. Environmental Science: Order statistics are used to analyze extreme events, such as floods, droughts, and heat
waves, in environmental science. For example, the maximum and minimum values of temperature or rainfall in a
particular region can be used to evaluate the risk of such events occurring in the future.
Statistical methods are used to analyze data related to environmental issues, such as climate change, air pollution, and
water quality.
9. Epidemiology: Order statistics are used in epidemiology to study the distribution of diseases in a population. For
example, the k-th highest incidence rate of a disease can be used to estimate the risk of getting the disease.
5
Statistics are used in our everyday lives in various ways.
Here are some examples of where we use mean,
median, and mode in everyday situations:
10. Health: Here we apply the principles of statistics, which says that the average is a more true/accurate reflection of a
variable. One simple example is while measuring blood pressure we often take 3 readings taken at regular intervals and
then calculate it's average, to eliminate the possibility of erratic reading which often happens while measuring by our
home electronic bp machines, when accurate measurement of bp is crucial for a person.
11. Risk aversion: Would you rather win $30 guaranteed or 50% chance of winning $100?
In statistics, the expected value is 50% * $100 + 50% * $0 = $50.
Someone who is more risk averse would likely take $30 versus someone is more risk-seeking.
This could extend to non-monetary situations such as human relationships
The expected value of a choice = probability of that choice * the utility of that choice + probability of not making that
choice * the utility of not making that choice.
12. Manufacturing : Suppose a car manufacturer wants to know what percentage of its vehicles experience a particular
mechanical problem within the first year of ownership.
The manufacturer might collect data on a random sample of cars sold in the past year, record whether or not each car
experienced the problem, and then use statistical methods to analyze the data and estimate the overall percentage of
cars with the problem.
Once the manufacturer has this estimate, it can use it to make decisions about how to address the problem. For example,
if the estimate is high, the manufacturer might issue a recall or redesign the affected component.
Alternatively, if the estimate is low, the manufacturer might choose to focus its resources on other areas of improvement.
6
As for future applications of mathematical statistics, there are many possibilities.
With the increasing availability of data, there will be a growing need for statistical
methods to analyze and make sense of this data.
1.Predictive Analytics: Statistical methods will be used to predict future outcomes based on historical
data, such as predicting the likelihood of a disease outbreak or the success of a marketing campaign.
2.Machine Learning: Statistical methods will be used to develop algorithms and models for machine
learning applications, such as speech recognition, image recognition, and natural language
processing.
3.Personalized Medicine: Statistical methods will be used to analyze large datasets of patient
information to develop personalized treatment plans and predict treatment outcomes.
4.Smart Cities: Statistical methods will be used to analyze data from sensors and other sources in
smart cities to improve urban planning, traffic management, and energy efficiency.
5.Quantum Computing: Statistical methods will play an important role in developing algorithms for
quantum computing, which has the potential to revolutionize many fields, including cryptography,
drug discovery, and materials science.
There are several common statistical terms that are often
used in everyday life
•Average: This refers to the typical or central value in a set of data. It can be calculated using various
measures, such as the mean, median, or mode.
•Probability: This refers to the likelihood or chance of an event occurring. It is often expressed as a
percentage or a fraction.
•Standard deviation: This refers to the amount of variation or spread in a set of data. A smaller standard
deviation indicates that the data is clustered closely around the mean, while a larger standard deviation
indicates that the data is more spread out.
•Confidence interval: This refers to a range of values that is likely to contain the true value of a population
parameter, such as the mean or proportion. It is often used to estimate the precision of a sample statistic.
•Regression: This refers to a statistical method that is used to model the relationship between two or more
variables. It is often used to predict the value of one variable based on the value of another variable.
TEACHING METHODS
This course enables learners to acquire knowledge of basic statistical ideas, methods and
terminology.
Study of the content of the course enables learners to represent and use statistical data in
graphical, diagrammatic and tabular forms, interpret statistical statements, calculations and
diagrams, perform statistical calculations accurately and acquire knowledge of elementary
ideas in probability,
Python is the most popular programming language for data science. This course introduces
Python within the context of the closely related areas of statistics and data science.
This gives companies an idea of how many products they can expect
to sell during different time periods and allows them to know how
much they should keep in inventory.
Example 3: Health Insurance
Health insurance companies often use statistics and probability to
determine how likely it is that certain individuals will spend a certain
amount on healthcare each year.
Example, an actuary at a health insurance company might use factors like
age, existing medical conditions, current health status, etc. to determine
that there’s a 80% probability that a certain individual will spend $10,000 or
more on healthcare in a given year.
Example 4: Traffic
Traffic engineers regularly use statistics to monitor total
traffic in different areas of a city, which allows them to
decide whether or not they should add or remove
roads to optimize traffic flow.
Pareto chart
A Pareto chart shows the frequency of occurrences of quality-related problems to highlight those that need
the most attention.
16
Example :
Pareto charts are a common tool used by manufacturers to analyze quality and defect data, providing a
simple visual representation as to the frequency of certain issues and the cumulative percentage of their
occurrence.
Type of Frequency
% of Total Cumulative %
Defect of Defect
Button
23 23/59 = 39.0 39.0
Defect
Pocket
16 27.1 39.0+27.1 = 66.1
Defect
Collar
10 16.9 83.1
Defect
Cuff
7 11.9 95.0
Defect
Sleeve
3 5.0 100
Defect
Total 59 - -
Statistics Analyze :
While the 80/20 rule does not apply perfectly to the example above, focusing on just 2 types of defects
(Button and Pocket) has the potential to remove the majority of all defects (66%).
What is Statistics?
Why?
Data
1. Collecting Data Analysis
e.g., Sample, Survey,
Observe,
Simulate
2. Characterizing Data
e.g., Organize/Classify,
Count, Summarize
3. Presenting Data Decision-
e.g., Tables, Charts, Making
Statements
4. Interpreting Results
e.g. Infer, Conclude, Specify
Confidence
© 1984-1994 T/Maker Co.
Statistics is the science of data : collecting, classifying, summarizing, organizing, analyzing,
19 and
interpreting numerical information
Why Study Statistics?
1. Numerical information is everywhere.
2. Statistical techniques are used to make decisions that affect our daily lives.
3. The knowledge of statistical methods will help you understand how decisions are made and
give you a better understanding of how they affect you.
4. No matter what line of work you select, you will find yourself faced with decisions where an
understanding of data analysis is helpful.
Statistics is the science of conducting studies to collect, organize, summarize, analyze, present,
interpret and draw conclusions from data. 20
Steps of Statistical Investigation
2. Explore the Data: Analyze and summarize the data (also called exploratory data analysis).
3. Draw a Conclusion: Use the data, probability, and statistical inference to draw a conclusion
about the population.
21
Steps of Statistical Investigation
Definition:
Science of collection, presentation, analysis, and reasonable interpretation of data.
Statistics presents a rigorous scientific method for gaining insight into data.
Example : suppose we measure the weight of 100 patients in a study. With so many
measurements, simply looking at the data fails to provide an informative account.
However statistics can give an instant overall picture of data based on graphical presentation
or numerical summarization irrespective to the number of data points.
Besides data summarization, another important task of statistics is to make inference (steps in
reasoning, moving from premises to logical consequences; etymologically, the word infer means to carry
forward) and predict relations of variables.
Steps of Statistical Investigation
Definition:
Facts or figures, which are numerical or otherwise, collected with a definite purpose are called
data.
Everyday we come across a lot of information in the form of facts, numerical figures, tables,
graphs, etc.
These may relate to profits of a company, temperatures of cities, expenditures in various sectors
of a five year plan, polling results, and so on.
These facts or figures, which are numerical or otherwise, collected with a definite purpose are
called data. 23
Population/
Type of
Cause and Type of Research Question Examples
Effect Study
ANOVA Statistics - used to evaluate the difference between the means of more than two groups.
1 26 23.2 0 61.0 0 1 1
2 30 30.2 9 65.5 1 3 2
3 32 28.9 17 59.6 1 3 4
4 37 22.4 1 68.4 1 2 3
5 33 25.5 7 64.5 0 3 5
Dataset:
6 29 22.3 1 70.2 0 2 2
7 32 23.0 0 67.3 0 1 1
persons.
9 32 22.2 3 71.5 0 1 4
10 33 29.1 5 63.2 1 1 4
11 26 20.8 2 69.1 0 1 3
13 31 36.3 1 66.3 0 2 5
column for each variable and one row for each 15 27 28.6 2 70.2 1 2 2
18 31 21.2 11 70.7 1 1 2
19 36 22.7 8 69.8 0 2 1
Data :
Set of values of one or more variables recorded on one or more observational units
Sources of data :
1. Routinely kept records
2. Surveys (census)
3. Experiments
4. External source
Categories of data :
1. Primary data: observation, questionnaire, record form, interviews, survey,
2. Secondary data: web, census, medical record, registry
Primary Data Vs Secondary Data
Primary Data
Primary data is the data that is collected for the first time through personal experiences or evidence,
particularly for research.
The data is mostly collected through observations, physical testing, mailed questionnaires, surveys,
personal interviews, telephonic interviews, case studies, and focus groups, etc.
Primary Data Vs Secondary Data
Secondary Data
Secondary data is a second-hand data that is already collected and recorded by some researchers for
their purpose, and not for the current research problem.
It is accessible in the form of data collected from different sources such as government publications,
censuses, internal records of the organisation, books, journal articles, websites and reports, etc.
This method of gathering data is affordable, readily available, and saves cost and time.
However, the one disadvantage is that the information assembled is for some other purpose and may
not meet the present research purpose or may not be accurate.
29
Discrete Vs continuous data
Discrete data (countable) is information that can only take certain values. These values don’t
have to be whole numbers but they are fixed values – such as shoe size, number of teeth,
number of kids, etc.
Discrete data includes discrete variables that are finite, numeric, countable, and non-negative
integers (5, 10, 15, and so on).
Continuous data (measurable) is data that can take any value. Height, weight, temperature
and length are all examples of continuous data.
Continuous data changes over time and can have different values at different time intervals
like weight of a person.
30
Definitions for Variables
Qualitative Quantitative
• Nominal - Categorical variables with no inherent order or ranking sequence such as names or classes
(e.g., gender). Value may be a numerical, but without numerical value (e.g., I, II, III). The only operation that
can be applied to Nominal variables is enumeration.
• Ordinal - Variables with an inherent rank or order, e.g. mild, moderate, severe. Can be compared for
equality, or greater or less, but not how much greater or less.
• Interval - Values of the variable are ordered as in Ordinal, and additionally, differences between values
are meaningful, however, the scale is not absolutely anchored. Calendar dates and temperatures on the
Fahrenheit scale are examples. Addition and subtraction, but not multiplication and division are meaningful
operations.
• Ratio - Variables with all properties of Interval plus an absolute, non-arbitrary zero point, e.g. age, weight,
temperature (Kelvin). Addition, subtraction, multiplication, and division are all meaningful operations.
38
Qualitative analysis contrasts
with quantitative analysis, which focuses on
numbers found in reports such as balance
sheets.
Qualitative data
The objects being studied are grouped into categories based on some qualitative trait.
The resulting data are merely labels or categories: Categorical Data
Categorical data
Nominal Ordinal
data data
Examples :
Eye color
Blue, brown, black, green, etc.
Smoking status
Smoker, non-smoker
Attitudes towards the death penalty
Strongly disagree, disagree, neutral, agree, strongly agree.
Nominal data
Studies measuring nominal data must ensure that each category is mutually exclusive and the
system of measurement needs to be exhaustive.
Variables that have only two responses i.e. Yes or no, are known as dichotomies.
Examples of Nominal Data
Type of car
BMW, Mercedes, Lexus, Toyota, Renault, Ford, etc.
Ethnicity
White British, afro-caribbean, Asian, Arab, Chinese, other, etc.
Smoking status
Smoker, non-smoker
Binary Data
Examples:
Smoking status- smoker, non-smoker
Attendance- present, absent
Result of a exam- pass, fail
Status of student- undergraduate, postgraduate
Ordinal data
• Ordinal data is data that comprises of categories that can be rank ordered.
• Similarly with nominal data the distance between each category cannot be calculated but the
categories can be ranked above or below each other.
Examples of Ordinal Data
Very satisfied
Did you enjoy the
Somewhat satisfied
teaching session ?
Neutral (please tick)
Somewhat dissatisfied
Very dissatisfied Yes
No
Quantitative Data
The objects being studied are ‘measured’ based on some quantitative trait.
The resulting data are set of numbers.
Examples
Pulse Rate
Height
Age
Exam marks
Time to complete a statistics test
Number of cigarettes smoked
Quantitative
data
Discrete Continuous
Discrete Data
Only certain values are possible (there are gaps between the possible values). Implies counting.
Continuous Data
52
Quant vs. Qual
Statistical Description of Data
54
Summary Measures in Descriptive Statistics
Summary Measures
Mean Mode
Range Coefficient of
Median Variation
Midrange Variance
Standard Deviation