0% found this document useful (0 votes)
18 views

Stats and its Real world applications.

The document provides an overview of statistics, covering its definition, types, and real-world applications. It explains concepts such as descriptive and inferential statistics, types of data, sampling techniques, and measures of central tendency including mean, median, and mode. Additionally, it illustrates practical examples and graphical representations to aid in data analysis and decision-making.

Uploaded by

vmadhumitha523
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Stats and its Real world applications.

The document provides an overview of statistics, covering its definition, types, and real-world applications. It explains concepts such as descriptive and inferential statistics, types of data, sampling techniques, and measures of central tendency including mean, median, and mode. Additionally, it illustrates practical examples and graphical representations to aid in data analysis and decision-making.

Uploaded by

vmadhumitha523
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

Understanding

Statistics and Its


Real-World
Applications
From Basics to Practical Use in Your Career
What You Will Learn (Revise)
Statistics and its importance.
• Types of statistics.
• Types of data.
• Population and sampling.
• Frequency distribution.
• Graphical representation.
• Measures of central tendency (mean, median, mode).
What is Statistics?
• The science of collecting, analyzing, interpreting, and presenting data
to make informed decisions.
• Example:
• Deciding on a new product launch based on customer survey data.

• Real-World Use:
• Marketing: Analyzing customer preferences.
• Healthcare: Tracking disease trends.
Real world examples

Collecting Data: A bakery owner wants to know which type of cake sells the most.
•They record the number of each type of cake sold daily over a month.
Example: Chocolate cakes: 50, Vanilla cakes: 30, Red velvet cakes: 20.

Analyzing Data: The owner calculates the average daily sales for each cake type and compares the
numbers.
•Example: Average daily sales: Chocolate - 5, Vanilla - 3, Red velvet - 2.

Interpreting Data: The owner observes that chocolate cakes are the most popular.
•Based on this, they decide to stock more chocolate cakes and promote them.

•Conclusion: "Chocolate cakes are the top-selling item; increasing production can boost sales.
Descriptive vs. Inferential
Statistics
• Descriptive Statistics: Summarize and describe data.
• Example: Average marks of students in a class.
• Inferential Statistics: Make predictions or generalizations.
• Example: Predicting election outcomes from sample surveys.

Use Case:
• Descriptive: Reporting monthly sales figures.
• Inferential: Forecasting future sales trends.
Types of Data
Categorical vs. Numerical Data
• Categorical Data (Qualitative): Describes categories (e.g., colors,
brands).
• Example: Preferred car brands (Toyota, BMW, Tesla).
• Numerical Data: Quantitative values (e.g., height, weight).
• Example: Age of participants in a survey.
1. Categorical Data

Describes categories or qualities that do not have numerical meaning.


• Nominal Data:
• Definition: Categories with no logical order.
• Example: Favorite car brands (Toyota, BMW, Tesla).
• Real-World Example: Types of cuisines (Italian, Chinese, Indian).
• Ordinal Data:
• Definition: Categories with a meaningful order but no consistent difference
between them.
• Example: Movie ratings (Excellent, Good, Average, Poor).
• Real-World Example: Education level (High school, Bachelor's, Master's, PhD).
2. Numerical Data
• Quantitative values that can be measured or counted.
Discrete and CONTINOUS
• Interval Data:
• Definition: Ordered data with meaningful intervals, but no true zero point.
• Example: Temperature in Celsius or Fahrenheit.
• Real-World Example: Calendar years (e.g., 1990, 2000, 2020).
• Ratio Data:
• Definition: Ordered data with meaningful intervals and a true zero point.
• Example: Weight of a person (in kilograms or pounds).
• Real-World Example: Distance traveled by a car (e.g., 0 km to 100 km).
Population and Sampling
• Population: The entire group you're studying.
• Example: All residents of a city.

• Sample: A smaller, representative group from the population.


• Example: 500 residents surveyed about city amenities.

• Surveying customer feedback without contacting every customer.


Types of Sampling techniques
Simple Random Sampling:
A company randomly selects 100 employees from a list of 1,000
to participate in a survey on job satisfaction. Every employee
has an equal chance of being chosen.

Stratified Sampling: A university selects students from different


departments (e.g., science, arts, business) to ensure that each
department is adequately represented in a survey on student
satisfaction.

Systematic Sampling: A researcher chooses every 10th person


on a city’s voter registration list to participate in a study on
election preferences.

Cluster Sampling: A school district selects a few schools at


random, then surveys all the students within those selected
schools to assess educational performance.
Frequency Distribution
• A table or chart showing how often data values occur.
Graphical Representation
•Why Use Graphs: Graphs simplify complex data
making it easier to understand and compare.

•How They're Used: Graphs display trends,


relationships, and patterns visually, helping users
analyze data effectively.

•Usefulness in Analysis: Graphs highlight key


insights, making it easier to communicate findings
and make data-driven decisions.

•Example: A bar chart is used to compare market


share percentages of different brands, guiding
marketing strategies.
Bar Chart
Definition: A chart with rectangular
bars representing data values.
• Real Example: Sales of different
products over a quarter.
• Why Use It: To compare different
categories or groups.
• When to Use: When you need to
show comparisons between different
groups.
Line Graph
Definition: A graph showing data
points connected by straight lines,
often used to show trends.
• Real Example: Sales trends over
a year.
• Why Use It: To show changes
over time.
• When to Use: When you need
to illustrate the trend or
changes of data points over
time.
Pie Chart
Definition: A circular chart divided into
segments to show proportionate data.
• Real Example: Market share of
different brands in a sector.
• Why Use It: To show parts of a whole.
• When to Use: When you want to
show percentage or proportional data.
Histogram
Definition: A bar graph representing the
frequency distribution of data.
• Real Example: Distribution of ages in a
survey.
• Why Use It: To show the distribution of
data and understand its spread.
• When to Use: When you want to display
the frequency of continuous data.
Measures of central tendency
• What are Mean, Median, and Mode?
• Mean: The average value, sensitive to all data points.
• Median: The middle value, less sensitive to outliers.
• Mode: The most frequent value, useful for identifying trends.
• Importance: Helps analyze real-world data like salaries, exam scores, and
customer purchases.
• 75,72,54,92,90
• 54,72,75,90,92 4,3,4,4.6,4.8,4.2 mean 4.1
• Median 75 4,3,4,4.6,4.8,4.2,2,2 mean 3
• south Indian 21 Chinese 15 Indian 34 itlaian 12
• Mode 34
Mean
• Mean is the average of all the numbers in a dataset.
• =4,8,10,12,16
• =adding up all the values in a dataset/number of values
• =4+8+10+12+16/5
• =50/5=10 average value (mean)
• =20,20,30,40,50 32 is the centre of my dataset
• =20+30+20+40+50/5
• =160/5=32
Real estate
• House prices=[72,000,75,000,56,000,2,000000]
• =5,50,750
• Outlier (abnormal behaviour)
Mean
• Average- most of the people have given it 4.2
• Mean is the average of all the numbers in a dataset
• Steps- add up all the numbers/number of values
• Ages=[4,8,10,12,16]
• Calculate the mean of my ages
• 4+8+10+12+16= 50
• 50/5= 10
• 10,20,30 mean of these numbers =20+30+10/3 =60/3= 20
Business and Sales analysis
• A company wants to know mean revenue per month to assess overall
performance
• Example if a shop earns around 10,000, 15,000, 12,000 over 3
months
• Calculate?
• Mean=add all the values/number of values
• [10,000, 12,000, 15,000]
• =10,000+15,000+12,000/3=37,000/3=12,333
• On an average my company earns around 12,333
Healthcare
• Doctors calculate the avg bp or heartrate of a patient over time to
assess health
• Bp readings are
• [120,125,130]
• =120+130+125/3=125
• Bp of 125 suggests stable readings over time
Education
• [92,45,67,87,92]
• 92+45+67+87+92/5=383/5=76.6
Sports
• The mean score or performance shows a players or teams overall
consistency
• A mean score of 60 runs suggests that a player typically scores around
60

Transportation
The mean travel time represents the typical time it takes to complete a
route
Customer behavior
• The mean spending or rating reflects the typical behavior
• A mean rating of 4.5 stars suggests more customers are satisfied even
if a few left low ratings
Median
• It gives me central or the middle point
• Ages= 8,4,10,12,13
• Step 1 – arrange them in an ascending order (lowest to highest)
• - 4,8,10,12
• Step 2- count the number of values values= 4 (even)
• Step 3 if it is an even number just calculate the average of the middle
2 elements
• 8+10/2= 18/2 = 9
Median
• 8,4,10,12,14
• Step 1- arrange it in an ascending order (lowest to highest)
• 4,8,10,12,14
• Step 2 count the no. of values = 5 (odd number)
• Step 3 pick the central element median=10
Median
• Step 1 arrange it
• Wheter the number of values are even/odd
• If it is even just average the middle 2 elements
• If it is odd just pick the central element

Mode – most frequent value
• Cuisines marks
• Indian 24 Rajani,abhishek,Rajani,Rajani,Rajani,danush,
• South Indian 23 Rajani 4
• Italian 19
• Chinese 26

• [2,4,6,6,8,10] 34,38,39,40,40,48
• 2 1 34 1
• 39 1
• M 38 1 40 2
• 4 1
• 6 2
Q1 Mean income of the employees
in a dataset
• [2500,3000,3200,3200,4000,4500,5000,5200,5500,6000]
• Calculate the mean
• =sum of all incomes/number of employees
• =42100/10
• =4210 (mean)
Q2 If an employee earning 4000$ is removed
from the dataset what is the new mean income?
• 2500,3000,3200,3200,4000,4500,5000,5200,5500,6000]
• Updated dataset=[2500,3000,3200,3200,4500,5000,5200,5500,6000]
• =add up all the values/number of values
• =38100/9
• =4233.33
Q3 If a new employee with an income of 7000
is added what will be the new mean income
• 2500,3000,3200,3200,4000,4500,5000,5200,5500,6000]

• Updated
dataset=[2500,3000,3200,3200,4000,4500,5000,5200,5500,6000,700
0]
• Adding up all the values/number of values
• 49100/11=4463.63
Q4 calculate the mean income of
employees earning less than 5000
• [50,80,2500,3000,3200,3200,4000,4500,5000,5200,5500,6000]
• New dataset=50,80,2500,3000,3200,3200,4000,4500
• mean=20400/6
• =3400

20530/8
2566
Q5 what is the mean income if the
incomes are increased by 10%?
• Dataset= 2500,3000,3200,3200,4000,4500,5000,5200,5500,6000
• updated
dataset=[2750,3300,3520,3520,4400,4950,5500,5720,6050,6600]
• =46310/10
• =4631
Median
Q6 what is the median income of the employees in
a dataset
• [2500,3000,3200,3200,4000,4500,5000,5200,5500,6000]
• Step1 – arranging them in an ascending order
• Step 2- counting the number of items (even or odd)
• Step 3- 10(even)

• 4000+4500/2
• 8500/2=4250 (median value).
Q7 if an employee earning 7000 is
added, how does the median
income change?
2500,3000,3200,3200,4000,4500,5000,5200,5500,6000]

2500,3000,3200,3200,4000,4500,5000,5200,5500,6000,7000]

11(odd)
4500
Q8 Remove the highest income from
the dataset what is the new median
income?
• [2500,3000,3200,3200,4000,4500,5000,5200,5500,6000]
• Updated dataset=[2500,3000,3200,3200,4000,4500,5000,5200,5500]
• 4000
Q9 find the median income of
employees earning between 3000
and 5500
• [2500,3000,3200,3200,4000,4500,5000,5200,5500,6000
• Updated dataset=[3000,3200,3200,4000,4500,5000,5200]
• median- number of values (7)
• Middle element /median 4000
mode
Q11 What is the mode of the dataset
[2500,3000,3200,3200,4000,4500,5000,5200,5500,6000]
3200
Q12 if the income 3200 is changed to 3500 for one of the employees
what is the new mode
2500,3000,3200,3500,4000,4500,5000,5200,5500,6000]
Mode- no mode
Q13 if all the incomes are multiplied by 2 does the
mode change?

• 2500,3000,3200,3200,4000,4500,5000,5200,5500,6000]
• Updated
dataset=[5000,6000,6400,6400,8000,9000,10000,10400,11000,12000
]
• mode 6400 (2 times)
Q14 find the mode of the subset of
incomes greater than 3000$
• 2500,3000,3200,3200,4000,4500,5000,5200,5500,6000
• Updated dataset=3200,3200,4000,4500,5000,5200,5500,6000
Mode 3200
Q15 if an additional employee with an income of 3200 is added what
happens to the mode?
Updated
dataset=2500,3000,3200,3200,3200,4000,4500,5000,5200,5500,6000
Mode 3200 frequency (3)
Household prices
• 30,000 , 35,000, 40,000, 5,00,000 , 55,000
• 5,00,000 – outliers
• Mean- 1,32,000 we do not use mean (sensitive to outliers)
Median- 40,000 (you will use median in case of outliers)

Mode – use to deal with categorical data


- To calculate the frequency
• 30,000 , 35,000, 40,000, 55000, 5,00,000 outlier
• Outlier – a value which behaves abnormally
• Mode no - number of highest frequency (categorical data)
• Mean 1,32,000
• Median 40,000
Mean for Grouped Data
Grouped distribution

f- frequency
x-middle point (lower+upper)/2

• Steps:
1.Compute midpoints x.
2.Multiply f by x for each class.
3.Add f⋅x and divide by total f:
Real-World Example:

Problem:
A company collects data on employee salaries (in thousands).

• Mean=1590/38
• The mean of the grouped distribution is 41.8421
Median for Grouped Data
Where:
•L: Lower boundary of the median class
•N: Total frequency
•CF: Cumulative frequency before the median class
•fm: Frequency of the median class
•h: Class width
Real-World Example:

• Problem: Using the same salary data:


Median for grouped distribution
Where:
•L: Lower boundary of the median class
•N: Total frequency
•CF: Cumulative frequency before the median class
•fm: Frequency of the median class
•h: Class width
Mode for Grouped Data

Where:
•L: Lower boundary of the modal class
•f1​: Frequency of the modal class
•f0​: Frequency before the modal class
•f2: Frequency after the modal class
•h: Class width
Mode for grouped distribution
Overall results:
•When to use each measure:
• Mean: For overall trends.
• Median: When data has outliers.
• Mode: To find the most common
category.

•Mean = 41.8k
•Median = 42.6k
•Mode = 44k

You might also like