Stats and its Real world applications.
Stats and its Real world applications.
• Real-World Use:
• Marketing: Analyzing customer preferences.
• Healthcare: Tracking disease trends.
Real world examples
Collecting Data: A bakery owner wants to know which type of cake sells the most.
•They record the number of each type of cake sold daily over a month.
Example: Chocolate cakes: 50, Vanilla cakes: 30, Red velvet cakes: 20.
Analyzing Data: The owner calculates the average daily sales for each cake type and compares the
numbers.
•Example: Average daily sales: Chocolate - 5, Vanilla - 3, Red velvet - 2.
Interpreting Data: The owner observes that chocolate cakes are the most popular.
•Based on this, they decide to stock more chocolate cakes and promote them.
•Conclusion: "Chocolate cakes are the top-selling item; increasing production can boost sales.
Descriptive vs. Inferential
Statistics
• Descriptive Statistics: Summarize and describe data.
• Example: Average marks of students in a class.
• Inferential Statistics: Make predictions or generalizations.
• Example: Predicting election outcomes from sample surveys.
Use Case:
• Descriptive: Reporting monthly sales figures.
• Inferential: Forecasting future sales trends.
Types of Data
Categorical vs. Numerical Data
• Categorical Data (Qualitative): Describes categories (e.g., colors,
brands).
• Example: Preferred car brands (Toyota, BMW, Tesla).
• Numerical Data: Quantitative values (e.g., height, weight).
• Example: Age of participants in a survey.
1. Categorical Data
Transportation
The mean travel time represents the typical time it takes to complete a
route
Customer behavior
• The mean spending or rating reflects the typical behavior
• A mean rating of 4.5 stars suggests more customers are satisfied even
if a few left low ratings
Median
• It gives me central or the middle point
• Ages= 8,4,10,12,13
• Step 1 – arrange them in an ascending order (lowest to highest)
• - 4,8,10,12
• Step 2- count the number of values values= 4 (even)
• Step 3 if it is an even number just calculate the average of the middle
2 elements
• 8+10/2= 18/2 = 9
Median
• 8,4,10,12,14
• Step 1- arrange it in an ascending order (lowest to highest)
• 4,8,10,12,14
• Step 2 count the no. of values = 5 (odd number)
• Step 3 pick the central element median=10
Median
• Step 1 arrange it
• Wheter the number of values are even/odd
• If it is even just average the middle 2 elements
• If it is odd just pick the central element
•
Mode – most frequent value
• Cuisines marks
• Indian 24 Rajani,abhishek,Rajani,Rajani,Rajani,danush,
• South Indian 23 Rajani 4
• Italian 19
• Chinese 26
• [2,4,6,6,8,10] 34,38,39,40,40,48
• 2 1 34 1
• 39 1
• M 38 1 40 2
• 4 1
• 6 2
Q1 Mean income of the employees
in a dataset
• [2500,3000,3200,3200,4000,4500,5000,5200,5500,6000]
• Calculate the mean
• =sum of all incomes/number of employees
• =42100/10
• =4210 (mean)
Q2 If an employee earning 4000$ is removed
from the dataset what is the new mean income?
• 2500,3000,3200,3200,4000,4500,5000,5200,5500,6000]
• Updated dataset=[2500,3000,3200,3200,4500,5000,5200,5500,6000]
• =add up all the values/number of values
• =38100/9
• =4233.33
Q3 If a new employee with an income of 7000
is added what will be the new mean income
• 2500,3000,3200,3200,4000,4500,5000,5200,5500,6000]
• Updated
dataset=[2500,3000,3200,3200,4000,4500,5000,5200,5500,6000,700
0]
• Adding up all the values/number of values
• 49100/11=4463.63
Q4 calculate the mean income of
employees earning less than 5000
• [50,80,2500,3000,3200,3200,4000,4500,5000,5200,5500,6000]
• New dataset=50,80,2500,3000,3200,3200,4000,4500
• mean=20400/6
• =3400
20530/8
2566
Q5 what is the mean income if the
incomes are increased by 10%?
• Dataset= 2500,3000,3200,3200,4000,4500,5000,5200,5500,6000
• updated
dataset=[2750,3300,3520,3520,4400,4950,5500,5720,6050,6600]
• =46310/10
• =4631
Median
Q6 what is the median income of the employees in
a dataset
• [2500,3000,3200,3200,4000,4500,5000,5200,5500,6000]
• Step1 – arranging them in an ascending order
• Step 2- counting the number of items (even or odd)
• Step 3- 10(even)
• 4000+4500/2
• 8500/2=4250 (median value).
Q7 if an employee earning 7000 is
added, how does the median
income change?
2500,3000,3200,3200,4000,4500,5000,5200,5500,6000]
2500,3000,3200,3200,4000,4500,5000,5200,5500,6000,7000]
11(odd)
4500
Q8 Remove the highest income from
the dataset what is the new median
income?
• [2500,3000,3200,3200,4000,4500,5000,5200,5500,6000]
• Updated dataset=[2500,3000,3200,3200,4000,4500,5000,5200,5500]
• 4000
Q9 find the median income of
employees earning between 3000
and 5500
• [2500,3000,3200,3200,4000,4500,5000,5200,5500,6000
• Updated dataset=[3000,3200,3200,4000,4500,5000,5200]
• median- number of values (7)
• Middle element /median 4000
mode
Q11 What is the mode of the dataset
[2500,3000,3200,3200,4000,4500,5000,5200,5500,6000]
3200
Q12 if the income 3200 is changed to 3500 for one of the employees
what is the new mode
2500,3000,3200,3500,4000,4500,5000,5200,5500,6000]
Mode- no mode
Q13 if all the incomes are multiplied by 2 does the
mode change?
• 2500,3000,3200,3200,4000,4500,5000,5200,5500,6000]
• Updated
dataset=[5000,6000,6400,6400,8000,9000,10000,10400,11000,12000
]
• mode 6400 (2 times)
Q14 find the mode of the subset of
incomes greater than 3000$
• 2500,3000,3200,3200,4000,4500,5000,5200,5500,6000
• Updated dataset=3200,3200,4000,4500,5000,5200,5500,6000
Mode 3200
Q15 if an additional employee with an income of 3200 is added what
happens to the mode?
Updated
dataset=2500,3000,3200,3200,3200,4000,4500,5000,5200,5500,6000
Mode 3200 frequency (3)
Household prices
• 30,000 , 35,000, 40,000, 5,00,000 , 55,000
• 5,00,000 – outliers
• Mean- 1,32,000 we do not use mean (sensitive to outliers)
Median- 40,000 (you will use median in case of outliers)
f- frequency
x-middle point (lower+upper)/2
• Steps:
1.Compute midpoints x.
2.Multiply f by x for each class.
3.Add f⋅x and divide by total f:
Real-World Example:
Problem:
A company collects data on employee salaries (in thousands).
• Mean=1590/38
• The mean of the grouped distribution is 41.8421
Median for Grouped Data
Where:
•L: Lower boundary of the median class
•N: Total frequency
•CF: Cumulative frequency before the median class
•fm: Frequency of the median class
•h: Class width
Real-World Example:
Where:
•L: Lower boundary of the modal class
•f1: Frequency of the modal class
•f0: Frequency before the modal class
•f2: Frequency after the modal class
•h: Class width
Mode for grouped distribution
Overall results:
•When to use each measure:
• Mean: For overall trends.
• Median: When data has outliers.
• Mode: To find the most common
category.
•Mean = 41.8k
•Median = 42.6k
•Mode = 44k