DSBDAL_Assignment no 3

The document outlines an experiment focused on basic statistics, specifically measures of central tendencies and variance using an open-source dataset. It includes instructions for calculating summary statistics, such as mean, median, and standard deviation, and emphasizes the importance of understanding data variability. Additionally, it provides a theoretical background on statistical concepts and practical tasks involving the Iris dataset.

Uploaded by

darshanpatil200219

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views4 pages

DSBDAL_Assignment no 3

Uploaded by

darshanpatil200219

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

TE (Comp)

Experiment:3
Title :
Basic Statistics - Measures of Central Tendencies and Variance
Perform the following operations on any open source dataset (eg. data.csv)
1. Provide summary statistics (mean, median, minimum, maximum, standard deviation)
for a dataset (age, income etc.) with numeric variables grouped by one of the qualitative
(categorical) variable. For example, if your categorical variable is age groups and
quantitative variable is income, then provide summary statistics of income grouped by
the age groups. Create a list that contains a numeric value for each response to the
categorical variable.
2. Write a Python program to display some basic statistical details like percentile, mean,
standard deviation etc. of the species of ‘Iris-setosa’, ‘Iris-versicolor’ and ‘Iris-
versicolor’ of iris.csv dataset.

Prerequisites:
Fundamentals of R -Programming Languages OR Python

Objectives :
To learn the concept of how to display summary statistics for each feature Available in
the dataset

Theory:
How to Find the Mean, Median, Mode, Range, and Standard Deviation
Simplify comparisons of sets of number, especially large sets of number, by calculating the
center values using mean, mode and median. Use the ranges and standard deviations of the
sets to examine the variability of data.

Calculating Mean
The mean identifies the average value of the set of numbers. For example, consider the
data set containing the values 20, 24, 25, 36, 25, 22, 23.

Formula

To find the mean, use the formula: Mean equals the sum of the numbers in the data set
divided by the number of values in the data set. In mathematical terms: Mean=(sum of all
terms)÷(how many terms or values in the set).
Adding Data Set
Add the numbers in the example data set: 20+24+25+36+25+22+23=175.

Finding Divisor
Divide by the number of data points in the set. This set has seven values so divide by 7.

Finding Mean
Insert the values into the formula to calculate the mean. The mean equals the sum of the
values (175) divided by the number of data points (7). Since 175÷7=25, the mean of this
data set equals 25. Not all mean values will equal a whole number.

Calculating Range
Range shows the mathematical distance between the lowest and highest values in the data
set. Range measures the variability of the data set. A wide range indicates greater variability
in the data, or perhaps a single outlier far from the rest of the data. Outliers may skew, or
shift, the mean value enough to impact data analysis.

Identifying Low and High Values

In the sample group, the lowest value is 20 and the highest value is 36.

Calculating Range
To calculate range, subtract the lowest value from the highest value. Since 36-20=16, the
range equals 16.
.

Calculating Standard Deviation

Standard deviation measures the variability of the data set. Like range, a smaller standard
deviation indicates less variability.

Formula
Finding standard deviation requires summing the squared difference between each data point
2
and the mean [∑(x-µ) ], adding all the squares, dividing that sum by one less than the number
of values (N-1), and finally calculating the square root of the dividend.
Mathematically, start with calculating the mean.

Calculating the Mean

Calculate the mean by adding all the data point values, then dividing by the number of data
points. In the sample data set, 20+24+25+36+25+22+23=175. Divide the sum, 175, by the
number of data points, 7, or 175÷7=25. The mean equals 25.

Squaring the Difference

Next, subtract the mean from each data point, then square each difference. The formula looks
2
like this: ∑(x-µ) , where ∑ means sum, x represents each data set value and µ represents the
2
mean value. Continuing with the example set, the values become: 20-25=-5 and -5 =25; 24-
2 2 2 2
25=-1 and -1 =1; 25-25=0 and 0 =0; 36-25=11 and 11 =121; 25-25=0 and 0 =0; 22-25=-3
2 2
and -3 =9; and 23-25=-2 and -2 =4.
Adding the Squared Differences
Adding the squared differences yields: 25+1+0+121+0+9+4=160.
Division by N-1

Divide the sum of the squared differences by one less than the number of data points. The
example data set has 7 values, so N-1 equals 7-1=6. The sum of the squared differences,
160, divided by 6 equals approximately 26.6667.
Standard Deviation

Calculate the standard deviation by finding the square root of the division by N-1. In the
example, the square root of 26.6667 equals approximately 5.164. Therefore, the standard
deviation equals approximately 5.164.
Evaluating Standard Deviation

Standard deviation helps evaluate data. Numbers in the data set that fall within one
standard deviation of the mean are part of the data set. Numbers that fall outside of two
standard deviations are extreme values or outliers. In the example set, the value 36 lies
more than two standard deviations from the mean, so 36 is an outlier. Outliers may
represent erroneous data or may suggest unforeseen circumstances and should be carefully
considered when interpreting data.

Input :
Structured Dataset : Iris Dataset
File: iris.csv

Output :
1. Display Dataset Details.
2. Calculate Min, Max,Mean,Varience value and Percentiles of probabilities also Display
Specific use quantile.
Conclusion:
Hence, we have studied using dataset into a dataframe and compare distribution and
identify outliers.

Questions:
1. What is Data visualization?
2. How to calculate min,max,range and standard deviation?
3. What is dataset.

A Level Statistics
100% (4)
A Level Statistics
186 pages
Stress-Free Math: A Visual Guide to Acing Math in Grades 4-9
From Everand
Stress-Free Math: A Visual Guide to Acing Math in Grades 4-9
Theresa R Fitzgerald
No ratings yet
Summative Test
100% (3)
Summative Test
2 pages
Exp-10
No ratings yet
Exp-10
4 pages
DSBDAL - Assignment No 10
No ratings yet
DSBDAL - Assignment No 10
5 pages
standard error
No ratings yet
standard error
14 pages
Discriptive Statistics
No ratings yet
Discriptive Statistics
23 pages
Descriptive Statistics PDF
100% (1)
Descriptive Statistics PDF
40 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
34 pages
Chap 003
No ratings yet
Chap 003
40 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
23 pages
Lecture III-Measures of Dispersion
No ratings yet
Lecture III-Measures of Dispersion
33 pages
Measures of Variation
No ratings yet
Measures of Variation
30 pages
unit 5 brm
No ratings yet
unit 5 brm
17 pages
Stat 1101 4 7
No ratings yet
Stat 1101 4 7
18 pages
Business Statistics: Session 2
No ratings yet
Business Statistics: Session 2
60 pages
3 Measures of Central Tendency
No ratings yet
3 Measures of Central Tendency
30 pages
Lecture 3 Sem 1 Edited
No ratings yet
Lecture 3 Sem 1 Edited
30 pages
Measure of Variability Ungrouped Data
No ratings yet
Measure of Variability Ungrouped Data
22 pages
Descriptive Statistics.pptx
No ratings yet
Descriptive Statistics.pptx
14 pages
Chapt3 Overheads
No ratings yet
Chapt3 Overheads
8 pages
1 - Chapter (1) Analysis of Data and Its Types Exercise
No ratings yet
1 - Chapter (1) Analysis of Data and Its Types Exercise
10 pages
Mohan Maths
No ratings yet
Mohan Maths
16 pages
CHAPTER 1 Descriptive Statistics
No ratings yet
CHAPTER 1 Descriptive Statistics
5 pages
3-Measures of Dispersion
No ratings yet
3-Measures of Dispersion
33 pages
ML Lab Final R22
No ratings yet
ML Lab Final R22
67 pages
Social Science Statistics (June-Aug) 2025-Topic 2
No ratings yet
Social Science Statistics (June-Aug) 2025-Topic 2
21 pages
Measures of Dispersion and Relative Standing
No ratings yet
Measures of Dispersion and Relative Standing
11 pages
Lecture 1 to 4
No ratings yet
Lecture 1 to 4
12 pages
Frequency Distribution Table: Measure of Dispersion: Range, Variance, Standard Deviation
No ratings yet
Frequency Distribution Table: Measure of Dispersion: Range, Variance, Standard Deviation
4 pages
Describing Data:: Numerical Measures
No ratings yet
Describing Data:: Numerical Measures
52 pages
Statistics and Statistic
No ratings yet
Statistics and Statistic
11 pages
Describing Data: Centre Mean Is The Technical Term For What Most People Call An Average. in Statistics, "Average"
No ratings yet
Describing Data: Centre Mean Is The Technical Term For What Most People Call An Average. in Statistics, "Average"
4 pages
1-Descriptive Statistics
No ratings yet
1-Descriptive Statistics
44 pages
1-Descriptive Statistics
No ratings yet
1-Descriptive Statistics
44 pages
Statistics & Psychology
No ratings yet
Statistics & Psychology
47 pages
Math Written Reportgroup 4 PDF
No ratings yet
Math Written Reportgroup 4 PDF
18 pages
5. Descriptive Statistics
No ratings yet
5. Descriptive Statistics
15 pages
Quantitative Analysis and Business Development (UNIT-1)
No ratings yet
Quantitative Analysis and Business Development (UNIT-1)
31 pages
2.descriptive Statistics
No ratings yet
2.descriptive Statistics
49 pages
Standard Deviation Formulas
No ratings yet
Standard Deviation Formulas
10 pages
Chapter 4 Basic Statistics
No ratings yet
Chapter 4 Basic Statistics
22 pages
MATH 121 (Chapter 4) - Measure of Dispersion _ Location (2)
No ratings yet
MATH 121 (Chapter 4) - Measure of Dispersion _ Location (2)
35 pages
Measure-of-Dispersion
No ratings yet
Measure-of-Dispersion
5 pages
Biostat Ch-5
No ratings yet
Biostat Ch-5
58 pages
Lecture2
No ratings yet
Lecture2
38 pages
Lecture 2.2 - Statistics - Desc Stat and Distrib
No ratings yet
Lecture 2.2 - Statistics - Desc Stat and Distrib
48 pages
Principles-of-Data-Science-WEB-5
No ratings yet
Principles-of-Data-Science-WEB-5
30 pages
4x @6ote ) 'Btda2@m
No ratings yet
4x @6ote ) 'Btda2@m
55 pages
Q4 LAS 4 Measures of Variability
No ratings yet
Q4 LAS 4 Measures of Variability
34 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
32 pages
Describing Data:: Numerical Measures
No ratings yet
Describing Data:: Numerical Measures
37 pages
MR Statistics 6
No ratings yet
MR Statistics 6
24 pages
Appendix C
No ratings yet
Appendix C
11 pages
Lecture 3
No ratings yet
Lecture 3
8 pages
1 Basic Statistics
No ratings yet
1 Basic Statistics
35 pages
ECO2004_Ch3
No ratings yet
ECO2004_Ch3
16 pages
Measures of Dispersion New
No ratings yet
Measures of Dispersion New
23 pages
Chapter 3
No ratings yet
Chapter 3
36 pages
Lecture 5&6
No ratings yet
Lecture 5&6
15 pages
Ge 4 - Topic 2-Statistics
No ratings yet
Ge 4 - Topic 2-Statistics
8 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Zeng Et Al. - 2011 - Small Business Industrial Buyers' Price Sensitivity Do Service Quality Dimensions Matter in Business Markets
No ratings yet
Zeng Et Al. - 2011 - Small Business Industrial Buyers' Price Sensitivity Do Service Quality Dimensions Matter in Business Markets
10 pages
Multiple Choice Questions-Chapter Five Discrete Probability Distributions
No ratings yet
Multiple Choice Questions-Chapter Five Discrete Probability Distributions
13 pages
STATISTICS-LESSON-14 3rd Quarter
No ratings yet
STATISTICS-LESSON-14 3rd Quarter
23 pages
Final sample test Fall 2024 solution
No ratings yet
Final sample test Fall 2024 solution
15 pages
Chapter 2: Special Probability Distributions
100% (1)
Chapter 2: Special Probability Distributions
58 pages
Standard Costing
100% (1)
Standard Costing
137 pages
Variance Questions
No ratings yet
Variance Questions
11 pages
Psychometric Validation of The Spanish Version of The Patient-Doctor Relationship Questionnaire (PDRQ)
No ratings yet
Psychometric Validation of The Spanish Version of The Patient-Doctor Relationship Questionnaire (PDRQ)
8 pages
Cheat Sheet Quantitative Methods in Finance Nova Cheat Sheet Quantitative Methods in Finance Nova
0% (1)
Cheat Sheet Quantitative Methods in Finance Nova Cheat Sheet Quantitative Methods in Finance Nova
3 pages
Thermal Physics Lecture 3
No ratings yet
Thermal Physics Lecture 3
7 pages
Stat and Prob Q4 M2 Digitized
No ratings yet
Stat and Prob Q4 M2 Digitized
38 pages
Force Calibration Results of Force Tranducers According ISO 376
No ratings yet
Force Calibration Results of Force Tranducers According ISO 376
64 pages
Examples 2
No ratings yet
Examples 2
18 pages
CH 07
No ratings yet
CH 07
80 pages
Factor Analysis
No ratings yet
Factor Analysis
3 pages
Intro CH 4-1
No ratings yet
Intro CH 4-1
16 pages
1978norton Griffithscountinganimals PDF
No ratings yet
1978norton Griffithscountinganimals PDF
186 pages
Msce Math I 03 Statistics
No ratings yet
Msce Math I 03 Statistics
3 pages
How To Do T-Test in EXCEL
No ratings yet
How To Do T-Test in EXCEL
9 pages
Module02 ANOVA
No ratings yet
Module02 ANOVA
28 pages
Week2 Class3
No ratings yet
Week2 Class3
19 pages
Normalized Elo
No ratings yet
Normalized Elo
7 pages
Lesson Plan Myca Lacerna April 13
No ratings yet
Lesson Plan Myca Lacerna April 13
8 pages
Chapter 4. Sampling The Benthos of Standing Waters: John A. Downing
No ratings yet
Chapter 4. Sampling The Benthos of Standing Waters: John A. Downing
44 pages
Fear of Rejection scale
No ratings yet
Fear of Rejection scale
9 pages
Cosm
No ratings yet
Cosm
3 pages
Statistical Methods
No ratings yet
Statistical Methods
8 pages
ProductCosting Material Ledger
No ratings yet
ProductCosting Material Ledger
148 pages

DSBDAL_Assignment no 3

Uploaded by

DSBDAL_Assignment no 3

Uploaded by

TE (Comp)

Identifying Low and High Values

Calculating Standard Deviation

Calculating the Mean

Squaring the Difference

You might also like