0% found this document useful (0 votes)
4 views8 pages

dataanexp-2

The document outlines the principles and stages of descriptive statistics using Python, emphasizing its importance in analyzing historical data to identify business performance and inefficiencies. It details various types of descriptive analysis, including measurements of frequency, central tendency, and dispersion, along with steps for conducting such analyses. Additionally, it provides definitions and calculations for mean, median, mode, and measures of dispersion, highlighting their significance in data interpretation.

Uploaded by

aniketagrawal810
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views8 pages

dataanexp-2

The document outlines the principles and stages of descriptive statistics using Python, emphasizing its importance in analyzing historical data to identify business performance and inefficiencies. It details various types of descriptive analysis, including measurements of frequency, central tendency, and dispersion, along with steps for conducting such analyses. Additionally, it provides definitions and calculations for mean, median, mode, and measures of dispersion, highlighting their significance in data interpretation.

Uploaded by

aniketagrawal810
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

EXPERIMENT - 2

Aim: - To understand the basis of descriptive statistics using Python.


Theory: -
1. Descriptive Analysis
Descriptive analysis refers to the interpretation of historical data to better understand changes
that occur in a business. It describes the use of a range of historic data to draw comparisons
with other reporting periods for the same company (i.e. quarterly or annually) or with others
within the same industry. Most commonly reported financial metrics are a product of
descriptive analytics, such as year-over-year (YOY) pricing changes, month-over-month sales
growth, the number of users, or the total revenue per subscriber. These measures all describe
what has occurred in a business during a set period.
Descriptive analysis is a very important tool that can be used in different parts of any business.
That's because it allows companies to understand how well it is performing and where there
may be inefficiencies. As such, corporate management can identify areas for improvement and
use it to motivate different teams to implement changes for continued success.
It is the technique of identifying patterns and links by utilizing recent and historical data.
Because it identifies patterns and associations without going any further, it is frequently
referred to as the most basic data analysis.

Stages in Descriptive Analysis


There are a few stages that companies follow in order to successfully implement descriptive
analysis into their business strategy. The following list highlights these stages along with a
description of each.

1. Identifying which metrics to analyse - Before beginning, it's important to decide which
metrics companies want to produce and the time frame for each, such as quarterly
revenue or annual operating profit.
2. Identifying and locating the data - This step requires locating all of the data required to
produce the result. This means going through all internal and external sources,
including databases.
3. Compiling the data - Once all the data is identified and located, the next step is to
prepare and compile it together. Part of the process here is to ensure that it's accurate
and to format everything into a single format.
4. Data analysis - Analysing datasets and figures means using different tools

Once all these steps are completed, it's important to present all the data to the appropriate
stakeholders. Using appropriate visual aids, such as charts, graphics, videos, and other tools
can be a great way to provide analysts, investors, management, and others with the insight they
need about the direction of the company.

1
2. Types of Descriptive Analysis
I. Measurements of Frequency:
Understanding how often a specific event or reaction occurs is essential for descriptive
analysis, providing quantitative insights through counts or percentages to reveal
patterns within the dataset.

II. Measures of Central Tendency:


In descriptive analysis, determining the central tendency is crucial, employing mean,
median, and mode to quantify the typical value and gain insights into the overall trend
or behaviour of observed variables.

III. Measures of Dispersion:


In certain scenarios, understanding how data is spread across a range is vital; measures
like range or standard deviation in descriptive analysis offer valuable information about
distribution patterns and variability within the dataset.

IV. Measures of Position:


An integral part of descriptive analysis involves determining a value's position relative
to others, employing metrics like quartiles and percentiles to offer nuanced insights into
the dataset's structure and identify trends or outliers.

2
3. Steps to conduct Descriptive Analysis
Descriptive analysis is an important phase in data exploration that involves summarizing and
describing the primary properties of a dataset. It provides vital insights into the data’s frequency
distribution, central tendency, dispersion, and identifying position. It assists researchers and
analysts in better understanding their data.
Conducting a descriptive analysis entails several critical phases, which include:
a) Data Collection
Before conducting any analysis, you must first collect relevant data. This process involves
identifying data sources, selecting appropriate data-collecting methods, and verifying that the
data acquired accurately represents the population or topic of interest. We can collect data
through surveys, experiments, observations, existing databases, or other methods.
b) Data Preparation
Data preparation is crucial for ensuring the dataset is clean, consistent, and ready for analysis.
This step covers the following tasks:
a) Data Cleaning: Handle missing values, exceptions, and errors in the dataset. Input
missing values or develop appropriate statistical techniques for dealing with them.
b) Data Transformation: Convert data into an appropriate format. Examples of this are
changing data types, encoding categorical variables, or scaling numerical variables.
c) Data Reduction: For large datasets, try reducing their size by sampling or aggregation
to make the analysis more manageable.
c) Apply Methods
In this step, you will analyse and describe the data using a variety of methodologies and
procedures. The following are some common descriptive analysis methods:
i. Frequency Distribution Analysis: Create frequency tables or bar charts to show the
number or proportion of occurrences for each category for categorical variables.
ii. Measures of Central Tendency: Calculate numerical variables’ mean, median, and
mode to determine the centre or usual value.
iii. Measures of Dispersion: Calculate the range, variance, and standard deviation to
examine the dispersion or variability of the data.
iv. Measures of Position: Identify the position of a single value or its response to others.
d) Summary Statistics and Visualization
Descriptive statistics refers to a set of methods for summarizing and describing the main
characteristics of a dataset. Summarize the data through statistics and visualization. This step
involves the following tasks:
i. Summary Statistics: Summarize your findings clearly and concisely.
ii. Data Visualization: Use various charts and plots to visualize the data. Create
histograms, box plots, scatter plots, or line charts for numerical data. Use bar charts, pie
charts, or stacked bar charts for categorical data.

3
4. Central Tendency
Central Tendencies in Statistics are the numerical values that are used to represent mid-value
or central value a large collection of numerical data. These obtained numerical values are called
central or average values in Statistics.
Measures of Central Tendency: -
Mean:
Mean in general terms is used for the arithmetic mean of the data, but other than the arithmetic
mean there are geometric mean and harmonic mean as well that are calculated using different
formulas.
i. The most common measure of central tendency is the mean.
ii. Mean is also known as the simple average.
iii. It is denoted by greek letter μ for population and by ¯x for sample.
iv. We can find mean of a number of elements by adding all the elements in a dataset and
then dividing by the number of elements in the dataset.
v. It is the most common measure of central tendency but it has a drawback.
vi. The mean is affected by the presence of outliers.
vii. So, mean alone is not enough for making business decisions.

Types of Mean:
Mean can be classified into three different class groups which are
 Arithmetic Mean
 Geometric Mean
 Harmonic Mean

4
Median
The Median of any distribution is that value that divides the distribution into two equal parts
such that the number of observations above it is equal to the number of observations below it.
Thus, the median is called the central value of any given data either grouped or ungrouped.
 Median is the number which divides the dataset into two equal halves.
 To calculate the median, we have to arrange our dataset of n numbers in ascending
order.
 Median is robust to outliers.
 So, for skewed distribution or when there is concern about outliers, the median may be
preferred.

Mode
The Mode is the value of that observation which has a maximum frequency corresponding to
it. In other, that observation of the data occurs the maximum number of times in a dataset.
i. Mode of a dataset is the value that occurs most often in the dataset.
ii. Mode is the value that has the highest frequency of occurrence in the dataset.

5
5. Dispersion in Statistics
Dispersion is the state of getting dispersed or spread. Statistical dispersion means the extent to
which numerical data is likely to vary about an average value. In other words, dispersion helps
to understand the distribution of the data.

Measures of Dispersion
In statistics, the measures of dispersion help to interpret the variability of data i.e. to know how
much homogenous or heterogeneous the data is. In simple terms, it shows how squeezed or
scattered the variable is.

Types of Measures of Dispersion


There are two main types of dispersion methods in statistics which are:

 Absolute Measure of Dispersion


 Relative Measure of Dispersion

Absolute Measure of Dispersion


An absolute measure of dispersion contains the same unit as the original data set. The absolute
dispersion method expresses the variations in terms of the average of deviations of observations
like standard or means deviations. It includes range, standard deviation, quartile deviation, etc.

The types of absolute measures of dispersion are:

i. Range: It is simply the difference between the maximum value and the minimum value
given in a data set. Example: 1, 3,5, 6, 7 => Range = 7 -1= 6
ii. Variance: Deduct the mean from each data in the set, square each of them and add each
square and finally divide them by the total no of values in the data set to get the variance.
Variance (σ2) = ∑(X−μ)2/N
iii. Standard Deviation: The square root of the variance is known as the standard
deviation i.e. S.D. = √σ.
iv. Quartiles and Quartile Deviation: The quartiles are values that divide a list of
numbers into quarters. The quartile deviation is half of the distance between the third
and the first quartile.
v. Mean and Mean Deviation: The average of numbers is known as the mean and the
arithmetic mean of the absolute deviations of the observations from a measure of central
tendency is known as the mean deviation (also called mean absolute deviation).
vi. Co-efficient of Dispersion: The coefficients of dispersion are calculated (along with
the measure of dispersion) when two series are compared, that differ widely in their
averages. The dispersion coefficient is also used when two series with different
measurement units are compared. It is denoted as C.D.

6
Relative Measure of Dispersion
The relative measures of dispersion are used to compare the distribution of two or more data
sets. This measure compares values without units. Common relative dispersion methods
include:

i. Co-efficient of Range
ii. Co-efficient of Variation
iii. Co-efficient of Standard Deviation
iv. Co-efficient of Quartile Deviation
v. Co-efficient of Mean Deviation
The common coefficients of dispersion are:

C.D. in terms of Coefficient of dispersion

Range C.D. = (Xmax – Xmin) ⁄ (Xmax +


Xmin)

Quartile Deviation C.D. = (Q3 – Q1) ⁄ (Q3 + Q1)

Standard Deviation (S.D.) C.D. = S.D. ⁄ Mean

Mean Deviation C.D. = Mean deviation/Average

7
6. Code Snippet:

Output:

You might also like