dataanexp-2
dataanexp-2
1. Identifying which metrics to analyse - Before beginning, it's important to decide which
metrics companies want to produce and the time frame for each, such as quarterly
revenue or annual operating profit.
2. Identifying and locating the data - This step requires locating all of the data required to
produce the result. This means going through all internal and external sources,
including databases.
3. Compiling the data - Once all the data is identified and located, the next step is to
prepare and compile it together. Part of the process here is to ensure that it's accurate
and to format everything into a single format.
4. Data analysis - Analysing datasets and figures means using different tools
Once all these steps are completed, it's important to present all the data to the appropriate
stakeholders. Using appropriate visual aids, such as charts, graphics, videos, and other tools
can be a great way to provide analysts, investors, management, and others with the insight they
need about the direction of the company.
1
2. Types of Descriptive Analysis
I. Measurements of Frequency:
Understanding how often a specific event or reaction occurs is essential for descriptive
analysis, providing quantitative insights through counts or percentages to reveal
patterns within the dataset.
2
3. Steps to conduct Descriptive Analysis
Descriptive analysis is an important phase in data exploration that involves summarizing and
describing the primary properties of a dataset. It provides vital insights into the data’s frequency
distribution, central tendency, dispersion, and identifying position. It assists researchers and
analysts in better understanding their data.
Conducting a descriptive analysis entails several critical phases, which include:
a) Data Collection
Before conducting any analysis, you must first collect relevant data. This process involves
identifying data sources, selecting appropriate data-collecting methods, and verifying that the
data acquired accurately represents the population or topic of interest. We can collect data
through surveys, experiments, observations, existing databases, or other methods.
b) Data Preparation
Data preparation is crucial for ensuring the dataset is clean, consistent, and ready for analysis.
This step covers the following tasks:
a) Data Cleaning: Handle missing values, exceptions, and errors in the dataset. Input
missing values or develop appropriate statistical techniques for dealing with them.
b) Data Transformation: Convert data into an appropriate format. Examples of this are
changing data types, encoding categorical variables, or scaling numerical variables.
c) Data Reduction: For large datasets, try reducing their size by sampling or aggregation
to make the analysis more manageable.
c) Apply Methods
In this step, you will analyse and describe the data using a variety of methodologies and
procedures. The following are some common descriptive analysis methods:
i. Frequency Distribution Analysis: Create frequency tables or bar charts to show the
number or proportion of occurrences for each category for categorical variables.
ii. Measures of Central Tendency: Calculate numerical variables’ mean, median, and
mode to determine the centre or usual value.
iii. Measures of Dispersion: Calculate the range, variance, and standard deviation to
examine the dispersion or variability of the data.
iv. Measures of Position: Identify the position of a single value or its response to others.
d) Summary Statistics and Visualization
Descriptive statistics refers to a set of methods for summarizing and describing the main
characteristics of a dataset. Summarize the data through statistics and visualization. This step
involves the following tasks:
i. Summary Statistics: Summarize your findings clearly and concisely.
ii. Data Visualization: Use various charts and plots to visualize the data. Create
histograms, box plots, scatter plots, or line charts for numerical data. Use bar charts, pie
charts, or stacked bar charts for categorical data.
3
4. Central Tendency
Central Tendencies in Statistics are the numerical values that are used to represent mid-value
or central value a large collection of numerical data. These obtained numerical values are called
central or average values in Statistics.
Measures of Central Tendency: -
Mean:
Mean in general terms is used for the arithmetic mean of the data, but other than the arithmetic
mean there are geometric mean and harmonic mean as well that are calculated using different
formulas.
i. The most common measure of central tendency is the mean.
ii. Mean is also known as the simple average.
iii. It is denoted by greek letter μ for population and by ¯x for sample.
iv. We can find mean of a number of elements by adding all the elements in a dataset and
then dividing by the number of elements in the dataset.
v. It is the most common measure of central tendency but it has a drawback.
vi. The mean is affected by the presence of outliers.
vii. So, mean alone is not enough for making business decisions.
Types of Mean:
Mean can be classified into three different class groups which are
Arithmetic Mean
Geometric Mean
Harmonic Mean
4
Median
The Median of any distribution is that value that divides the distribution into two equal parts
such that the number of observations above it is equal to the number of observations below it.
Thus, the median is called the central value of any given data either grouped or ungrouped.
Median is the number which divides the dataset into two equal halves.
To calculate the median, we have to arrange our dataset of n numbers in ascending
order.
Median is robust to outliers.
So, for skewed distribution or when there is concern about outliers, the median may be
preferred.
Mode
The Mode is the value of that observation which has a maximum frequency corresponding to
it. In other, that observation of the data occurs the maximum number of times in a dataset.
i. Mode of a dataset is the value that occurs most often in the dataset.
ii. Mode is the value that has the highest frequency of occurrence in the dataset.
5
5. Dispersion in Statistics
Dispersion is the state of getting dispersed or spread. Statistical dispersion means the extent to
which numerical data is likely to vary about an average value. In other words, dispersion helps
to understand the distribution of the data.
Measures of Dispersion
In statistics, the measures of dispersion help to interpret the variability of data i.e. to know how
much homogenous or heterogeneous the data is. In simple terms, it shows how squeezed or
scattered the variable is.
i. Range: It is simply the difference between the maximum value and the minimum value
given in a data set. Example: 1, 3,5, 6, 7 => Range = 7 -1= 6
ii. Variance: Deduct the mean from each data in the set, square each of them and add each
square and finally divide them by the total no of values in the data set to get the variance.
Variance (σ2) = ∑(X−μ)2/N
iii. Standard Deviation: The square root of the variance is known as the standard
deviation i.e. S.D. = √σ.
iv. Quartiles and Quartile Deviation: The quartiles are values that divide a list of
numbers into quarters. The quartile deviation is half of the distance between the third
and the first quartile.
v. Mean and Mean Deviation: The average of numbers is known as the mean and the
arithmetic mean of the absolute deviations of the observations from a measure of central
tendency is known as the mean deviation (also called mean absolute deviation).
vi. Co-efficient of Dispersion: The coefficients of dispersion are calculated (along with
the measure of dispersion) when two series are compared, that differ widely in their
averages. The dispersion coefficient is also used when two series with different
measurement units are compared. It is denoted as C.D.
6
Relative Measure of Dispersion
The relative measures of dispersion are used to compare the distribution of two or more data
sets. This measure compares values without units. Common relative dispersion methods
include:
i. Co-efficient of Range
ii. Co-efficient of Variation
iii. Co-efficient of Standard Deviation
iv. Co-efficient of Quartile Deviation
v. Co-efficient of Mean Deviation
The common coefficients of dispersion are:
7
6. Code Snippet:
Output: