EXPERIMENT-2
EXPERIMENT-2
StagesinDescriptiveAnalysis
There are a few stages that companies follow in order to successfully implement descriptive
analysis into their business strategy. The following list highlights these stages along with a
description of each.
1. Identifyingwhichmetricstoanalyse-Beforebeginning,it'simportanttodecidewhich
metrics companies want to produce and the time frame for each, such as quarterly
revenue or annual operating profit.
2. Identifyingandlocatingthedata-Thissteprequireslocatingallofthedatarequiredto produce
the result. This means going through all internal and external sources, including
databases.
3. Compiling the data - Once all the data is identified and located, the next step is to
prepare and compile it together. Part of the process here is to ensure that it's accurate
and to format everything into a single format.
4. Dataanalysis-Analysingdatasetsandfiguresmeansusingdifferent tools
Once all these steps are completed, it's important to present all the data to the appropriate
stakeholders. Using appropriate visual aids, such as charts, graphics, videos, and other tools
canbeagreatwaytoprovideanalysts,investors,management,andotherswiththeinsightthey need
about the direction of the company.
1
2. TypesofDescriptiveAnalysis
I. MeasurementsofFrequency:
Understandinghowoftenaspecificeventorreactionoccursisessentialfordescriptive
analysis, providing quantitative insights through counts or percentages to reveal
patterns within the dataset.
II. MeasuresofCentralTendency:
In descriptive analysis, determining the central tendency is crucial, employing mean,
median, and modeto quantifythetypical valueand gain insights into theoverall trend or
behaviour of observed variables.
III. MeasuresofDispersion:
Incertainscenarios,understandinghowdataisspreadacrossarangeisvital;measures
likerangeorstandarddeviationindescriptiveanalysisoffervaluableinformationabout
distribution patterns and variability within the dataset.
IV. MeasuresofPosition:
Anintegralpartofdescriptiveanalysisinvolvesdeterminingavalue'spositionrelative
toothers,employingmetricslikequartilesandpercentilestooffernuancedinsightsinto the
dataset's structure and identify trends or outliers.
2
3. StepstoconductDescriptiveAnalysis
Descriptive analysis is an important phase in data exploration that involves summarizing and
describingtheprimarypropertiesofadataset.Itprovidesvitalinsightsintothedata’sfrequency
distribution, central tendency, dispersion, and identifying position. It assists researchers and
analysts in better understanding their data.
Conductingadescriptiveanalysisentailsseveralcriticalphases,whichinclude:
a) Data Collection
Before conducting any analysis, you must first collect relevant data. This process involves
identifying data sources, selecting appropriate data-collecting methods, and verifying that the
data acquired accurately represents the population or topic of interest. We can collect data
through surveys, experiments, observations, existing databases, or other methods.
b) Data Preparation
Datapreparationiscrucialforensuringthedatasetisclean,consistent,andreadyforanalysis. This
step covers the following tasks:
a) DataCleaning:Handlemissingvalues,exceptions,anderrorsinthedataset.Input missing
values or develop appropriate statistical techniques for dealing with them.
b) Data Transformation:Convert data into an appropriate format. Examples of this are
changing data types, encoding categorical variables, or scaling numerical variables.
c) Data Reduction:For large datasets, try reducing their size by sampling or aggregation
to make the analysis more manageable.
c) ApplyMethods
In this step, you will analyse and describe the data using a variety of methodologies and
procedures. The following are some common descriptive analysis methods:
i. FrequencyDistribution Analysis:Createfrequencytablesorbarchartstoshowthe number
or proportion of occurrences for each category for categorical variables.
ii. MeasuresofCentralTendency:Calculatenumericalvariables’mean,median,and mode to
determine the centre or usual value.
iii. MeasuresofDispersion:Calculatetherange,variance,andstandarddeviationto examine
the dispersion or variability of the data.
iv. MeasuresofPosition:Identifythepositionofasinglevalueoritsresponse toothers.
d) SummaryStatisticsandVisualization
Descriptive statistics refers to a set of methods for summarizing and describing the main
characteristics of a dataset. Summarize the data through statistics and visualization. This step
involves the following tasks:
i. SummaryStatistics:Summarizeyourfindingsclearlyand concisely.
ii. Data Visualization:Use various charts and plots to visualize the data. Create
histograms, box plots, scatter plots, or line charts for numerical data. Use bar charts,
pie charts, or stacked bar charts for categorical data.
3
4. CentralTendency
Central Tendencies in Statisticsare the numerical values that are used to represent mid-value or
central value a large collection of numerical data. These obtained numerical values are called
central or average values in Statistics.
MeasuresofCentralTendency:-
Mean:
Meaningeneraltermsisusedforthearithmeticmeanofthedata,butotherthanthearithmetic mean there
aregeometric mean and harmonic mean as well that arecalculated using different formulas.
i. Themost commonmeasureofcentraltendencyisthe mean.
ii. Meanisalso known asthesimple average.
iii. Itisdenotedby greek letterμ forpopulation and by¯x for sample.
iv. We can findmean of anumberof elementsby addingalltheelementsina dataset and then
dividing by the number of elements in the dataset.
v. Itisthemost commonmeasureofcentral tendencybutit hasa drawback.
vi. Themeanisaffectedbythepresenceofoutliers.
vii. So,meanaloneisnotenoughformakingbusiness decisions.
TypesofMean:
Meancanbeclassifiedintothreedifferent classgroupswhichare
ArithmeticMean
GeometricMean
HarmonicMean
4
Median
The Median of any distribution is that value that divides the distribution into two equal parts
such that thenumberof observations aboveit is equal to thenumberofobservations below it.
Thus, the median is called the central value of any given data either grouped or ungrouped.
Medianisthenumberwhichdividesthedatasetintotwoequalhalves.
Tocalculatethemedian,wehavetoarrangeourdatasetofnnumbersinascending order.
Medianisrobustto outliers.
So,forskeweddistributionorwhenthereisconcernaboutoutliers,themedianmaybe
preferred.
Mode
The Mode is the value of that observation which has a maximum frequency corresponding to
it. In other, that observation of the data occurs the maximum number of times in a dataset.
i. Modeof adataset isthevaluethat occursmost oftenin thedataset.
ii. Modeisthevalue that hasthe highestfrequency of occurrencein thedataset.
5
5. DispersioninStatistics
Dispersionisthestateofgettingdispersedorspread.Statisticaldispersionmeanstheextentto
whichnumericaldataislikelytovaryaboutanaveragevalue.Inotherwords,dispersionhelps to
understand the distribution of the data.
MeasuresofDispersion
Instatistics,themeasuresofdispersionhelptointerpretthevariabilityofdatai.e.toknowhow much
homogenous or heterogeneous the data is. In simple terms, it shows how squeezed or
scattered the variable is.
TypesofMeasuresofDispersion
Therearetwo maintypes of dispersionmethodsin statisticswhich are:
AbsoluteMeasureof Dispersion
RelativeMeasureofDispersion
AbsoluteMeasureof Dispersion
Anabsolutemeasureofdispersioncontainsthesameunitastheoriginaldataset.Theabsolute
dispersionmethodexpressesthevariationsintermsoftheaverageofdeviationsofobservations
likestandardormeansdeviations.Itincludesrange,standarddeviation,quartiledeviation,etc.
i. Range:Itissimplythedifferencebetweenthemaximumvalueandtheminimumvalue given
in a data set. Example: 1, 3,5, 6, 7 => Range = 7 -1= 6
ii. Variance:Deductthemeanfromeachdataintheset,squareeachofthemandaddeach
squareandfinallydividethembythetotalnoofvaluesinthedatasettogetthevariance.
Variance(σ2)=∑(X−μ)2/N
iii. Standard Deviation: The square root of the variance is known as the standard
deviation i.e. S.D. = √σ.
iv. Quartiles and Quartile Deviation: The quartiles are values that divide a list of
numbers into quarters. The quartile deviation is half of the distance between the third
and the first quartile.
v. Mean and Mean Deviation: The average of numbers is known as the mean and the
arithmeticmeanoftheabsolutedeviationsoftheobservationsfromameasureofcentral
tendency is known as the mean deviation (also called mean absolute deviation).
vi. Co-efficient of Dispersion: The coefficients of dispersion are calculated (along with
the measure of dispersion) when two series are compared, that differ widely in their
averages. The dispersion coefficient is also used when two series with different
measurement units are compared. It is denoted as C.D.
6
RelativeMeasureof Dispersion
The relative measures of dispersion are used to compare the distribution of two or more data sets.
This measure compares values without units. Common relative dispersion methods include:
i. Co-efficientofRange
ii. Co-efficientofVariation
iii. Co-efficientofStandard Deviation
iv. Co-efficientofQuartileDeviation
v. Co-efficient of Mean
DeviationThecommoncoefficientsofdispers
ionare:
7
6. Code Snippet:
Output: