0% found this document useful (0 votes)
34 views

Material DA 7

This document describes useful statistical measures for descriptive analytics using the built-in "iris" and "mtcars" data sets in R. It discusses count, mean, median, mode, range, quartiles, standard deviation, skewness, and kurtosis. For the iris data set, examples are given calculating the count, mean, median, range, quantiles, standard deviation, skewness and kurtosis of variables like Sepal Length. Bar plots and boxplots are also used to visualize variables like Species in the iris data.

Uploaded by

Aparna Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

Material DA 7

This document describes useful statistical measures for descriptive analytics using the built-in "iris" and "mtcars" data sets in R. It discusses count, mean, median, mode, range, quartiles, standard deviation, skewness, and kurtosis. For the iris data set, examples are given calculating the count, mean, median, range, quantiles, standard deviation, skewness and kurtosis of variables like Sepal Length. Bar plots and boxplots are also used to visualize variables like Species in the iris data.

Uploaded by

Aparna Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Descriptive Analytics on “iris”, “mtcars” data sets

Useful Statistical Measures for Descriptive Analytics :


1. Count: used for counting number of observation or group variable observations. Mostly it is
used for categorical data column. Eg., Species: Is a variable in “iris” data set in built in R
studio. It is a group variable, because there are three varieties of species present in that
column.
Following are the methods used to describe Species variable:
 Frequency table of Species
 Compare the different types of species
 Bar or Pie chart can be prepared for this variable

# Usage of count for descriptive analytics

library(dplyr)

sp=iris%>%count(Species) #dplyr package


Species n
<fct> <int>
1 setosa 50
2 versicolor 50
3 virginica 50
“sp” is a data set, it can be copied to working directory with the following syntax:

write.csv(sp,"sp.csv")

Bar plot for the variable Species in iris data set:

counts=table(iris$Species)
barplot(counts, main="Species Distribution",
xlab="Number of Species")

2. Mean: Used to represent the continuous variable measure and also used for comparing two
or more than two variables collected under the same characteristics. Eg., "Sepal.Length"
"Sepal.Width" "Petal.Length" "Petal.Width" these variables means can be compared for
further analysis.
3. Median: Is used to represent the ordinal data measure and also used for comparing two or
more than two variables collected under the same characteristics.
4. Mode: Is used to count highest number of times occurring observations in the
categorical/specifically nominal variable.
5. Range: Is used for understanding the spread of the data distribution in a simplest way with
minimum and maximum of the values in the variable.
6. Quartiles and other positional measures: These are positional values of the ordinal data
observations. Genrally for any quartile Qn = n(n+1)/4, if it is decile, Dn=n(n+1)/10 and
percentile is Pn =n(n+1)/100
7. Standard deviation: It is one of mostly used statistical measure to understand the deviations
between the values of the variable and there by best measure for representing the variation
among all values. It is used for numerical measures which suits for mathematical operations.
( x− x́ )2
Formula for Standard deviation (Std)=
√ n
8. Skewness: Skewness is a measure of symmetry, or more precisely, the lack of symmetry.
A distribution, or data set, is symmetric if it looks the same to the left and right of the centre
point.
Examples of skewness:

9. Kurtosis: Kurtosis is a parameter that describes the shape of a random variable’s probability
distribution. For normal probability distribution values, value of kurtosis is almost equal to
one. If it is positive value, the peak is higher and for negative value flatter is more.

#Mean

mean(iris$Sepal.Length)

mean(iris$Sepal.Width)

mean(iris$Petal.Length)

mean(iris$Petal.Width)

summary(iris$Sepal.Length)

summary(iris$Sepal.Width)

summary(iris$Petal.Length)

summary.data.frame(iris)

irisn=iris[,-5]

irist=summary.data.frame(irisn)

irisst=as.data.frame(irist)

irisst

write.csv(irist,"irist.csv")
sd(irisn$Sepal.Length)

var(irisn$Sepal.Length)

boxplot(irisn)

library(ggplot2)

ggplot(iris, aes(y = Sepal.Length, x = Species, color = Species)) +

geom_boxplot() +

theme_classic()

quantile(iris$Sepal.Length)

quantile(irisn$Sepal.Length,c(0.30,0.45))

library(e1071)

skewness(iris$Sepal.Length) # run for e1071 package

kurtosis(iris$Sepal.Length)

You might also like