Project 1 - Descriptive Statistics
Project 1 - Descriptive Statistics
Task: We have been given 3 variables to analyze. They are: X1 – unemployment level, X2 –
inflation level, X3 – Net savings level. Our job is to provide basic descriptive statistics analysis
on the three variables.
Content
Definitions of the terms
Analysis of the given Data
Conclusion
1
Measures of variables:
Range – is the difference between the largest value and the smallest value. It is
determined by only the two extreme data values.
Interquartile Range – is the measure of variability that overcomes the dependency on
extreme values. This measure of variability is the difference between the third quartile,
Q3, and the first quartile, Q1. In other words, the interquartile range is the range for the
middle 50% of the data.
Variance –A measure of dispersion around the mean, equal to the sum of squared
deviations from the mean divided by one less than the number of cases. The variance is
measured in units that are the square of those of the variable itself.
Standard Deviation – is defined to be the positive square root of the variance.
2
Box Plots– is a type of chart often used in explanatory data analysis. Box plots visually
show the distribution of numerical data and skewness through displaying the data
quartiles (or percentiles) and averages. Box plots show the five-number summary of a set
of data: including the minimum score, first (lower) quartile, median, third (upper)
quartile, and maximum score.
Outliers – An outlier is an observation that lies an abnormal distance from other values
in a random sample from a population. In a sense, this definition leaves it up to the
analyst (or a consensus process) to decide what will be considered abnormal. Before
abnormal observations can be singled out, it is necessary to characterize normal
observations.
z-Scores – reveal to statisticians and traders whether a score is typical for a specified
data set or if it is atypical. Z-scores also make it possible for analysts to adapt scores
from various data sets to make scores that can be compared to one another more
accurately.
Analytics:
Statistics
x1 x2 x3
N Valid 59 59 59
Missing 0 0 0
Mean 11.3441 9.1453 12.9002
Std. Error of Mean .53791 .61191 .86699
Median 10.2800 8.8800 11.2300
Mode 9.70 2.70a 11.23a
Std. Deviation 4.13173 4.70016 6.65945
Variance 17.071 22.091 44.348
Skewness 2.151 .392 1.065
Std. Error of Skewness .311 .311 .311
Range 24.87 20.17 31.90
Minimum 4.24 1.35 3.28
Maximum 29.11 21.52 35.18
Sum 669.30 539.57 761.11
Percentiles 25 9.0900 5.1300 8.1300
50 10.2800 8.8800 11.2300
75 12.7200 12.7200 17.0000
3
a. Multiple modes exist. The smallest value is shown
We have 59 Data Points for each variable (x1, x2, x3). None of them are missing.
x1:
Mean (11.34) and Median (10.28) values are not that far from each other. Mean is bigger
than the median we can conclude that the skewness is right.
Mode (9.7) and it is observed 2 times in whole data.
Skewness is 2.151 curve is Highly Skewed Right. This means that there are some data
that are far left from the majority of other data (As we will see in the histogram down
below). This also might imply that there is an outlier in the data set.
Percentiles - as we see Q1, Q2(median) and Q3 are close to each other. 50% of the data
is in the range 9.09 (Q1) and 12.72 (Q3).
Range (Maximum - minimum) is 24.87 compared to the Q3 – Q1 (3.63) the number is
high, which means that we have some data scattered at the beginning and in the end.
Histogram – x1 is visual representation of what we discussed above.
4
x2
Mean (9.15) and Median (8.88) values are really close. Mean is bigger than the median
we can conclude that the skewness is a little right.
Mode is multiple in this case; it means it is not helpful measure for our analyses.
Skewness is 0.39 this means that the curve is almost Symmetric, a little skewed to the
right.
Percentiles - as we see Q1, Q2(median) and Q3 are moderately close to each other. 50%
of the data is in the range 5.13 (Q1) and 12.72(Q3).
Range (Maximum - minimum) is 21.52 compared to the Q3 – Q1 (7.59) the number is
high, which means that we have some data scattered at the beginning and in the end.
5
x3
Mean (12.90) and Median (11.23) values are really close. Mean is bigger than the median
we can conclude that the skewness is a right.
Mode is multiple in this case; it means it is not helpful measure for our analyses.
Skewness is 1.065 this means that the curve is moderately skewed right.
Percentiles - as we see Q1, Q2(median) and Q3 are moderately close to each other. 50%
of the data is in the range 8.13(Q1) and 17.00(Q3).
Range (Maximum - minimum) is 31.90 compared to the Q3 – Q1 (8.87) the number is
high, which means that we have some data scattered at the beginning and in the end.
6
Box plot
x1 – we see that there are 2 outliers (26, 29). mean is bigger than median, this means we have
positive skewness. Also, box plot is comparatively short and show that there is lower variability.
Distance between Q1 and Q3 are almost same. Also, X1 has almost normal distribution if we
exclude outliers (but excluding outliers need further investigations).
7
Q-Q plot x1 – as box plot show that there are 2 outliers. If we exclude outliers we can predict the
future variables, because it follows the straight line.
x2 – we see that there are no outliers. mean is little bigger than median; Also, box plot is
comparatively large and show that there is higher variability. Distance between Q1 is smaller
than, distance Q3.
8
Q-Q plot x2 – as box plot show that there are no outliers we can predict the future variables,
because it follows the straight line.
x3 – From X3 Box plot we see that there is 1 outlier (36). Mean is bigger than median, so we
have highly positive skewness. Also, box plot is comparatively large and show that there is
higher variability. Distance between Q1 is smaller than, distance Q3.
9
Q-Q plot x3 – as box plot show that there is one outlier. we can predict the future variables,
because it follows the straight line.
Conclusion
We analyzed three variables, x1-unemployment level, x2- inflation level and x3 - Net savings level.
10
x1- The curve is highly skewed right (2.15). There are 2 outliers. And it Still needs further
analysis too exclude or not these outliers.
x2- There are no outliers. The curve is skewed little bit right. with Skewness ( 0.392) almost 0.
this information is valid.
x3- The curve is moderately skewed to the right. with skewness 1.065. there is one outlier. So,
this data might need further analysis.
11