0% found this document useful (0 votes)
64 views11 pages

Project 1 - Descriptive Statistics

The document provides definitions and analysis of descriptive statistics for 3 variables: unemployment (X1), inflation (X2), and net savings (X3). Key findings include: X1 has a right skewed distribution with mean > median, high standard deviation, and some outliers. X2 is approximately symmetric with multiple modes. X3 also has a right skewed distribution with mean > median and a high standard deviation. The document analyzes measures of central tendency, dispersion, shape, and outliers to characterize the distributions of each variable.

Uploaded by

Ana Chikovani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views11 pages

Project 1 - Descriptive Statistics

The document provides definitions and analysis of descriptive statistics for 3 variables: unemployment (X1), inflation (X2), and net savings (X3). Key findings include: X1 has a right skewed distribution with mean > median, high standard deviation, and some outliers. X2 is approximately symmetric with multiple modes. X3 also has a right skewed distribution with mean > median and a high standard deviation. The document analyzes measures of central tendency, dispersion, shape, and outliers to characterize the distributions of each variable.

Uploaded by

Ana Chikovani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Topic: Descriptive Statistics

Task: We have been given 3 variables to analyze. They are: X1 – unemployment level, X2 –
inflation level, X3 – Net savings level. Our job is to provide basic descriptive statistics analysis
on the three variables.

Content
 Definitions of the terms
 Analysis of the given Data
 Conclusion

Definitions of the terms:


Measures of location:
 Mean – is an average value of variable. The mean provides a measure of central location
for the data.
 Std. Error of Mean – he standard error of the mean, or simply standard error, indicates
how different the population mean is likely to be from a sample mean. It tells you how
much the sample mean would vary if you were to repeat a study using new samples from
within a single population.
 Mode – is the value that occurs with greatest frequency.
 Median – The median is the value in the middle when the data are arranged in ascending
order (smallest value to largest value). With an odd number of observations, the median
is the middle value. An even number of observations has no single middle value, so the
median is the average number of those middle values.
 Percentiles – A percentile provides information about how the data are spread over the
interval from the smallest value to the largest value. For a data set containing n
observations, the pth percentile divides the data into two parts: approximately p% of the
observations are less than the pth percentile, and approximately (100 – p) % of the
observations are greater than the pth percentile.
 Quartiles – it is often desirable to divide a data set into four parts, with each part
containing approximately one-fourth, or 25%, of the observations. These division points
are referred to as the quartiles and are defined as follows.
Q1 = first quartile, or 25th percentile
Q2 = second quartile, or 50th percentile (also the median)
Q3 = third quartile, or 75th percentile
Because quartiles are specific percentiles, the procedure for computing percentiles can be
used to compute the quartiles.

1
Measures of variables:
 Range – is the difference between the largest value and the smallest value. It is
determined by only the two extreme data values.
 Interquartile Range – is the measure of variability that overcomes the dependency on
extreme values. This measure of variability is the difference between the third quartile,
Q3, and the first quartile, Q1. In other words, the interquartile range is the range for the
middle 50% of the data.
 Variance –A measure of dispersion around the mean, equal to the sum of squared
deviations from the mean divided by one less than the number of cases. The variance is
measured in units that are the square of those of the variable itself.
 Standard Deviation – is defined to be the positive square root of the variance.

Measures of Distribution Shape, Relative Location, and Outliers:


 Skewness – is a measure of the asymmetry of a distribution. A distribution is
asymmetrical when its left and right side are not mirror images. A distribution can have
right (or positive), left (or negative), or zero skewness.
 Std. Error of Skewness – normal range for the skewness is from +1 To -1. The ratio of
skewness to its standard error can be used as a test of normality (that is, you can reject
normality if the ratio is less than -2 or greater than +2). A large positive value for
skewness indicates a long right tail; an extreme negative value indicates a long-left tail.
 Histogram – is a graphical representation of data points organized into user-specified
ranges. Similar in appearance to a bar graph, the histogram condenses a data series into
an easily interpreted visual by taking many data points and grouping them into logical
ranges or bins.
 Minimum – The smallest value of a numeric variable.
 Maximum – The largest value of a numeric variable.
 Five-number summary– A five-number summary is especially useful in descriptive
analyses or during the preliminary investigation of a large data set. A summary consists
of five values: the most extreme values in the data set (the maximum and minimum
values), the lower and upper quartiles, and the median. These values are presented
together and ordered from lowest to highest: minimum value, lower quartile (Q1),
median value (Q2), upper quartile (Q3), maximum value.

2
 Box Plots– is a type of chart often used in explanatory data analysis. Box plots visually
show the distribution of numerical data and skewness through displaying the data
quartiles (or percentiles) and averages. Box plots show the five-number summary of a set
of data: including the minimum score, first (lower) quartile, median, third (upper)
quartile, and maximum score.
 Outliers – An outlier is an observation that lies an abnormal distance from other values
in a random sample from a population. In a sense, this definition leaves it up to the
analyst (or a consensus process) to decide what will be considered abnormal. Before
abnormal observations can be singled out, it is necessary to characterize normal
observations.
 z-Scores – reveal to statisticians and traders whether a score is typical for a specified
data set or if it is atypical. Z-scores also make it possible for analysts to adapt scores
from various data sets to make scores that can be compared to one another more
accurately.

Analytics:
Statistics
x1 x2 x3
N Valid 59 59 59
Missing 0 0 0
Mean 11.3441 9.1453 12.9002
Std. Error of Mean .53791 .61191 .86699
Median 10.2800 8.8800 11.2300
Mode 9.70 2.70a 11.23a
Std. Deviation 4.13173 4.70016 6.65945
Variance 17.071 22.091 44.348
Skewness 2.151 .392 1.065
Std. Error of Skewness .311 .311 .311
Range 24.87 20.17 31.90
Minimum 4.24 1.35 3.28
Maximum 29.11 21.52 35.18
Sum 669.30 539.57 761.11
Percentiles 25 9.0900 5.1300 8.1300
50 10.2800 8.8800 11.2300
75 12.7200 12.7200 17.0000

3
a. Multiple modes exist. The smallest value is shown

We have 59 Data Points for each variable (x1, x2, x3). None of them are missing.
x1:

 Mean (11.34) and Median (10.28) values are not that far from each other. Mean is bigger
than the median we can conclude that the skewness is right.
 Mode (9.7) and it is observed 2 times in whole data.
 Skewness is 2.151 curve is Highly Skewed Right. This means that there are some data
that are far left from the majority of other data (As we will see in the histogram down
below). This also might imply that there is an outlier in the data set.
 Percentiles - as we see Q1, Q2(median) and Q3 are close to each other. 50% of the data
is in the range 9.09 (Q1) and 12.72 (Q3).
 Range (Maximum - minimum) is 24.87 compared to the Q3 – Q1 (3.63) the number is
high, which means that we have some data scattered at the beginning and in the end.
Histogram – x1 is visual representation of what we discussed above.

4
x2

 Mean (9.15) and Median (8.88) values are really close. Mean is bigger than the median
we can conclude that the skewness is a little right.
 Mode is multiple in this case; it means it is not helpful measure for our analyses.
 Skewness is 0.39 this means that the curve is almost Symmetric, a little skewed to the
right.
 Percentiles - as we see Q1, Q2(median) and Q3 are moderately close to each other. 50%
of the data is in the range 5.13 (Q1) and 12.72(Q3).
 Range (Maximum - minimum) is 21.52 compared to the Q3 – Q1 (7.59) the number is
high, which means that we have some data scattered at the beginning and in the end.

Histogram – x2 is visual representation of what we discussed above.

5
x3

 Mean (12.90) and Median (11.23) values are really close. Mean is bigger than the median
we can conclude that the skewness is a right.
 Mode is multiple in this case; it means it is not helpful measure for our analyses.
 Skewness is 1.065 this means that the curve is moderately skewed right.
 Percentiles - as we see Q1, Q2(median) and Q3 are moderately close to each other. 50%
of the data is in the range 8.13(Q1) and 17.00(Q3).
 Range (Maximum - minimum) is 31.90 compared to the Q3 – Q1 (8.87) the number is
high, which means that we have some data scattered at the beginning and in the end.

Histogram – x3 is visual representation of what we discussed above.

6
Box plot
x1 – we see that there are 2 outliers (26, 29). mean is bigger than median, this means we have
positive skewness. Also, box plot is comparatively short and show that there is lower variability.
Distance between Q1 and Q3 are almost same. Also, X1 has almost normal distribution if we
exclude outliers (but excluding outliers need further investigations).

7
Q-Q plot x1 – as box plot show that there are 2 outliers. If we exclude outliers we can predict the
future variables, because it follows the straight line.

x2 – we see that there are no outliers. mean is little bigger than median; Also, box plot is
comparatively large and show that there is higher variability. Distance between Q1 is smaller
than, distance Q3.

8
Q-Q plot x2 – as box plot show that there are no outliers we can predict the future variables,
because it follows the straight line.

x3 – From X3 Box plot we see that there is 1 outlier (36). Mean is bigger than median, so we
have highly positive skewness. Also, box plot is comparatively large and show that there is
higher variability. Distance between Q1 is smaller than, distance Q3.

9
Q-Q plot x3 – as box plot show that there is one outlier. we can predict the future variables,
because it follows the straight line.

Conclusion
We analyzed three variables, x1-unemployment level, x2- inflation level and x3 - Net savings level.

10
 x1- The curve is highly skewed right (2.15). There are 2 outliers. And it Still needs further
analysis too exclude or not these outliers.
 x2- There are no outliers. The curve is skewed little bit right. with Skewness ( 0.392) almost 0.
this information is valid.
 x3- The curve is moderately skewed to the right. with skewness 1.065. there is one outlier. So,
this data might need further analysis.

11

You might also like