0% found this document useful (0 votes)
1K views

Business Statistics Assignment

This document discusses wine quality data analysis. It provides the five number summary and box plot for fixed acidity data, showing it is skewed right. Correlation coefficients are calculated between fixed acidity and volatile acidity (-0.33, moderately negative correlation) and residual sugar and chlorides (-0.08, virtually no correlation). Based on the correlation scores, a change in fixed acidity would not majorly affect volatile acidity or vice versa. Skewness and kurtosis measures are described to analyze data symmetry. Mean is the average, median is the middle value, and mode is the most frequent value in a data set.

Uploaded by

kshitij
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views

Business Statistics Assignment

This document discusses wine quality data analysis. It provides the five number summary and box plot for fixed acidity data, showing it is skewed right. Correlation coefficients are calculated between fixed acidity and volatile acidity (-0.33, moderately negative correlation) and residual sugar and chlorides (-0.08, virtually no correlation). Based on the correlation scores, a change in fixed acidity would not majorly affect volatile acidity or vice versa. Skewness and kurtosis measures are described to analyze data symmetry. Mean is the average, median is the middle value, and mode is the most frequent value in a data set.

Uploaded by

kshitij
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

ANSWER 1:

(I) Five Number Summary & Box Plot


As per the given data of Fixed Acidity of wine, the five number summary is as follows:
a. Minimum = 5.6
b. 1​st​ Quartile = 7.4
c. Median = 7.8
d. 3​rd​ Quartile = 8.05
e. Maximum = 11.2

In the following image, all values of Fixed Acidity have been plotted in increasing order:-

According to the five number summary, please find below the Box Whiskers Chart,
Interquartile value = Q3 - Q1 = 8.05-7.4
.​.​. IQR = 0.65

The Hinges of the box are located at the lower and upper quartile i.e. at 7.4 and 8.05.
The box-and-whisker plots are used to determine whether a distribution is skewed. The
location of the median in the box can relate information about the skewness of the middle
50% of the data. For the given data for Fixed Acidity, the median is at 7.8 with the two
hinges being 7.4 and 8.05.

Hence, the data for Fixed Acidity is skewed towards the right i.e. 3rd Quartile.
(II) Correlation Coefficient

Correlation is a measure of the degree of relatedness of variables. Several measures of


correlation are available, the selection of which depends mostly on the level of data being
analyzed. Ideally, researchers would like to solve for μ , the population coefficient of
correlation, r.

The statistic ‘r’ is the Pearson Product- Moment correlation coefficients, named after Karl
Pearson (1857-1936), an English Statistician who developed several coefficients of
correlation along with other significant statistical concepts.

The Pearson Product-Moment Correlation Coefficient is calculated using:-

(A) Correlation Coefficient for Fixed Acidity & Volatile Acidity

Sr. No. Fixed Acidity (x) Volatile Acidity (y) x2 y2 xy


1 7.4 0.7 54.76 0.49 5.1
2 7.8 0.88 60.84 0.7744 6.864
3 7.8 0.76 60.84 0.5776 5.928
4 11.2 0.28 125.44 0.0784 3.136
5 7.4 0.7 54.76 0.49 5.18
6 7.4 0.66 54.76 0.4356 4.884
7 7.9 0.6 62.41 0.36 4.74
8 7.3 0.65 53.29 0.4225 4.745
9 7.8 0.58 60.84 0.3364 4.524
10 7.5 0.5 56.25 0.25 3.75
11 6.7 0.58 44.89 0.3364 3.886
12 7.5 0.5 56.25 0.25 3.75
13 5.6 0.615 31.36 0.378225 3.444
14 7.8 0.61 60.84 0.3721 4.758
15 8.9 0.62 79.21 0.3844 5.518
16 8.9 0.62 79.21 0.3844 5.518
17 8.5 0.28 72.25 0.0784 2.38
18 8.1 0.56 65.61 0.3136 4.536
19 7.4 0.59 54.76 0.3481 4.366
20 7.9 0.32 62.41 0.1024 2.528
21 8.9 0.22 79.21 0.0484 1.958
22 7.6 0.39 57.76 0.1521 2.964
23 7.9 0.43 62.41 0.1849 3.397
24 8.5 0.49 72.25 0.2401 4.165
25 6.9 0.4 47.61 0.16 2.76
26 6.3 0.39 39.69 0.1521 2.457
Sum (Σ) 202.9 13.925 1609.91 8.100525 107.316

Using the Pearson Product - Moment Correlation Coefficient for Fixed Acidity & Volatile
Acidity is calculated as follows:-

R= 107.316 - [(202.9 x 13.95)/26] ​.


2​ 2​
√{[1609.91 - (202.9)​ /26][8.100525 - (13.925)​ /26]}

∴ R = - 0.3277013278
Since the correlation coefficient of Fixed Acidity and Volatile Acidity is between -1 and 0,
they have a ​Moderately Negative Correlation​.

(B) Correlation Coefficient for Residual Sugar & Chlorides


Sr. No. Residual Sugar (x) Chlorides (y) x2 y2 xy
1 1.9 0.076 3.61 0.005776 0.1444
2 2.6 0.098 6.76 0.009604 0.2548
3 2.3 0.092 5.29 0.008464 0.2116
4 1.9 0.075 3.61 0.005625 0.1425
5 1.9 0.076 3.61 0.005776 0.1444
6 1.8 0.075 3.24 0.005625 0.135
7 1.6 0.069 2.56 0.004761 0.1104
8 1.2 0.065 1.44 0.004225 0.078
9 2 0.073 4 0.005329 0.146
10 6.1 0.071 37.21 0.005041 0.4331
11 1.8 0.097 3.24 0.009409 0.1746
12 6.1 0.071 37.21 0.005041 0.4331
13 1.6 0.089 2.56 0.007921 0.1424
14 1.6 0.114 2.56 0.012996 0.1824
15 3.8 0.176 14.44 0.030976 0.6688
16 3.9 0.17 15.21 0.0289 0.663
17 1.8 0.092 3.24 0.008464 0.1656
18 1.7 0.368 2.89 0.135424 0.6256
19 4.4 0.086 19.36 0.007396 0.3784
20 1.8 0.341 3.24 0.116281 0.6138
21 1.8 0.077 3.24 0.005929 0.1386
22 2.3 0.082 5.29 0.006724 0.1886
23 1.6 0.106 2.56 0.011236 0.1696
24 2.3 0.084 5.29 0.007056 0.1932
25 2.4 0.085 5.76 0.007225 0.204
26 1.4 0.08 1.96 0.0064 0.112
Sum (Σ) 63.6 2.888 199.38 0.467604 6.8539

Using the Pearson Product - Moment Correlation Coefficient for Residual Sugar & Chlorides
is calculated as follows:-
R= 6.8539 - [(63.6 x 2.888)/26] ​.
2​ 2​
√{[199.38 - (63.6)​ /26][0.467604 - (2.888)​ /26]}

∴ R = -0.08304224084

Since the correlation coefficient of Residual Sugar and Chlorides is between -1 and 0, they
have ​Virtually No Correlation​.

(III) Cause-Effect relationship between Fixed Acidity and Volatile Acidity, based on the
Correlation Coefficient Score

The correlation coefficient score between Fixed Acidity and Volatile Acidity is -0.33.

This indicates that there is ​Moderately Negative Correlation​ between fixed acidity and
Volatile Acidity.
Hence, the effect of any change in fixed acidity of a wine sample is not that major on the
volatile acidity of the wine and vice versa.
ANSWER 2:
(I) Skewness & Kurtosis

Skewness could be a measure of the symmetry in an exceedingly distribution. A symmetrical


dataset can have an asymmetry capable zero. So, a standard distribution can have an
asymmetry of zero. Skewness basically measures the relative size of the two tails.

The rule of thumb seems to be:


If the skewness is between -0.5 and 0.5, then the data is fairly symmetrical
If the skewness is between -1 and – zero.5 or between zero.5 and 1, the data are moderately
skewed
If the skewness is a smaller amount than -1 or larger than one, the data are extremely inclined

Kurtosis could be a measure of the combined sizes of the two tails. It measures the number of
chance within the tails. The value is usually compared to the kurtosis of the traditional
distribution, that is equal to 3. If the kurtosis is larger than three, then the dataset has heavier
tails than a standard distribution (more within the tails). If the kurtosis is a smaller amount
than three, then the dataset has lighter tails than a standard distribution (less within the tails).
Kurtosis is sometimes reported as “excess kurtosis.” Excess kurtosis is determined by
subtracting 3 form the kurtosis. This makes the normal distribution kurtosis equal 0.

The kurtosis parameter could be a measure of the combined weight of the tails relative to the
rest of the distribution. Kurtosis is all regarding the tails of the distribution – not the
peakedness or flatness. It measures the tail-heaviness of the distribution.

High Kurtosis and Skewness denotes that the data is skewed asymmetrically.

(II) Mean, Median & Mode

Mean - The mean (or average) is that the most popular and standard measure of central
tendency. It will be used with each distinct and continuous knowledge, although its use is
most often with continuous data.

Median - The median is the middle score for a group of data that has been organized so as of
magnitude. The median is a smaller amount affected by outliers and skewed data.

Mode - The mode is that the most frequent score in our data set. On a bar chart it represents
the highest bar in a bar graph or histogram. One can, therefore, sometimes consider the mode
as being the most popular option.

From the given data, we can interpret that the ratio of the quantity of free sulphate dioxide in
the overall total sulphate dioxide is substantially less individually. Due to this, there is a
difference in the overall mean median and mode of free sulphate dioxide and total sulphate
dioxide.

(III) Distribution

In order to study the distribution pattern, we need to take a careful look at the standard
deviations of the given data.

The standard deviation could be a statistic that measures the dispersion of a dataset relative to
its mean and is calculated as the root of the variance. It is calculated as the root of variance by
determinant the variation between every data point relative to the mean.

To identify the distribution pattern, we need to observe the z score. A Z-score could be a
numerical measure used in statistics of a value's relationship to the mean (average) of a group
of values, measured in terms of standard deviations from the mean.

If a Z-score is zero, it indicates that the data point's score is identical to the mean score. A
Z-score of one.0 would indicate a value that is one standard deviation from the mean.

Z-scores is also positive or negative, with a positive value indicating the score is higher than
the mean and a negative score indicating it's below the mean.

From the given data, we can see that the difference between the mean and Standard Deviation
is the least for Chlorides (0.03) and Volatile Acid (0.18). This means that these two are the
most likely ones to have a normal data distribution.

Therefore we can conclude that the two variables who can be the closest to being normally
distributed are the Chlorides and Volatile Acids.
ANSWER 3:

​ he ruling party will win 0 rounds, 1 round, 2 rounds, 3 rounds or all 4 rounds
(A) T
of voting

Probability is a branch of mathematics that deals with calculating the likelihood of a given
event's occurrence, which is expressed as a number between 1 and 0. An event with a
probability of 1 can be considered a certainty.
If p is the chance that the ruling party has of winning then q is the chance that the ruling party
has of losing.

∴​ If p = 0.6, q = 1 - 0.6 = 0.4

Pn(k) = Cn(k) x p​k​ x q​n-k

The probability that the ruling party will win 0 rounds can be calculated by:
P4(0) = q​4​ = 0.4​4​ = 0.0256

Similarly, the probability that the ruling party will win 1 round:
P4(1) = 4!/(1! x 3!) x (0.6​1​ x 0.4​3​) = 0.1536

The probability that the ruling party will win 2 rounds is:
P4(2) = 4!/(2! x 2!) x (0.6​2​ x 0.4​2​) = 0.3456

The probability that the ruling party will win 3 rounds is:
P4(3) = 4!/(3! x 1!) x (0.6​3​ x 0.4​1​) = 0.3456

The probability that the ruling party will win all 4 rounds is:
P4(4) = 0.6​4​ = 0.1296

Hence, the probabilities are as follows:

Ruling Party Wins 0 Times 0.0256

Ruling Party Wins 1 Times 0.1536

Ruling Party Wins 2 Times 0.3456

Ruling Party Wins 3 Times 0.3456

Ruling Party Wins 4 Times 0.1296


(B) ​The probability that the ruling party will win at least 1 round

The probabilty that the ruling party will win atleast 1 round can be calculated using the
following formula:

P(1, 2, 3, 4) = 1 - q​4
P(1, 2, 3, 4) = 1- 0.0256
P(1, 2, 3, 4) = 0.9744

Hence, the probabilty of the ruling party winning atleast once is 0.9744

You might also like