Business Statistics Assignment
Business Statistics Assignment
In the following image, all values of Fixed Acidity have been plotted in increasing order:-
According to the five number summary, please find below the Box Whiskers Chart,
Interquartile value = Q3 - Q1 = 8.05-7.4
... IQR = 0.65
The Hinges of the box are located at the lower and upper quartile i.e. at 7.4 and 8.05.
The box-and-whisker plots are used to determine whether a distribution is skewed. The
location of the median in the box can relate information about the skewness of the middle
50% of the data. For the given data for Fixed Acidity, the median is at 7.8 with the two
hinges being 7.4 and 8.05.
Hence, the data for Fixed Acidity is skewed towards the right i.e. 3rd Quartile.
(II) Correlation Coefficient
The statistic ‘r’ is the Pearson Product- Moment correlation coefficients, named after Karl
Pearson (1857-1936), an English Statistician who developed several coefficients of
correlation along with other significant statistical concepts.
Using the Pearson Product - Moment Correlation Coefficient for Fixed Acidity & Volatile
Acidity is calculated as follows:-
∴ R = - 0.3277013278
Since the correlation coefficient of Fixed Acidity and Volatile Acidity is between -1 and 0,
they have a Moderately Negative Correlation.
Using the Pearson Product - Moment Correlation Coefficient for Residual Sugar & Chlorides
is calculated as follows:-
R= 6.8539 - [(63.6 x 2.888)/26] .
2 2
√{[199.38 - (63.6) /26][0.467604 - (2.888) /26]}
∴ R = -0.08304224084
Since the correlation coefficient of Residual Sugar and Chlorides is between -1 and 0, they
have Virtually No Correlation.
(III) Cause-Effect relationship between Fixed Acidity and Volatile Acidity, based on the
Correlation Coefficient Score
The correlation coefficient score between Fixed Acidity and Volatile Acidity is -0.33.
This indicates that there is Moderately Negative Correlation between fixed acidity and
Volatile Acidity.
Hence, the effect of any change in fixed acidity of a wine sample is not that major on the
volatile acidity of the wine and vice versa.
ANSWER 2:
(I) Skewness & Kurtosis
Kurtosis could be a measure of the combined sizes of the two tails. It measures the number of
chance within the tails. The value is usually compared to the kurtosis of the traditional
distribution, that is equal to 3. If the kurtosis is larger than three, then the dataset has heavier
tails than a standard distribution (more within the tails). If the kurtosis is a smaller amount
than three, then the dataset has lighter tails than a standard distribution (less within the tails).
Kurtosis is sometimes reported as “excess kurtosis.” Excess kurtosis is determined by
subtracting 3 form the kurtosis. This makes the normal distribution kurtosis equal 0.
The kurtosis parameter could be a measure of the combined weight of the tails relative to the
rest of the distribution. Kurtosis is all regarding the tails of the distribution – not the
peakedness or flatness. It measures the tail-heaviness of the distribution.
High Kurtosis and Skewness denotes that the data is skewed asymmetrically.
Mean - The mean (or average) is that the most popular and standard measure of central
tendency. It will be used with each distinct and continuous knowledge, although its use is
most often with continuous data.
Median - The median is the middle score for a group of data that has been organized so as of
magnitude. The median is a smaller amount affected by outliers and skewed data.
Mode - The mode is that the most frequent score in our data set. On a bar chart it represents
the highest bar in a bar graph or histogram. One can, therefore, sometimes consider the mode
as being the most popular option.
From the given data, we can interpret that the ratio of the quantity of free sulphate dioxide in
the overall total sulphate dioxide is substantially less individually. Due to this, there is a
difference in the overall mean median and mode of free sulphate dioxide and total sulphate
dioxide.
(III) Distribution
In order to study the distribution pattern, we need to take a careful look at the standard
deviations of the given data.
The standard deviation could be a statistic that measures the dispersion of a dataset relative to
its mean and is calculated as the root of the variance. It is calculated as the root of variance by
determinant the variation between every data point relative to the mean.
To identify the distribution pattern, we need to observe the z score. A Z-score could be a
numerical measure used in statistics of a value's relationship to the mean (average) of a group
of values, measured in terms of standard deviations from the mean.
If a Z-score is zero, it indicates that the data point's score is identical to the mean score. A
Z-score of one.0 would indicate a value that is one standard deviation from the mean.
Z-scores is also positive or negative, with a positive value indicating the score is higher than
the mean and a negative score indicating it's below the mean.
From the given data, we can see that the difference between the mean and Standard Deviation
is the least for Chlorides (0.03) and Volatile Acid (0.18). This means that these two are the
most likely ones to have a normal data distribution.
Therefore we can conclude that the two variables who can be the closest to being normally
distributed are the Chlorides and Volatile Acids.
ANSWER 3:
he ruling party will win 0 rounds, 1 round, 2 rounds, 3 rounds or all 4 rounds
(A) T
of voting
Probability is a branch of mathematics that deals with calculating the likelihood of a given
event's occurrence, which is expressed as a number between 1 and 0. An event with a
probability of 1 can be considered a certainty.
If p is the chance that the ruling party has of winning then q is the chance that the ruling party
has of losing.
The probability that the ruling party will win 0 rounds can be calculated by:
P4(0) = q4 = 0.44 = 0.0256
Similarly, the probability that the ruling party will win 1 round:
P4(1) = 4!/(1! x 3!) x (0.61 x 0.43) = 0.1536
The probability that the ruling party will win 2 rounds is:
P4(2) = 4!/(2! x 2!) x (0.62 x 0.42) = 0.3456
The probability that the ruling party will win 3 rounds is:
P4(3) = 4!/(3! x 1!) x (0.63 x 0.41) = 0.3456
The probability that the ruling party will win all 4 rounds is:
P4(4) = 0.64 = 0.1296
The probabilty that the ruling party will win atleast 1 round can be calculated using the
following formula:
P(1, 2, 3, 4) = 1 - q4
P(1, 2, 3, 4) = 1- 0.0256
P(1, 2, 3, 4) = 0.9744
Hence, the probabilty of the ruling party winning atleast once is 0.9744