Individual Assignment 1
Individual Assignment 1
TABLE OF CONTENTS
EXECUTIVE SUMMARY
The main aim of this report is to use the database a company holds to research and dive deeper into the
insights received from the results to make inferred decisions. The company uses various descriptive statistics tools
to measure the average income generated, the most frequent customer who arrived at the store. It also measured the
countries with highest customers, and it also identified highest number of tracks with respect to its milliseconds.
The analysis showed that the company highest customer base was from USA followed by Canada. It also found
that majorly the invoices averaged to $5.651942. After researching it was found that the companies can focus on
gaining more customers from the different countries where there is a potential to cater the customers need. It was
also found that the company can also start keeping tracks with higher milliseconds as someone might want tracks
INTRODUCTION
An online music retailer is sustaining the market through selling the diverse collection of globally
renowned music albums, songs or music-related products. The companies database contains information pertaining
to Customer’s Personal Information, Sales, Invoices, Employees, Tracks available, etc. The company wants to
analyse and address the specific business challenges and they are optimising this using SQL. The main aim of
writing the report is to leverage the extensive database that the company holds to analyse data and make informed
Descriptive Analytics is all about measures of Central Tendency. It includes Mean, Median, Mode, Variance
and Standard Deviation, Standard Error, Kurtosis, Skewness, Range, Minimum, Maximum, Sum, Count
Customer id Total
Mode 2 Mean 5.651942
Range 58 Standard Error 0.233785
Minimum 1 Median 3.96
Maximum 59 Mode 1.98
Count 412 Standard Deviation 4.74532
Sample Variance 22.51806
Kurtosis 1.059629
Skewness 1.213908
Range 24.87
Minimum 0.99
Maximum 25.86
Sum 2328.6
Count 412
Mean
It is average of the dataset, and it is found by summing all the data and dividing it by the number of entries.
The mean is affected by the very high and low values in the data. The average of the total of purchases amounts to
5.651942, which indicates that the customers on an average shop for 5.65 dollar while going to the shop.
Standard Error
Standard Error is an estimation of how much the sample mean is likely to vary from the true population
mean. The standard error for the data is 0.233785 which indicates that there is a high level of precision in the
Median
5
Median is the middle most value in the dataset. Median is not affected by the highest and lowest value in
the dataset, and it simply divides the dataset in two equal parts. For our data the middle most value is 3.96. Means,
half of the value of the purchases in the dataset are less than 3.96 and half are the greater than 3.96.
Mode
Mode is the measure which represents the most repeated value in the data. While plotting on the graph this
value will be the peak of the graph. It shows the most common category in the dataset. For our data the mode is
1.98 meaning that most purchases are made of this amount. While, to talk about the customer id the most repeated
Standard Deviation
It is square root of variance, and this value gives us an idea of the distance of the data from the mean. The
standard deviation for our data is 4.74532 and hence we can say that the values in data deviate from the mean by
about 4.74532 units. Means if the average purchases are 5.651942 it implies that most of the invoices total fall
Variance
Variance measures the value of the dataset which is far from the mean. A higher variance indicates that the
data points are more spread over the mean while lower indicates they are closer. The variance for our data is
22.51806 which is very far from the mean and hence we can say that the total purchases amount is very far the
Kurtosis
Kurtosis is a measure of the tailedness of a distribution. A kurtosis value of 1.059629 for totals indicate that
the totals are more attracted towards the mean as compared to the whole data. It mostly implies that the totals of the
Skewness
6
Skewness indicates whether the data points tend to be more spread out on one side than the other. The
skewness of 1.213908 indicates that the higher amount of invoice totals is pulling up the mean of the invoice and
Range
Range shows the difference between the maximum value and minimum value. The range for customer id is
58 which means that our data is spread over 58 customer ids. The range for invoice total is 24.87 which means
invoice amounts vary by this amount. It can be useful for budgeting or knowing the trends of billing.
Minimum relates to the smallest value and maximum is the biggest value. In our customer id the minimum
number assigned is 1 and the maximum number is 59 which means around 59 customers have purchased from the
shop. Talking about the invoice the minimum is $0.99 and the maximum is $25.86, which means the highest
Sum
Sum is the total of the amounts added. The sum for the invoice is $2,328.6, which means that in total there
Count
Count refers to number of customers and the total number of invoices. It is implied that number of
customers = number of invoices, as for each purchase the invoice would have been generated.
a. Customers
Using the bar chart, I have used the data from the country table, and combined the data to show a
relationship between the customer count and country. The below bar chart shows how many customers are from
which country. As seen in the chart the highest number of customers come from USA for, which is 13, followed
by Canada and Brazil. There are many other countries with the lowest number of customers.
10
8
8
6
4 5 5
4
2 3
2 2 2
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
a il e y c l ia y ia m rk d ry d ly ds d in en lia na ile
SA d z c m li a
an do ub ug Ind rwa str iu a n a n ta n n a i
U na Bra an t
a
g nm inla ng rela I rla ola Sp ed str ent C
h
a F r rm g p r o u
A Bel I P w
C e Kin Re Po N e F Hu h e S u
A Ar g
G D
d h et
ti e zec N
n C
U
COUNTRIES
b. Invoice
8
Using the multiple bar chart the analysis of the invoice pertaining to each country has been analysed. As it
was evident from the above chart that USA leads the customer count, it by default follows that the invoice
amount will be highest for the USA, followed by Canada. Analysing it further it was found that, Spain had the
least number of invoices until year 2010 which saw a sudden spike in year 2012. This shows that apart from
USA and Canada the company can also focus on targeting the customers who have higher purchase in recent
years.
Argentina
Annual Purchase Amounts by Country Australia
Austria
140
Belgium
Brazil
120 Canada
Chile
Czech Repub-
lic
100
Denmark
PURCHASE AMOUNTS
Finland
France
80
Germany
Hungary
60 India
Ireland
Italy
40 Netherlands
Norway
Poland
20 Portugal
Spain
Sweden
0
United
2009 2010 2011 2012 2013 Kingdom
YEARS USA
c. Track
9
By using the dot chart, I have classified the number of tracks as per the milliseconds. As evident from the
chart the highest number of tracks that is 1680 are between 200,000 to 300,000 milliseconds. Also, it is noted
that the highest number of tracks are seen to be between 1 to 1,000,000 milliseconds followed by 153 tracks
1200
800
696
594
400
140 120 153
0 58 45 10 5 0 0 0 0 2
00 0 0 0 0 0 00 00 00 0 0 0 0 0 0
,0 , 00 , 00 , 00 , 00 , 00 ,0 ,0 ,0 00 00 00 00 00 00
0 0 0 0 0 0 0 0 0 0, 0, 0, 0, 0, 0,
10 20 30 40 50 00 50 00 50 ,0
0
,5
0
,0
0
,5
0
,0
0
,5
0
to to to to to 1, 1, 2, 2, 3 3 4 4 5 5
1 01 01 01 01 to to to to to to to to to to
0,
0
0,
0
0,
0
0,
0 01 01 01 01 01 01 01 01 01 01
,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0
10 20 30 40 0 0 0 0 0 0 0 0 0 0
50 00 50 00 50 00 50 00 50 00
1, 1, 2, 2, 3, 3, 4, 4, 5,
Customerid
2 57 30 7 58
4 59 32 43 35
37 33 8 45 14
38 34 9 22 52
40 36 11 24 31
10
42 12 46 1 10
16 13 47 3 48
17 15 49 39 27
19 50 25 41 6
21 51 26 18 44
54 53 28 20 23
55 29 5 56
(Appendix 1)
Country CustomerCount
USA 13
Canada 8
Brazil 5
France 5
11
Germany 4
(Appendix 3)
6. Customer Purchase History
Unique albums
347
(Appendix 5)
8. Customer Contact Information
CONCLUSION
An online music retailer data project, implemented using SQL, has given efficient results by using SQL as
its tool. The analysis was done by using various descriptive statistics tools such as mean, median, mode, standard
deviation, variance, standard error, kurtosis and skewness. It also found the range, minimum, maximum values of
the dataset. It helped achieved data management, enhanced operational efficiency and provided a valuable insight.
While analysing it was noted that the customer on average spent $5.65 on purchase. The total of the invoices
amounted to $2328.60, which came from 59 varied customers. The USA leads the store followed by Canada in
customers department. The top 3 selling genres are Rock, Latin and Metal, where Rock is the most famous.
Segmentation made on the basis of payments helps identify the specific customer group and make marketing
strategies accordingly. The artist Iron Maiden holds the highest number of albums, with 21 albums. The album
‘Lost Season 3’ has the longest duration of 70665582 milliseconds. The data presented, help provide a background
of customer demographics, patterns pertaining to purchase, the tracks and artists related to them, etc. It provided
RECOMMENDATIONS
Based on the analysis there are some recommendations which we can provide to the company. Firstly, the
company can find out high value customers who has the highest purchases and can offer personalised promotions
which in turn will lead to high revenue. USA is leading the list but, company can also focus on targeting customers
from other countries like Canda, Germany, Brazil and France through different marketing campaigns. The
company can provide different offers and bundles of the top selling genres which can lead to attracting more
customers and generating sales. The company has 347 unique albums but in turn the company can keep the stock
of the albums which are high in demand. Leveraging the data to find the purchasing patterns of the customer can
help in creating great marketing strategies. Regular updating the database can be an effective way to enhance the
data. The company can come up with different schemes like monthly subscription, exclusive content for premium
customers at discounted price. By implementing this recommendation, the company can enhance its growth to a
greater extent.
15