0% found this document useful (0 votes)
3 views

Individual Assignment 1

This report analyzes data from an online music retailer to derive insights for business decisions, focusing on customer demographics, sales performance, and track durations. Key findings indicate that the USA has the highest customer base and average invoice amounts, while descriptive statistics reveal trends in customer spending. Recommendations include targeting potential markets and expanding track offerings based on duration preferences.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Individual Assignment 1

This report analyzes data from an online music retailer to derive insights for business decisions, focusing on customer demographics, sales performance, and track durations. Key findings indicate that the USA has the highest customer base and average invoice amounts, while descriptive statistics reveal trends in customer spending. Recommendations include targeting potential markets and expanding track offerings based on duration preferences.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

1

Case Study: Online Music Retailer

TABLE OF CONTENTS

Sr No. Particulars Page No.


1 Executive Summary 3
2 Introduction 4
3 Questions 1 to 10 5
4 Conclusion 14
5 Recommendation 15
6 Appendices 16
2

EXECUTIVE SUMMARY

The main aim of this report is to use the database a company holds to research and dive deeper into the

insights received from the results to make inferred decisions. The company uses various descriptive statistics tools

to measure the average income generated, the most frequent customer who arrived at the store. It also measured the

countries with highest customers, and it also identified highest number of tracks with respect to its milliseconds.

The analysis showed that the company highest customer base was from USA followed by Canada. It also found

that majorly the invoices averaged to $5.651942. After researching it was found that the companies can focus on

gaining more customers from the different countries where there is a potential to cater the customers need. It was

also found that the company can also start keeping tracks with higher milliseconds as someone might want tracks

with longer duration.


3

INTRODUCTION

An online music retailer is sustaining the market through selling the diverse collection of globally

renowned music albums, songs or music-related products. The companies database contains information pertaining

to Customer’s Personal Information, Sales, Invoices, Employees, Tracks available, etc. The company wants to

analyse and address the specific business challenges and they are optimising this using SQL. The main aim of

writing the report is to leverage the extensive database that the company holds to analyse data and make informed

decisions, enhance efficiency and address business challenges.

1. Descriptive Analytics on Payments:


4

Descriptive Analytics is all about measures of Central Tendency. It includes Mean, Median, Mode, Variance

and Standard Deviation, Standard Error, Kurtosis, Skewness, Range, Minimum, Maximum, Sum, Count

Customer id Total
Mode 2 Mean 5.651942
Range 58 Standard Error 0.233785
Minimum 1 Median 3.96
Maximum 59 Mode 1.98
Count 412 Standard Deviation 4.74532
Sample Variance 22.51806
Kurtosis 1.059629
Skewness 1.213908
Range 24.87
Minimum 0.99
Maximum 25.86
Sum 2328.6
Count 412

Mean

It is average of the dataset, and it is found by summing all the data and dividing it by the number of entries.

The mean is affected by the very high and low values in the data. The average of the total of purchases amounts to

5.651942, which indicates that the customers on an average shop for 5.65 dollar while going to the shop.

Standard Error

Standard Error is an estimation of how much the sample mean is likely to vary from the true population

mean. The standard error for the data is 0.233785 which indicates that there is a high level of precision in the

sample mean. It is reliable approximation of the population mean.

Median
5

Median is the middle most value in the dataset. Median is not affected by the highest and lowest value in

the dataset, and it simply divides the dataset in two equal parts. For our data the middle most value is 3.96. Means,

half of the value of the purchases in the dataset are less than 3.96 and half are the greater than 3.96.

Mode

Mode is the measure which represents the most repeated value in the data. While plotting on the graph this

value will be the peak of the graph. It shows the most common category in the dataset. For our data the mode is

1.98 meaning that most purchases are made of this amount. While, to talk about the customer id the most repeated

customer to visit the store is customer with id 2.

Standard Deviation

It is square root of variance, and this value gives us an idea of the distance of the data from the mean. The

standard deviation for our data is 4.74532 and hence we can say that the values in data deviate from the mean by

about 4.74532 units. Means if the average purchases are 5.651942 it implies that most of the invoices total fall

between the range of cents 0.91 to $10.39

Variance

Variance measures the value of the dataset which is far from the mean. A higher variance indicates that the

data points are more spread over the mean while lower indicates they are closer. The variance for our data is

22.51806 which is very far from the mean and hence we can say that the total purchases amount is very far the

average amount spent.

Kurtosis

Kurtosis is a measure of the tailedness of a distribution. A kurtosis value of 1.059629 for totals indicate that

the totals are more attracted towards the mean as compared to the whole data. It mostly implies that the totals of the

invoice are mostly around the value of $5.651942.

Skewness
6

Skewness indicates whether the data points tend to be more spread out on one side than the other. The

skewness of 1.213908 indicates that the higher amount of invoice totals is pulling up the mean of the invoice and

the totals are positively skewed.

Range

Range shows the difference between the maximum value and minimum value. The range for customer id is

58 which means that our data is spread over 58 customer ids. The range for invoice total is 24.87 which means

invoice amounts vary by this amount. It can be useful for budgeting or knowing the trends of billing.

Minimum and Maximum

Minimum relates to the smallest value and maximum is the biggest value. In our customer id the minimum

number assigned is 1 and the maximum number is 59 which means around 59 customers have purchased from the

shop. Talking about the invoice the minimum is $0.99 and the maximum is $25.86, which means the highest

invoice is of $25.86 amount and lowest is of $0.99.

Sum

Sum is the total of the amounts added. The sum for the invoice is $2,328.6, which means that in total there

has been a purchase of $2,328.6 from the shop.

Count

Count refers to number of customers and the total number of invoices. It is implied that number of

customers = number of invoices, as for each purchase the invoice would have been generated.

2. Data Visualization with Charts:


7

a. Customers

Using the bar chart, I have used the data from the country table, and combined the data to show a

relationship between the customer count and country. The below bar chart shows how many customers are from

which country. As seen in the chart the highest number of customers come from USA for, which is 13, followed

by Canada and Brazil. There are many other countries with the lowest number of customers.

Number of Customers by Country


14
12 13
NUMBER OF CUSTOMERS

10
8
8
6
4 5 5
4
2 3
2 2 2
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
a il e y c l ia y ia m rk d ry d ly ds d in en lia na ile
SA d z c m li a
an do ub ug Ind rwa str iu a n a n ta n n a i
U na Bra an t
a
g nm inla ng rela I rla ola Sp ed str ent C
h
a F r rm g p r o u
A Bel I P w
C e Kin Re Po N e F Hu h e S u
A Ar g
G D
d h et
ti e zec N
n C
U
COUNTRIES

b. Invoice
8

Using the multiple bar chart the analysis of the invoice pertaining to each country has been analysed. As it

was evident from the above chart that USA leads the customer count, it by default follows that the invoice

amount will be highest for the USA, followed by Canada. Analysing it further it was found that, Spain had the

least number of invoices until year 2010 which saw a sudden spike in year 2012. This shows that apart from

USA and Canada the company can also focus on targeting the customers who have higher purchase in recent

years.

Argentina
Annual Purchase Amounts by Country Australia
Austria
140
Belgium
Brazil

120 Canada
Chile
Czech Repub-
lic
100
Denmark
PURCHASE AMOUNTS

Finland
France
80
Germany
Hungary

60 India
Ireland
Italy
40 Netherlands
Norway
Poland
20 Portugal
Spain
Sweden
0
United
2009 2010 2011 2012 2013 Kingdom

YEARS USA

c. Track
9

By using the dot chart, I have classified the number of tracks as per the milliseconds. As evident from the

chart the highest number of tracks that is 1680 are between 200,000 to 300,000 milliseconds. Also, it is noted

that the highest number of tracks are seen to be between 1 to 1,000,000 milliseconds followed by 153 tracks

with the duration of 2,500,001 to 3,000,000 milliseconds.

Track Duration Distribution


1600 1680
NUMBER OF TRACKS

1200

800
696
594
400
140 120 153
0 58 45 10 5 0 0 0 0 2
00 0 0 0 0 0 00 00 00 0 0 0 0 0 0
,0 , 00 , 00 , 00 , 00 , 00 ,0 ,0 ,0 00 00 00 00 00 00
0 0 0 0 0 0 0 0 0 0, 0, 0, 0, 0, 0,
10 20 30 40 50 00 50 00 50 ,0
0
,5
0
,0
0
,5
0
,0
0
,5
0
to to to to to 1, 1, 2, 2, 3 3 4 4 5 5
1 01 01 01 01 to to to to to to to to to to
0,
0
0,
0
0,
0
0,
0 01 01 01 01 01 01 01 01 01 01
,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0 ,0
10 20 30 40 0 0 0 0 0 0 0 0 0 0
50 00 50 00 50 00 50 00 50 00
1, 1, 2, 2, 3, 3, 4, 4, 5,

TRACK DURATION (in milliseconds)

3. Customer Segmentation Based on Payments

SELECT DISTINCT customerid


FROM invoice
WHERE total < (SELECT AVG (total) FROM invoice);
I have identified the distinct customerid’s whose amounts are lower than the average total amounts.

Customerid
2 57 30 7 58
4 59 32 43 35
37 33 8 45 14
38 34 9 22 52
40 36 11 24 31
10

42 12 46 1 10
16 13 47 3 48
17 15 49 39 27
19 50 25 41 6
21 51 26 18 44
54 53 28 20 23
55 29 5 56
(Appendix 1)

4. Top 3 Selling Genres

SELECT g.genreid, g.name, COUNT(t.trackid) AS TOP_SELLING


FROM track t
INNER JOIN genre g ON g.genreid = t.genreid
GROUP BY t.genreid
ORDER BY TOP_SELLING DESC
LIMIT 3;

Genreid Name Top_Selling


1 Rock 1297
7 Latin 579
3 Metal 374
(Appendix 2)
5. Top Countries by Customer Count

SELECT Country, COUNT(CustomerID) AS CustomerCount


FROM Customer
GROUP BY Country
ORDER BY CustomerCount DESC
LIMIT 5;

Country CustomerCount
USA 13
Canada 8
Brazil 5
France 5
11

Germany 4
(Appendix 3)
6. Customer Purchase History

SELECT c.customerid, c.firstname, c.lastname, SUM(total) AS Purchases


FROM customer c
INNER JOIN invoice i ON c.customerid = i.customerid
GROUP BY c.customerid, c.firstname, c.lastname
ORDER BY Purchases DESC
LIMIT 7;

Customerid Firstname Lastname Purchases


6 Helena Holy 49.62
26 Richard Cunningham 47.62
57 Luis Rojas 46.62
46 Hugh O’Reilly 45.62
45 Ladislav Kovacs 45.62
28 Julia Barnett 43.62
24 Frank Ralston 43.62
(Appendix 4)
7. Album Count

SELECT COUNT(DISTINCT albumid) AS Unique_albums


FROM album;

Unique albums
347
(Appendix 5)
8. Customer Contact Information

SELECT Firstname, Lastname, Phone, Country


FROM customer
WHERE country = ‘USA’

Firstname Lastname Phone Country


Frank Harris +1 (650) 253-0000 USA
12

Jack Smith +1 (425) 882-8080 USA


Michelle Brooks +1 (212) 221-3546 USA
Tim Goyer +1 (408) 996-1010 USA
Dan Miller +1 (650) 644-3358 USA
Kathy Chase +1 (775) 223-7665 USA
Heather Leacock +1 (407) 999-7788 USA
John Gordon +1 (617) 522-1333 USA
Frank Ralston +1 (312) 332-3232 USA
Victor Stevens +1 (608) 257-0597 USA
Richard Cunningham +1 (817) 924-7272 USA
Patrick Gray +1 (520) 622-4200 USA
Julia Barnett +1 (801) 531-7272 USA
(Appendix 6)
9. Album Popularity by Artist

SELECT a.artistid, a.name, COUNT(al.albumid) AS Highest_Number_of_Albums


FROM album al
INNER JOIN artist a ON a.artistid = al.artistid
GROUP BY a.artistid
ORDER BY Highest_Number_of_Albums DESC
LIMIT 1;

Artistid Name Highest_Number_of_Artists


90 Iron Maiden 21
(Appendix 7)
10. Longest Album Duration:

SELECT a.albumid, a.title, SUM(t.milliseconds) AS TOTAL_DURATION_MILISECONDS


FROM track t
INNER JOIN album a ON a.albumid = t.albumid
GROUP BY a.albumid
ORDER BY TOTAL_DURATION_MILISECONDS DESC
LIMIT 1;
13

Albumid Title Total_Duration_Miliseconds


229 Lost, Season 3 70665582
(Appendix 8)

CONCLUSION

An online music retailer data project, implemented using SQL, has given efficient results by using SQL as

its tool. The analysis was done by using various descriptive statistics tools such as mean, median, mode, standard

deviation, variance, standard error, kurtosis and skewness. It also found the range, minimum, maximum values of

the dataset. It helped achieved data management, enhanced operational efficiency and provided a valuable insight.

While analysing it was noted that the customer on average spent $5.65 on purchase. The total of the invoices

amounted to $2328.60, which came from 59 varied customers. The USA leads the store followed by Canada in

customers department. The top 3 selling genres are Rock, Latin and Metal, where Rock is the most famous.

Segmentation made on the basis of payments helps identify the specific customer group and make marketing

strategies accordingly. The artist Iron Maiden holds the highest number of albums, with 21 albums. The album

‘Lost Season 3’ has the longest duration of 70665582 milliseconds. The data presented, help provide a background

of customer demographics, patterns pertaining to purchase, the tracks and artists related to them, etc. It provided

great insight for business growth.


14

RECOMMENDATIONS

Based on the analysis there are some recommendations which we can provide to the company. Firstly, the

company can find out high value customers who has the highest purchases and can offer personalised promotions

which in turn will lead to high revenue. USA is leading the list but, company can also focus on targeting customers

from other countries like Canda, Germany, Brazil and France through different marketing campaigns. The

company can provide different offers and bundles of the top selling genres which can lead to attracting more

customers and generating sales. The company has 347 unique albums but in turn the company can keep the stock

of the albums which are high in demand. Leveraging the data to find the purchasing patterns of the customer can

help in creating great marketing strategies. Regular updating the database can be an effective way to enhance the

data. The company can come up with different schemes like monthly subscription, exclusive content for premium

customers at discounted price. By implementing this recommendation, the company can enhance its growth to a

greater extent.
15

You might also like