SPOTIFY DATA ANALYSIS SYSTEM (IP CLASS XII)
SPOTIFY DATA ANALYSIS SYSTEM (IP CLASS XII)
The “Spotify EDA using Python” has been developed to analyze the
dataset to improve better decision in business. This code is supported to
eliminate and, in some cases, reduce the uncertainty faced by the existing
system. Moreover, the code is designed for the particular need of the
company to carry out operations in a smooth and effective manner.
The purpose of “Spotify EDA using Python” is to make more efficient &
meaningful the existing system by the help of programming language and
their useful functions/methods, reaching their business requirements, so
that their valuable data/information can be used for analysis with ease.
Data manipulation & data visualization of the data is done separately. The
required programming language is easily available and easy to work with.
It may help drawing conclusions out of dataset. In a very short time, the
insights will be obvious, simple and sensible. It will help in understanding
the trends & patterns of users perfectly and vividly. It also helps in
comparative analysis between Artist, Genres and Years. It will also help
in recommendations feature.
• Be easy to operate.
• Be expandable.
• All the fields such as Artist Name, Track Name, Track ID are
validated and does not take invalid values.
• Each field for Artist Name, Track Name, Track ID cannot accept
null value fields.
• Avoiding errors and redundancy in data.
• Integration of all the modules/libraries in the system.
• Loading of the dataset with all the validation checks.
• Modifications done for the errors found during execution.
• Functionality of the entire module/libraries.
• Validations for source code.
• Checking of the Coding standards to be maintained during coding.
• Executing the module with all the possible test data.
• Testing of the functionality involving all type of calculations etc.
• Commenting standard in the source files.
• Checking null values before analyzing dataset.
• Describing the dataset in tabular form.
• Checking & viewing the no. of rows and columns in dataset.
Feasibility Study:
After doing the project Spotify’s EDA using Python, study and analyzing
all the existing or required functionalities of the program, the next task is
to do the feasibility study for the project. All projects are feasible - given
unlimited resources and infinite time.
• Dataset that must be provided by the legitimate source: there can also
be few insights, which can help management in decision- making and cost
controlling, but since these insights do not get required attention, such
kind of insights and information were also identified and given required
attention.
• The required dataset is clean & contains no null values for each analysis.
The use case model for any system consists of “use cases”. Use cases
represent different ways in which the system can be used by the user. A
simple way to find all the use case of a system is to ask the questions
“What the user can do using the system?” The use cases partition the
system behavior into transactions such that each transaction performs
some useful action from the users’ point of view.
• Analysis of data.
• Ensure data cleaning & manipulation.
• Proper control of the analyst on the dataset.
• Minimize manual numerical computations.
• Minimum time needed for the various processing.
• Greater efficiency. Better service.
• User friendliness and interactive.
• Minimum time required.
SNIPPETS OF SOURCE CODE:
We’ve imported four libraries for analyzing the dataset i.e., NumPy, Pandas,
Matplotlib and Seaborn. NumPy used for numerical computations. Pandas
used for manipulation of data. And, finally Seaborn & Matplotlib used for
visualization purpose.
We’ve used read_csv() function from Pandas library to load the dataset into
the program. Loading the data is important step before starting the analysis,
dataset is the source of this exploratory data analysis.
OUTPUT:
We’ve used info() function to know the datatype, no. of columns & memory
usage of dataset in the program.
OUTPUT:
4.) SORTING ACCORDING TO THE METRICS –
OUTPUT:
5.) APPLYING DESCRIPTIVE STATISTICS –
We’ve used describe() function to get the descriptive statistics of the data
that is present in dataset. It returns values like Mean, Standard Deviation,
Minimum, Maximum etc.
OUTPUT:
6.) VISUALIZATION: CORRELATION MAP –
We’ve used heatmap() function from the Seaborn library to visualize the
Correlation map between all the variables.
OUTPUT:
9.) BAR PLOT (A) –
We’ve used barplot() function from the Seaborn library to visualize the
correlation between Genres & Duration parameter of the dataset.
OUTPUT:
We’ve used barplot() function from the Seaborn library to visualize the
correlation between Genres & Popularity parameter of the dataset.
OUTPUT:
CONCLUSION OF THE PROJECT: