Analysis of Superstore Database
Analysis of Superstore Database
[email protected]
Organization: DGT
College name: Silicon Institute of Technology
State: Odisha
Domain: Data Analytics
S/E date: 12.06.2023 - 24.07.2023
Project Title: Analysis of Superstore Dataset
Introduction: The goal of this project is to analyze the Superstore dataset to gain
insights into sales trends, customer behavior, and operational efficiency. The dataset
contains information about various aspects of the store's operations, including sales,
customer demographics, product categories, and geographical regions. By conducting a
comprehensive analysis, we aim to identify opportunities for improvement and make
data-driven recommendations to optimize store performance.
Data Collection and Preprocessing: Collect and preprocess the Superstore dataset.
Sales Analysis: Analyze sales metrics, trends, and factors influencing sales fluctuations.
Customer Behavior Analysis: Study customer demographics, preferences, and
segmentation for personalized strategies.
Objectives:
•Identify sales trends, such as seasonal patterns and fluctuations, to optimize inventory management and sales
forecasting.
•Understand customer behavior by analyzing demographics, preferences, and purchase patterns to develop targeted
marketing strategies and enhance customer satisfaction.
•Improve operational efficiency by identifying bottlenecks, streamlining processes, and optimizing resource allocation
for enhanced profitability.
•Provide data-driven recommendations to optimize store performance, improve customer experience, and increase
overall profitability based on the analysis findings.
WHO ARE THE END USERS:
Target Audience or End Users:
•Store Managers: They require insights on sales performance, customer behavior, and
operational efficiency to make informed decisions and optimize store operations.
•Marketing Managers: They need information on customer demographics, preferences,
and buying patterns to develop targeted marketing campaigns and improve customer
engagement.
The solution for the "Analysis of Superstore dataset" project involves conducting a
comprehensive analysis of the Superstore dataset to gain insights into sales trends,
customer behavior, and operational efficiency. This analysis will be carried out using various
statistical and data mining techniques, as well as advanced visualization tools.
•Exploratory Data Analysis (EDA):EDA techniques were employed to gain initial insights
into the dataset. This included data visualization through charts, graphs, and plots to
understand the distribution of variables, identify outliers, and detect patterns or
relationships between different variables.
•Statistical Analysis: Utilized to uncover correlations, trends, and patterns within the
Superstore dataset. These techniques helped in understanding the impact of various
factors on sales, customer behavior, and operational efficiency.
•Customer Segmentation: applied o categorize customers based on their attributes and
buying behavior. This allowed for the identification of distinct customer groups with
specific needs and preferences, enabling targeted marketing strategies.
•Data Visualization: Advanced data visualization techniques using tools like Python
libraries (e.g., Matplotlib, Seaborn) were used to create visually appealing and
informative charts, graphs, and dashboards. These visualizations facilitated the effective
communication of analysis results and provided a clear representation of key findings.
These modelling techniques, methodologies, and frameworks formed the foundation of the
"Analysis of Superstore dataset" project for Data Analytics, ensuring a systematic and data-
driven approach to extract valuable insights from the dataset.
Results:
LINKS:
Github Link:
https://github.com/Subham966/Analysis_of_SuperStore_Dataset-
IBM_Internship_Project_for_DataAnalysis
Research Paper:
Here are some references for sales analysis on Superstore dataset:
•Chakraborty, M. (2020). Sales Analysis of Superstore using Power BI. Kaggle.
https://www.kaggle.com/moumoyesh/sales-analysis-of-superstore-using-power-bi
•Microsoft. (n.d.). Analyse and visualize Superstore data in Power BI. https://powerbi.microsoft.com/en-
us/tutorials/analyze-and-visualize-superstore-data/
•Vignesh, S. (2021). Sales Analysis of Superstore dataset using Power BI. Towards Data Science.
https://towardsdatascience.com/sales-analysis-of-superstore-dataset-using-power-bi-1432f74fa62e
•Pranav, B. (2021). Sales Analysis of Superstore Data using Power BI. Analytics Vidhya.
https://www.analyticsvidhya.com/blog/2021/04/sales-analysis-of-superstore-data-using-power-bi/
•Microsoft. (n.d.). Analyse and visualize Superstore data in Power BI. https://powerbi.microsoft.com/en-
us/tutorials/analyze-and-visualize-superstore-data/
•Wong, J. (2021). Sales Analysis of Superstore Dataset Using Power BI. LinkedIn.
https://www.linkedin.com/pulse/sales-analysis-superstore-dataset-using-power-bi-jeremy-wong/
•Rajasekaran, D., & Mohan, K. V. (2018). A review of sales forecasting models for retail industry.
International Journal of Business Forecasting and Marketing Intelligence, 4(1), 1-16.
•Suri, S., & Taneja, S. (2018). A comparative analysis of machine learning algorithms for sales forecasting
in retail industry. International Journal of Engineering and Technology, 7(4.19), 66-70.
DATA SET:
Size 563kb
Number of columns 21
df = pd.read_csv("/content/drive/MyDrive/IBM_Project/Superstoredataset.csv",
encoding='cp1252')
df
df.describe()
# Step-2: Exploratory Data Analysis – EDA:
# Group the data by Product Name and sum up the sales by product
product_group = df.groupby(["Product Name"]).sum()["Sales"]
product_group.head()
top_5_selling_products.plot(kind="bar")
# Filter the data to only include the Canon imageCLASS 2200 Advanced Copier
product = df[df["Product Name"] == "Canon imageCLASS 2200 Advanced Copier"]
# Ploting
region_group.plot(kind="bar")
plt.show()
What is the sales trend over time (monthly, yearly)?
Profit over time:
Sales Generated by Statewise:
plt.figure(figsize=(22,10))
plt.bar(state_sales['State'], state_sales['Sales'], align='center',)
plt.xlabel("State")
plt.ylabel("Sales")
plt.title("Sales Generated by State")
plt.xticks(rotation=90)
plt.show()
state_sales
Select top 5 cities by sales and Sort the data by
Sales in descending order:
city_sales = df_places.groupby('City', as_index=False).sum()
# Sort the data by Sales in descending order
city_sales.sort_values(by='Sales', ascending=False, inplace=True)
# Select the top 5 cities
top_5_cities_sales = city_sales.head()
plt.bar(top_5_cities_sales['City'], top_5_cities_sales['Sales'], align='center')
plt.xlabel("City")
plt.ylabel("Sales")
plt.title("Top 5 Cities by Sales")
plt.xticks(rotation=90)
plt.show()
top_5_cities_sales
Select top 5 cities by profit and Sort the data by
profit in descending order:
city_profit = df_places.groupby('City', as_index=False).sum()
# Sort the data by Sales in descending order
city_profit.sort_values(by='Profit', ascending=False, inplace=True)
plt.show()
top_5_cities_profit
The best sales:
# Group the data by product category and calculate the average profit for each category
avg_profit_margin_by_category = df.groupby('Category')['Profit'].sum()
print(avg_profit_margin_by_category)
df['Profit Margin'] = df['Profit'] / df['Sales']
# Group the data by product category and calculate the average profit margin for each
category
avg_profit_margin_by_category = df.groupby('Category')['Profit Margin'].mean()
# Plot the average profit margin for each category as a bar chart
avg_profit_margin_by_category.plot(kind='bar')
plt.show()
CONCLUSION:
The analysis of the Superstore dataset has provided valuable insights into sales trends,
customer behavior, and operational efficiency. Through exploratory data analysis and advanced
modeling techniques, we have identified several significant findings:
•Sales Trends: The analysis revealed seasonal patterns, with peak sales occurring during specific
months. Additionally, certain product categories exhibited higher demand and profitability than
others, indicating opportunities for strategic focus and optimization.
•Customer Segmentation:
•Predictive Insights: These insights enable proactive decision-making and assist in effective
resource planning and inventory management.
•Enhanced Profitability:
•Improved Decision Making:
•Customer Satisfaction and Retention:
Moving forward, it is recommended that the Superstore continues to monitor sales performance,
customer behavior, and operational metrics. This will allow for ongoing adjustments and
improvements based on changing market dynamics and evolving customer preferences.
Overall, the "Analysis of Superstore dataset" project demonstrates the power of data analytics in
uncovering insights that drive strategic decision-making, operational efficiency, and ultimately, the
success of the Superstore in a competitive retail market.