Time Series Forecasting for Predicting Store Sales Using Prophet
Last Updated :
15 Jul, 2024
Time series forecasting is a crucial aspect of business analytics, enabling companies to predict future trends based on historical data. Accurate forecasting can significantly impact decision-making processes, inventory management, and overall business strategy. One of the powerful tools for time series forecasting is Prophet, an open-source library developed by Facebook's Core Data Science team. This article will delve into the technical aspects of using Prophet for predicting store sales, providing a comprehensive guide from data preparation to model evaluation.
The Prophet Model
The Prophet model uses a decomposable time series model which is built on the following components:
- Trend Component: This component models the overall trend of the data, which can be linear or logistic. Prophet automatically detects changes in trends by selecting changepoints from the data.
- Seasonal Component: This component models seasonal patterns in the data using Fourier series. It can capture yearly, weekly, and daily seasonality.
- Holiday Component: This component incorporates the impact of holidays on the data. Users can provide a list of important holidays to be included in the model.
- Uncertainty Intervals: Provides a range within which the true values are likely to fall, giving an estimate of the prediction's reliability.
Steps for Implementing Prophet for Store Sales Forecasting
Installing Prophet
pip install prophet
One can install Prophet using the above command.
Let's discuss the steps to implement a store sales predictor using Prophet. Here we can consider two examples. One example predicts store sales based on time series, and in another, we can predict store sales for each category based on time series.
- Predicts the store sales based on timeseries
- Predict store sales for each category based on timseries.
Step 1: Import Necessary Libraries
Let's import the necessary libraries.
Python
import pandas as pd
from prophet import Prophet
import matplotlib.pyplot as plt
Let's understand each imports. They are as follows:
- pandas: Used for data manipulation and analysis.
- Prophet: A forecasting tool provided by Facebook, used for forecasting time series data.
- matplotlib: Used for plotting and visualization.
Step 2: Load the Dataset
The code to load the dataset is as follows:
Here we load the data using pandas dataframe.
- file_path: Path to the CSV file containing the sales data.
- pd.read_csv(file_path): Reads the CSV file into a pandas DataFrame.
Dataset Link: Store_sales
Python
# dataset path
file_path = "https://media.geeksforgeeks.org/wp-content/\
uploads/20240704211146/retail_sales_dataset.csv"
# read data to dataframe
data = pd.read_csv(file_path)
Step 3: Data Analysis
Before implementing Prophet, it's crucial to analyze and understand the data. It involves:
- Checking for missing values.
- Identifying the granularity of the data (daily, weekly, etc.).
- Understanding the overall trends and seasonal patterns.
Let's check for any missing values.
Python
Output:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Transaction ID 1000 non-null int64
1 Date 1000 non-null object
2 Customer ID 1000 non-null object
3 Gender 1000 non-null object
4 Age 1000 non-null int64
5 Product Category 1000 non-null object
6 Quantity 1000 non-null int64
7 Price per Unit 1000 non-null int64
8 Total Amount 1000 non-null int64
dtypes: int64(5), object(4) memory usage: 70.4+ KB
You can notice that there is no missing values in the dataset. Every columns have equal set of data. Let's look at the plot diagram between total amount and date.
Python
# convert object to date format
data['Date'] = pd.to_datetime(data['Date'])
# plot diagram
data.plot(x='Date', y='Total Amount')
Output:
Plot DiagramStep 4: Preparing the Data for Prophet
Prophet requires the data to have two columns: ds (date) and y (value to forecast). Additionally, if you want to forecast sales for each product category, you'll need to prepare separate datasets for each category.
daily_sales.rename(columns={'Date': 'ds', 'Total Amount': 'y'})
Here we use pandas dataframe to replaces 'Date' column with 'ds' and 'Total Amount' with 'y'.
Step 5: Training the Prophet Model
To train a prophet model, one should initialize the model using Prophet() and train the prophet model using the prepared data.
# Initialize the model
model = Prophet()
# Fit the model
model.fit(daily_sales)
Here we pass the prepared data to train the prophet model.
Step 6: Forecasting with Prophet
Next step is to make predictions using the prophet model.
# Create a dataframe to hold the dates for which we want to make predictions
future = model.make_future_dataframe(periods=365)
# Predict future sales
forecast = model.predict(future)
Here we need to make a future dataframe that holds the date for which we want to make predictions and make predictions for the future date using the predict() method.
Step 7: Plot the Forecast and the Component
The plotting of forecast and it's component can be using the below code.
# plot forecast
model.plot(forecast)
#plot forecast component
model.plot_components(forecast)
Example1: Predicts the store sales based on timeseries
Let's predict the store sales based on timseries.
Import Libraries and Load the Dataset
We can import the necessary libraries and load the dataset.
Python
import pandas as pd
from prophet import Prophet
import matplotlib.pyplot as plt
# dataset path
file_path = "https://media.geeksforgeeks.org/wp-content/\
uploads/20240704211146/retail_sales_dataset.csv"
# read data to dataframe
data = pd.read_csv(file_path)
data.shape
Output:
(1000, 9)
Preparing the Data for Prophet
The prophet requires the data to have two columns: ds (date) and y (value to forecast) for time series forecasting. The code is as follows:
Python
# Convert the 'Date' column to datetime format
data['Date'] = pd.to_datetime(data['Date'])
# Aggregate total sales by date
daily_sales = data.groupby('Date')['Total Amount'].sum().reset_index()
# Rename columns to fit Prophet's requirements
daily_sales = daily_sales.rename(columns={'Date': 'ds', 'Total Amount': 'y'})
# Display the first few rows of the prepared data
daily_sales.head()
Output:
ds y
0 2023-01-01 3600
1 2023-01-02 1765
2 2023-01-03 600
3 2023-01-04 1240
4 2023-01-05 1100
The above code does the following steps.
- Convert Date to datetime format: Ensures that the 'Date' column is in a format that Prophet can recognize and work with.
- Aggregate total sales by date: Groups the data by 'Date' and sums up the 'Total Amount' for each date to get daily sales.
- Rename columns for Prophet: Renames the columns to 'ds' (date) and 'y' (value), which are the expected column names in Prophet.
- Display the first few rows of the prepared data: Shows the first few rows of the transformed DataFrame to verify the changes.
Training the Prophet Model
To train a prophet model, one should initialize the model using Prophet() and train the prophet model using the prepared data. The code is as follows:
Python
# Initialize the model
model = Prophet()
# Fit the model
model.fit(daily_sales)
The code steps is as follows:
- Initialize the model: Creates an instance of the Prophet model.
- Fit the model: Trains the Prophet model on the prepared data (daily_sales), which includes the historical sales data.
Forecasting with Prophet
The code for forecasting is as follows:
Python
# Create a dataframe to hold the dates for which we want to make predictions
future = model.make_future_dataframe(periods=365)
# Predict future sales
forecast = model.predict(future)
# display first 5 rows and 5 columns
forecast.head(5).iloc[:,0:6]
Output:
index | ds | trend | yhat_lower | yhat_upper | trend_lower | trend_upper |
0 | 2023-01-01 00:00:00 | 1320.24825 | -494.5066309379769 | 2624.543928089163 | 1320.24825 | 1320.24825 |
1 | 2023-01-02 00:00:00 | 1320.2615444335206 | -62.14814861103264 | 2971.5126360266177 | 1320.2615444335206 | 1320.2615444335206 |
2 | 2023-01-03 00:00:00 | 1320.2748388670414 | -141.95277252227478 | 2925.4439832197586 | 1320.2748388670414 | 1320.2748388670414 |
3 | 2023-01-04 00:00:00 | 1320.2881333005616 | -301.57031472493316 | 2821.8639758272507 | 1320.2881333005616 | 1320.2881333005616 |
4 | 2023-01-05 00:00:00 | 1320.3014277340822 | -380.8603476610306 | 2646.1291107103552 | 1320.3014277340822 | 1320.3014277340822 |
Plotting the Forecast
Let's plot the forecast.
Python
# Plot the forecast
fig1 = model.plot(forecast)
plt.title('Total Sales Forecast')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.show()
Output:
Prophet Sales Forecast The code explanation is as follows:
- Create a dataframe for future dates: Generates a dataframe with future dates for which predictions will be made (365 days into the future).
- Predict future sales: Uses the trained Prophet model to predict future sales based on the generated future dates.
- Plot the forecast: Visualizes the predicted sales, including the historical data and forecasted values, with appropriate titles and labels for clarity.
Plotting the Forecast Components
The code to plot the forecast components is as follows:
Python
# Plot the forecast components
fig2 = model.plot_components(forecast)
plt.show()
Output:
Forecast ComponentsThe above code does the following:
- Plot the forecast components: Visualizes different components of the forecast, such as trend, yearly seasonality, and weekly seasonality.
- Show the plot: Displays the component plots to understand the contributions of each component to the overall forecast.
Example2: Forecasting Sales for Each Product Category
In the above example, we implemented the code to forecast total sales based on the date. In this example we will forecast total sales for each product category based on date. The code is same as that fo the above example except the data preparation step. Let's look at the process of data preparation:
- Identify unique product categories: Retrieves the unique product categories from the dataset.
- Loop through each category:
- Filter data for the current category: Selects the sales data corresponding to the current product category.
- Aggregate sales by date: Groups the data by 'Date' and sums the 'Total Amount' for each date to get daily sales for the category.
- Rename columns for Prophet: Renames the columns to 'ds' (date) and 'y' (value) as required by Prophet.
The complete code is as follows:
Python
import pandas as pd
from prophet import Prophet
import matplotlib.pyplot as plt
# dataset path
file_path = "https://media.geeksforgeeks.org/wp-content/\
uploads/20240704211146/retail_sales_dataset.csv"
# read data to dataframe
data = pd.read_csv(file_path)
# Forecast sales for each product category
categories = data['Product Category'].unique()
for category in categories:
category_data = data[
data['Product Category'] == category].groupby('Date')[
'Total Amount'].sum().reset_index()
category_data = category_data.rename(
columns={'Date': 'ds', 'Total Amount': 'y'})
model = Prophet()
model.fit(category_data)
future = model.make_future_dataframe(periods=365)
forecast = model.predict(future)
fig = model.plot(forecast)
plt.title(f'Sales Forecast for {category}')
plt.xlabel('Date')
plt.ylabel('Sales')
plt.show()
model.plot_components(forecast)
plt.show()
Output:
Forecasting Sales for Beauty Category
Forecasting Sales for Beauty Category
Forecasting Sales for Clothing Category
Forecasting Sales for Clothing Category
Forecasting Sales for Electronic Category
Forecasting Sales for Electronic CategoryThe code explanation is as follows:
- Identify unique product categories: Retrieves the unique product categories from the dataset.
- Loop through each category:
- Filter data for the current category: Selects the sales data corresponding to the current product category.
- Aggregate sales by date: Groups the data by 'Date' and sums the 'Total Amount' for each date to get daily sales for the category.
- Rename columns for Prophet: Renames the columns to 'ds' (date) and 'y' (value) as required by Prophet.
- Initialize and train the model: Creates a new Prophet model instance and trains it on the category-specific sales data.
- Create future dataframe and predict: Generates future dates and predicts future sales for the category.
- Plot the forecast: Visualizes the forecasted sales for the category with appropriate titles and labels.
- Plot the forecast components: Visualizes different components of the forecast, such as trend, yearly seasonality, and weekly seasonality, for the category.
Conclusion
Time Series Forecasting for predicting store sales using Prophet is an effective approach for understanding and forecasting sales trends and seasonality. Prophet, developed by Facebook, is particularly adept at handling time series data with strong seasonal effects and several seasons of historical data.
In our analysis, we applied Prophet to historical store sales data to predict future sales. The model successfully captured the overall increasing trend in sales and the weekly seasonal patterns. The forecast plot provided clear visualizations of predicted sales and uncertainty intervals, aiding in strategic decision-making.
Similar Reads
Random Forest for Time Series Forecasting using R
Random Forest is an ensemble machine learning method that can be used for time series forecasting. It is based on decision trees and combines multiple decision trees to make more accurate predictions. Here's a complete explanation along with an example of using Random Forest for time series forecast
7 min read
Time Series Forecasting Using TensorFlow in R
Time series forecasting involves using past data collected at regular intervals to predict future values of a variable that changes over time. By analyzing historical data, we can understand trends, seasonal patterns, and cyclical behaviors, which helps in making more informed decisions.Applications
11 min read
Share Price Forecasting Using Facebook Prophet
Time series forecast can be used in a wide variety of applications such as Budget Forecasting, Stock Market Analysis, etc. But as useful it is also challenging to forecast the correct projections, Thus can't be easily automated because of the underlying assumptions and factors. The analysts who prod
6 min read
Time Series Forecasting using Pytorch
Time series forecasting plays a major role in data analysis, with applications ranging from anticipating stock market trends to forecasting weather patterns. In this article, we'll dive into the field of time series forecasting using PyTorch and LSTM (Long Short-Term Memory) neural networks. We'll u
9 min read
Time Series and Forecasting Using R
Time series forecasting is the process of using historical data to make predictions about future events. It is commonly used in fields such as finance, economics, and weather forecasting. R is a powerful programming language and software environment for statistical computing and graphics that is wid
9 min read
Time Series Forecasting with Support Vector Regression
Time series forecasting is a critical aspect of data analysis, with applications spanning from financial markets to weather predictions. In recent years, Support Vector Regression (SVR) has emerged as a powerful tool for time series forecasting due to its ability to handle nonlinear relationships an
11 min read
TIme Series Forecasting using TensorFlow
TensorFlow emerges as a powerful tool for data scientists performing time series analysis through its ability to leverage deep learning techniques. By incorporating deep learning into time series analysis, we can achieve significant advancements in both the depth and accuracy of our forecasts. Tenso
8 min read
Time Series Forecasting using Recurrent Neural Networks (RNN) in TensorFlow
Time series data (such as stock prices) are sequence that exhibits patterns such as trends and seasonality. Each data point in a time series is linked to a timestamp which shows the exact time when the data was observed or recorded. Many fields including finance, economics, weather forecasting and m
5 min read
Time Series Forecasting as Supervised Learning
Time series forecasting involves predicting future values based on previously observed data points. By reframing it as a supervised learning problem, you can leverage a variety of machine learning algorithms, both linear and nonlinear, to improve the forecasting accuracy. In this article, we will se
3 min read
How to Remove Non-Stationarity in Time Series Forecasting
Removing non-stationarity in time series data is crucial for accurate forecasting because many time series forecasting models assume stationarity, where the statistical properties of the time series do not change over time. Non-stationarity can manifest as trends, seasonality, or other forms of irre
7 min read