0% found this document useful (0 votes)
1 views

Geakmindz Test.ipynb - Colab

The document outlines a project involving the analysis of product delivery timelines using two datasets related to order delivery details and delivery times. It includes tasks such as writing SQL queries to join datasets, performing exploratory data analysis to identify factors impacting delivery time, and building a machine learning model to predict delivery times. The document also emphasizes the importance of presenting findings through a PowerPoint or Google Slides presentation.

Uploaded by

RexLex
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Geakmindz Test.ipynb - Colab

The document outlines a project involving the analysis of product delivery timelines using two datasets related to order delivery details and delivery times. It includes tasks such as writing SQL queries to join datasets, performing exploratory data analysis to identify factors impacting delivery time, and building a machine learning model to predict delivery times. The document also emphasizes the importance of presenting findings through a PowerPoint or Google Slides presentation.

Uploaded by

RexLex
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

10/28/24, 7:25 PM Geakmindz Test.

ipynb - Colab

This dataset is about product delivery timelines of Orders (something like e-commerce delivery). The dataset comprises of two csv files, one
showing all details of order delivery and other showing the time it took to deliver the order. The problem statement is: When a customer places
an order, how long does it take to get delivered in days?

1. Write an SQL Query to join both datasets (assuming each csv file is an SQL Table) and obtain order delivery details and timelines in a
single table.
2. Perform Exploratory Analysis on the data and identify at least 3 aspects which impact delivery time and how much does it impact.
3. Analyse the dataset and build a Machine Learning model to predict the “Delivery Time” for orders.
4. Present the results of EDA and Modelling in a Power Point Presentation/Google slides. (Slide Preparation can be done offline) Data
Dictionary order-delivery.csv ➢ Order Number – A unique identifier for each order ➢ Product Name – Name of the Product ➢ Order Type
– New or Additional, indicating if it was first time or not. ➢ Product Cost – Cost of the Product in $ ➢ Cash on Delivery – Is it prepaid or
not. ➢ Product Unavailable Flag – Is the product available in stock or needs to procure from manufacturer to deliver to customer. ➢ Multi-
Mode Transport Flag – Does it requires more than one mode of transport in delivering (flight/rail/road/ship) ➢ Speed Delivery Flag – Has
the customer requested for faster delivery. ➢ Remote Location – Is the customer location outside major cities. ➢ Multi Hop Delivery –
Does this delivery involves multiple transit hubs. ➢ Product Size – Size of the product order-delivery-time.csv ➢ Order Number – A unique
identifier for each order ➢ Delivery Time – No of days taken to deliver the product to customer.

!pip install pandas matplotlib seaborn scikit-learn

Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (2.2.2)


Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (3.7.1)
Requirement already satisfied: seaborn in /usr/local/lib/python3.10/dist-packages (0.13.2)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (1.5.2)
Requirement already satisfied: numpy>=1.22.4 in /usr/local/lib/python3.10/dist-packages (from pandas) (1.26.4)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas) (2.8.2
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.10/dist-packages (from pandas) (2024.2)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (4.54.1
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (1.4.7
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (24.1)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (10.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib) (3.2.0)
Requirement already satisfied: scipy>=1.6.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (1.13.1)
Requirement already satisfied: joblib>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (1.4.2)
Requirement already satisfied: threadpoolctl>=3.1.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn) (3
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pand

import pandas as pd

order_delivery = pd.read_csv('order-delivery.csv')
order_delivery_time = pd.read_excel('order-delivery-time.xlsx')

order_delivery.head()

Product Multi Mode Speed Multi


Order Product Order Product Cash on Remote Product
Unavailable TraNosport Delivery Hop
Number Name Type Cost Delivery Location Size
Flag Flag Flag Delivery

Product
0 2503942 Additional 375.00 Y N Y N N N Large
B

Product
1 2061728 Additional 555.51 N Y Y N N Y Large
B

Product
2 2545860 Additional 2166.31 N N Y N N N Small
B

Product

toggle_off
3 3564189 N 637 60 Y N Y N N N M di
Next steps: Generate code with order_delivery View recommended plots New interactive sheet

order_delivery_time.head()

https://colab.research.google.com/drive/1K4g8IbDl0rnRW87jT9c_Z7b6LSGExwdX?authuser=0#scrollTo=qo6VRbtlKPh8&uniqifier=1&printMode=true 1/8
10/28/24, 7:25 PM Geakmindz Test.ipynb - Colab

Order Number Delivery Time

0 2503942 23

1 2061728 114

2 2545860 46

3 3564189 21

4 2335870 92

Next steps: Generate code with order_delivery_time


toggle_off View recommended plots New interactive sheet

df = pd.merge(order_delivery, order_delivery_time, on='Order Number')


df.head()

Product Multi Mode Speed Multi


Order Product Order Product Cash on Remote Product Deliver
Unavailable TraNosport Delivery Hop
Number Name Type Cost Delivery Location Size Tim
Flag Flag Flag Delivery

Product
0 2503942 Additional 375.00 Y N Y N N N Large 2
B

Product
1 2061728 Additional 555.51 N Y Y N N Y Large 11
B

Product
2 2545860 Additional 2166.31 N N Y N N N Small 4
B

Product
3 3564189 New 637.60 Y N Y N N N Medium 2
A

Product
4 2335870 New 1497.11 Y Y Y N N Y Large 9
A

Next steps: Generate code with df


toggle_off View recommended plots New interactive sheet

STEP 2: EXPLORATORY DATA ANALYSIS

import matplotlib.pyplot as plt


import seaborn as sns

df.describe() #Statistics

Order Number Product Cost Delivery Time

count 7.807000e+03 7807.000000 7807.000000

mean 2.828361e+06 1198.549240 89.548610

std 7.311594e+05 1248.550426 103.910767

min 1.013417e+06 28.800000 13.000000

25% 2.316644e+06 521.445000 32.000000

50% 2.823562e+06 868.160000 56.000000

75% 3.408601e+06 1408.410000 104.000000

max 4.303141e+06 24464.580000 952.000000

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7807 entries, 0 to 7806
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Order Number 7807 non-null int64
1 Product Name 7807 non-null object
2 Order Type 7807 non-null object
3 Product Cost 7807 non-null float64

https://colab.research.google.com/drive/1K4g8IbDl0rnRW87jT9c_Z7b6LSGExwdX?authuser=0#scrollTo=qo6VRbtlKPh8&uniqifier=1&printMode=true 2/8
10/28/24, 7:25 PM Geakmindz Test.ipynb - Colab
4 Cash on Delivery 7807 non-null object
5 Product Unavailable Flag 7807 non-null object
6 Multi Mode TraNosport Flag 7807 non-null object
7 Speed Delivery Flag 7807 non-null object
8 Remote Location 7807 non-null object
9 Multi Hop Delivery 7807 non-null object
10 Product Size 7807 non-null object
11 Delivery Time 7807 non-null int64
dtypes: float64(1), int64(2), object(9)
memory usage: 732.0+ KB

print(df.isnull().sum()) #Check for missing values

Order Number 0
Product Name 0
Order Type 0
Product Cost 0
Cash on Delivery 0
Product Unavailable Flag 0
Multi Mode TraNosport Flag 0
Speed Delivery Flag 0
Remote Location 0
Multi Hop Delivery 0
Product Size 0
Delivery Time 0
dtype: int64

#Box plot to see impact of Remote Location on Delivery Time


plt.figure(figsize=(10, 6))
sns.boxplot(x='Remote Location', y='Delivery Time', data=df)
plt.title('Impact of Remote Location on Delivery Time')
plt.xlabel('Remote Location')
plt.ylabel('Delivery Time (Days)')
plt.show()

# Mean delivery time for remote and non remote locations


print('Mean Delivery Time (Remote location)')
print(df.groupby('Remote Location')['Delivery Time'].mean())

Mean Delivery Time (Remote location)


Remote Location
N 80.334048
Y 168.710074
Name: Delivery Time, dtype: float64

https://colab.research.google.com/drive/1K4g8IbDl0rnRW87jT9c_Z7b6LSGExwdX?authuser=0#scrollTo=qo6VRbtlKPh8&uniqifier=1&printMode=true 3/8
10/28/24, 7:25 PM Geakmindz Test.ipynb - Colab

If Mean delivery time is noticeably higher for remote locations,it indicates remote location increases delivery time

# Box plot to see Impact of 'Speed Delivery Flag' on 'Delivery Time'


plt.figure(figsize=(10, 6))
sns.boxplot(x='Speed Delivery Flag', y='Delivery Time', data=df)
plt.title('Impact of Speed Delivery Flag on Delivery Time')
plt.xlabel('Speed Delivery Flag')
plt.ylabel('Delivery Time (Days)')
plt.show()

# Mean delivery time for 'Speed Delivery Flag'


print('Mean Delivery Time (Speed Delivery Flag)')
print(df.groupby('Speed Delivery Flag')['Delivery Time'].mean())

Mean Delivery Time (Speed Delivery Flag)


Speed Delivery Flag
N 91.390414
Y 61.681818
Name: Delivery Time, dtype: float64

If orders with speed delivery requests have a lower mean delivery time, it confirms that speed delivery reduces the time taken for delivery

# Box plot to see impact of 'Multihop' delivery on 'Delivery Time'


plt.figure(figsize=(10, 6))
sns.boxplot(x='Multi Hop Delivery', y='Delivery Time', data=df)
plt.title('Impact of Multihop Delivery on Delivery Time')
plt.xlabel('Multihop Delivery')
plt.ylabel('Delivery Time (Days)')
plt.show()

#Mean delivery for Single vs Multihop deliveries


print('Mean Delivery Time (Multihop Delivery)')
print(df.groupby('Multi Hop Delivery')['Delivery Time'].mean())

https://colab.research.google.com/drive/1K4g8IbDl0rnRW87jT9c_Z7b6LSGExwdX?authuser=0#scrollTo=qo6VRbtlKPh8&uniqifier=1&printMode=true 4/8
10/28/24, 7:25 PM Geakmindz Test.ipynb - Colab

Mean Delivery Time (Multihop Delivery)


Multi Hop Delivery
N 71.881941
Y 105.546497
Name: Delivery Time, dtype: float64

If the mean delivery time is significantly higher for multihop deliveries, it indicates that multiple transit hubs lead to longer delivery times

Correlation Analysis

#Correlation Matrix and Heatmap


numerical_df = df.select_dtypes(include=['number'])
correlation_matrix = numerical_df.corr()
plt.figure(figsize=(10, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm',vmin = -1,vmax =1)
plt.title('Correlation Matrix')
plt.show()

https://colab.research.google.com/drive/1K4g8IbDl0rnRW87jT9c_Z7b6LSGExwdX?authuser=0#scrollTo=qo6VRbtlKPh8&uniqifier=1&printMode=true 5/8
10/28/24, 7:25 PM Geakmindz Test.ipynb - Colab

Perform Exploratory Analysis on the data and identify at least 3 aspects which impact delivery time and how much does it impact

#Analysing the Impact of Product Cost on Delivery Time


plt.figure(figsize=(10, 6))
sns.scatterplot(x='Product Cost', y='Delivery Time', data=df)
plt.title('Product Cost vs Delivery Time')
plt.xlabel('Product Cost')
plt.ylabel('Delivery Time (Days)')
plt.show()

https://colab.research.google.com/drive/1K4g8IbDl0rnRW87jT9c_Z7b6LSGExwdX?authuser=0#scrollTo=qo6VRbtlKPh8&uniqifier=1&printMode=true 6/8
10/28/24, 7:25 PM Geakmindz Test.ipynb - Colab

Analyse the dataset and build a Machine Learning model to predict the “Delivery Time” for orders.

from sklearn.model_selection import train_test_split


from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

X = df.drop(['Order Number','Delivery Time','Product Name'],axis = 1)


y = df['Delivery Time']

categorical_features = ['Order Type','Cash on Delivery','Product Unavailable Flag','Multi Mode TraNosport Flag','Speed Del
numerical_features = X.select_dtypes(include=['int64','float64']).columns

preprocessor = ColumnTransformer(
transformers=[
('cat', OneHotEncoder(drop = 'first'), categorical_features),
('num',StandardScaler(),numerical_features)
],
remainder='passthrough'
)

# Make an 80-20 train test split


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

WE ARE USING RANDOM FOREST REGRESSOR FOR OUR MACHINE LEARNING MODEL

#Defining pipeline for preprocessing and training


pipeline = Pipeline(steps = [
('preprocessor', preprocessor),
('regressor', LinearRegression())
])

pipeline.fit(X_train, y_train)
https://colab.research.google.com/drive/1K4g8IbDl0rnRW87jT9c_Z7b6LSGExwdX?authuser=0#scrollTo=qo6VRbtlKPh8&uniqifier=1&printMode=true 7/8
10/28/24, 7:25 PM Geakmindz Test.ipynb - Colab

▸ Pipeline i ?

▸ preprocessor: ColumnTransformer ?

https://colab.research.google.com/drive/1K4g8IbDl0rnRW87jT9c_Z7b6LSGExwdX?authuser=0#scrollTo=qo6VRbtlKPh8&uniqifier=1&printMode=true 8/8

You might also like