Geakmindz Test.ipynb - Colab
Geakmindz Test.ipynb - Colab
ipynb - Colab
This dataset is about product delivery timelines of Orders (something like e-commerce delivery). The dataset comprises of two csv files, one
showing all details of order delivery and other showing the time it took to deliver the order. The problem statement is: When a customer places
an order, how long does it take to get delivered in days?
1. Write an SQL Query to join both datasets (assuming each csv file is an SQL Table) and obtain order delivery details and timelines in a
single table.
2. Perform Exploratory Analysis on the data and identify at least 3 aspects which impact delivery time and how much does it impact.
3. Analyse the dataset and build a Machine Learning model to predict the “Delivery Time” for orders.
4. Present the results of EDA and Modelling in a Power Point Presentation/Google slides. (Slide Preparation can be done offline) Data
Dictionary order-delivery.csv ➢ Order Number – A unique identifier for each order ➢ Product Name – Name of the Product ➢ Order Type
– New or Additional, indicating if it was first time or not. ➢ Product Cost – Cost of the Product in $ ➢ Cash on Delivery – Is it prepaid or
not. ➢ Product Unavailable Flag – Is the product available in stock or needs to procure from manufacturer to deliver to customer. ➢ Multi-
Mode Transport Flag – Does it requires more than one mode of transport in delivering (flight/rail/road/ship) ➢ Speed Delivery Flag – Has
the customer requested for faster delivery. ➢ Remote Location – Is the customer location outside major cities. ➢ Multi Hop Delivery –
Does this delivery involves multiple transit hubs. ➢ Product Size – Size of the product order-delivery-time.csv ➢ Order Number – A unique
identifier for each order ➢ Delivery Time – No of days taken to deliver the product to customer.
import pandas as pd
order_delivery = pd.read_csv('order-delivery.csv')
order_delivery_time = pd.read_excel('order-delivery-time.xlsx')
order_delivery.head()
Product
0 2503942 Additional 375.00 Y N Y N N N Large
B
Product
1 2061728 Additional 555.51 N Y Y N N Y Large
B
Product
2 2545860 Additional 2166.31 N N Y N N N Small
B
Product
toggle_off
3 3564189 N 637 60 Y N Y N N N M di
Next steps: Generate code with order_delivery View recommended plots New interactive sheet
order_delivery_time.head()
https://colab.research.google.com/drive/1K4g8IbDl0rnRW87jT9c_Z7b6LSGExwdX?authuser=0#scrollTo=qo6VRbtlKPh8&uniqifier=1&printMode=true 1/8
10/28/24, 7:25 PM Geakmindz Test.ipynb - Colab
0 2503942 23
1 2061728 114
2 2545860 46
3 3564189 21
4 2335870 92
Product
0 2503942 Additional 375.00 Y N Y N N N Large 2
B
Product
1 2061728 Additional 555.51 N Y Y N N Y Large 11
B
Product
2 2545860 Additional 2166.31 N N Y N N N Small 4
B
Product
3 3564189 New 637.60 Y N Y N N N Medium 2
A
Product
4 2335870 New 1497.11 Y Y Y N N Y Large 9
A
df.describe() #Statistics
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7807 entries, 0 to 7806
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Order Number 7807 non-null int64
1 Product Name 7807 non-null object
2 Order Type 7807 non-null object
3 Product Cost 7807 non-null float64
https://colab.research.google.com/drive/1K4g8IbDl0rnRW87jT9c_Z7b6LSGExwdX?authuser=0#scrollTo=qo6VRbtlKPh8&uniqifier=1&printMode=true 2/8
10/28/24, 7:25 PM Geakmindz Test.ipynb - Colab
4 Cash on Delivery 7807 non-null object
5 Product Unavailable Flag 7807 non-null object
6 Multi Mode TraNosport Flag 7807 non-null object
7 Speed Delivery Flag 7807 non-null object
8 Remote Location 7807 non-null object
9 Multi Hop Delivery 7807 non-null object
10 Product Size 7807 non-null object
11 Delivery Time 7807 non-null int64
dtypes: float64(1), int64(2), object(9)
memory usage: 732.0+ KB
Order Number 0
Product Name 0
Order Type 0
Product Cost 0
Cash on Delivery 0
Product Unavailable Flag 0
Multi Mode TraNosport Flag 0
Speed Delivery Flag 0
Remote Location 0
Multi Hop Delivery 0
Product Size 0
Delivery Time 0
dtype: int64
https://colab.research.google.com/drive/1K4g8IbDl0rnRW87jT9c_Z7b6LSGExwdX?authuser=0#scrollTo=qo6VRbtlKPh8&uniqifier=1&printMode=true 3/8
10/28/24, 7:25 PM Geakmindz Test.ipynb - Colab
If Mean delivery time is noticeably higher for remote locations,it indicates remote location increases delivery time
If orders with speed delivery requests have a lower mean delivery time, it confirms that speed delivery reduces the time taken for delivery
https://colab.research.google.com/drive/1K4g8IbDl0rnRW87jT9c_Z7b6LSGExwdX?authuser=0#scrollTo=qo6VRbtlKPh8&uniqifier=1&printMode=true 4/8
10/28/24, 7:25 PM Geakmindz Test.ipynb - Colab
If the mean delivery time is significantly higher for multihop deliveries, it indicates that multiple transit hubs lead to longer delivery times
Correlation Analysis
https://colab.research.google.com/drive/1K4g8IbDl0rnRW87jT9c_Z7b6LSGExwdX?authuser=0#scrollTo=qo6VRbtlKPh8&uniqifier=1&printMode=true 5/8
10/28/24, 7:25 PM Geakmindz Test.ipynb - Colab
Perform Exploratory Analysis on the data and identify at least 3 aspects which impact delivery time and how much does it impact
https://colab.research.google.com/drive/1K4g8IbDl0rnRW87jT9c_Z7b6LSGExwdX?authuser=0#scrollTo=qo6VRbtlKPh8&uniqifier=1&printMode=true 6/8
10/28/24, 7:25 PM Geakmindz Test.ipynb - Colab
Analyse the dataset and build a Machine Learning model to predict the “Delivery Time” for orders.
categorical_features = ['Order Type','Cash on Delivery','Product Unavailable Flag','Multi Mode TraNosport Flag','Speed Del
numerical_features = X.select_dtypes(include=['int64','float64']).columns
preprocessor = ColumnTransformer(
transformers=[
('cat', OneHotEncoder(drop = 'first'), categorical_features),
('num',StandardScaler(),numerical_features)
],
remainder='passthrough'
)
WE ARE USING RANDOM FOREST REGRESSOR FOR OUR MACHINE LEARNING MODEL
pipeline.fit(X_train, y_train)
https://colab.research.google.com/drive/1K4g8IbDl0rnRW87jT9c_Z7b6LSGExwdX?authuser=0#scrollTo=qo6VRbtlKPh8&uniqifier=1&printMode=true 7/8
10/28/24, 7:25 PM Geakmindz Test.ipynb - Colab
▸ Pipeline i ?
▸ preprocessor: ColumnTransformer ?
https://colab.research.google.com/drive/1K4g8IbDl0rnRW87jT9c_Z7b6LSGExwdX?authuser=0#scrollTo=qo6VRbtlKPh8&uniqifier=1&printMode=true 8/8