SMDM project
SMDM project
Table of Contents
1.Data Overview ................................................................................................................................ 4
1. 1 Context ................................................................................................................................... 4
1.2 Objective.............................................................................................................................. 4
1.3 Data Dictionary .................................................................................................................... 4
1.4 Data Structure and Statistical Summary ................................................................................... 5
1.5 Observations and Insights ........................................................................................................ 7
2.Univariate Analysis ......................................................................................................................... 7
2.1 Explore all the variables and provide observations on their distributions .................................. 7
2.1.1 Cuisine type .......................................................................................................................... 8
2.1.2 Cost of the order ................................................................................................................... 9
2.1.3 Day of the week .................................................................................................................... 9
2.1.4 Rating ................................................................................................................................. 10
2.1.5 Food Preparation Time ........................................................................................................ 11
2.1.6 Delivery time ....................................................................................................................... 11
2.1.7 Top 5 restaurants in terms of the number of orders received................................................ 12
2.1.9 Percentage of the orders cost more than 20 dollars ............................................................. 13
2.1.10 Mean order delivery time .................................................................................................. 13
2.1.11 The company has decided to give 20% discount vouchers to the top 3 most frequent
customers. Find the IDs of these customers and the number of orders they placed ....................... 13
2.2 Observations and Insights: ..................................................................................................... 13
3 Multivariate Analysis .................................................................................................................... 14
3.1 Perform a multivariate analysis to explore relationships between the important variables in the
dataset. (It is a good idea to explore relations between numerical variables as well as relations
between numerical and categorical variables) ............................................................................. 14
3.1.1 Cuisine Vs Cost of the order ............................................................................................. 14
3.1.2 Cuisine Vs Food preparation time ........................................................................................ 15
3.1.3 Day of the week and Delivery time ...................................................................................... 16
3.1.4 Observations on the revenue generated by restaurants ....................................................... 16
3.1.5 Rating Vs Delivery time ....................................................................................................... 17
3.1.6 Rating Vs Food preparation time ......................................................................................... 18
3.1.7 Rating Vs Cost of the order .................................................................................................. 19
3.1.8 Correlation among variables ............................................................................................... 20
3.2 Observations and Insights: ..................................................................................................... 20
4. The company wants to provide a promotional offer in the advertisement of the restaurants. The
condition to get the offer is that the restaurants must have a rating count of more than 50 and the
2|Page
average rating should be greater than 4. Find the restaurants fulfilling the criteria to get the
promotional offer ............................................................................................................................ 21
5. The company charges the restaurant 25% on the orders having cost greater than 20 dollars and
15% on the orders having cost greater than 5 dollars. Find the net revenue generated by the company
across all orders. ............................................................................................................................. 22
6. The company wants to analyse the total time required to deliver the food. What percentage of
orders take more than 60 minutes to get delivered from the time the order is placed? (The food has to
be prepared and then delivered.) ..................................................................................................... 23
7. The company wants to analyse the delivery time of the orders on weekdays and weekends. How
does the mean delivery time vary during weekdays and weekends? ................................................. 23
8. What are your conclusions from the analysis? What recommendations would you like to share to
help improve the business? (You can use cuisine type and feedback ratings to drive your business
recommendations.) ......................................................................................................................... 24
Conclusions: .................................................................................................................................... 25
Recommendations:.......................................................................................................................... 25
3|Page
1.Data Overview
1. 1 Context
The number of restaurants in New York is increasing day by day. Lots of students and busy
professionals rely on those restaurants due to their hectic lifestyles. Online food delivery
service is a great option for them. It provides them with good food from their favourite
restaurants. A food aggregator company Food Hub offers access to multiple restaurants
through a single smartphone app.
The app allows the restaurants to receive a direct online order from a customer. The app
assigns a delivery person from the company to pick up the order after it is confirmed by the
restaurant. The delivery person then uses the map to reach the restaurant and waits for the
food package. Once the food package is handed over to the delivery person, he/she confirms
the pick-up in the app and travels to the customer's location to deliver the food. The delivery
person confirms the drop-off in the app after delivering the food package to the customer.
The customer can rate the order in the app. The food aggregator earns money by collecting a
fixed margin of the delivery order from the restaurants.
1.2 Objective
The food aggregator company has stored the data of the different orders made by the
registered customers in their online portal. They want to analyse the data to get a fair idea
about the demand of different restaurants which will help them in enhancing their customer
experience. Suppose you are a Data Scientist at Food hub and the Data Science team has
shared some of the key questions that need to be answered. Perform the data analysis to find
answers to these questions that will help the company to improve the business.
4|Page
1.4 Data Structure and Statistical Summary
5|Page
Data information and data types of the respective columns.
Duplicate rows.
6|Page
There are 736 ratings for which the ratings are not given.
2.Univariate Analysis
2.1 Explore all the variables and provide observations on their distributions
Unique values for the features order_id, customer_id and restaurant_name is 1898, 1200
and 178.
7|Page
2.1.1 Cuisine type
There are 14 unique varieties of cuisine type in this food hub restaurant chain for the
customers to place the order.
Based on the data driven from the table and the graph we can clearly see that the top 6
Cuisine types is the most ordered food type from the customer which influences that
adding new dishes to this list of cuisine type will also help in increasing the overall
revenue for the food hub in terms of business growth.
The top 6 cuisine types are:
1. American cuisine type with 584 orders placed.
2. Japanese cuisine type with 474 orders placed.
3. Italian cuisine type with 298 orders placed.
8|Page
4. Chinese cuisine type with 215 orders placed.
5. Mexican cuisine type with 77 orders placed.
6. Indian cuisine type with 73 orders placed.
This data suggests that the marketing team can try to give more discount/voucher or a
freebie for the least ordered cuisine types in the above list to increase the overall sales
Of the cuisine and total revenue.
According the graph plotted above it is clear that the most of the orders have been placed
In between 12 to 14 dollar and the nature of the graph is a right skewed data.
The same cuisine type as shown in the previous graph is ranked with terms of the cost of
the order placed variable as well.
Around 26.34% of the orders have been placed within 12 to 15 dollars which influences
that we can fix a rate within this range to increase the order sales of the business.
9|Page
There are two different variables in this feature column as weekday and weekend.
Around 71.18% of orders were placed during weekend and the rest 28.82% on weekdays.
It is advisable to run a promotion on weekdays to improve the sales and weekend need
not require the concentration since it has the good sales on those days instead we can
add more options of cuisine items for top orders.
2.1.4 Rating
Based on the data shown in the above graph there 4 unique values in this feature column.
Out of which the major share holds for the ‘Not given’ variable which means the rating has
been ignored for the orders placed around 736 feedbacks.
Around 588 ratings are rated for 5 which is the second most ranking followed by 4 ratings
for 386 orders and the least ratings for 3 at 188 rating numbers.
People prefer ratings to place the order since it’s an influencing feature on the sales, we
Need to avoid customer skipping the rating recommendations by communication while
Delivery or by push notification with the voucher for the rated orders.
It is suggested to fix on the low rating rank of 3 by rectifying the reasons through the
Comments for the rating 3 and action on either food delivery partner or the food
quality, quantity depending upon the reviews.
10 | P a g e
2.1.5 Food Preparation Time
The average time took in preparing the food once ordered is 27.31 minutes and as shown
In the above graph and also this feature has no skewness in its nature which means that
The time has been maintained considerably at mean level.
By looking at the graph we can insist in reducing the maximum time took in preparing the
Food which is above 34 minutes nearly 220 orders fall under this time period.
Minimum time took in completing the food preparation is 20min and up to 35 minutes as
the maximum time took in food preparation.
From the above graph it is evident that most of the orders have been delivered with
the time line from approx. 24minutes and between 28 to 29minutes and the rest of
The orders have been distributed.
This feature is a left skewed data where most of the orders have been delivered at
the left variable timings and the mean time took in delivering the food is 24.16min.
We can reduce the maximum time took in delivering the food by working on the
Food preparation and delivery time which helps in increasing the rating reviews.
11 | P a g e
2.1.7 Top 5 restaurants in terms of the number of orders received
The top 5 restaurants in terms of the number of orders received rank has been displayed in
the above figure at the specific restaurant level.
Based on the above graph and the data count American cuisine type is the most
famous and 415 it is the highest number of orders placed cuisine type on weekends.
Followed by the other 4 different cuisine types with the order counts as Japanese
335, Italian 207, Chinese 163 and Mexican 53.
12 | P a g e
The same order of cuisine type has been followed for the weekdays as well as per the
graph shown above.
2.1.11 The company has decided to give 20% discount vouchers to the top 3 most
frequent customers. Find the IDs of these customers and the number of orders they
placed
The top 4 cuisine types are the huge contributors for the business growth.
People tend to place more orders for the price range between 12 to 15 dollars.
Most of the orders have been placed during the weekend which plays a vital role for the
revenue generation of the business.
Most of the people have not shown interest in rating the experience of their orders.
The average time taken for preparing food for the orders placed is 27.31 minutes.
Most of the orders have been delivered within 25 minutes of duration.
Top 3 restaurants have received the highest number of orders.
American, Japanese and Italian are the most popular cuisine type which people prefers to
order frequently.
29.24% of the orders have the bill value above 20 dollars.
The average time took by the delivery partner in delivering the order is 24.16 minutes.
Top 3 customers have placed the orders 13, 10 and 9 times.
The suggestion to improve the business at a larger scale is that we can introduce few new
items in the top cuisines and run a promotion on first time ordering these dishes with the
minimum discount value.
13 | P a g e
Need to maintain the average mean time through the orders to increase the sales since the
faster delivery will avoid the many unusual complications and escalations which will help in
improving the ratings of the orders as well.
Suggest to normalize the price range for the foods to be 12 to 15 dollars to apply the same
price for the upcoming introductory dishes as well.
3 Multivariate Analysis
Based on the data analysis, it appears that southern cuisine type has the cost of the order
at the highest range between 7.5 to 32 rupees and few other cuisines Spanish, Middle
Eastern, Indian has almost the same median value of approx. 16 rupees as the southern type.
There are 3 cuisine type which has the outliers Korean, Mediterranean and Vietnamese.
Korean cuisine type has the outliers on both upper and lower fence of the region for which
the cost of the order values needs to be checked and treated accordingly.
For Korean cuisine the median and the minimum value is almost the same which means
most of the food items are cheaper compare to other cuisine type dishes.
The median value is same for the Italian, American, Chinese and Japanese cuisine type which
means that most of these cuisine dishes have the cost of the order value approx. 14 rupees.
14 | P a g e
Most of the cuisine types are right skewed and whisker is longer on the upper end which
means that the more orders are placed at the higher rates of the cuisine dishes.
Based on the data analysis, it appears that Thai cuisine type has the food preparation time
orders at the highest range from 21.5 to 35 minutes and it is slightly left skewed feature in
terms of the time taken in preparing the food orders.
Here also only Korean cuisine has the outliers at the upper fence of the boxplot and it needs
to be treated.
American, Chinese, Indian, Mediterranean and Middle east has the same median value of 27
Which means most of the orders has been completed at 27minutes for these cuisine type of
food in terms of order placed.
Italian and Thai cuisine type has the highest median value of 28 minutes out of all other
cuisine type.
15 | P a g e
3.1.3 Day of the week and Delivery time
Based on the data analysis from the above graph it is evident that the orders have delivered
faster on weekends compare to the weekdays, average time took in delivering the food on
weekend is 22.47 minutes as of 28.34 minutes during weekdays.
Based on the data provided on the number of orders placed we can say that most of business
happens during weekend where as 67.9% of the orders has been placed on weekend and
rest 32.07% orders on weekdays.
From the graph we can say that the minimum time taken to deliver the order is 15 minutes
during weekend and 24 minutes on weekdays which clearly implies that the orders and
delivery has been more and faster during weekend.
The overall time in delivering the order is also high in weekend due to a greater number of
orders which 30357 minutes nearly twice as that of the weekdays is 15502 minutes.
16 | P a g e
Figure 3.1.4 Revenue generation by restaurants
Above list is the top revenue generated restaurants list in terms of the cost of the order
considered.
The top 5 restaurants in the lists are Shake Shack with 3579.53 dollars at the 1st position
followed by The Meatball Shop -2145.21 dollars, Blue Ribbon Sushi-1903.95, Blue Ribbon
Fried Chicken-1662.29 dollars and Parm-1112.76 dollars as the total revenue for the orders
placed.
American cuisine type is the most popular out of all the other varieties and it gives a major
contribution in terms of the revenue generation for the top restaurant Shake Shack with 219
order counts.
American, Italian and Japanese the 3 most frequent cuisine types out of top 5 restaurants
which holds the share of 33.22% of the total cost of the order in the given data set.
Based on the data analysis from the above graph, the highest rating rank is 3 which has been
delivered with the time line from 23.90 minutes to 25.25 minutes.
Both the rating scale where as 5 and ‘Not given’ acts similarly with respect to graph where
time taken to deliver the products are same from 23.80 minutes to 24.58 minutes.
We observe that the customers are rated 4 ratings for the fastest delivered orders with the
mean time as 23.90 minutes.
Action to be taken is to deliver the orders fast in order to obtain the good ratings of 4 to
increase the overall revenue generation.
17 | P a g e
3.1.6 Rating Vs Food preparation time
Based on the data analysis from the above graph, it is evident that 3 is the highest ranging
rating review ranges from 26.8 minutes to 28 minutes which means most of the customers
Delivered with this time line has rated 3.
Both the rating scale whereas 5 and ‘Not given’ holds same food preparation time and are
parallel from 27 minutes to 27.78 minutes also has the same mean time as 27.39 mins.
The second highest ratings people prefer is to be 5 and it also has the nearly same mean
value as 3 ratings approx. 27.38 mins.
it can be suggested that to reduce the food preparation time to minimum which helps in
improving the rating reviews by the customer.
18 | P a g e
3.1.7 Rating Vs Cost of the order
Based on the data analysis from the above graph, it is evident that 3 is the highest rating
review ranges from 15.10 rupees to 17.45 rupees which means most of the customers rated
3 for their orders delivered in terms of the cost of the product range.
We can see a variation with respect to the ratings for 5 and ‘Not given’ variables with respect
to other numerical bivariate analysis feature.
It looks like cost of the product does not influence the rating review of the customer.
As per the data around 736 people prefer not to rate their experience which will affect the
sales and revenue.
Around 588 people has rated 5 for the sum of the ordered cost value 9975.83, 388 people
have rated 4 for the sum of the ordered cost value 6450.19 and 188 people have rated 3 for
the sum value 3049.99 rupees.
19 | P a g e
3.1.8 Correlation among variables
Most of the cuisine types are right skewed in terms of the cost which means that the order
values are at the lower cost level.
Most of the cuisine types food has been prepared with the average mean time between 25
to 28 minutes of duration.
The average mean time took to deliver the orders during weekend and weekdays is of
5minutes difference, it just takes an average of 22.5 minutes for a delivery partner to
Deliver the food.
Top 3 restaurants hold the major share of 33% in the net revenue generation.
3 rating is the most ranged review used by the customers within time, cost and food
preparation values.
There is a very slight correlation between food preparation time and the delivery time.
We need to reduce the delay in the food preparation and delivery time which will help us to
meet the expectation of a customer with the timely and quality food delivery.
20 | P a g e
4. The company wants to provide a promotional offer in the
advertisement of the restaurants. The condition to get the offer is that
the restaurants must have a rating count of more than 50 and the
average rating should be greater than 4. Find the restaurants fulfilling
the criteria to get the promotional offer
21 | P a g e
The above 4 restaurants found to be the rating count more than 50 and also the
same restaurants fall under the category of average rating value greater than 4.
Only 4 restaurants found to be the eligible restaurants to run the promotional
Offers which meets the criteria of the rating counts more than 50 and the
Average rating greater than 4.
The 4 restaurants are Shake Shack -133 rating counts, The Meatball Shop
-84, Blue Ribbon Sushi -73, Blue Ribbon Fried Chicken -64.
The 4 restaurants are Shake Shack -4.27 average ratings, The Meatball Shop
-4.51, Blue Ribbon Sushi -4.21, Blue Ribbon Fried Chicken -4.32.
5. The company charges the restaurant 25% on the orders having cost
greater than 20 dollars and 15% on the orders having cost greater
than 5 dollars. Find the net revenue generated by the company across
all orders.
After adding the charges as per the company’s criteria, a new column of revenue is
added to the data table as per the calculations on the cost of the order criteria.
The total net revenue of the company after adding 25% charges for the orders above
20 dollar and 15% charges for the order greater than 5 dollars is 6166.3 dollars.
The total net revenue of the order is 31314.82 out of which after adding the
company’s charges based on the order value the net revenue of the company
is 6166.3 dollars.
The 19.69% of the revenue amount goes to the company out of the total revenue
generated from the orders placed.
22 | P a g e
6. The company wants to analyse the total time required to
deliver the food. What percentage of orders take more than 60
minutes to get delivered from the time the order is placed? (The food
has to be prepared and then delivered.)
Figure 6 Percentage of orders that takes more than 60 minutes to deliver the order
The total number of orders that takes more than 60 minutes to deliver the orders
from the time the orders placed is 200.
The total order placed is 1898 for which the percentage of the orders that got
delivered more than 60 minutes is 10.54%.
23 | P a g e
Fig 6 Mean delivery time Vs Day of the week
Based on the data analysis from the above graph, it is evident that most of the orders have
not been rated for all cuisine types that has been delivered to the customers.
The second largest rated review is 5 and it has the same consistency through the top-rated
cuisine types.
24 | P a g e
Southern type cuisine has the rating 5 which is greater than the count of the ‘not rated’
reviews and the highest number of ratings for this cuisine is 4.
Vietnamese has the least number of ratings with respect to all variables compare to the
Entire cuisine types of 14 variants.
Mediterranean cuisine type has the highest rating count for 5 followed by the ‘Not given’
Feature,3 and 4 ratings.
Conclusions:
This analysis provides a valuable insight into the food hub chain business with the customer
base and their ordering behaviour which plays a vital role in enhancing the customer
experience which will directly impact on the rating of the orders in order to develop the
revenue of the company.
The company can provide a few types of promotional and marketing strategic offers to
change the ordering behaviour of the customers during weekdays as well.
Understanding the impact of the ratings of the order placed, company can consider to give
exceptional offers for the customers who does not rate their experience.
The top restaurants among the list contributes the big junk of piece for the revenue
generation.
The most popular cuisine type is the main source of revenue generations for the number of
orders placed.
Food preparation time and delivery time are correlated since the total delivery time is
dependent on these two factors which need to be maintained and should make most use of
meeting the target within the time line.
Recommendations:
Based on the data analysed from the above graphs and calculations, it can be suggested to
add more values in generating the revenue is to by improving the rating conversion ratio.
It will be a great insightful difference if the low rated reviews had to be considered in terms
of the comment shared and escalations need to be fixed with the solution to avoid the future
low-rate reviews for the orders going to be placed.
It is recommended to work on the reviews which has ratings of 3 and 4, quality team has to
spend time on analysing the root cause of the specific rating and solving it depending on the
delivery partner issue or the food quality issue.
It is suggested to divide the root cause for the low ratings to two domains:
1. Food delivery partner: if the rating has a comment with respect to the order
delivery time, delivery partner behaviour/ door step delivery rejections, to identify
the exact reason and fix those issue which helps in acquiring good rating for the
upcoming orders.
25 | P a g e
2. Food delivered issue: if the comment is related to the food type with respect to
taste, condition of the food, packaging/spill issue these needs to addressed and
solved at the kitchen and the delivery partner level.
To run a special promotional offer in terms of the push notification for the customer those
have refused to rate their orders by offering a discount on the next order or even by offering
a freebie for the orders for the rating commitment which will help in increasing the rating
scale at the cuisine level and which will directly influence the sales of the business.
We can reduce the total delivery time of the food for the placed order which will avoid the
escalations on the late delivery and temperature of the food loss, this can be done by
keeping the food preparation items on the go at the kitchen hub and also assigning a near by
delivery locations to the delivery partner.
To run an exciting offer like buy1 get1, 50% off, Free product promotion and minimum order
value discount as additional offers during the weekdays to improve the overall business on
those days as well.
At last, we can add few new introductory dishes to the top cuisine types at the top-rated
restaurants with minimal range of the cost for the products to make the business run more
efficiently on the long run and can completely stop preparing the least ordered cuisine type
Foods.
26 | P a g e