0% found this document useful (0 votes)

400 views

Capstone Project Final Report

This report analyzes potential locations for a specialty coffee shop in Jakarta, Indonesia. The client wants a location with less competition, close to their central Jakarta supplier, and in an area with adequate population. The report clusters Jakarta districts using K-Means based on venue data. It finds 4 clusters and maps them, then analyzes each cluster's common venues. Recommended locations fulfill the client's criteria of proximity, population, and cluster type with less competition.

Uploaded by

Hajid Naufal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

400 views

Capstone Project Final Report

Uploaded by

Hajid Naufal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

CAPSTONE PROJECT

THE BATTLE OF NEIGHBORHOOD

PROJECT REPORT
Hajid Naufal Atthousi, 2020
1. Introduction

1.1. Background
The potential of the coffee industry in Indonesia is quite large. As a tropical country,
Indonesia is a suitable location for coffee cultivation. Therefore, the cultivation and
management of Indonesian specialty coffee is a strategic step that must continue to be
developed. The culture of drinking coffee has indeed spread to various countries not only in
Indonesia. There is a very high demand for coffee which makes the coffee business
opportunity more profitable. The coffee business has become one of the businesses that has
been taken into account where the beverage product is in high demand.

Jakarta is one of the cities in Indonesia which has a lot of coffee shops. From 2013 until the
end of 2018, there have been several coffee shops spreads around every corner of the capital
city of Jakarta, even from several locations of offices, schools, or campuses. this report will
be targeted to a client interested in opening a Coffee shop in Jakarta, Indonesia.

1.2. Business Problem

The client is interested to open a specialty coffee shop in Jakarta. Unfortunately, he has issue
on making a decision about the location to open the coffee shop. Though, the client is quite
optimist of his homegrown specialty coffee. His first issue is that he wanted to know which
place has lesser competition so that he can grow his business in a stable pace without
fighting over customer, whether it is battle between coffee shops or other kind of cafes or
restaurant. The second concern is that he wanted the place to not very far away from his
supplier in the central Jakarta to minimize the time in retrieving the supply from the supplier.
Finally, last but not least, he wanted the place to have an adequate population. So, where
will I recommend the best place for him to open the coffee shop?

1.3. Target interest

Personal client who wants to gain insight about the best location to build a coffee shop in
Jakarta according to his concerns.

2. Data Acquisition and pre-processing

2.1. Data choice

In order to solve the problem, I need a precise data that can tell the population of each
district. Furthermore, the data should also can tell the neighborhood within each district
since that data will be used on the last section to see the distance on each neighborhood
from the central Jakarta (supplier's place) and the population within the neighborhood.
So, I will use the following data:
• Dataset from Jakarta Open Data. I choose to use this data since it is the most up to date
within the site. This data consists of:
- The name of districts and neighborhoods
- The spread of population based on gender (Male and Female)
- The spread of population based on age (from 0 to above 75 years old with 4 years
step)
- The cities, districts and neighborhood of those population’s spread
• Latitude and Longitude from geopy.geocoders package that will be cast on each data
• Venues list that I can get from real-time foursquare API

2.2. Data acquisition, cleaning and pre-processing

On the Dataset from Jakarta Open Data, I will sort the data and group it by district, before I
sum the population to get a new column, I will drop the population whose age is in the range
0-4 since those population is rather out from the target market (in case growth hacking is
needed), another opinion as why I don't drop the age >75 is because from my personal
experience some of those people are indeed still drink coffee in Indonesia. At this point, I
will have a sorted data about Jakarta grouped by district. I also rename the column so the
client can understand what the dataframe means. After that, I will group the dataframe by
district, applying join function on the neighborhood and sum up the population for each
district. This is how the first 5 rows of the dataframe looks after I have done the steps:

The next thing I will do is to use geocoder package so I can cast geocoder.arcgis function to
retrieve all the location's latitude and longitude in a single for looping and then append it to
the list and make a new column with the list of latitude and longitude of each districts. You
can see result at the dataframe below
In order to make things easier for later analysis, I will retrieve the approximate distance from
the supplier’s location for each district by using haversine formula. The supplier’s latitude
and longitude are at (-6.171009, 106.852772). Here’s the dataframe after I use haversine
formula for each district. This dataframe is the final pre-processed dataframe that later will
be used for the next step.

In order to make sure that my dataframe can be plotted into a map, I will use folium package
to make the map from my current dataframe. the coordinate of Jakarta can be found by
using Nominatim. Here’s what the map looks like with my current dataframe. The label in
the map have all description of the dataframe for each district.

After I have confirmed that the map looks perfectly fine, the next step is to get the data of
nearby venues by using foursquare API. By passing in the query needed to make the call, I
can get the nearby venues for each district. The table below shows the total number venues
returned for some districts.
This dataset will be used to get the average frequency for each venue category within
districts in the explanatory data analysis section.

3. Methodology

3.1. Explanatory data analysis

3.1.1. One hot encoding for venues dataset

One hot encoding on jakarta venues dataframe is necessary. This one hot encoding
will be used to get average frequency for each venue categories by using mean
function. Panda get dummies function will be used to get the one hot encoding for
the dataset. This is how the data set looks after I apply one hot encoding.

3.1.2. Getting average frequency for each venue category

The dataset of one hot encoding will be averaged to get the frequency of each venue
category within district. This average frequency dataset will be used later for my
modeling phase. Here’s how the dataset looks.
3.1.3. Checking top 5 venues for general overview
To get the general overview of the frequency we can use for looping code below. This
step is used for further analysis when K-Means cluster has finished generating its
result. This will improve my understanding of why the cluster is leaned to be labeled
that way.

3.2. Modeling
After I have done all necessary analysis and data to be inserted to my model, the next thing
to do is using unsupervised machine learning algorithm which is clustering. The algorithm
that I choose is K-Means. Based on the frequency, I found that choosing K-Means is actually
preferable in this problem rather than DBScan. From what I have researched, DBscan
doesn't work well with datasets that have large difference in densities. You will notice if you
look at my notebook, specifically at the results of the code above, some districts have a very
low densities compared with other districts.

3.2.1. Finding best K by using silhouette method

Before I run my K-Means model, I search for the best K first. This can be done by
either using elbow method or silhouette method. For this problem, I choose to do
silhouette method. The graphs below show the best K for this problem after I run
the silhouette method. The result shows that the best K is 4.

3.2.2. K-Means algorithm for clustering

After I got the best K, I will pass it to the K-Means algorithm provided by scikit learn cluster.
The dataset that will be fitted in this algorithm is the dataset of average frequency of each
districts in Jakarta.

4. Results
In order to make it easier for the client to see the result, I will plot the map that can show
the cluster and its description. There will be some steps to achieve my desired map.

4.1. Constructing dataframe that shows top common venues for each district
This step is necessary to get the main idea of most common venues within each district. For
this case I will return 7 most common venues. Later, this dataframe will be used in parallel
with the previous for looping function that can return the average frequency of each district
for result analysis section. Here’s how the dataframe looks.

4.2. Appending cluster labels to the previous dataframe

The cluster labels that were generated by K-Means will be appended to the previous
dataframe. This variable within dataframe will be used to make a map and also will be used
for the cluster analysis later on. This is what the dataframe looks for the first rows.
4.3. Generate cluster map for visualization
This map will be the visualization for the cluster map for each district in the dataframe with
their own description. This visualization will help the client to easily understand the cluster
spread in Jakarta.
5. Analysis and Discussion
In this section, I will provide an analysis about the district within each cluster. After the analysis,
I will pick some suitable places and see its distance and population for comparison in order to
fulfill the client’s second and third concern.

5.1. Cluster analysis

I can access the district within each cluster in the last dataframe to see the result. Thanks to
the silhouette method the number of clusters are 4 with no empty nodes (label 0,1,2,3).
Here’s the district in cluster label 0. (please note I only call the district column and its top
venues)

This is first 5 rows list of districts in the cluster label 1. There is a total of 38 districts in this
cluster

This is the list of districts in the cluster label 2.

This is the district in the cluster label 3

You readers may ask why there are so many districts labeled in cluster label 1. The answer
to that question is within the frequency. I will provide the answer alongside the cluster that
I will pick for further analysis. In this case, I will pick cluster label 0, 2 and 3 since it seems
those clusters will have less competition if my client wants to open a specialty coffee shop.

5.2. Frequency check

I have mentioned about the frequency before to see why the clusters are leaned to be
labeled that way. The code below can check the frequency within each district, this time I
will pick cluster 0,2 and 3 so the reader will get the general idea about what I mean. I will
also provide the screenshot of the results.

You will notice that cluster 2 (CIRACAS, MATRAMAN, PASAR MINGGU, PASAR REBO) are
leaned to be clustered to one of its most recurring venue which is pizza place, while there
are some labels in cluster 1 that also has pizza place, the frequency might differ in the second
most recurring venue (or perharps the first). On the other hand, the CILINCING and TANJUNG
PRIOK district were also in different cluster, if you notice their first and second venue
frequency were quite unique from other clusters. That's why, in my opinion, they were in
their own different clusters. If I want to different description for each label, it would be:
• Label 0 : Districts with moderate level competition with Asian and Donut shop as its
main competitor
• Label 1 : Districts with moderate to high level competition with various unique
venues as its main competitor
• Label 2 : Districts with low to moderate level competition with pizza place as its main
competitor
• Label 3 : Districts with low level competition with seafood restaurant as its main
competitor.

5.3. Picking suitable place

Based from the client's first concern, I will pick the place with less competition. In this case,
I will pick the place with less frequency of cafes, restaurants and other type of veues. From
the results above, I can see from the district MATRAMAN (cluster 2) and CILINCING (cluster
3) has less competition. The other district from cluster 2 will be dropped, since from the
map I can see that district MATRAMAN was close to the supplier place compared to other
districts in the same cluster.

The next step is to compare the distance between MATRAMAN and CILINCING district first,
as it is the second concern of our client. Here’s the comparison of distance between the
two districts.

From the results above, we have a clear winner which is MATRAMAN district. After this, I
will move on to the client's third concern which is the place with adequate population. By
iterating the very first dataframe again, I can obtain the population within each
neighborhood of MATRAMAN district. This time I group it bey neighborhood and its sum of
population. Here’s how the data looks.

5.4. Plot it into graph

In order for our client to get better understanding with ease, I will use bar graph to plot the
result above so that the comparison can be interpreted visually. The graph is shown below.
Hopefully, the client now will have an insight of the most suitable location to build a
specialty coffee shop.

6. Conclusion
In this project, I have analyzed the frequency of venues within each district in Jakarta. I used the
K-Means algorithm to make clusters of those districts. This algorithm is very useful for clustering
and plotting a cluster map in order to help the client to gain better understanding of the market
competition within each district.
From the results, I will recommend the client to open a specialty coffee shop in UTAN KAYU
SELATAN neighborhood which resides in MATRAMAN district. The reasons are:

• MATRAMAN district has less competition compared with other districts.

• MATRAMAN district is closer to the supplier's place compared with other district that also
has less competition.
• UTAN KAYU SELATAN neighborhood in MATRAMAN district is the recommended place to
open the specialty coffee shop because the population within that area is the highest
compared with other neighborhoods in the MATRAMAN district.

7. Future direction
From this project, there are some improvements that can be made to gain a better model and
analysis:
• The analysis above will have different results if you use Google Maps API instead.
Personally, I think Gmaps has more comprehensive data set of Indonesia compared
with foursquare, but the price is too expensive if you just want to do a one time project
like this.
• If, somehow, you use google maps API and see the results have few differences in
densities and you want to have more accurate results (to see whether there is cluster
within clusters), DBScan might be preferred to solve it.
• Elbow method can also be used to retrieve optimum K in Kmeans, this may produce
slightly different result but it is worth to try. You can also set the number iteration in
the KMeans function, the default is 10. If you perhaps want to play with the code, you
can tweak this variable alongside the random state.

Minnesota Court Rules: General Rules of Practice
100% (8)
Minnesota Court Rules: General Rules of Practice
1 page
TN33 California Pizza Kitchen
100% (2)
TN33 California Pizza Kitchen
8 pages
Retention Modeling at Scholastic Travel Company
No ratings yet
Retention Modeling at Scholastic Travel Company
8 pages
Structural Steel Design and Drafting 2nd Edition MacLaughlin
100% (3)
Structural Steel Design and Drafting 2nd Edition MacLaughlin
244 pages
Online Restaurant Management System For Birds and Beans Café Correct Margin
No ratings yet
Online Restaurant Management System For Birds and Beans Café Correct Margin
113 pages
SWOT ANALYSIS OF GOLDSTAR Revised
No ratings yet
SWOT ANALYSIS OF GOLDSTAR Revised
13 pages
Bukas
No ratings yet
Bukas
4 pages
Assignment Cover Sheet: Bachelor of Business (Talented)
No ratings yet
Assignment Cover Sheet: Bachelor of Business (Talented)
14 pages
Problem Set 3
0% (2)
Problem Set 3
3 pages
CAPSTONE DOCUMENT FORMAT 2016-2017 - October PDF
100% (1)
CAPSTONE DOCUMENT FORMAT 2016-2017 - October PDF
35 pages
Capstone Project (PGDM)
No ratings yet
Capstone Project (PGDM)
5 pages
Capstone Project - Configuration Management PDF
No ratings yet
Capstone Project - Configuration Management PDF
24 pages
Eastern Visayas State University: Republic of The Philippines
No ratings yet
Eastern Visayas State University: Republic of The Philippines
5 pages
Developing Examination Management System Senior Capstone Project A Case Study
No ratings yet
Developing Examination Management System Senior Capstone Project A Case Study
7 pages
Capstone Project Report
No ratings yet
Capstone Project Report
25 pages
Capstone Project
No ratings yet
Capstone Project
10 pages
Breaking Into Product Management - Capstone Assignment - MU
No ratings yet
Breaking Into Product Management - Capstone Assignment - MU
7 pages
Capstone Project Report: Entrepreneurship Portal
No ratings yet
Capstone Project Report: Entrepreneurship Portal
56 pages
IoT Based Water Quality Monitoring System For Smart Cities
No ratings yet
IoT Based Water Quality Monitoring System For Smart Cities
6 pages
Case Study (Motorola)
100% (1)
Case Study (Motorola)
3 pages
Updated Case Westover Electrical
No ratings yet
Updated Case Westover Electrical
8 pages
Scope and Limitation of Study
No ratings yet
Scope and Limitation of Study
1 page
Capstone Project
No ratings yet
Capstone Project
63 pages
Ordering System Proposal - Faezal Hasriq
No ratings yet
Ordering System Proposal - Faezal Hasriq
11 pages
Capstone Report Final
No ratings yet
Capstone Report Final
43 pages
TiVo Strategic Paper
100% (4)
TiVo Strategic Paper
36 pages
Case Analysis
No ratings yet
Case Analysis
15 pages
Design and Develop A Food Service Establ
No ratings yet
Design and Develop A Food Service Establ
45 pages
Service Innovation
No ratings yet
Service Innovation
8 pages
Case Study Analysis
93% (29)
Case Study Analysis
32 pages
Bituin: Related Studies
No ratings yet
Bituin: Related Studies
3 pages
(Team 1) Case Study Joan Holtz
No ratings yet
(Team 1) Case Study Joan Holtz
5 pages
Capstone Report
No ratings yet
Capstone Report
35 pages
CSPSS
No ratings yet
CSPSS
25 pages
04 Review 1 - ARG
No ratings yet
04 Review 1 - ARG
1 page
Professional Tour Guiding TP 2
No ratings yet
Professional Tour Guiding TP 2
5 pages
Nurse Reggae Sunsplash
No ratings yet
Nurse Reggae Sunsplash
18 pages
Competition Mapping - Different Antivirus Softwares
No ratings yet
Competition Mapping - Different Antivirus Softwares
58 pages
User Interface and Experience Optimization: An Evaluation of Design Enhancements for the Virtual Learning Environment (VLE)
No ratings yet
User Interface and Experience Optimization: An Evaluation of Design Enhancements for the Virtual Learning Environment (VLE)
14 pages
Telecom Market Research Proposal
No ratings yet
Telecom Market Research Proposal
4 pages
Chapter One 1.1 Background of The Study
100% (1)
Chapter One 1.1 Background of The Study
5 pages
01 - Applied Task Performance Final
No ratings yet
01 - Applied Task Performance Final
4 pages
Capstone Project: Submitted By: Buenalee C. Dela Paz-Mmpa
50% (2)
Capstone Project: Submitted By: Buenalee C. Dela Paz-Mmpa
3 pages
Capstone Virtual Nook Inventory Management System
No ratings yet
Capstone Virtual Nook Inventory Management System
114 pages
01 Performance Task 1 MARKETING
No ratings yet
01 Performance Task 1 MARKETING
3 pages
Case Analysis of Huella Online Travel
67% (3)
Case Analysis of Huella Online Travel
6 pages
Answer Key (Drills) - Understanding-Financial-Statements
No ratings yet
Answer Key (Drills) - Understanding-Financial-Statements
4 pages
03 eLMS Activity 1 - Great Books
No ratings yet
03 eLMS Activity 1 - Great Books
1 page
Healthcare Cost Analysis (Source Code) : Description
No ratings yet
Healthcare Cost Analysis (Source Code) : Description
6 pages
4M's of Marketing
100% (4)
4M's of Marketing
12 pages
Inventory System Summary
No ratings yet
Inventory System Summary
3 pages
Competitors Analysis
No ratings yet
Competitors Analysis
3 pages
Thesis Aga Burger Edited 1
No ratings yet
Thesis Aga Burger Edited 1
88 pages
Capstone Project Report: "Factors Affecting The Trust of Customer in Online Shopping"
50% (2)
Capstone Project Report: "Factors Affecting The Trust of Customer in Online Shopping"
25 pages
Online Car Rental System Proposed Paper
No ratings yet
Online Car Rental System Proposed Paper
12 pages
ZARA CaseStudy Group5 Final
100% (1)
ZARA CaseStudy Group5 Final
16 pages
Pranav Nale - 19070124051 - Design Thinking - Case Study
100% (1)
Pranav Nale - 19070124051 - Design Thinking - Case Study
4 pages
06 Task Performance ENTREP
No ratings yet
06 Task Performance ENTREP
19 pages
Capstone Project - Revised
No ratings yet
Capstone Project - Revised
58 pages
Bakery Management System Software
No ratings yet
Bakery Management System Software
41 pages
01 Activity 1
No ratings yet
01 Activity 1
2 pages
Coursera Capstone Project Final
No ratings yet
Coursera Capstone Project Final
6 pages
Report Capstone Week 4
No ratings yet
Report Capstone Week 4
7 pages
BB Sir - Compact Nov 24 (CAF)-204
No ratings yet
BB Sir - Compact Nov 24 (CAF)-204
1 page
The Potato, The Egg, and The Coffee Beans: Moral of The Story
No ratings yet
The Potato, The Egg, and The Coffee Beans: Moral of The Story
5 pages
Interpol - Int-Global Policing Goals
100% (1)
Interpol - Int-Global Policing Goals
4 pages
Contemporary Asian America third edition A Multidisciplinary Reader Min Zhou (Editor) download
100% (1)
Contemporary Asian America third edition A Multidisciplinary Reader Min Zhou (Editor) download
55 pages
M43 (English)
No ratings yet
M43 (English)
330 pages
Enzyme
No ratings yet
Enzyme
4 pages
Slipform International Brochure
No ratings yet
Slipform International Brochure
16 pages
History Intro - 1
No ratings yet
History Intro - 1
35 pages
Chapter 2 PDF
No ratings yet
Chapter 2 PDF
9 pages
TomTom Interview Guide SWE III IV
No ratings yet
TomTom Interview Guide SWE III IV
4 pages
Complete Download Scaling Analysis in Modeling Transport and Reaction Processes A Systematic Approach to Model Building and the Art of Approximation 1st Edition William B. Krantz PDF All Chapters
100% (3)
Complete Download Scaling Analysis in Modeling Transport and Reaction Processes A Systematic Approach to Model Building and the Art of Approximation 1st Edition William B. Krantz PDF All Chapters
61 pages
Tax Revenue Mobilization Episodes in Developing Countries: Policy Design and Practice
No ratings yet
Tax Revenue Mobilization Episodes in Developing Countries: Policy Design and Practice
30 pages
Transitions 4
No ratings yet
Transitions 4
40 pages
Leslies Resume and Cover Letter
No ratings yet
Leslies Resume and Cover Letter
4 pages
60sGlobalNorthSouthDivide PDF
No ratings yet
60sGlobalNorthSouthDivide PDF
1 page
Poisond - Poisond (A Pirate RPG)
100% (1)
Poisond - Poisond (A Pirate RPG)
28 pages
Relay Testing (Oc&ef Relay) PDF
No ratings yet
Relay Testing (Oc&ef Relay) PDF
4 pages
5.1 Sustainable Development
No ratings yet
5.1 Sustainable Development
8 pages
Case History BS CI Pellets 1
No ratings yet
Case History BS CI Pellets 1
1 page
Solgold PLC Ni 43-101 Technical Report On An Updated Mineral Resource Estimate For The Alpala Deposit, Cascabel Project, Northern Ecuador
No ratings yet
Solgold PLC Ni 43-101 Technical Report On An Updated Mineral Resource Estimate For The Alpala Deposit, Cascabel Project, Northern Ecuador
6 pages
Future of Commerce Trends Report 2023
No ratings yet
Future of Commerce Trends Report 2023
54 pages
2424 Fort Worth Star-Telegram 1908-08-30 11
No ratings yet
2424 Fort Worth Star-Telegram 1908-08-30 11
1 page
Unit 8: T 8.1 Inventions Jeans
No ratings yet
Unit 8: T 8.1 Inventions Jeans
4 pages
Chapter1 Uma Sekaran
No ratings yet
Chapter1 Uma Sekaran
31 pages
jQuery Succinctly 1st Edition by Cody Lindley - Download the entire ebook instantly and explore every detail
No ratings yet
jQuery Succinctly 1st Edition by Cody Lindley - Download the entire ebook instantly and explore every detail
54 pages
2 5 Assignment Absolute Value Functions and Graphs
No ratings yet
2 5 Assignment Absolute Value Functions and Graphs
8 pages
June 2017 (IAL) QP - F1 Edexcel
No ratings yet
June 2017 (IAL) QP - F1 Edexcel
28 pages