Data Science For Executives
Data Science For Executives
Inspektorat Jenderal
Kementerian Keuangan
September 2019
Denny, Ph.D.
Short Bio
Education
Professional
Experiences
◼ Value:
◼ provide the best experience to the customers
◼ Objective:
◼ Help customer to meet his/her GOJEK driver — without a
single call
3
Gojek: Pickup Points
4
Gojek: Clustering Pickup Points
5
Gojek: Clustering Pickup Points
6
Naming Frequent
Place of Interest?
7
Automatic naming from booking-text
data
8
Challenge?
9
DATA SCIENCE / ANALYTICS
10
Data Science, Data Mining, Big Data
◼ Data science is the same concept as data mining and big data:
"use the most powerful hardware, the most powerful
programming systems, and the most efficient algorithms to
solve problems“
11
Data Mining
12
Data Mining vs Database
13
Business Value of Analytics (ATO)
◼ Fraud detection
◼ Identify High Risk Refund
• Previous practice simple business rules based on experience:
• Total claimed investment deductions > $N
• Ratio of self education deductions to total income > N
• Total international transfers > N times taxable income
• Luxury vehicle purchase $M > N times taxable income
• Use modelling
• regression, decision trees, random forests
• increase Tax revenue
14
DATA MINING CAPABILITY
16
Market Basket Analysis, Association
Rule Mining
17
Classification
◼ predict the target class (categorical) for each case
in the data
◼ Examples:
◼ Customer credit rating (high risk vs low risk)
◼ Identify tourist who will violate their terms
◼ tax case selection (SPT)
18
Classification
19
Tax Case Selection
20
Regression
◼ predict continuous value for each case in the data
◼ example:
◼ estimate value of adjustments
21
Cluster Analysis
22
Cluster Analysis and Visualizations
Component Plane: Employee Market Component Plane: lodge through e-tax
23
23
Clustering: Identifying Hot Spots /
Outlier
Distance Matrix Visualizations
Component Plane: Count of Debt Cases Component Plane: Count of Debt Cases Paid
Outlier Analysis
Component Plane: Count of Debt Cases Component Plane: Count of Debt Cases Paid
A B
Component Plane: SEIFA Distance Matrix Visualizations
C
25
Time Series
26
Data Mining Should Not be Used
Blindly
◼ Data mining find regularities from history, but history is not
the same as the future.
◼ Concept drift
◼ Population drift
27
Data Scientist
28
Challenges
30
DATA MINING PROCESS
31
Data Mining Process: CRISP-DM
32
Business Understanding
◼ understand what you want to accomplish from a
business perspective
◼ unwise to commit to data science without assessing its value
◼ the expected value “lift” enhanced insight and decision
making, as compared to
◼ the total cost of operations
◼ Value:
◼ increase revenue
◼ decrease cost
◼ improve the customer experience
◼ reduce risks and increase compliance
◼ increase productivity
34
Data Understanding
◼ acquire the data
listed in the project
resources
◼ describe data,
explore data, verify
data quality
◼ Data Management
◼ Data Dictionary
35
Data Understanding
36
CRISP DM: Data Preparation
◼ Data cleaning and preprocessing
◼ may take 60% of effort!
38
Evaluation
39
Deployment
◼ Deploy to operational
40
THANK YOU
41