SlideShare a Scribd company logo
Tu Pham - CTO @ Eway
Google Cloud Next
-- Surabaya, Indonesia - 06/2019 --
End to End Business Intelligence
on Google Cloud
About Me - CTO at Eway JSC
- Google Developer Expert on Cloud
Platform
- 8 years experience on Big data and
Cloud Computing
- Open source contributor, blogger,
father
2
3
4
5
6
7
8
9
10
11
In Affiliate Marketing, We Are Partners With
- Indonesia
- Go-Jek
- Bukalapak
- Traveloka
- The World
- Lazada
- Shopee
- Aliexpress
- Adcombo
- Leadbit
12
> 5M
Transactions
in 2018
13
>100M dollar
Gross
merchandise
volume in
2018
14
15
16
When You
Have (Big)
Data
17
How Do We
Use This
Data
18
Use Case
- Reporting
- Business Analytics
- Operational Analytics
- Product Features
- System Monitoring
19
- Reporting to Partners, Advertisers,
Publishers, ...
Reporting
20
Business
Analytics
- Analyzing Growth, Users behavior,
Sign up funnels, Sign up referrals
21
Operational
Analytics
- Analyzing Root cause analysis,
Latency analysis, Error analysis
22
Operational
Analytics
- Better Threshold alerts, Security
alerts, Capacity planning
23
Product
Features
- Product Features Top Products ,
Publisher challenge, A/B Testing
24
25
Sample: End-To-End Flow For Mining
User Behavior
26
How do we
collect this
data?
27
Step 1: GC Compute Engine Instances
Collect Raw Data
- Technology: Cloud Load Balancing, Compute Engine
- Why Cloud Load Balancing:
- TCP/UDP Load Balancing
- Seamless Autoscaling
- Scalable
- Why Compute Engine:
- High-Performance
- Scalable
- Low Cost
- Fast Networking
- Custom Machine Types 28
Step 1: GC Compute Engine Instances
Collect Raw Data
29
How do we
process this
data?
30
Step 2: GC Compute Engine Instances
Convert Raw Data To Apache Parquet Files
- Technology: Compute Engine, Parquet file format
- Why Parquet:
- Self-describing, columnar storage format
- Language-independent
- High query-performance
- Spark SQL is much faster with Parquet
- High compression (up to 70%)- less disk IO
31
Step 2: GC Compute Engine Instances
Convert Raw Data To Apache Parquet Files
32
Step 2: GC Compute Engine Instances
Convert Raw Data To Apache Parquet Files
33
- Technology: Compute Engine, Parquet file format, Cloud Storage
- Why Cloud Storage:
- Four storage classes
- Easy to integrate
- Object Lifecycle Management
- Fast Networking
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
34
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
35
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
36
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
37
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
38
How do we
visualize this
data
39
Step 4: Explore Dataset Using BI Tools
- Technology: DataPrep, Big Query, Grafana, PowerBI
40
Step 4: Explore Dataset Using BI Tools
- Technology: DataPrep
41
Step 4: Explore Dataset Using BI Tools
- Technology: BigQuery
42
Step 4b: Explore Dataset Using BI Tools
43
44
45
Step 4: Explore Dataset Using BI Tools
- Technology: Grafana
46
Step 4: Explore Dataset Using BI Tools
- Technology: PowerBI
47
Step 4b: Explore Dataset Using BI Tools
48
Step 5: Aggregate Data
> val df = spark.read.parquet(“/log/2017/07/user_engagement/1.snappy.parquet”)
49
> df.show()
+----------------------------+---------------------------+------------------------------+-----------------------------+---------+---------+------------------
+
| id | categoryId | topicId | userId | action | value | created |
+----------------------------+---------------------------+------------------------------+-----------------------------+---------+---------+------------------
+
|"100011125479181_..|"253397751448382"|"253397751448382_...| "100011125479181 "|"view"| "" |1490621079|
|"100004354358107_..|"253397751448382"|"253397751448382_...| "100004354358107"|"view"| "" |1490491531|
|"100014752680147_..|"253397751448382"|"253397751448382_...| "100014752680147"|"like"| "" |1490457109|
Step 5: Aggregate Data
50
> val df_group_count = df.groupBy("userId","categoryId", "action").count().show()
+--------------------------+---------------------------+----------+----------+
| userId | categoryId | action | count |
+--------------------------+---------------------------+----------+----------+
|"100011896037126"|"253397751448382"|"like" | 2 |
|"100010391178709"|"253397751448382"|"like" | 1 |
|"100011186707422"|"253397751448382"|"like" | 1 |
|"100012202096674"|"253397751448382"|"like" | 1 |
Step 5: Aggregate Data
51
Step 5: Aggregate Big Data
52
Step 5: Aggregate Big Data
Number of unique user per topic AVG User engagement per topic
53
Become
Geek
54
Where Are
The AI /
ML
55
Create
Your
Principles
Principles:
- KISS (Keep it simple, stupid)
- DRY (Don’t Repeat Yourself)
- Single Responsibility
- Low Cost
- Scalable
56
Be 1% better everyday
tips
Create your system
principles
Design system
architecture, data flow,
data model, data
structure first
Separate realtime and
batch flows
Separate data storage
strategies between data
types
Save the cost by
network cost, instances
cost, storage cost by
metric monitoring &
alert system 57
Thank You - Q&A
● Eway: https://eway.vn
● My Contact: tupp@eway.vn
58

More Related Content

What's hot (20)

PDF
Google and big query
QlikView-India
 
PDF
Cloud Developer Days - BigQuery
Wlodek Bielski
 
PPTX
30 days of google cloud event
PreetyKhatkar
 
PDF
Google cloud big data summit master gcp big data summit la - 10-20-2015
Raj Babu
 
PDF
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Chris Jang
 
PPTX
Google Cloud Platform (GCP)
Chetan Sharma
 
PDF
Visualising and Linking Open Data from Multiple Sources
Data Driven Innovation
 
PDF
An overview of BigQuery
GirdhareeSaran
 
PDF
Google Bigtable
GirdhareeSaran
 
PDF
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
Kai Wähner
 
PPTX
Big Data Best Practices on GCP
AllCloud
 
PDF
StackEngine Demo - Docker Austin
Boyd Hemphill
 
PDF
Critical Breakthroughs and Challenges in Big Data and Analytics
Data Driven Innovation
 
PDF
#DataUnlimited - Google Big Data Unlimited
Audrey Huvet
 
PPTX
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Imam Raza
 
PDF
Google Cloud Platform Introduction - 2016Q3
Simon Su
 
PDF
IoT at Google Scale
James Chittenden
 
PDF
Google BigQuery - Features & Benefits
Andreas Raible
 
PDF
Containerizing the Cloud with Kubernetes and Docker
James Chittenden
 
PPTX
Google Cloud Platform: Prototype ->Production-> Planet scale
Idan Tohami
 
Google and big query
QlikView-India
 
Cloud Developer Days - BigQuery
Wlodek Bielski
 
30 days of google cloud event
PreetyKhatkar
 
Google cloud big data summit master gcp big data summit la - 10-20-2015
Raj Babu
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Chris Jang
 
Google Cloud Platform (GCP)
Chetan Sharma
 
Visualising and Linking Open Data from Multiple Sources
Data Driven Innovation
 
An overview of BigQuery
GirdhareeSaran
 
Google Bigtable
GirdhareeSaran
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
Kai Wähner
 
Big Data Best Practices on GCP
AllCloud
 
StackEngine Demo - Docker Austin
Boyd Hemphill
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Data Driven Innovation
 
#DataUnlimited - Google Big Data Unlimited
Audrey Huvet
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Imam Raza
 
Google Cloud Platform Introduction - 2016Q3
Simon Su
 
IoT at Google Scale
James Chittenden
 
Google BigQuery - Features & Benefits
Andreas Raible
 
Containerizing the Cloud with Kubernetes and Docker
James Chittenden
 
Google Cloud Platform: Prototype ->Production-> Planet scale
Idan Tohami
 

Similar to End To End Business Intelligence On Google Cloud (20)

PDF
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Edwin Poot
 
PDF
Google на конференции Big Data Russia
rusbase.vc
 
PPTX
GDSC Cloud Jam.pptx
GDSCIITBhilai
 
PPTX
Eric Andersen Keynote
Data Con LA
 
PDF
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
PPTX
Big data solutions on cloud – the way forward
Kiththi Perera
 
PPTX
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Kiththi Perera
 
PDF
Cloud Computing for Data Professionals
Ankit Rathi
 
PDF
Make Data Work for You
DATAVERSITY
 
PDF
Navigating Your Data Landscape With Siddharth Desai and Elena Cuevas | Curren...
HostedbyConfluent
 
PDF
Modern Thinking área digital MSKM 21/09/2017
MSMK - Madrid School of Marketing
 
PDF
Google Cloud Platform at Vente-Exclusive.com
Alex Van Boxel
 
PPTX
Florian Pertynski session at Google Partner Summit Review
IIHEvents
 
PPTX
IARE_BDBA_ PPT_0.pptx
AIMLSEMINARS
 
PDF
GigaOM Putting Big Data to Work by Brett Sheppard
Brett Sheppard
 
PPTX
Empower customer success at LinkedIn with advanced analytics and great visual...
Michael Li
 
PPT
The Cloud Enabled Business
cloudPWR
 
PPTX
Big Data as a Service
skilledanalysts
 
PDF
Serhii Kholodniuk: What you need to know, before migrating data platform to G...
Lviv Startup Club
 
PDF
Slides: Success Stories for Data-to-Cloud
DATAVERSITY
 
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Edwin Poot
 
Google на конференции Big Data Russia
rusbase.vc
 
GDSC Cloud Jam.pptx
GDSCIITBhilai
 
Eric Andersen Keynote
Data Con LA
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
ConnectaDigital
 
Big data solutions on cloud – the way forward
Kiththi Perera
 
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Kiththi Perera
 
Cloud Computing for Data Professionals
Ankit Rathi
 
Make Data Work for You
DATAVERSITY
 
Navigating Your Data Landscape With Siddharth Desai and Elena Cuevas | Curren...
HostedbyConfluent
 
Modern Thinking área digital MSKM 21/09/2017
MSMK - Madrid School of Marketing
 
Google Cloud Platform at Vente-Exclusive.com
Alex Van Boxel
 
Florian Pertynski session at Google Partner Summit Review
IIHEvents
 
IARE_BDBA_ PPT_0.pptx
AIMLSEMINARS
 
GigaOM Putting Big Data to Work by Brett Sheppard
Brett Sheppard
 
Empower customer success at LinkedIn with advanced analytics and great visual...
Michael Li
 
The Cloud Enabled Business
cloudPWR
 
Big Data as a Service
skilledanalysts
 
Serhii Kholodniuk: What you need to know, before migrating data platform to G...
Lviv Startup Club
 
Slides: Success Stories for Data-to-Cloud
DATAVERSITY
 
Ad

More from Tu Pham (20)

PDF
Multimodal Search in Google Cloud: LLMs with vision
Tu Pham
 
PPTX
From CTO To CEO: The Pathway and Rewards
Tu Pham
 
PPTX
Go from idea to app with no coding using AppSheet.pptx
Tu Pham
 
PDF
Secure your app against DDOS, API Abuse, Hijacking, and Fraud
Tu Pham
 
PDF
Challenges In Implementing SRE
Tu Pham
 
PDF
IT Strategy
Tu Pham
 
PDF
Set up Learn and Development program
Tu Pham
 
PDF
Cost Management For IT Project / Product
Tu Pham
 
PDF
Minimum Viable Product 101
Tu Pham
 
PDF
Understand your customers
Tu Pham
 
PDF
Let's build great products for mid-size companies
Tu Pham
 
PDF
Latency Control And Supervision In Resilience Design Patterns
Tu Pham
 
PDF
High Output Tech Management
Tu Pham
 
PDF
Security On The Cloud
Tu Pham
 
PPTX
Eway Tech Talk #2 Coding Guidelines
Tu Pham
 
PPTX
Eway Tech Talk #0 Knowledge Sharing
Tu Pham
 
PPTX
Php 5.6 vs Php 7 performance comparison
Tu Pham
 
PDF
System Security on Cloud
Tu Pham
 
PDF
Big Data at DYNO
Tu Pham
 
PDF
Understanding Kubernetes
Tu Pham
 
Multimodal Search in Google Cloud: LLMs with vision
Tu Pham
 
From CTO To CEO: The Pathway and Rewards
Tu Pham
 
Go from idea to app with no coding using AppSheet.pptx
Tu Pham
 
Secure your app against DDOS, API Abuse, Hijacking, and Fraud
Tu Pham
 
Challenges In Implementing SRE
Tu Pham
 
IT Strategy
Tu Pham
 
Set up Learn and Development program
Tu Pham
 
Cost Management For IT Project / Product
Tu Pham
 
Minimum Viable Product 101
Tu Pham
 
Understand your customers
Tu Pham
 
Let's build great products for mid-size companies
Tu Pham
 
Latency Control And Supervision In Resilience Design Patterns
Tu Pham
 
High Output Tech Management
Tu Pham
 
Security On The Cloud
Tu Pham
 
Eway Tech Talk #2 Coding Guidelines
Tu Pham
 
Eway Tech Talk #0 Knowledge Sharing
Tu Pham
 
Php 5.6 vs Php 7 performance comparison
Tu Pham
 
System Security on Cloud
Tu Pham
 
Big Data at DYNO
Tu Pham
 
Understanding Kubernetes
Tu Pham
 
Ad

Recently uploaded (20)

PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PPTX
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PPTX
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
PDF
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Per Axbom: The spectacular lies of maps
Nexer Digital
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
The Future of Artificial Intelligence (AI)
Mukul
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
Market Insight : ETH Dominance Returns
CIFDAQ
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Farrell_Programming Logic and Design slides_10e_ch02_PowerPoint.pptx
bashnahara11
 
RAT Builders - How to Catch Them All [DeepSec 2024]
malmoeb
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Per Axbom: The spectacular lies of maps
Nexer Digital
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 

End To End Business Intelligence On Google Cloud