SlideShare a Scribd company logo
Effective	use	of	cloud	resources	
for	Data	Engineering
A	guide to	Cloud Computing &	Big	Data
12/2017
Obsessed With Creating Value From Data
Expert	in	Big	Data	Architecture
Certification	and	Recognition	
from	Cloudera
Specialization	in	Telco	Data	
Monetization
Johnson Darkwah
Where are you running your Big Data ?
The vast majority has been
running on premise, on
private infrastructure
33% of enterprises will take their data
lakes off life support.
Forrester Predictions 2018:	The Honeymoon
For AI	Is Over
Initial Cost of a Production
Big Data Management
Platform running On
Premise
200,000 EUR
5,092,000 CZK
Challenges with the Cloud
• Security	remains	a	concern	
for	early	adopters
• Experienced	companies	
focus	on	optimizing	cost
• Data	integration	poses	a	
challenge
• Lack	of	resources
Why use the cloud today for analytics
It’s	never	been	
cheaper	to	process	
and	store	data	in	the	
cloud
It’s	never	been	
easier	to	manage	
cloud	services	
Cost Flexibility
Master
m4.xlarge
1x	1TB	EBS
Worker
m4.2xlarge
4x	1TB	EBS
Worker
m4.2xlarge
4x	1TB	EBS
Worker
m4.2xlarge
4x	1TB	EBS
Worker
m4.2xlarge
4x	1TB	EBS
Worker
m4.2xlarge
4x	1TB	EBS
Data	processing	power	of	32	CPU	cores,	120GB	
RAM	and	the	ability	to	store	up	to	6,5	TB	of	data.
When you treat cloud the same way
as on-premise
Monthly Costs on AWS
AWS	EC2	(Europe	Central)
- Compute:	1932.48	USD
- EBS	Volumes:	1134.00	USD
$	3,066.49	USD
65,371	CZK
AWS	EC2	(Europe	Central)
- Compute:	8725.44 USD
- EBS	Volumes:	4860.00 USD
$	13,585.44 USD
289,605	CZK
Reasonable minimum
spec
Comparable to
production on-prem.
https://calculator.s3.amazonaws.com/index.html
ü Utilize Object Stores
ü Pay only for what we really use
ü Get discounts
ü Optimize the Data Architecture
ü Automate Everything
Cloud Data Engineering Checklist
The	cost	of	storing	a	GB	in	a	Object	Store	(S3	or	ADLS)	is	2/3 of	the	coston	HDFS
ü Utilize Object Stores
ü Pay only for what we really use
ü Get discounts
ü Optimize the Data Architecture
ü Automate Everything
Cloud Data Engineering Checklist
Most	companies	do	not	need	a	full	scale	cluster	24/7.
CPU	Usage	%
ü Utilize Object Stores
ü Pay only for what we really use
ü Get discounts
ü Optimize the Data Architecture
ü Automate Everything
Cloud Data Engineering Checklist
On-Demand Reserved Spot
Similar	models	are	now	also	available	on	Azure	and	GCP.		
Pay	for	compute	capacity	
by	the	hour	with	no	long-
term	commitments
Make	1-3	years	
commitment	and	receive	a	
significant	discount	on	
instance	and	storage	
pricing.
Bid	for	unused	capacity,	
charged	at	a	Spot	price	
that	fluctuates	based	on	
supply	and	demand.
Full	Price Over	50%	off Over	80%	off
Spot Average Prices
ü Utilize Object Stores
ü Pay only for what we really use
ü Get discounts
ü Optimize the Data Architecture
ü Automate Everything
Cloud Data Engineering Checklist
Realtime System	24x7 S3/ADLS M W W W W W W
SQL	Engine
Terraform	/	Cloudera	Altus
Existing	Enterprise
Logs,	CRM,	RDBMS
Gauss
Data	Tool
Clicks IoT Trans.
Temp	cluster
Client	Job
ML,	SQL,	Enrichment,	…
Spot
On-premise
Cloud
RIOSOD
Show me the money …
AWS	EC2	(Europe	Central)
Realtime System	(RI	– 24x7)
- Compute	(3x):	352.52	USD
- EBS	Vol.	(3x1TB):	54.00	USD
Temp
- Compute	(OD	8x7):
- 1x	m4.xlarge:	58.56	USD
- 3x	m4.2xlarge:	351.36	USD
- EBS	Vol.	(9x1TB):	162.00	USD
$	1,165.78	USD
24,852.10	CZK
Spot	support	(3x	– m4.2xlarge	- $0.2/h)
- Compute():	34.34	USD
- EBS	Vol.	(9x1TB):	162.00 USD
https://calculator.s3.amazonaws.com/index.html
8 Hours Compute Daily
ü Utilize Object Stores
ü Pay only for what we really use
ü Get discounts
ü Optimize the Data Architecture
ü Automate Everything
Cloud Data Engineering Checklist
Terraform
• Enables	simple	management	and	automation	
of	cloud	environments
• AWS,	Azure,	GCP,	OpenStack,	…
• Many	provisioning	options
#	Configure	the	AWS	Provider	
provider	"aws"	{	
access_key = "${var.aws_access_key}"	
secret_key = "${var.aws_secret_key}"	
region	= "us-east-1"	
}	
#	Create	a	datanode
resource	"aws_instance"	”datanode"	{	
#	...	
}
Cloudera Altus
• Enterprise	ready	&	fully	managed	
cluster
• Spark,	MRv2	and	Hive	supported
• Support	for	spot	instances
• Uses	S3	or	ADLS	
• Clusters	can	run	only	when	
executing	jobs
• Custom	pre-built	image	(AMI)	
support
• Per	hour	pricing	with	support	
included
ü Utilize Object Stores
ü Pay only for what we really use
ü Get discounts
ü Optimize the Data Architecture
ü Automate Everything
Cloud Data Engineering Checklist
Twitter:	@darkwahj
LinkedIn:	Johnson	Darkwah
Do more with data
& save money.
Questions ?
www.gaussalgo.com

More Related Content

What's hot (20)

PDF
The Shifting Landscape of Data Integration
DATAVERSITY
 
PDF
Slides: Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
PPTX
Journey to Cloud Analytics
Datavail
 
PDF
Cloud Economics
Rackspace
 
PPTX
The Evolution of Data Architecture
Wei-Chiu Chuang
 
PDF
Platforming the Major Analytic Use Cases for Modern Engineering
DATAVERSITY
 
PPTX
Future of Analytics: Drivers of Change
CCG
 
PDF
Using Data Platforms That Are Fit-For-Purpose
DATAVERSITY
 
PDF
Data-Ed Online Presents: Data Warehouse Strategies
DATAVERSITY
 
PDF
Optimizing for Costs in the Cloud
Amazon Web Services LATAM
 
PDF
How to Use a Semantic Layer on Big Data to Drive AI & BI Impact
DATAVERSITY
 
PDF
Data Strategy Best Practices
DATAVERSITY
 
PDF
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
Vasu S
 
PDF
Slides: Accelerating Queries on Cloud Data Lakes
DATAVERSITY
 
PDF
Data Architecture PowerPoint Presentation Slides
SlideTeam
 
PPTX
Power BI Advance Modeling
CCG
 
PDF
Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
PDF
Agile NoSQL With XRX
DATAVERSITY
 
PPTX
Power BI Advanced Data Modeling Virtual Workshop
CCG
 
PDF
Slides: Relational to NoSQL Migration
DATAVERSITY
 
The Shifting Landscape of Data Integration
DATAVERSITY
 
Slides: Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
Journey to Cloud Analytics
Datavail
 
Cloud Economics
Rackspace
 
The Evolution of Data Architecture
Wei-Chiu Chuang
 
Platforming the Major Analytic Use Cases for Modern Engineering
DATAVERSITY
 
Future of Analytics: Drivers of Change
CCG
 
Using Data Platforms That Are Fit-For-Purpose
DATAVERSITY
 
Data-Ed Online Presents: Data Warehouse Strategies
DATAVERSITY
 
Optimizing for Costs in the Cloud
Amazon Web Services LATAM
 
How to Use a Semantic Layer on Big Data to Drive AI & BI Impact
DATAVERSITY
 
Data Strategy Best Practices
DATAVERSITY
 
Case Study - Spotad: Rebuilding And Optimizing Real-Time Mobile Adverting Bid...
Vasu S
 
Slides: Accelerating Queries on Cloud Data Lakes
DATAVERSITY
 
Data Architecture PowerPoint Presentation Slides
SlideTeam
 
Power BI Advance Modeling
CCG
 
Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
Agile NoSQL With XRX
DATAVERSITY
 
Power BI Advanced Data Modeling Virtual Workshop
CCG
 
Slides: Relational to NoSQL Migration
DATAVERSITY
 

Similar to Effective use of cloud resources for Data Engineering - Johnson Darkwah (20)

PPTX
Cloud cost optimization (AWS, GCP)
Szabolcs Zajdó
 
PPTX
Big Data on Cloud Native Platform
Sunil Govindan
 
PPTX
Big Data on Cloud Native Platform
Sunil Govindan
 
PDF
Best Practices for AWS Cloud Cost Optimization
Cloudyn
 
PDF
Big Data - in the cloud or rather on-premises?
Guido Schmutz
 
PDF
PAC 2019 virtual Stefano Doni
Neotys
 
PPTX
Types of Cloud Storage and choosing the right solution
Vrishali Sanglikar
 
PDF
AWS re:Invent re:Cap - 비용 최적화 - 모범사례와 아키텍처 설계 심화편 - 이원일
Amazon Web Services Korea
 
PDF
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
RightScale
 
PPTX
Big data journey to the cloud 5.30.18 asher bartch
Cloudera, Inc.
 
PPTX
How to Lower TCO and Avoid Cloud Lock-in

Cloudera, Inc.
 
PDF
Cloud Big Data Architectures
Lynn Langit
 
PDF
Cloud Storage for all
Tony Ramos de la Torre
 
PPTX
비용을 절감하고 수익을 최대화할 수 있는 클라우드 컴퓨팅 운용 노하우
Amazon Web Services Korea
 
PDF
RightScale Webinar - Tales From the Trenches: Understanding and Managing Clo...
RightScale
 
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
PPTX
Big Data Day LA 2015 - Lessons learned from scaling Big Data in the Cloud by...
Data Con LA
 
PDF
Understanding cloud costs with analytics
RightScale
 
PDF
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Matei Zaharia
 
PDF
The Foundations of Cloud Data Storage
Jan-Erik Finlander
 
Cloud cost optimization (AWS, GCP)
Szabolcs Zajdó
 
Big Data on Cloud Native Platform
Sunil Govindan
 
Big Data on Cloud Native Platform
Sunil Govindan
 
Best Practices for AWS Cloud Cost Optimization
Cloudyn
 
Big Data - in the cloud or rather on-premises?
Guido Schmutz
 
PAC 2019 virtual Stefano Doni
Neotys
 
Types of Cloud Storage and choosing the right solution
Vrishali Sanglikar
 
AWS re:Invent re:Cap - 비용 최적화 - 모범사례와 아키텍처 설계 심화편 - 이원일
Amazon Web Services Korea
 
Should You Move Between AWS, Azure, or Google Clouds? Considerations, Pros an...
RightScale
 
Big data journey to the cloud 5.30.18 asher bartch
Cloudera, Inc.
 
How to Lower TCO and Avoid Cloud Lock-in

Cloudera, Inc.
 
Cloud Big Data Architectures
Lynn Langit
 
Cloud Storage for all
Tony Ramos de la Torre
 
비용을 절감하고 수익을 최대화할 수 있는 클라우드 컴퓨팅 운용 노하우
Amazon Web Services Korea
 
RightScale Webinar - Tales From the Trenches: Understanding and Managing Clo...
RightScale
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Big Data Day LA 2015 - Lessons learned from scaling Big Data in the Cloud by...
Data Con LA
 
Understanding cloud costs with analytics
RightScale
 
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Matei Zaharia
 
The Foundations of Cloud Data Storage
Jan-Erik Finlander
 
Ad

Recently uploaded (20)

PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
PPT
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
PDF
APEX PROGRAMME _ JEE MAIN _ REVISION SCHEDULE_2025-26 (11 07 2025) 6 PM.pdf
dhanvin1493
 
DOCX
Q1_LE_Mathematics 8_Lesson 4_Week 4.docx
ROWELLJAYMALAPIT
 
PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PPTX
GLOBAL_Gender-module-5_committing-equity-responsive-budget.pptx
rashmisahu90
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
DOCX
Online Delivery Restaurant idea and analyst the data
sejalsengar2323
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PDF
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
 
PPTX
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
PPTX
Solution+Architecture+Review+-+Sample.pptx
manuvratsingh1
 
PDF
Basotho Satisfaction with Electricity(Statspack)
KatlehoMefane
 
PPTX
UPS Case Study - Group 5 with example and implementation .pptx
yasserabdelwahab6
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
MR and reffffffvvvvvvvfversal_083605.pptx
manjeshjain
 
From Vision to Reality: The Digital India Revolution
Harsh Bharvadiya
 
APEX PROGRAMME _ JEE MAIN _ REVISION SCHEDULE_2025-26 (11 07 2025) 6 PM.pdf
dhanvin1493
 
Q1_LE_Mathematics 8_Lesson 4_Week 4.docx
ROWELLJAYMALAPIT
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
GLOBAL_Gender-module-5_committing-equity-responsive-budget.pptx
rashmisahu90
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
Online Delivery Restaurant idea and analyst the data
sejalsengar2323
 
short term internship project on Data visualization
JMJCollegeComputerde
 
apidays Munich 2025 - Integrate Your APIs into the New AI Marketplace, Senthi...
apidays
 
M1-T1.pptxM1-T1.pptxM1-T1.pptxM1-T1.pptx
teodoroferiarevanojr
 
Solution+Architecture+Review+-+Sample.pptx
manuvratsingh1
 
Basotho Satisfaction with Electricity(Statspack)
KatlehoMefane
 
UPS Case Study - Group 5 with example and implementation .pptx
yasserabdelwahab6
 
Ad

Effective use of cloud resources for Data Engineering - Johnson Darkwah