SlideShare a Scribd company logo
The Evolving Landscape of
Data Engineering
Andrei Savu - Event Co-organizer
Staff Engineer @ Twitter
Follow me @andreisavu
Andrei Savu
Staff Engineer @ Twitter:
* MoPub Backend & Data Pipelines
* Mobile App Monetization
Co-organizer of the Data Engineering
Club.
Previously Tech Lead at Cloudera via
the Axemblr acquisition. Started the
Cloud engineering team.
The Past:
● OSS communities
● AWS history
● Google Cloud history
The Present: Patterns
The Future: Wish List
Topics
Weeks of Provisioning
Static Infrastructure
Commodity Hardware
Commodity Networking
Data Locality Important
Running in the Public
Cloud was unusual
CAPEX
The Past - OSS
Visionary Business
Fast iterations
Data Management as a
key platform use case
Incredible Scale
Transition to “serverless”
OPEX & Elastic
The Past - AWS
Visionary Products
Fast iterations
Machine Learning as a key
use case
State of the Art data
platform
Last 3 years on fast
forward
Intelligent Billing
OPEX & Elastic
The Past - Google Cloud
The Present: Patterns
Weeks to Minutes to Seconds
Hadoop/Spark ecosystem is mature.
We have a broad set of options.
Big Data is much Bigger (e.g. x1e.32xlarge: 3TB
mem, 128 vCPUs, 14Gbps network)
Scale continues to be hard.
Cloud economics can be very disruptive
(especially for data workloads)
High-performance networks are common.
Storage can be decoupled from compute.
Cluster locality is important.
Service Endpoints (not clusters, aka serverless,
aka managed etc.).
Sophisticated Auto-scaling (batch & streaming,
spot vs. on-demand, multi-az).
Multi-DC and Multi-Region from Day 1.
Various flavors of containers.
The Future: Wish List
A Data Catalog product as the center of the
universe.
Data Monitoring Systems:
* statistical properties, anomaly detection,
schema changes, consumption patterns etc.
More intelligence at the data infrastructure level:
* data format migrations, intelligent caching
based on access patterns.
Declarative data transformation vs. explicit ETL.
Intelligent data sampling products. Scalability
has a cost.
Thanks!
Join the community on Meetup.com!
www.meetup.com/Data-Engineering-Club
www.dataeng.club
Do you want to present? Get in touch.
Feedback #dataengclub
Ad

More Related Content

What's hot (20)

Big Data and ML on Google Cloud
Big Data and ML on Google CloudBig Data and ML on Google Cloud
Big Data and ML on Google Cloud
Wlodek Bielski
 
Hw09 Counting And Clustering And Other Data Tricks
Hw09   Counting And Clustering And Other Data TricksHw09   Counting And Clustering And Other Data Tricks
Hw09 Counting And Clustering And Other Data Tricks
Cloudera, Inc.
 
Hadoop World - Oct 2009
Hadoop World - Oct 2009Hadoop World - Oct 2009
Hadoop World - Oct 2009
Derek Gottfrid
 
Big data groningen
Big data groningenBig data groningen
Big data groningen
Willem Hendriks
 
Momentum v2.0
Momentum v2.0Momentum v2.0
Momentum v2.0
Shamshad Ansari
 
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and SparkReal-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
SingleStore
 
Atmosphere Conference 2015: Oktawave Horizon Project: the future of real-time...
Atmosphere Conference 2015: Oktawave Horizon Project: the future of real-time...Atmosphere Conference 2015: Oktawave Horizon Project: the future of real-time...
Atmosphere Conference 2015: Oktawave Horizon Project: the future of real-time...
PROIDEA
 
End To End Business Intelligence On Google Cloud
End To End Business Intelligence On Google CloudEnd To End Business Intelligence On Google Cloud
End To End Business Intelligence On Google Cloud
Tu Pham
 
Google cloud
Google cloudGoogle cloud
Google cloud
vipin ojha
 
re:Invent re:Peat
re:Invent re:Peatre:Invent re:Peat
re:Invent re:Peat
Steve Houël
 
PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics
SingleStore
 
Autonomous analytics on streaming data
Autonomous analytics on streaming dataAutonomous analytics on streaming data
Autonomous analytics on streaming data
Claudiu Barbura
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
Kai Wähner
 
Satwik resume
Satwik resumeSatwik resume
Satwik resume
Satwik Mishra
 
Cloud computer
Cloud computerCloud computer
Cloud computer
Rajesh Kumar
 
Microservices Live
Microservices LiveMicroservices Live
Microservices Live
Data Driven Innovation
 
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Coburn Watson
 
Automate scalable micro services with Kubernetes on Google Cloud by Billme , ...
Automate scalable micro services with Kubernetes on Google Cloud by Billme , ...Automate scalable micro services with Kubernetes on Google Cloud by Billme , ...
Automate scalable micro services with Kubernetes on Google Cloud by Billme , ...
TeK Charnsilp Chinprasert
 
Near-real time reporting in Azure | 2017 Cloudbrew Radu Vunvulea
Near-real time reporting in Azure | 2017 Cloudbrew Radu VunvuleaNear-real time reporting in Azure | 2017 Cloudbrew Radu Vunvulea
Near-real time reporting in Azure | 2017 Cloudbrew Radu Vunvulea
Radu Vunvulea
 
PhD Projects in Green Cloud Computing Research Guidance
PhD Projects in Green Cloud Computing Research GuidancePhD Projects in Green Cloud Computing Research Guidance
PhD Projects in Green Cloud Computing Research Guidance
PhD Services
 
Big Data and ML on Google Cloud
Big Data and ML on Google CloudBig Data and ML on Google Cloud
Big Data and ML on Google Cloud
Wlodek Bielski
 
Hw09 Counting And Clustering And Other Data Tricks
Hw09   Counting And Clustering And Other Data TricksHw09   Counting And Clustering And Other Data Tricks
Hw09 Counting And Clustering And Other Data Tricks
Cloudera, Inc.
 
Hadoop World - Oct 2009
Hadoop World - Oct 2009Hadoop World - Oct 2009
Hadoop World - Oct 2009
Derek Gottfrid
 
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and SparkReal-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
Real-Time Supply Chain Analytics with Machine Learning, Kafka, and Spark
SingleStore
 
Atmosphere Conference 2015: Oktawave Horizon Project: the future of real-time...
Atmosphere Conference 2015: Oktawave Horizon Project: the future of real-time...Atmosphere Conference 2015: Oktawave Horizon Project: the future of real-time...
Atmosphere Conference 2015: Oktawave Horizon Project: the future of real-time...
PROIDEA
 
End To End Business Intelligence On Google Cloud
End To End Business Intelligence On Google CloudEnd To End Business Intelligence On Google Cloud
End To End Business Intelligence On Google Cloud
Tu Pham
 
PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics PowerStream: Propelling Energy Innovation with Predictive Analytics
PowerStream: Propelling Energy Innovation with Predictive Analytics
SingleStore
 
Autonomous analytics on streaming data
Autonomous analytics on streaming dataAutonomous analytics on streaming data
Autonomous analytics on streaming data
Claudiu Barbura
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
Kai Wähner
 
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Cloud Capacity Planning Tooling - South Bay SRE Meetup Aug-09-2016
Coburn Watson
 
Automate scalable micro services with Kubernetes on Google Cloud by Billme , ...
Automate scalable micro services with Kubernetes on Google Cloud by Billme , ...Automate scalable micro services with Kubernetes on Google Cloud by Billme , ...
Automate scalable micro services with Kubernetes on Google Cloud by Billme , ...
TeK Charnsilp Chinprasert
 
Near-real time reporting in Azure | 2017 Cloudbrew Radu Vunvulea
Near-real time reporting in Azure | 2017 Cloudbrew Radu VunvuleaNear-real time reporting in Azure | 2017 Cloudbrew Radu Vunvulea
Near-real time reporting in Azure | 2017 Cloudbrew Radu Vunvulea
Radu Vunvulea
 
PhD Projects in Green Cloud Computing Research Guidance
PhD Projects in Green Cloud Computing Research GuidancePhD Projects in Green Cloud Computing Research Guidance
PhD Projects in Green Cloud Computing Research Guidance
PhD Services
 

Similar to The Evolving Landscape of Data Engineering (20)

The Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringThe Evolving Landscape of Data Engineering
The Evolving Landscape of Data Engineering
Andrei Savu
 
云计算及其应用
云计算及其应用云计算及其应用
云计算及其应用
lantianlcdx
 
ESA and the Cloud
ESA and the CloudESA and the Cloud
ESA and the Cloud
Netcetera
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
ElsonPaul2
 
AnilKumarT_Resume_latest
AnilKumarT_Resume_latestAnilKumarT_Resume_latest
AnilKumarT_Resume_latest
anil_thyagarajan
 
An Introduction to Data Intensive Computing
An Introduction to Data Intensive ComputingAn Introduction to Data Intensive Computing
An Introduction to Data Intensive Computing
Collin Bennett
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 
Stargate, the gateway for some multi-models data API
Stargate, the gateway for some multi-models data APIStargate, the gateway for some multi-models data API
Stargate, the gateway for some multi-models data API
Data Con LA
 
Cloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs Google
Patrick Pierson
 
Big data on google cloud
Big data on google cloudBig data on google cloud
Big data on google cloud
Tu Pham
 
Deep dive into Google Cloud for Big Data
Deep dive into Google Cloud for Big DataDeep dive into Google Cloud for Big Data
Deep dive into Google Cloud for Big Data
Tu Le Dinh
 
Cloud Computing Fundamentals
Cloud Computing FundamentalsCloud Computing Fundamentals
Cloud Computing Fundamentals
Aravind Udayashankara
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everything
Lew Tucker
 
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017
Jongwook Woo
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
webscale
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
Compare Infobase Limited
 
AWS Cloud Technology And Future of Faster Modern Architecture
AWS Cloud Technology And Future of Faster Modern ArchitectureAWS Cloud Technology And Future of Faster Modern Architecture
AWS Cloud Technology And Future of Faster Modern Architecture
TMME - TECH MEETUP FOR MYANMAR ENGINEERS IN JP
 
Big Data on OpenStack
Big Data on OpenStackBig Data on OpenStack
Big Data on OpenStack
Nati Shalom
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
IdontKnow66967
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
Animesh Chaturvedi
 
The Evolving Landscape of Data Engineering
The Evolving Landscape of Data EngineeringThe Evolving Landscape of Data Engineering
The Evolving Landscape of Data Engineering
Andrei Savu
 
云计算及其应用
云计算及其应用云计算及其应用
云计算及其应用
lantianlcdx
 
ESA and the Cloud
ESA and the CloudESA and the Cloud
ESA and the Cloud
Netcetera
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
ElsonPaul2
 
An Introduction to Data Intensive Computing
An Introduction to Data Intensive ComputingAn Introduction to Data Intensive Computing
An Introduction to Data Intensive Computing
Collin Bennett
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 
Stargate, the gateway for some multi-models data API
Stargate, the gateway for some multi-models data APIStargate, the gateway for some multi-models data API
Stargate, the gateway for some multi-models data API
Data Con LA
 
Cloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs GoogleCloud comparison - AWS vs Azure vs Google
Cloud comparison - AWS vs Azure vs Google
Patrick Pierson
 
Big data on google cloud
Big data on google cloudBig data on google cloud
Big data on google cloud
Tu Pham
 
Deep dive into Google Cloud for Big Data
Deep dive into Google Cloud for Big DataDeep dive into Google Cloud for Big Data
Deep dive into Google Cloud for Big Data
Tu Le Dinh
 
Cloud Computing ...changes everything
Cloud Computing ...changes everythingCloud Computing ...changes everything
Cloud Computing ...changes everything
Lew Tucker
 
President Election of Korea in 2017
President Election of Korea in 2017President Election of Korea in 2017
President Election of Korea in 2017
Jongwook Woo
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
webscale
 
Big Data on OpenStack
Big Data on OpenStackBig Data on OpenStack
Big Data on OpenStack
Nati Shalom
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
Animesh Chaturvedi
 
Ad

More from Andrei Savu (20)

Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015
Andrei Savu
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
Andrei Savu
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
Andrei Savu
 
APIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSFAPIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSF
Andrei Savu
 
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Andrei Savu
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
Andrei Savu
 
Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10
Andrei Savu
 
Creating pools of Virtual Machines - ApacheCon NA 2013
Creating pools of Virtual Machines - ApacheCon NA 2013Creating pools of Virtual Machines - ApacheCon NA 2013
Creating pools of Virtual Machines - ApacheCon NA 2013
Andrei Savu
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
Andrei Savu
 
Axemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x OverviewAxemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x Overview
Andrei Savu
 
2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUG2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUG
Andrei Savu
 
Counters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at HackoverCounters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at Hackover
Andrei Savu
 
Simple REST with Dropwizard
Simple REST with DropwizardSimple REST with Dropwizard
Simple REST with Dropwizard
Andrei Savu
 
Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2 Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2
Andrei Savu
 
Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1 Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1
Andrei Savu
 
Polyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudPolyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the Cloud
Andrei Savu
 
Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011
Andrei Savu
 
Apache Whirr
Apache WhirrApache Whirr
Apache Whirr
Andrei Savu
 
Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36
Andrei Savu
 
Apache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayApache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesday
Andrei Savu
 
Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015Recap on AWS Lambda after re:Invent 2015
Recap on AWS Lambda after re:Invent 2015
Andrei Savu
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
Andrei Savu
 
Introducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data BashIntroducing Cloudera Director at Big Data Bash
Introducing Cloudera Director at Big Data Bash
Andrei Savu
 
APIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSFAPIs & Underlying Protocols #APICraftSF
APIs & Underlying Protocols #APICraftSF
Andrei Savu
 
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS MeetupChallenges for running Hadoop on AWS - AdvancedAWS Meetup
Challenges for running Hadoop on AWS - AdvancedAWS Meetup
Andrei Savu
 
Cloud as a Data Platform
Cloud as a Data PlatformCloud as a Data Platform
Cloud as a Data Platform
Andrei Savu
 
Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10Apache Provisionr (incubating) - Bucharest JUG 10
Apache Provisionr (incubating) - Bucharest JUG 10
Andrei Savu
 
Creating pools of Virtual Machines - ApacheCon NA 2013
Creating pools of Virtual Machines - ApacheCon NA 2013Creating pools of Virtual Machines - ApacheCon NA 2013
Creating pools of Virtual Machines - ApacheCon NA 2013
Andrei Savu
 
Data Scientist Toolbox
Data Scientist ToolboxData Scientist Toolbox
Data Scientist Toolbox
Andrei Savu
 
Axemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x OverviewAxemblr Provisionr 0.3.x Overview
Axemblr Provisionr 0.3.x Overview
Andrei Savu
 
2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUG2012 in Review - Bucharest JUG
2012 in Review - Bucharest JUG
Andrei Savu
 
Counters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at HackoverCounters with Riak on Amazon EC2 at Hackover
Counters with Riak on Amazon EC2 at Hackover
Andrei Savu
 
Simple REST with Dropwizard
Simple REST with DropwizardSimple REST with Dropwizard
Simple REST with Dropwizard
Andrei Savu
 
Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2 Guava Overview Part 2 Bucharest JUG #2
Guava Overview Part 2 Bucharest JUG #2
Andrei Savu
 
Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1 Guava Overview. Part 1 @ Bucharest JUG #1
Guava Overview. Part 1 @ Bucharest JUG #1
Andrei Savu
 
Polyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the CloudPolyglot Persistence & Big Data in the Cloud
Polyglot Persistence & Big Data in the Cloud
Andrei Savu
 
Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011Building a Great Team in Open Source - Open Agile 2011
Building a Great Team in Open Source - Open Agile 2011
Andrei Savu
 
Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36Automated Testing for Web Applications - Wurbe #36
Automated Testing for Web Applications - Wurbe #36
Andrei Savu
 
Apache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesdayApache ZooKeeper TechTuesday
Apache ZooKeeper TechTuesday
Andrei Savu
 
Ad

Recently uploaded (20)

Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Massive Power Outage Hits Spain, Portugal, and France: Causes, Impact, and On...
Aqusag Technologies
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 

The Evolving Landscape of Data Engineering

  • 1. The Evolving Landscape of Data Engineering Andrei Savu - Event Co-organizer Staff Engineer @ Twitter Follow me @andreisavu
  • 2. Andrei Savu Staff Engineer @ Twitter: * MoPub Backend & Data Pipelines * Mobile App Monetization Co-organizer of the Data Engineering Club. Previously Tech Lead at Cloudera via the Axemblr acquisition. Started the Cloud engineering team.
  • 3. The Past: ● OSS communities ● AWS history ● Google Cloud history The Present: Patterns The Future: Wish List Topics
  • 4. Weeks of Provisioning Static Infrastructure Commodity Hardware Commodity Networking Data Locality Important Running in the Public Cloud was unusual CAPEX The Past - OSS
  • 5. Visionary Business Fast iterations Data Management as a key platform use case Incredible Scale Transition to “serverless” OPEX & Elastic The Past - AWS
  • 6. Visionary Products Fast iterations Machine Learning as a key use case State of the Art data platform Last 3 years on fast forward Intelligent Billing OPEX & Elastic The Past - Google Cloud
  • 7. The Present: Patterns Weeks to Minutes to Seconds Hadoop/Spark ecosystem is mature. We have a broad set of options. Big Data is much Bigger (e.g. x1e.32xlarge: 3TB mem, 128 vCPUs, 14Gbps network) Scale continues to be hard. Cloud economics can be very disruptive (especially for data workloads) High-performance networks are common. Storage can be decoupled from compute. Cluster locality is important. Service Endpoints (not clusters, aka serverless, aka managed etc.). Sophisticated Auto-scaling (batch & streaming, spot vs. on-demand, multi-az). Multi-DC and Multi-Region from Day 1. Various flavors of containers.
  • 8. The Future: Wish List A Data Catalog product as the center of the universe. Data Monitoring Systems: * statistical properties, anomaly detection, schema changes, consumption patterns etc. More intelligence at the data infrastructure level: * data format migrations, intelligent caching based on access patterns. Declarative data transformation vs. explicit ETL. Intelligent data sampling products. Scalability has a cost.
  • 9. Thanks! Join the community on Meetup.com! www.meetup.com/Data-Engineering-Club www.dataeng.club Do you want to present? Get in touch. Feedback #dataengclub