SlideShare a Scribd company logo
Query or Not to Query
Ask Unravel
Prajakta Kalmegh, Principal Engineer
Yusaku Sako, Head of Data Science
Data Science@
Prajakta Kalmegh
▪ pkalmegh@unraveldata.com
▪ https://www.linkedin.com/in/pkalmegh/
Yusaku Sako
▪ ysako@unraveldata.com
▪ https://www.linkedin.com/in/yusaku-sako/
Experienced Team
Strong Market Validation A Microsoft M12 Company
Broad Technology and Cloud Platform Coverage
Our Pedigree
Radically Simplify DataOps
Select right tech for the app,
infrastructure, environment
and cluster
Debug code and pipelines,
predict issues and assist in
optimizing apps
Check for app correctness
and resource efficiency
Container sizing, Scheduling
and Cluster selection
Tuning apps and
eliminating rogue apps
Proactive and automated
actions to maintain SLAs
CONTINUOUS
INTEGRATION
C
O
N
TIN
U
O
U
S
O
PTIM
IZA
TIO
N
CONTINUOUS
INTEGRATION
C
O
N
TIN
U
O
U
S
O
PTIM
IZA
TIO
N
AGENDA
Unravel
API
Speedup
Development
Operational
Insights
Optimize
Deployment
Speedup Development
Meet John
8:00 am 11:00 am 11:30 am 4:30 pm
Still not
working?
Why is this hard?
What if I tweak
my query?
Did my resource
requirements
change?
Is my data skewed
on xxx?
Is the cluster
bottlenecked?
Unravel brings data-driven insights
as you code
The notebook Demo
2Q | | !2Q
What Unravel exploits?
▪ Users often issue similar queries
▪ Same challenges faced
▪ Same mistakes repeated
▪ By the same user, by other users
Holistic view of what worked and what did not
Optimize Deployment
When is a good time to schedule?
These are my resource
requirements, should I
schedule now or later?
This report needs to be
ready before Monday
morning, my start time is
flexible
I have a new workload to
schedule, what is a good
time to start it?
Finding the missing piece
Is the cluster slow (er) ?
Has the query needs
changed over time?
Is it both?
Unravel uses predictive analytics to
time it right
The timeit Demo
while (!best) {
# find better
}
What Unravel exploits?
▪ Cluster utilization timeseries
▪ Query execution variability data
▪ Similar resource profiles and consumption history
Holistic view of when it worked and when it didn’t
Operational Actionable Insights
DELAYED
Delayed: Increase in Input Data Size Detected
Delayed: Increase in Input Data Size Detected
DELAYED
Architecture
UNRAVELDAEMONS
UnravelAPI
Query Historical
Executions
Cluster State
Indicators
Query
Quality
Predictor
Best Slot
Finder
AskUnravel
Already Scheduled
Scheduled Query
User
Adhoc Query
New Schedule
Predict query issues
Predict cluster issues
Predict delays
Submit Query
✓ SLA-bound
✓ Responsive
wantstosubmit/track
askunravel
Submit Query
Better Slots are xxx
On Track
Delayed by xxx
Use-Cases
Hold and submit lateraskunravel
Better Slots are xxx
Operational
Insights
Optimize
Deployment
Speedup
Development
Recap: Detect Issues as early as possible
8:00 am 8:30 am
Enjoy your day!
Thank you for watching!
Signup for a free trial today
https://bit.ly/3mo2ira
Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

More Related Content

What's hot (20)

PDF
Internals of Speeding up PySpark with Arrow
Databricks
 
PDF
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Databricks
 
PDF
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
PDF
Stream Processing: Choosing the Right Tool for the Job
Databricks
 
PDF
Advanced Natural Language Processing with Apache Spark NLP
Databricks
 
PDF
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Databricks
 
PDF
Extending Machine Learning Algorithms with PySpark
Databricks
 
PDF
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Databricks
 
PDF
SAIS2018 - Fact Store At Netflix Scale
Nitin S
 
PDF
RealTime Recommendations @Netflix - Spark
Nitin S
 
PDF
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Databricks
 
PDF
Spark Uber Development Kit
Jen Aman
 
PPTX
The Next AMPLab: Real-Time, Intelligent, and Secure Computing
Spark Summit
 
PDF
Spark Summit EU talk by Tug Grall
Spark Summit
 
PDF
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 
PDF
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Databricks
 
PDF
Keeping Identity Graphs In Sync With Apache Spark
Databricks
 
PDF
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Databricks
 
PDF
Bay Area Apache Flink Meetup Community Update August 2015
Henry Saputra
 
PDF
Rental Cars and Industrialized Learning to Rank with Sean Downes
Databricks
 
Internals of Speeding up PySpark with Arrow
Databricks
 
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Databricks
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
Stream Processing: Choosing the Right Tool for the Job
Databricks
 
Advanced Natural Language Processing with Apache Spark NLP
Databricks
 
Using Spark Mllib Models in a Production Training and Serving Platform: Exper...
Databricks
 
Extending Machine Learning Algorithms with PySpark
Databricks
 
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Databricks
 
SAIS2018 - Fact Store At Netflix Scale
Nitin S
 
RealTime Recommendations @Netflix - Spark
Nitin S
 
Enabling Scalable Data Science Pipeline with Mlflow at Thermo Fisher Scientific
Databricks
 
Spark Uber Development Kit
Jen Aman
 
The Next AMPLab: Real-Time, Intelligent, and Secure Computing
Spark Summit
 
Spark Summit EU talk by Tug Grall
Spark Summit
 
How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....
Databricks
 
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Databricks
 
Keeping Identity Graphs In Sync With Apache Spark
Databricks
 
Using BigDL on Apache Spark to Improve the MLS Real Estate Search Experience ...
Databricks
 
Bay Area Apache Flink Meetup Community Update August 2015
Henry Saputra
 
Rental Cars and Industrialized Learning to Rank with Sean Downes
Databricks
 

Similar to Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Problematic Queries (20)

PDF
Understanding DataOps and Its Impact on Application Quality
DevOps.com
 
PDF
Doing DevOps for Big Data? What You Need to Know About AIOps
DevOps.com
 
PDF
Doing DevOps for Big Data? What You Need to Know About AIOps
DevOps.com
 
PDF
How to get started with Site Reliability Engineering
Andrew Kirkpatrick
 
PDF
Wed-12-05pm-box-salmanahmed
Salman Ahmed
 
PPTX
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
Yahoo Developer Network
 
PPTX
Still Suffering from IT Outages? Accept Failure, Learn from Failure and Get R...
Splunk
 
PDF
An Engineer's Guide to a Good Night's Sleep
C4Media
 
PPTX
It Works On Dev
marcelesser
 
PDF
SparkApplicationDevMadeEasy_Spark_Summit_2015
Lance Co Ting Keh
 
PDF
Better Visibility into Spark Execution for Faster Application Development-(S...
Spark Summit
 
PDF
Web Operations Keeping The Data On Time 1st Edition John Allspaw
murabiherico
 
PDF
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Sean Chittenden
 
PPTX
Site (Service) Reliability Engineering
Mark Underwood
 
PPTX
THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST
Opher Dubrovsky
 
PDF
SRE Topics with Charity Majors and Liz Fong-Jones of Honeycomb
Daniel Zivkovic
 
PDF
Building data intensive applications
Amit Kejriwal
 
PPTX
SRE (service reliability engineer) on big DevOps platform running on the clou...
DevClub_lv
 
PDF
Hidden Costs of Chasing the Mythical 'Five Nines'
DevOpsDays DFW
 
PDF
Agile Mumbai 2019 Conference | Intelligent DevOps enabling Enterprise Agilit...
AgileNetwork
 
Understanding DataOps and Its Impact on Application Quality
DevOps.com
 
Doing DevOps for Big Data? What You Need to Know About AIOps
DevOps.com
 
Doing DevOps for Big Data? What You Need to Know About AIOps
DevOps.com
 
How to get started with Site Reliability Engineering
Andrew Kirkpatrick
 
Wed-12-05pm-box-salmanahmed
Salman Ahmed
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
Yahoo Developer Network
 
Still Suffering from IT Outages? Accept Failure, Learn from Failure and Get R...
Splunk
 
An Engineer's Guide to a Good Night's Sleep
C4Media
 
It Works On Dev
marcelesser
 
SparkApplicationDevMadeEasy_Spark_Summit_2015
Lance Co Ting Keh
 
Better Visibility into Spark Execution for Faster Application Development-(S...
Spark Summit
 
Web Operations Keeping The Data On Time 1st Edition John Allspaw
murabiherico
 
Life Cycle of Metrics, Alerting, and Performance Monitoring in Microservices
Sean Chittenden
 
Site (Service) Reliability Engineering
Mark Underwood
 
THE RISE AND FALL OF SERVERLESS COSTS - TAMING THE (SERVERLESS) BEAST
Opher Dubrovsky
 
SRE Topics with Charity Majors and Liz Fong-Jones of Honeycomb
Daniel Zivkovic
 
Building data intensive applications
Amit Kejriwal
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
DevClub_lv
 
Hidden Costs of Chasing the Mythical 'Five Nines'
DevOpsDays DFW
 
Agile Mumbai 2019 Conference | Intelligent DevOps enabling Enterprise Agilit...
AgileNetwork
 
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Ad

Recently uploaded (20)

PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PPTX
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
PDF
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PDF
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PPTX
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PDF
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PPT
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
PDF
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
Advanced_NLP_with_Transformers_PPT_final 50.pptx
Shiwani Gupta
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
Research Methodology Overview Introduction
ayeshagul29594
 
apidays Helsinki & North 2025 - Monetizing AI APIs: The New API Economy, Alla...
apidays
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
apidays Helsinki & North 2025 - How (not) to run a Graphql Stewardship Group,...
apidays
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
Exploring Multilingual Embeddings for Italian Semantic Search: A Pretrained a...
Sease
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
apidays Singapore 2025 - Trustworthy Generative AI: The Role of Observability...
apidays
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
AI Future trends and opportunities_oct7v1.ppt
SHIKHAKMEHTA
 
Avatar for apidays apidays PRO June 07, 2025 0 5 apidays Helsinki & North 2...
apidays
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 

Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Problematic Queries