SlideShare a Scribd company logo
2019 HPCC
Systems®
Community Day
Challenge Yourself –
Challenge the Status Quo
Shamser Ahmed
shamser.ahmed@lexisnexisrisk.co
m
Workunit Analysis Tool
Tech Review
Overview
• Why analyze workunits?
• Analyzing workunits manually
• Introducing the Workunit Analysis Tool
• Demonstration
• Challenges
• Concluding remarks
• Questions & Suggestions
Workunit Analysis Tool 2
Why analyze
workunits?
Why analyze workunits?
Examine graph to
• Determine if the job is as efficient as possible
• Graph may not be optimal
• Issues: redundant/duplicate activities, inefficient sorting, inefficient joins,
too many sub-graphs, skew relating issues etc
• Human guidance may be necessary
• Reveal errors in ECL
• Is the platform doing what you expect?
• Platform related issues
• Why is my job running slower than before?
Workunit Analysis Tool 4
Why analyze workunits?
Examine graph metrics to identify issues with
• Skews
• Spills
• External services
• Less than optimal operation (join, sort, distribute, etc)
• Does actual time taken match expected time?
Workunit Analysis Tool 5
Why analyze workunits?
To make sure the platform is doing what you expected it to do,
To have the information necessary to optimize the ECL code,
and identify issues.
Workunit Analysis Tool 6
ECL related project should not be considered complete until
a thorough graph analysis has been completed.
Analyzing
workunits
manually
Analyzing workunit - a walk through
Workunit Analysis Tool 8
Analyzing workunit - a walk through
Workunit Analysis Tool 9
Analyzing workunit - a walk through
Workunit Analysis Tool 10
Analyzing workunit - a walk through
Workunit Analysis Tool 11
Analyzing workunit - a walk through
Workunit Analysis Tool 12
Analyzing workunit - a walk through
Workunit Analysis Tool 13
Analyzing workunit - a walk through
Workunit Analysis Tool 14
Analyzing workunit - a walk through
Workunit Analysis Tool 15
Analyzing workunit - a walk through
Workunit Analysis Tool 16
Analyzing workunit - a walk through
Workunit Analysis Tool 17
Analyzing workunit - a walk through
Workunit Analysis Tool 18
So, do we routinely analyze work units?
oAlways?
oSometimes?
oRarely?
Workunit Analysis Tool 19
So, do we routinely analyze work units?
• Probably not enough
• Probably not in sufficient depth
• Why?
• Difficult to fully understand large graphs
• Difficult to digest the large number of metrics
• Difficult to interpret the metrics
• Not having the time
Workunit Analysis Tool 20
Introducing the
Workunit Analysis
Tool
Introducing the Workunit Analysis Tool
• Analyzes the workunit to provide information useful for
• Improving performance
• Diagnosing issues
Workunit Analysis Tool 22
Rules
Distribute skew rule
IO Disk read skew rule
IO Disk write skew rule
Spill skew rule
Spilling in few nodes
rule
Keyed join rule
Lookup join rule
Sequential slow rule
Slow external call
How it works?
Workunit Analysis Tool 23
Graph
Split into
activities
Workunit Analysis Tool
Rules
Process
Match
Rule
Issues
Activity Issue Cost
a3 Distrbute
skew worse
than input
dataset
3000
A5 Heavily
skewed IO
2000
Calc Cost
Report highest
cost issues
How cost is calculated?
• Cost is
Actual time taken - theoretical ideal time
Workunit Analysis Tool 24
Example: 400 way Thor
An activity’s metrics show:
Theoretical ideal ~ average node’s elapsed time. i.e. 10 minutes
Cost = max-ideal i.e. 45-10 => 35 minutes
Slowest node Average node Activity
45 minutes 10 minutes 45 minutes
Elapsed Time
Demonstration
Workunit Analysis Tool demo
Workunit Analysis Tool 26
Workunit Analysis Tool demo
Workunit Analysis Tool 27
Workunit Analysis Tool (command line) demo
Workunit Analysis Tool 28
Workunit Analysis Tool (command line) demo
Workunit Analysis Tool 29
Challenges
Challenges
Workunit Analysis Tool 31
Challenges
Workunit Analysis Tool 32
Challenges
Workunit Analysis Tool 33
Concluding
remarks
How it should be used
Workunit Analysis Tool 35
It is a tool for the developer
It does not decide if something is wrong or
right:
Developers should interpret the information
and decide on what changes (if any) is
needed.
It will not catch every problem
There will always be cases that have not
been considered or implemented.
Workunits of concern should be
analyzed manually.
• Improve cost calculation
• More rules
• Skews: global sort, spilling skews (some nodes spilling others not), all on one node, unbalanced
join and other excessive skews
• Issues caused by sequential operation
• Slow joins
• Ratio of disk IO time to size read out of line
• Index read/keyed join & large number of reject rows
• Large amount of time in functions & soap calls
• Long time waiting for queues
• Proportion of time spent spilling to other work
• Live analysis: analyze workunit whilst it’s executing
• ROXIE Support
Features Planned
Workunit Analysis Tool 36
Concluding remarks
• Automatically analyzes workunit after a job completes
• Analyzes the entire work unit in seconds
• Thoroughly analyses workunit:
• Every graph & subgraph
• Every metric
• Every time
• Now, every workunit may be analyzed every time it executes
• Caveat:
• Work in progress
• Doesn’t eliminate manual analysis
Workunit Analysis Tool 37
Questions?
Shamser Ahmed
Senior Consulting SW Engineer
shamser.ahmed@lexisnexisrisk.com
Workunit Analysis Tool 38
View this presentation on YouTube:
https://www.youtube.com/watch?v=5F9WW89yDZw&list=PL-8MJMUpp8IKH5-
d56az56t52YccleX5h&index=3&t=0s
(5:33:00)
Workunit Analysis Tool 39
Ad

More Related Content

What's hot (10)

Verification flow and_planning_vlsi_design
Verification flow and_planning_vlsi_designVerification flow and_planning_vlsi_design
Verification flow and_planning_vlsi_design
Usha Mehta
 
Priore 2017 - release planning and project management tools
Priore 2017 -  release planning and project management toolsPriore 2017 -  release planning and project management tools
Priore 2017 - release planning and project management tools
Xavier Franch
 
Testing in Legacy
Testing in LegacyTesting in Legacy
Testing in Legacy
Taras Slipets
 
The 5 Laws of Software Estimates
The 5 Laws of Software EstimatesThe 5 Laws of Software Estimates
The 5 Laws of Software Estimates
Vitebsk Miniq
 
Agile Development and Continuous Testing
Agile Development and Continuous TestingAgile Development and Continuous Testing
Agile Development and Continuous Testing
Testinium
 
Software testing expert evangelization
Software testing expert evangelizationSoftware testing expert evangelization
Software testing expert evangelization
TestCampRO
 
T19 performance testing effort - estimation or guesstimation revised
T19   performance testing effort - estimation or guesstimation revisedT19   performance testing effort - estimation or guesstimation revised
T19 performance testing effort - estimation or guesstimation revised
TEST Huddle
 
4 verification flow_planning
4 verification flow_planning4 verification flow_planning
4 verification flow_planning
Usha Mehta
 
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Aleksandr Tavgen
 
Working Effectively with PeopleSoft Support
Working Effectively with PeopleSoft SupportWorking Effectively with PeopleSoft Support
Working Effectively with PeopleSoft Support
Smart ERP Solutions, Inc.
 
Verification flow and_planning_vlsi_design
Verification flow and_planning_vlsi_designVerification flow and_planning_vlsi_design
Verification flow and_planning_vlsi_design
Usha Mehta
 
Priore 2017 - release planning and project management tools
Priore 2017 -  release planning and project management toolsPriore 2017 -  release planning and project management tools
Priore 2017 - release planning and project management tools
Xavier Franch
 
The 5 Laws of Software Estimates
The 5 Laws of Software EstimatesThe 5 Laws of Software Estimates
The 5 Laws of Software Estimates
Vitebsk Miniq
 
Agile Development and Continuous Testing
Agile Development and Continuous TestingAgile Development and Continuous Testing
Agile Development and Continuous Testing
Testinium
 
Software testing expert evangelization
Software testing expert evangelizationSoftware testing expert evangelization
Software testing expert evangelization
TestCampRO
 
T19 performance testing effort - estimation or guesstimation revised
T19   performance testing effort - estimation or guesstimation revisedT19   performance testing effort - estimation or guesstimation revised
T19 performance testing effort - estimation or guesstimation revised
TEST Huddle
 
4 verification flow_planning
4 verification flow_planning4 verification flow_planning
4 verification flow_planning
Usha Mehta
 
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Aleksandr Tavgen
 

Similar to Work Unit Analysis Tool (20)

Generating unit tests based on user logs
Generating unit tests based on user logsGenerating unit tests based on user logs
Generating unit tests based on user logs
Rick Wicker
 
PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko
Neotys
 
A practical guide for using Statistical Tests to assess Randomized Algorithms...
A practical guide for using Statistical Tests to assess Randomized Algorithms...A practical guide for using Statistical Tests to assess Randomized Algorithms...
A practical guide for using Statistical Tests to assess Randomized Algorithms...
Lionel Briand
 
What a DevOps specialist has to know about static code analysis
What a DevOps specialist has to know about static code analysisWhat a DevOps specialist has to know about static code analysis
What a DevOps specialist has to know about static code analysis
Andrey Karpov
 
DBmaestro's State of the Database Continuous Delivery Survey- Findings Revealed
DBmaestro's State of the Database Continuous Delivery Survey- Findings RevealedDBmaestro's State of the Database Continuous Delivery Survey- Findings Revealed
DBmaestro's State of the Database Continuous Delivery Survey- Findings Revealed
DBmaestro - Database DevOps
 
In (database) automation we trust
In (database) automation we trustIn (database) automation we trust
In (database) automation we trust
DBmaestro - Database DevOps
 
REQUIREMENT ENGINEERING
REQUIREMENT ENGINEERINGREQUIREMENT ENGINEERING
REQUIREMENT ENGINEERING
Saqib Raza
 
Requirement Analysis - Software Enigneering
Requirement Analysis - Software EnigneeringRequirement Analysis - Software Enigneering
Requirement Analysis - Software Enigneering
university of education,Lahore
 
Get It Right the First Time Through Cheap and Easy DIY Usability Testing - Dr...
Get It Right the First Time Through Cheap and Easy DIY Usability Testing - Dr...Get It Right the First Time Through Cheap and Easy DIY Usability Testing - Dr...
Get It Right the First Time Through Cheap and Easy DIY Usability Testing - Dr...
DesignHammer
 
05 REQUIREMENT ENGINEERING for students of
05 REQUIREMENT ENGINEERING for students of05 REQUIREMENT ENGINEERING for students of
05 REQUIREMENT ENGINEERING for students of
AssadLeo1
 
OutSystems Tips and Tricks
OutSystems Tips and TricksOutSystems Tips and Tricks
OutSystems Tips and Tricks
OutSystems
 
req engg (1).ppt
req engg (1).pptreq engg (1).ppt
req engg (1).ppt
WaniHBisen
 
Year13_SystemModelsmypresentationTechnology.ppt
Year13_SystemModelsmypresentationTechnology.pptYear13_SystemModelsmypresentationTechnology.ppt
Year13_SystemModelsmypresentationTechnology.ppt
AbhishekaVidyalankar
 
Optimizing Application Performance - 2022.pptx
Optimizing Application Performance - 2022.pptxOptimizing Application Performance - 2022.pptx
Optimizing Application Performance - 2022.pptx
JasonTuran2
 
software requirement
software requirement software requirement
software requirement
nimmik4u
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements Engineering
MuhammadTalha436
 
Scaling Pinterest's Monitoring
Scaling Pinterest's MonitoringScaling Pinterest's Monitoring
Scaling Pinterest's Monitoring
Brian Overstreet
 
Get it right the first time through cheap and easy DIY usability testing
Get it right the first time through cheap and easy DIY usability testingGet it right the first time through cheap and easy DIY usability testing
Get it right the first time through cheap and easy DIY usability testing
DesignHammer
 
Get it right the first time through cheap and easy DIY usability testing
Get it right the first time through cheap and easy DIY usability testingGet it right the first time through cheap and easy DIY usability testing
Get it right the first time through cheap and easy DIY usability testing
David Minton
 
System analysis 1
System analysis 1System analysis 1
System analysis 1
Mustafa Saeed
 
Generating unit tests based on user logs
Generating unit tests based on user logsGenerating unit tests based on user logs
Generating unit tests based on user logs
Rick Wicker
 
PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko PAC 2019 virtual Alexander Podelko
PAC 2019 virtual Alexander Podelko
Neotys
 
A practical guide for using Statistical Tests to assess Randomized Algorithms...
A practical guide for using Statistical Tests to assess Randomized Algorithms...A practical guide for using Statistical Tests to assess Randomized Algorithms...
A practical guide for using Statistical Tests to assess Randomized Algorithms...
Lionel Briand
 
What a DevOps specialist has to know about static code analysis
What a DevOps specialist has to know about static code analysisWhat a DevOps specialist has to know about static code analysis
What a DevOps specialist has to know about static code analysis
Andrey Karpov
 
DBmaestro's State of the Database Continuous Delivery Survey- Findings Revealed
DBmaestro's State of the Database Continuous Delivery Survey- Findings RevealedDBmaestro's State of the Database Continuous Delivery Survey- Findings Revealed
DBmaestro's State of the Database Continuous Delivery Survey- Findings Revealed
DBmaestro - Database DevOps
 
REQUIREMENT ENGINEERING
REQUIREMENT ENGINEERINGREQUIREMENT ENGINEERING
REQUIREMENT ENGINEERING
Saqib Raza
 
Get It Right the First Time Through Cheap and Easy DIY Usability Testing - Dr...
Get It Right the First Time Through Cheap and Easy DIY Usability Testing - Dr...Get It Right the First Time Through Cheap and Easy DIY Usability Testing - Dr...
Get It Right the First Time Through Cheap and Easy DIY Usability Testing - Dr...
DesignHammer
 
05 REQUIREMENT ENGINEERING for students of
05 REQUIREMENT ENGINEERING for students of05 REQUIREMENT ENGINEERING for students of
05 REQUIREMENT ENGINEERING for students of
AssadLeo1
 
OutSystems Tips and Tricks
OutSystems Tips and TricksOutSystems Tips and Tricks
OutSystems Tips and Tricks
OutSystems
 
req engg (1).ppt
req engg (1).pptreq engg (1).ppt
req engg (1).ppt
WaniHBisen
 
Year13_SystemModelsmypresentationTechnology.ppt
Year13_SystemModelsmypresentationTechnology.pptYear13_SystemModelsmypresentationTechnology.ppt
Year13_SystemModelsmypresentationTechnology.ppt
AbhishekaVidyalankar
 
Optimizing Application Performance - 2022.pptx
Optimizing Application Performance - 2022.pptxOptimizing Application Performance - 2022.pptx
Optimizing Application Performance - 2022.pptx
JasonTuran2
 
software requirement
software requirement software requirement
software requirement
nimmik4u
 
Scaling Pinterest's Monitoring
Scaling Pinterest's MonitoringScaling Pinterest's Monitoring
Scaling Pinterest's Monitoring
Brian Overstreet
 
Get it right the first time through cheap and easy DIY usability testing
Get it right the first time through cheap and easy DIY usability testingGet it right the first time through cheap and easy DIY usability testing
Get it right the first time through cheap and easy DIY usability testing
DesignHammer
 
Get it right the first time through cheap and easy DIY usability testing
Get it right the first time through cheap and easy DIY usability testingGet it right the first time through cheap and easy DIY usability testing
Get it right the first time through cheap and easy DIY usability testing
David Minton
 
Ad

More from HPCC Systems (20)

Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
HPCC Systems
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
HPCC Systems
 
Welcome
WelcomeWelcome
Welcome
HPCC Systems
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
HPCC Systems
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
HPCC Systems
 
Path to 8.0
Path to 8.0 Path to 8.0
Path to 8.0
HPCC Systems
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
HPCC Systems
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
HPCC Systems
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
HPCC Systems
 
Docker Support
Docker Support Docker Support
Docker Support
HPCC Systems
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
HPCC Systems
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
HPCC Systems
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
HPCC Systems
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
HPCC Systems
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
HPCC Systems
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
HPCC Systems
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
HPCC Systems
 
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
HPCC Systems
 
Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
HPCC Systems
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
HPCC Systems
 
Towards Trustable AI for Complex Systems
Towards Trustable AI for Complex SystemsTowards Trustable AI for Complex Systems
Towards Trustable AI for Complex Systems
HPCC Systems
 
Closing / Adjourn
Closing / Adjourn Closing / Adjourn
Closing / Adjourn
HPCC Systems
 
Community Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon CuttingCommunity Website: Virtual Ribbon Cutting
Community Website: Virtual Ribbon Cutting
HPCC Systems
 
Release Cycle Changes
Release Cycle ChangesRelease Cycle Changes
Release Cycle Changes
HPCC Systems
 
Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index Geohashing with Uber’s H3 Geospatial Index
Geohashing with Uber’s H3 Geospatial Index
HPCC Systems
 
Advancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine LearningAdvancements in HPCC Systems Machine Learning
Advancements in HPCC Systems Machine Learning
HPCC Systems
 
Expanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network CapabilitiesExpanding HPCC Systems Deep Neural Network Capabilities
Expanding HPCC Systems Deep Neural Network Capabilities
HPCC Systems
 
Leveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC SystemsLeveraging Intra-Node Parallelization in HPCC Systems
Leveraging Intra-Node Parallelization in HPCC Systems
HPCC Systems
 
DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch DataPatterns - Profiling in ECL Watch
DataPatterns - Profiling in ECL Watch
HPCC Systems
 
Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem Leveraging the Spark-HPCC Ecosystem
Leveraging the Spark-HPCC Ecosystem
HPCC Systems
 
Community Award Ceremony
Community Award Ceremony Community Award Ceremony
Community Award Ceremony
HPCC Systems
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
HPCC Systems
 
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
A Success Story of Challenging the Status Quo: Gadget Girls and the Inclusion...
HPCC Systems
 
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
Beyond the Spectrum – Creating an Environment of Diversity and Empowerment wi...
HPCC Systems
 
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
Using High Dimensional Representation of Words (CBOW) to Find Domain Based Co...
HPCC Systems
 
Ad

Recently uploaded (20)

Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136
illuminati Agent uganda call+256776963507/0741506136
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
Geometry maths presentation for begginers
Geometry maths presentation for begginersGeometry maths presentation for begginers
Geometry maths presentation for begginers
zrjacob283
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Medical Dataset including visualizations
Medical Dataset including visualizationsMedical Dataset including visualizations
Medical Dataset including visualizations
vishrut8750588758
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptxmd-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
md-presentHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHation.pptx
fatimalazaar2004
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
GenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.aiGenAI for Quant Analytics: survey-analytics.ai
GenAI for Quant Analytics: survey-analytics.ai
Inspirient
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 

Work Unit Analysis Tool

  • 1. 2019 HPCC Systems® Community Day Challenge Yourself – Challenge the Status Quo Shamser Ahmed [email protected] m Workunit Analysis Tool Tech Review
  • 2. Overview • Why analyze workunits? • Analyzing workunits manually • Introducing the Workunit Analysis Tool • Demonstration • Challenges • Concluding remarks • Questions & Suggestions Workunit Analysis Tool 2
  • 4. Why analyze workunits? Examine graph to • Determine if the job is as efficient as possible • Graph may not be optimal • Issues: redundant/duplicate activities, inefficient sorting, inefficient joins, too many sub-graphs, skew relating issues etc • Human guidance may be necessary • Reveal errors in ECL • Is the platform doing what you expect? • Platform related issues • Why is my job running slower than before? Workunit Analysis Tool 4
  • 5. Why analyze workunits? Examine graph metrics to identify issues with • Skews • Spills • External services • Less than optimal operation (join, sort, distribute, etc) • Does actual time taken match expected time? Workunit Analysis Tool 5
  • 6. Why analyze workunits? To make sure the platform is doing what you expected it to do, To have the information necessary to optimize the ECL code, and identify issues. Workunit Analysis Tool 6 ECL related project should not be considered complete until a thorough graph analysis has been completed.
  • 8. Analyzing workunit - a walk through Workunit Analysis Tool 8
  • 9. Analyzing workunit - a walk through Workunit Analysis Tool 9
  • 10. Analyzing workunit - a walk through Workunit Analysis Tool 10
  • 11. Analyzing workunit - a walk through Workunit Analysis Tool 11
  • 12. Analyzing workunit - a walk through Workunit Analysis Tool 12
  • 13. Analyzing workunit - a walk through Workunit Analysis Tool 13
  • 14. Analyzing workunit - a walk through Workunit Analysis Tool 14
  • 15. Analyzing workunit - a walk through Workunit Analysis Tool 15
  • 16. Analyzing workunit - a walk through Workunit Analysis Tool 16
  • 17. Analyzing workunit - a walk through Workunit Analysis Tool 17
  • 18. Analyzing workunit - a walk through Workunit Analysis Tool 18
  • 19. So, do we routinely analyze work units? oAlways? oSometimes? oRarely? Workunit Analysis Tool 19
  • 20. So, do we routinely analyze work units? • Probably not enough • Probably not in sufficient depth • Why? • Difficult to fully understand large graphs • Difficult to digest the large number of metrics • Difficult to interpret the metrics • Not having the time Workunit Analysis Tool 20
  • 22. Introducing the Workunit Analysis Tool • Analyzes the workunit to provide information useful for • Improving performance • Diagnosing issues Workunit Analysis Tool 22
  • 23. Rules Distribute skew rule IO Disk read skew rule IO Disk write skew rule Spill skew rule Spilling in few nodes rule Keyed join rule Lookup join rule Sequential slow rule Slow external call How it works? Workunit Analysis Tool 23 Graph Split into activities Workunit Analysis Tool Rules Process Match Rule Issues Activity Issue Cost a3 Distrbute skew worse than input dataset 3000 A5 Heavily skewed IO 2000 Calc Cost Report highest cost issues
  • 24. How cost is calculated? • Cost is Actual time taken - theoretical ideal time Workunit Analysis Tool 24 Example: 400 way Thor An activity’s metrics show: Theoretical ideal ~ average node’s elapsed time. i.e. 10 minutes Cost = max-ideal i.e. 45-10 => 35 minutes Slowest node Average node Activity 45 minutes 10 minutes 45 minutes Elapsed Time
  • 26. Workunit Analysis Tool demo Workunit Analysis Tool 26
  • 27. Workunit Analysis Tool demo Workunit Analysis Tool 27
  • 28. Workunit Analysis Tool (command line) demo Workunit Analysis Tool 28
  • 29. Workunit Analysis Tool (command line) demo Workunit Analysis Tool 29
  • 35. How it should be used Workunit Analysis Tool 35 It is a tool for the developer It does not decide if something is wrong or right: Developers should interpret the information and decide on what changes (if any) is needed. It will not catch every problem There will always be cases that have not been considered or implemented. Workunits of concern should be analyzed manually.
  • 36. • Improve cost calculation • More rules • Skews: global sort, spilling skews (some nodes spilling others not), all on one node, unbalanced join and other excessive skews • Issues caused by sequential operation • Slow joins • Ratio of disk IO time to size read out of line • Index read/keyed join & large number of reject rows • Large amount of time in functions & soap calls • Long time waiting for queues • Proportion of time spent spilling to other work • Live analysis: analyze workunit whilst it’s executing • ROXIE Support Features Planned Workunit Analysis Tool 36
  • 37. Concluding remarks • Automatically analyzes workunit after a job completes • Analyzes the entire work unit in seconds • Thoroughly analyses workunit: • Every graph & subgraph • Every metric • Every time • Now, every workunit may be analyzed every time it executes • Caveat: • Work in progress • Doesn’t eliminate manual analysis Workunit Analysis Tool 37
  • 38. Questions? Shamser Ahmed Senior Consulting SW Engineer [email protected] Workunit Analysis Tool 38
  • 39. View this presentation on YouTube: https://www.youtube.com/watch?v=5F9WW89yDZw&list=PL-8MJMUpp8IKH5- d56az56t52YccleX5h&index=3&t=0s (5:33:00) Workunit Analysis Tool 39

Editor's Notes

  • #3: In the presentation, I will be covering the following areas:
  • #5: So, WHY WOULD YOU WANT TO ANALYZE WU? I'd suggest that you'd Examine graph to... Graph not optimal (compile time information) The code generator does not "know" about data until execution completes. Hints need to guide the code generator A different action may be better suited Highlight inefficiencies in ECL code Too many small sub-graphs with effecting performance  Inappropriate joins – is keyed join better, lookup join? Or spills at unexpected times Platform is not infallible Code generator could do a better job.  The engines can always be optimise further … Team constantly improving --------------------- Analyzing WU may highlight issues in the design, data or architecture Hey, my job is running slower? Regression in platform Or bug introduced in ECL Or has the data changed?
  • #6: In addition to examining the graph, the graph metrics should be examined The metrics will highlight Skews causing cluster to be used inefficiently..some nodes idle whilst others very busy Spills affect performance. Usually, necessary. But may be possible to eliminate External (soap calls) becoming a bottle neck Lookup join, keyed join better?  Assisting in achieving better distribution? How long do we expect that "work" to take? Does it match with the actual time taken?
  • #8: Important to understand: how now Appreciate what/how analysis tool works
  • #9: START BY having a look at real world Workunit and conducting some analysis. This WU executed on a 400 way thor.  As you can see it took over 1 hour 17 minutes.  That is quite a significant amount of resources.  Definitely, worth seeing if it's possible to reduce the total cluster time.
  • #10: With a large Workunit on a busy system, it takes some time to gather and display the graph.
  • #11: Eventually, the entire graph is shown. Many graphs, subgraphs and activites here.  Too many to examine everyone, so we'll focus on the activies having the biggest impact
  • #12: Clicked sub-graphs icon to get the timings related to the subgraphs and then clicked "TimeElapsed" to sort by timings
  • #13: The list is sorted in reverse elapsed time order – subgraph with highest elapsed time shown first Clicking on that one to drill down
  • #14: So, here we have the subgraph with the slowest execution time.... We going to examine the activities to see where the time is going
  • #15: I've click spill read and see 1) the maximum execution time is around 24 seconds 2) other metrics not paricularly interesting
  • #16: Now examine, Project Disk Read..  max local execute 9 seconds But skew is 400%.. Would be significant but subsequently HASH DISTRIBUTE
  • #17: Quick process again.. Reducing skew Finally, examining Local Join
  • #18: 14 minutes... doesn't sound significant but 14minus X 400 way cluster... worth reducing if possible MASSIVE SKEW IN LOCAL EXECUTE TIME 3500%!!  SEEMS one has large number of spills... needs examining Consider the previous hash distributes to see if skew may be reduced... So, we can carry on looking at elapsed time in other parts of the activites... More to do.. examine a different metric
  • #19: It’s not over.. There are many more metrics to examine This is to give a taste of the analysing WU manually.  I'll end the demo but in the real work the analysis would continue for far more subgraphs and metrics
  • #20: So, that bring us to the question of "in the real world" do we ... I think the answer for most would be "less thank we'd like"
  • #21: large complex workunit that takes significant cluster time Some graphs are VERY LARGE and browsers struggle to render quickly enough forgiven for not fully understanding all the metrics Expected value Best case for hardware, network bandwidth Need: general feel for what the values should be Time consuming: Examine key metrics, for key graphs But small graph may be important.. So GREAT TO HAVE MORE ASSISTANT IN analyze WU, So that brings us to Work unit Analyzer tool...
  • #23: … The Workunit analysis tool is designed to assist the user in analysing work units  <read slide> Now: Automatic and routine More thorough
  • #25: Suppose, heavily skewed data means … So, cost in this case would be 2,100 seconds. Cost calculation not perfect: e.g. skews upstream activities/ complex relationship
  • #27: I would like to show demo of of it working on a small test ECL. Here's a short piece of ECL (that does nothing useful) designed to test the Analysis tool It outputs the first 100 users and first 100 urls – for not reason whatsoever.. Workunit Analysis Tool is built into the workflow...
  • #28:   These are screen shots of job I executed earlier... Within a fraction of a second after the WU completes, the potential issues are shown in the messages section...
  • #29: WHEN YOU MAY USE COMAND LINE .are going to have a quick look at the command line verson of the Analysis tool  We'd not normally need to use the command line .. but I'm examining the real world workunit that we were looking at earlier. To see what issues it detects
  • #30: Bang A fraction of a second later The analysis is complete... You can see it has detected couple of dozen issues.  We found a couple of potential issues in our 5-10 minutes of manual analysis... 2 dozen issues detected in less than a second. The list is sorted in reverse cost order, with the highest cost shown first... In the real worl, I'd now examine the reported activities in more details and see if we can do something about the area of concern
  • #36: Is a developer tool.  It is a tool to assist the developers
  • #37: ...You feedback and suggestions is invaluable
  • #38:  <read slide> Analysis stored with workunit analysis is routine rather than when a problem arises or when the developer has time (3rd point): Potential to for a more thorough analysis
  • #39: Thank-you for listening (and participation). We have time for any questions. Which work units do you analyze? Every work units? Only ones that take a long time? When jobs take longer than normal? How do you analyze workunits? Do you focus on particular parts of the graph? Particular metrics? Skews Elapsed time Data sizes That concludes the presentation.   Feel free to contact me with questions, feedback and suggestions. Thank-you very much for your attention.