SlideShare a Scribd company logo
Java Production
Debugging 101
A Reversim Summit Lab, February, 2013
PRODUCTION
 DEBUGGING

= FORENSICS
Business Requirements

                Prod.
Requirements                  Forensics
                Debugging
                Severely      Hours, days,
Timeframe
                limited       weeks…
Chain of
                Meaningless   Sacred
Custody

Documentation   Useful        Sacred
Endgame


Production Debugging          Forensics


1. Gather evidence            1. Identify crime in progress


2. Restore functionality      2. Gather evidence


                3. Figure out what happened
Our Forensic Process


Gather Evidence

  Restore Production

    Analyze Findings

      Implement Solution

        Post-Mortem
Evidence toolchain
WHAT SHALL WE COLLECT?
Our focus points for today

•   Thread dump
•   Heap dump
•   VM (especially GC) metrics
•   System metrics
•   Logs
jstack

• Minimalistic tool
• Against a running process:
 jstack <pid>
• Outputs to stdout
• Identifies deadlocks
jmap

• Heap-dump from a running process
  – Lengthy process
  – Freezes VM
• Some extras
• Command:
  jmap –dump:format=b,file=<output>
  <pid>
jstat

•   JVM metrics: classloader, JIT, GC
•   Tracking over time
•   Console-based
•   jstat –gcutil <pid> 5s
The JVM GC
jvisualvm

• Combines most of the above, with GUI
• Remote via X11 forwarding (dreadful!)
So…

SHALL WE DANCE?
Scenario 1

• Phone call in the middle of the night
  – “The application is stuck!”


• What do you do?
Scenario 2


• Looks familiar?
   – “The application is
     crawling to a halt!”
   – “So restart it.”
   – “OK, it‟s good now.”


• This is a lie.
   – You will get another
     call.
Scenario 3

• 1st tier support engineer (maybe
  you?) calls:
  – “I get OutOfMemoryExceptions on this
    service.”
  – “Restart it.”
  – “Already have. Happened again.”
  – “Well, shit.”
BREAK TIME!
Without further ado…

FORENSIC
TOOLCHAIN
GNU toolchain is your friend


• bash, ps, grep, less, awk
  – „nuff said


• … or:
  – http://gnuwin32.sourceforge.net/
MAT

• Eclipse
  plugin/standal
  one
• Reads heap
  dumps
• Easy drill-
  down
And most important…
RESOLUTION TIME!
Back to: Scenario 1

• What did we gather?
  –   CPU – 100% single-core utilization
  –   GC metrics – no useful data
  –   Heap dump – no useful data
  –   Thread dump
       • java.util.Regex * gazillion
• Where the problem is implies…
   what the problem is
Back to: Scenario 2

• What did we gather?
  –   CPU – 100% single-core utilization
  –   Heap dump – no useful data
  –   Thread dump
  –   GC metrics
       • Frequent, long GCs (GC, FGC, FGCT)
• Rapid HashMap insertions: recipe for
  disaster
Back to: Scenario 3

• What did we gather?
  –   CPU – low utilization
  –   Thread dump – no useful data
  –   GC metrics – high heap utilization, low GC
  –   Heap dump
       • Predictably high number of strings
       • Strings are abnormally large
       • Strings contain entire HTML subset!
• Substring/regex can be dangerous!
Headache? Take two of these!

AFTERWORD
Adieu
• Thank you for attending!

• Presentation and demos:
            http://git.io/7LK4fw

• Tomer Gabel
  – tomer@tomergabel.com
  – http://www.tomergabel.com/
  – @tomerg
Thank you
 our sponsors

More Related Content

What's hot (20)

Spork || How To Streamline Your TDD Process
Spork || How To Streamline Your TDD ProcessSpork || How To Streamline Your TDD Process
Spork || How To Streamline Your TDD Process
Arik Fraimovich
 
On the way to low latency (2nd edition)
On the way to low latency (2nd edition)On the way to low latency (2nd edition)
On the way to low latency (2nd edition)
Artem Orobets
 
Odoo Online platform: architecture and challenges
Odoo Online platform: architecture and challengesOdoo Online platform: architecture and challenges
Odoo Online platform: architecture and challenges
Odoo
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard Way
Cosimo Streppone
 
Writing Serverless Application in Java with comparison of 3 approaches: AWS S...
Writing Serverless Application in Java with comparison of 3 approaches: AWS S...Writing Serverless Application in Java with comparison of 3 approaches: AWS S...
Writing Serverless Application in Java with comparison of 3 approaches: AWS S...
Andrew Zakordonets
 
Html5 devconf nodejs_devops_shubhra
Html5 devconf nodejs_devops_shubhraHtml5 devconf nodejs_devops_shubhra
Html5 devconf nodejs_devops_shubhra
Shubhra Kar
 
Docker at OpenDNS
Docker at OpenDNSDocker at OpenDNS
Docker at OpenDNS
OpenDNS
 
Organizing the world of CQ rest infinitive possibilities by Arkadiusz Kita
Organizing the world of CQ rest infinitive possibilities by Arkadiusz KitaOrganizing the world of CQ rest infinitive possibilities by Arkadiusz Kita
Organizing the world of CQ rest infinitive possibilities by Arkadiusz Kita
AEM HUB
 
IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor Benchmarking
Brendan Gregg
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsGet Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java Applications
ScyllaDB
 
Spark summit2014 techtalk - testing spark
Spark summit2014 techtalk - testing sparkSpark summit2014 techtalk - testing spark
Spark summit2014 techtalk - testing spark
Anu Shetty
 
Using SLOs for Continuous Performance Optimizations of Your k8s Workloads
Using SLOs for Continuous Performance Optimizations of Your k8s WorkloadsUsing SLOs for Continuous Performance Optimizations of Your k8s Workloads
Using SLOs for Continuous Performance Optimizations of Your k8s Workloads
ScyllaDB
 
Standardising Swedish genomics analyses using nextflow
Standardising Swedish genomics analyses using nextflowStandardising Swedish genomics analyses using nextflow
Standardising Swedish genomics analyses using nextflow
Phil Ewels
 
Performance is a feature! - London .NET User Group
Performance is a feature! - London .NET User GroupPerformance is a feature! - London .NET User Group
Performance is a feature! - London .NET User Group
Matt Warren
 
Sparklens: Understanding the Scalability Limits of Spark Applications with R...
 Sparklens: Understanding the Scalability Limits of Spark Applications with R... Sparklens: Understanding the Scalability Limits of Spark Applications with R...
Sparklens: Understanding the Scalability Limits of Spark Applications with R...
Databricks
 
OSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
OSMC 2012 | Neues in Nagios 4.0 by Andreas EricssonOSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
OSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
NETWAYS
 
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Zabbix
 
Infrastructure coders logstash
Infrastructure coders logstashInfrastructure coders logstash
Infrastructure coders logstash
David Lutz
 
Ensuring Kubernetes Cost Efficiency across (many) Clusters - DevOps Gathering...
Ensuring Kubernetes Cost Efficiency across (many) Clusters - DevOps Gathering...Ensuring Kubernetes Cost Efficiency across (many) Clusters - DevOps Gathering...
Ensuring Kubernetes Cost Efficiency across (many) Clusters - DevOps Gathering...
Henning Jacobs
 
Java performance monitoring
Java performance monitoringJava performance monitoring
Java performance monitoring
Simon Ritter
 
Spork || How To Streamline Your TDD Process
Spork || How To Streamline Your TDD ProcessSpork || How To Streamline Your TDD Process
Spork || How To Streamline Your TDD Process
Arik Fraimovich
 
On the way to low latency (2nd edition)
On the way to low latency (2nd edition)On the way to low latency (2nd edition)
On the way to low latency (2nd edition)
Artem Orobets
 
Odoo Online platform: architecture and challenges
Odoo Online platform: architecture and challengesOdoo Online platform: architecture and challenges
Odoo Online platform: architecture and challenges
Odoo
 
Velocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard WayVelocity 2012 - Learning WebOps the Hard Way
Velocity 2012 - Learning WebOps the Hard Way
Cosimo Streppone
 
Writing Serverless Application in Java with comparison of 3 approaches: AWS S...
Writing Serverless Application in Java with comparison of 3 approaches: AWS S...Writing Serverless Application in Java with comparison of 3 approaches: AWS S...
Writing Serverless Application in Java with comparison of 3 approaches: AWS S...
Andrew Zakordonets
 
Html5 devconf nodejs_devops_shubhra
Html5 devconf nodejs_devops_shubhraHtml5 devconf nodejs_devops_shubhra
Html5 devconf nodejs_devops_shubhra
Shubhra Kar
 
Docker at OpenDNS
Docker at OpenDNSDocker at OpenDNS
Docker at OpenDNS
OpenDNS
 
Organizing the world of CQ rest infinitive possibilities by Arkadiusz Kita
Organizing the world of CQ rest infinitive possibilities by Arkadiusz KitaOrganizing the world of CQ rest infinitive possibilities by Arkadiusz Kita
Organizing the world of CQ rest infinitive possibilities by Arkadiusz Kita
AEM HUB
 
IntelON 2021 Processor Benchmarking
IntelON 2021 Processor BenchmarkingIntelON 2021 Processor Benchmarking
IntelON 2021 Processor Benchmarking
Brendan Gregg
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsGet Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java Applications
ScyllaDB
 
Spark summit2014 techtalk - testing spark
Spark summit2014 techtalk - testing sparkSpark summit2014 techtalk - testing spark
Spark summit2014 techtalk - testing spark
Anu Shetty
 
Using SLOs for Continuous Performance Optimizations of Your k8s Workloads
Using SLOs for Continuous Performance Optimizations of Your k8s WorkloadsUsing SLOs for Continuous Performance Optimizations of Your k8s Workloads
Using SLOs for Continuous Performance Optimizations of Your k8s Workloads
ScyllaDB
 
Standardising Swedish genomics analyses using nextflow
Standardising Swedish genomics analyses using nextflowStandardising Swedish genomics analyses using nextflow
Standardising Swedish genomics analyses using nextflow
Phil Ewels
 
Performance is a feature! - London .NET User Group
Performance is a feature! - London .NET User GroupPerformance is a feature! - London .NET User Group
Performance is a feature! - London .NET User Group
Matt Warren
 
Sparklens: Understanding the Scalability Limits of Spark Applications with R...
 Sparklens: Understanding the Scalability Limits of Spark Applications with R... Sparklens: Understanding the Scalability Limits of Spark Applications with R...
Sparklens: Understanding the Scalability Limits of Spark Applications with R...
Databricks
 
OSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
OSMC 2012 | Neues in Nagios 4.0 by Andreas EricssonOSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
OSMC 2012 | Neues in Nagios 4.0 by Andreas Ericsson
NETWAYS
 
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Erik Skytthe - Monitoring Mesos, Docker, Containers with Zabbix | ZabConf2016
Zabbix
 
Infrastructure coders logstash
Infrastructure coders logstashInfrastructure coders logstash
Infrastructure coders logstash
David Lutz
 
Ensuring Kubernetes Cost Efficiency across (many) Clusters - DevOps Gathering...
Ensuring Kubernetes Cost Efficiency across (many) Clusters - DevOps Gathering...Ensuring Kubernetes Cost Efficiency across (many) Clusters - DevOps Gathering...
Ensuring Kubernetes Cost Efficiency across (many) Clusters - DevOps Gathering...
Henning Jacobs
 
Java performance monitoring
Java performance monitoringJava performance monitoring
Java performance monitoring
Simon Ritter
 

Similar to Lab: JVM Production Debugging 101 (20)

JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
srisatish ambati
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
Jon Haddad
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
DataStax Academy
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
DataStax Academy
 
Elite Bug Squashing
Elite Bug SquashingElite Bug Squashing
Elite Bug Squashing
Tony Brown
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.com
Damien Krotkine
 
Hands on Performance Tuning - Mike Croft
Hands on Performance Tuning - Mike CroftHands on Performance Tuning - Mike Croft
Hands on Performance Tuning - Mike Croft
JAXLondon2014
 
Hands-on Performance Workshop - The science of performance
Hands-on Performance Workshop - The science of performanceHands-on Performance Workshop - The science of performance
Hands-on Performance Workshop - The science of performance
C2B2 Consulting
 
Hands-on Performance Tuning Lab - Devoxx Poland
Hands-on Performance Tuning Lab - Devoxx PolandHands-on Performance Tuning Lab - Devoxx Poland
Hands-on Performance Tuning Lab - Devoxx Poland
C2B2 Consulting
 
淺談 Java GC 原理、調教和 新發展
淺談 Java GC 原理、調教和新發展淺談 Java GC 原理、調教和新發展
淺談 Java GC 原理、調教和 新發展
Leon Chen
 
Advanced Operations
Advanced OperationsAdvanced Operations
Advanced Operations
DataStax Academy
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)
Jon Haddad
 
Troubleshooting Node.js
Troubleshooting Node.jsTroubleshooting Node.js
Troubleshooting Node.js
Igor Soarez
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
DataStax Academy
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in Production
DataStax Academy
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in Production
DataStax Academy
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at Parse
Travis Redman
 
Advanced Benchmarking at Parse
Advanced Benchmarking at ParseAdvanced Benchmarking at Parse
Advanced Benchmarking at Parse
MongoDB
 
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
Daniel Bryant
 
Infinispan from POC to Production
Infinispan from POC to ProductionInfinispan from POC to Production
Infinispan from POC to Production
C2B2 Consulting
 
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
JavaOne 2010: Top 10 Causes for Java Issues in Production and What to Do When...
srisatish ambati
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
Jon Haddad
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
DataStax Academy
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
DataStax Academy
 
Elite Bug Squashing
Elite Bug SquashingElite Bug Squashing
Elite Bug Squashing
Tony Brown
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.com
Damien Krotkine
 
Hands on Performance Tuning - Mike Croft
Hands on Performance Tuning - Mike CroftHands on Performance Tuning - Mike Croft
Hands on Performance Tuning - Mike Croft
JAXLondon2014
 
Hands-on Performance Workshop - The science of performance
Hands-on Performance Workshop - The science of performanceHands-on Performance Workshop - The science of performance
Hands-on Performance Workshop - The science of performance
C2B2 Consulting
 
Hands-on Performance Tuning Lab - Devoxx Poland
Hands-on Performance Tuning Lab - Devoxx PolandHands-on Performance Tuning Lab - Devoxx Poland
Hands-on Performance Tuning Lab - Devoxx Poland
C2B2 Consulting
 
淺談 Java GC 原理、調教和 新發展
淺談 Java GC 原理、調教和新發展淺談 Java GC 原理、調教和新發展
淺談 Java GC 原理、調教和 新發展
Leon Chen
 
Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)Diagnosing Problems in Production (Nov 2015)
Diagnosing Problems in Production (Nov 2015)
Jon Haddad
 
Troubleshooting Node.js
Troubleshooting Node.jsTroubleshooting Node.js
Troubleshooting Node.js
Igor Soarez
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
DataStax Academy
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in Production
DataStax Academy
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in Production
DataStax Academy
 
Benchmarking at Parse
Benchmarking at ParseBenchmarking at Parse
Benchmarking at Parse
Travis Redman
 
Advanced Benchmarking at Parse
Advanced Benchmarking at ParseAdvanced Benchmarking at Parse
Advanced Benchmarking at Parse
MongoDB
 
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
Daniel Bryant
 
Infinispan from POC to Production
Infinispan from POC to ProductionInfinispan from POC to Production
Infinispan from POC to Production
C2B2 Consulting
 

More from Tomer Gabel (20)

How shit works: Time
How shit works: TimeHow shit works: Time
How shit works: Time
Tomer Gabel
 
Nondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsNondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of Us
Tomer Gabel
 
Slaying Sacred Cows: Deconstructing Dependency Injection
Slaying Sacred Cows: Deconstructing Dependency InjectionSlaying Sacred Cows: Deconstructing Dependency Injection
Slaying Sacred Cows: Deconstructing Dependency Injection
Tomer Gabel
 
An Abridged Guide to Event Sourcing
An Abridged Guide to Event SourcingAn Abridged Guide to Event Sourcing
An Abridged Guide to Event Sourcing
Tomer Gabel
 
How shit works: the CPU
How shit works: the CPUHow shit works: the CPU
How shit works: the CPU
Tomer Gabel
 
How Shit Works: Storage
How Shit Works: StorageHow Shit Works: Storage
How Shit Works: Storage
Tomer Gabel
 
Java 8 and Beyond, a Scala Story
Java 8 and Beyond, a Scala StoryJava 8 and Beyond, a Scala Story
Java 8 and Beyond, a Scala Story
Tomer Gabel
 
The Wix Microservice Stack
The Wix Microservice StackThe Wix Microservice Stack
The Wix Microservice Stack
Tomer Gabel
 
Scala Refactoring for Fun and Profit (Japanese subtitles)
Scala Refactoring for Fun and Profit (Japanese subtitles)Scala Refactoring for Fun and Profit (Japanese subtitles)
Scala Refactoring for Fun and Profit (Japanese subtitles)
Tomer Gabel
 
Scala Refactoring for Fun and Profit
Scala Refactoring for Fun and ProfitScala Refactoring for Fun and Profit
Scala Refactoring for Fun and Profit
Tomer Gabel
 
Onboarding at Scale
Onboarding at ScaleOnboarding at Scale
Onboarding at Scale
Tomer Gabel
 
Scala in the Wild
Scala in the WildScala in the Wild
Scala in the Wild
Tomer Gabel
 
Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)
Tomer Gabel
 
Put Your Thinking CAP On
Put Your Thinking CAP OnPut Your Thinking CAP On
Put Your Thinking CAP On
Tomer Gabel
 
Leveraging Scala Macros for Better Validation
Leveraging Scala Macros for Better ValidationLeveraging Scala Macros for Better Validation
Leveraging Scala Macros for Better Validation
Tomer Gabel
 
A Field Guide to DSL Design in Scala
A Field Guide to DSL Design in ScalaA Field Guide to DSL Design in Scala
A Field Guide to DSL Design in Scala
Tomer Gabel
 
Functional Leap of Faith (Keynote at JDay Lviv 2014)
Functional Leap of Faith (Keynote at JDay Lviv 2014)Functional Leap of Faith (Keynote at JDay Lviv 2014)
Functional Leap of Faith (Keynote at JDay Lviv 2014)
Tomer Gabel
 
Scala Back to Basics: Type Classes
Scala Back to Basics: Type ClassesScala Back to Basics: Type Classes
Scala Back to Basics: Type Classes
Tomer Gabel
 
5 Bullets to Scala Adoption
5 Bullets to Scala Adoption5 Bullets to Scala Adoption
5 Bullets to Scala Adoption
Tomer Gabel
 
Nashorn: JavaScript that doesn’t suck (ILJUG)
Nashorn: JavaScript that doesn’t suck (ILJUG)Nashorn: JavaScript that doesn’t suck (ILJUG)
Nashorn: JavaScript that doesn’t suck (ILJUG)
Tomer Gabel
 
How shit works: Time
How shit works: TimeHow shit works: Time
How shit works: Time
Tomer Gabel
 
Nondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of UsNondeterministic Software for the Rest of Us
Nondeterministic Software for the Rest of Us
Tomer Gabel
 
Slaying Sacred Cows: Deconstructing Dependency Injection
Slaying Sacred Cows: Deconstructing Dependency InjectionSlaying Sacred Cows: Deconstructing Dependency Injection
Slaying Sacred Cows: Deconstructing Dependency Injection
Tomer Gabel
 
An Abridged Guide to Event Sourcing
An Abridged Guide to Event SourcingAn Abridged Guide to Event Sourcing
An Abridged Guide to Event Sourcing
Tomer Gabel
 
How shit works: the CPU
How shit works: the CPUHow shit works: the CPU
How shit works: the CPU
Tomer Gabel
 
How Shit Works: Storage
How Shit Works: StorageHow Shit Works: Storage
How Shit Works: Storage
Tomer Gabel
 
Java 8 and Beyond, a Scala Story
Java 8 and Beyond, a Scala StoryJava 8 and Beyond, a Scala Story
Java 8 and Beyond, a Scala Story
Tomer Gabel
 
The Wix Microservice Stack
The Wix Microservice StackThe Wix Microservice Stack
The Wix Microservice Stack
Tomer Gabel
 
Scala Refactoring for Fun and Profit (Japanese subtitles)
Scala Refactoring for Fun and Profit (Japanese subtitles)Scala Refactoring for Fun and Profit (Japanese subtitles)
Scala Refactoring for Fun and Profit (Japanese subtitles)
Tomer Gabel
 
Scala Refactoring for Fun and Profit
Scala Refactoring for Fun and ProfitScala Refactoring for Fun and Profit
Scala Refactoring for Fun and Profit
Tomer Gabel
 
Onboarding at Scale
Onboarding at ScaleOnboarding at Scale
Onboarding at Scale
Tomer Gabel
 
Scala in the Wild
Scala in the WildScala in the Wild
Scala in the Wild
Tomer Gabel
 
Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)Speaking Scala: Refactoring for Fun and Profit (Workshop)
Speaking Scala: Refactoring for Fun and Profit (Workshop)
Tomer Gabel
 
Put Your Thinking CAP On
Put Your Thinking CAP OnPut Your Thinking CAP On
Put Your Thinking CAP On
Tomer Gabel
 
Leveraging Scala Macros for Better Validation
Leveraging Scala Macros for Better ValidationLeveraging Scala Macros for Better Validation
Leveraging Scala Macros for Better Validation
Tomer Gabel
 
A Field Guide to DSL Design in Scala
A Field Guide to DSL Design in ScalaA Field Guide to DSL Design in Scala
A Field Guide to DSL Design in Scala
Tomer Gabel
 
Functional Leap of Faith (Keynote at JDay Lviv 2014)
Functional Leap of Faith (Keynote at JDay Lviv 2014)Functional Leap of Faith (Keynote at JDay Lviv 2014)
Functional Leap of Faith (Keynote at JDay Lviv 2014)
Tomer Gabel
 
Scala Back to Basics: Type Classes
Scala Back to Basics: Type ClassesScala Back to Basics: Type Classes
Scala Back to Basics: Type Classes
Tomer Gabel
 
5 Bullets to Scala Adoption
5 Bullets to Scala Adoption5 Bullets to Scala Adoption
5 Bullets to Scala Adoption
Tomer Gabel
 
Nashorn: JavaScript that doesn’t suck (ILJUG)
Nashorn: JavaScript that doesn’t suck (ILJUG)Nashorn: JavaScript that doesn’t suck (ILJUG)
Nashorn: JavaScript that doesn’t suck (ILJUG)
Tomer Gabel
 

Recently uploaded (20)

#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
 
Automation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From AnywhereAutomation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From Anywhere
Lynda Kane
 
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5..."Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...
Fwdays
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your UsersAutomation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Lynda Kane
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Buckeye Dreamin' 2023: De-fogging Debug Logs
Buckeye Dreamin' 2023: De-fogging Debug LogsBuckeye Dreamin' 2023: De-fogging Debug Logs
Buckeye Dreamin' 2023: De-fogging Debug Logs
Lynda Kane
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Image processinglab image processing image processing
Image processinglab image processing  image processingImage processinglab image processing  image processing
Image processinglab image processing image processing
RaghadHany
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
 
Automation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From AnywhereAutomation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From Anywhere
Lynda Kane
 
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5..."Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...
"Client Partnership — the Path to Exponential Growth for Companies Sized 50-5...
Fwdays
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your UsersAutomation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Lynda Kane
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Buckeye Dreamin' 2023: De-fogging Debug Logs
Buckeye Dreamin' 2023: De-fogging Debug LogsBuckeye Dreamin' 2023: De-fogging Debug Logs
Buckeye Dreamin' 2023: De-fogging Debug Logs
Lynda Kane
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Image processinglab image processing image processing
Image processinglab image processing  image processingImage processinglab image processing  image processing
Image processinglab image processing image processing
RaghadHany
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 

Lab: JVM Production Debugging 101

  • 1. Java Production Debugging 101 A Reversim Summit Lab, February, 2013
  • 3. Business Requirements Prod. Requirements Forensics Debugging Severely Hours, days, Timeframe limited weeks… Chain of Meaningless Sacred Custody Documentation Useful Sacred
  • 4. Endgame Production Debugging Forensics 1. Gather evidence 1. Identify crime in progress 2. Restore functionality 2. Gather evidence 3. Figure out what happened
  • 5. Our Forensic Process Gather Evidence Restore Production Analyze Findings Implement Solution Post-Mortem
  • 7. WHAT SHALL WE COLLECT?
  • 8. Our focus points for today • Thread dump • Heap dump • VM (especially GC) metrics • System metrics • Logs
  • 9. jstack • Minimalistic tool • Against a running process: jstack <pid> • Outputs to stdout • Identifies deadlocks
  • 10. jmap • Heap-dump from a running process – Lengthy process – Freezes VM • Some extras • Command: jmap –dump:format=b,file=<output> <pid>
  • 11. jstat • JVM metrics: classloader, JIT, GC • Tracking over time • Console-based • jstat –gcutil <pid> 5s
  • 13. jvisualvm • Combines most of the above, with GUI • Remote via X11 forwarding (dreadful!)
  • 15. Scenario 1 • Phone call in the middle of the night – “The application is stuck!” • What do you do?
  • 16. Scenario 2 • Looks familiar? – “The application is crawling to a halt!” – “So restart it.” – “OK, it‟s good now.” • This is a lie. – You will get another call.
  • 17. Scenario 3 • 1st tier support engineer (maybe you?) calls: – “I get OutOfMemoryExceptions on this service.” – “Restart it.” – “Already have. Happened again.” – “Well, shit.”
  • 20. GNU toolchain is your friend • bash, ps, grep, less, awk – „nuff said • … or: – http://gnuwin32.sourceforge.net/
  • 21. MAT • Eclipse plugin/standal one • Reads heap dumps • Easy drill- down
  • 24. Back to: Scenario 1 • What did we gather? – CPU – 100% single-core utilization – GC metrics – no useful data – Heap dump – no useful data – Thread dump • java.util.Regex * gazillion • Where the problem is implies…  what the problem is
  • 25. Back to: Scenario 2 • What did we gather? – CPU – 100% single-core utilization – Heap dump – no useful data – Thread dump – GC metrics • Frequent, long GCs (GC, FGC, FGCT) • Rapid HashMap insertions: recipe for disaster
  • 26. Back to: Scenario 3 • What did we gather? – CPU – low utilization – Thread dump – no useful data – GC metrics – high heap utilization, low GC – Heap dump • Predictably high number of strings • Strings are abnormally large • Strings contain entire HTML subset! • Substring/regex can be dangerous!
  • 27. Headache? Take two of these! AFTERWORD
  • 28. Adieu • Thank you for attending! • Presentation and demos: http://git.io/7LK4fw • Tomer Gabel – [email protected] – http://www.tomergabel.com/ – @tomerg
  • 29. Thank you our sponsors

Editor's Notes

  • #3: Picture source: CSI Las Vegas (http://flowtv.org/wp-content/uploads/2007/11/csi3.jpg)
  • #7: Image source: http://www.about-larnaca.info/2012/06/thief-is-caught-red-handed-in-kiti.html
  • #8: Invite discussion. Ask audience to point out different data that is (a) useful and (b) readily accessible. Limit to 3 minutes.Image source: http://lets-rap.com/wp-content/uploads/2011/05/house-md-d-house-md-1048019_1152_864.jpg (copyright Fox)
  • #9: Expound a bit on anything that hasn’t been raised in the earlier discussion. Limit to 2 minutes, less if possible.
  • #14: “All this and more” sales pitch. Mention the profiler.
  • #16: Actual scenario details: pathological regular expression in a service (http://swtch.com/~rsc/regexp/regexp1.html).Exhibited behavior: very high single-core CPU utilization. Little or no GC activity. Possible StackOverflow if left long enoughAnalysis: stack trace will exhibit very deep, repetitive call stack to java.util.* classes. Package and class names will indicate a regex issue.Bonus points to whomever recognizes the pathological regex scenario without looking at the code, beyond the “problematic Regex” axiom.Workaround: depends on the details, in some cases the offensive input can be deleted or routed to a dead letter queue. In this case, none – this has to be resolved at the algorithmic level.Image source: http://blog.rogersbroadcasting.com/billhart/2010/01/19/tuesday-january-19th-2010-dialing-miscue-avatar-can-kill-you-look-where-kellie-flew/
  • #17: Actual scenario details: GC storm as a result of exponentially-growing HashSet. Adding small elements to a java.collection.HashSet (or HashMap) rapidly without specifying an appropriate capacity is a recipe for disaster.Exhibited behavior: ~100% single-core CPU utilization. Very high GC activity – eden generation fills up rapidly, overflowing to old gen with FGCs in increasing frequency. Eventual OutOfMemoryException.Analysis: jstat –gcutil or tracking the graphs in VisualVM clearly exhibit high GC pressure and inability to clear up enough RAM on each GC cycle. Stack trace is hit-or-miss (may exhibit a thread very clearly working on HashMap.resize, but as likely to point at code generating the strings added to the HashSet). Heap dump will likely exhibit a HashSet with very high load factor and an appropriately high count of items in the map.Bonus points to whomever recognizes the exponential expansion scenario without looking at the code.Workaround: restart the service, and possibly set up a cron task to restart periodically. VM flags that watchdog OOM situations are useless because OOME fires way too late, if at all.Image source: http://pathogenomics.bham.ac.uk/blog/wp-content/uploads/You-Cant-Handle-the-Truth.jpg (A Few Good Men)
  • #18: Actual scenario details: Memory leak as a result of saving substrings from web calls. Substring actually references the original string, or in practical terms the resulting HTML from each website is kept in active memory until a large-enough allocation fires OOME. GC isn’t immediately overwhelmed because it mostly has large, long-lived objects, so once memory is defragmented (a single FGC cycle) it just doesn’t have much it can do about the memory pressure.Exhibited behavior: Service process dead. OOME clearly marked in the logs.Analysis: Save logs aside; restart application to collect more information. jstat –gcutil output and a pre-failure thread dump should be saved aside as a matter of good course, but are not useful in the analysis of this scenario. Either saving pre-failure heap dumps or turning on the -XX:+HeapDumpOnOutOfMemoryError JVM option is necessary to resolve this problem.Bonus points for legitimate scenario suggestions at this point, but cut it off very quickly as a good lesson in “not jumping ahead of our data.”Workaround: restart the service, and possibly set up a cron task to restart periodically. VM flags that watchdog OOM situations can help.
  • #19: 5-10 minutesImage source: http://www.penny-arcade.com/comic/2002/02/22
  • #20: Image source: http://www.forensicinnovations.com/blog/wp-includes/images/mobilekit.jpg
  • #23: Image source: http://etc.usf.edu/clipart/28300/28384/brain_28384.htmH. Newell Martin, The Human Body (New York: Henry Holt and Company, 1917) 145
  • #24: Image source: http://www.ign.com/articles/2008/03/28/guitar-hero-aerosmith-first-look?page=2 (Guitar Hero: Aerosmith @ IGN)
  • #25: Follow through on evidence in the form of discussion (who used…? What did you find?) – don’t discount useful alternative theories!Forensic analysis: read through stack trace. Identify likely culprit. Bonus points to whomever yells “pathological regex” first.
  • #26: Heap dump actually has useful data (hashmap capacity, size and load factor) but it’s not likely anyone will notice that.Forensics: jstat –gcutil clearly shows GC storm. Stack trace to figure out the where, and common sense to figure out the why.
  • #27: Forensics: Analyzing the heap dump will clearly evidence an abundance of suspiciously-large strings; drilling into a couple of these will let us see that the strings kept in the heap are in fact full-blown HTML responses.Bonus points to whomever recognizes the substring scenario without looking at the code (nBA employees don’t count ).Evidently this behavior was changed in Java 7u6: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4513622