SlideShare a Scribd company logo
#CCCEU14
#CCCEU14 
TROUBLESHOOTING APACHE CLOUDSTACK
#CCCEU14 
JORIS VAN LIESHOUT 
Working @ Schuberg Philis since 2010 
Mission Critical 
+3 year IaaS team 
Part of the initial CS vs OS 
Started with CloudStack 2.2.14
#CCCEU14 
Reading the logs 
Understanding System VMs 
Use the source, Luke! 
DB API, hands off? 
Employee Cloud 
Questions 
Side notes 
• Have you worked with ACS? 
• CloudStack 4.4.1 
• XenServer 6.2.0 SP1 
• Default log settings 
AGENDA
#CCCEU14 
Less is more <=> 
• less /var/log/cloudstack/management/management-server. 
log 
Keep management server ids at hand 
• select * from cloud.mshost where removed is null; 
Stack traces 
• Look at the first instead of the last 
Search for API key 
Log4j 1.2 EnhancedPatternLayout 
• /etc/cloudstack/management/log4j-cloud.xml 
READING THE LOGS
#CCCEU14 
READING THE LOGS 
(Date, Time), Log priority 
Class 
• Will match a .java file in the source
#CCCEU14 
READING THE LOGS 
Thread Name 
Thread context 
Optional Job id
#CCCEU14 
READING THE LOGS 
When forwarding commands 
• Host_id-Sequence_nr 
• MgmtId 
• Can be the other server 
• Via 
• Command
#CCCEU14 
READING THE LOGS 
Find the first call 
• API call 
• API key 
• Name 
Thread name and 
first context 
Async creates a job
#CCCEU14 
READING THE LOGS 
Find the first call 
Thread name and 
first context 
Might creates a job 
Picked up by…
#CCCEU14 
READING THE LOGS 
Thread name and 
first context 
Sending… 
• Track sequence id 
• Keep an eye on MgmtID 
Executing…
#CCCEU14 
UNDERSTANDING SYSTEM VMS 
ssh to SVM 
• From hypervisor 
• Port 3922 
• /root/.ssh/id_rsa.cloud 
xensourse.log, SMlog and cloud/vmops.log 
XAPI call to vmops plugin
#CCCEU14 
USE THE SOURCE, LUKE! 
GitHub 
• https://github.com/apache/cloudstack 
Eclipse for Java EE 
• https://cwiki.apache.org/confluence/display/CLOUDSTACK/ 
Using+Eclipse+With+CloudStack 
DevCloud4 
• https://github.com/imduffy15/devcloud4
#CCCEU14 
Read only! 
Unless… 
• After code review 
• Bug solution also changes db 
Why? 
• Change state => data mismatch 
• Incorrect value => ACS start fails 
• DB data is not always leading 
• Effects of DB change can stick around 
Marvin and CloudMonkey 
• And find a way out without DB change 
DB API, HANDS OFF!
#CCCEU14 
EMPLOYEE CLOUD 
Realistic workload 
Use as UAT environment 
To reproduce bugs and workarounds 
Technology test ground
#CCCEU14 
jvanlieshout@schubergphilis.com 
@JorizvL 
QUESTIONS?

More Related Content

What's hot (19)

PDF
Docker-Vancouver Meetup - March 18, 2014 - Contain(erize) the tests - Mark Ei...
bacongobbler
 
PDF
F5 Automation and service discovery
Scott van Kalken
 
PPTX
Windows Azure PowerShell CmdLets
Pavel Revenkov
 
PPT
Spark Streaming Info
Doug Chang
 
PPTX
Container Monitoring with Sysdig
Sreenivas Makam
 
PDF
Ansible testing
Scott van Kalken
 
PDF
Docker on AWS OpsWorks
Jonathan Weiss
 
PDF
EKS에서 Opentelemetry로 코드실행 모니터링하기 - 신재현 (인덴트코퍼레이션) :: AWS Community Day Online...
AWSKRUG - AWS한국사용자모임
 
PDF
Docker Introduction
w_akram
 
PDF
Performance testing meets the cloud - Artem Shendrikov
Aneta Kołosowska (Wiśniewska)
 
PDF
Testing Ansible with Jenkins and Docker
Dennis Rowe
 
PDF
Deploying PHP Applications with Ansible
Orestes Carracedo
 
PPTX
Managing Large Selenium Grid
dimakovalenko
 
PDF
Docker orchestration using core os and ansible - Ansible IL 2015
Leonid Mirsky
 
PPTX
Monitor-Driven Development Using Ansible
Itamar Hassin
 
PPTX
Automation of Active Directory's Deployments on AWS
Devoteam Revolve
 
PDF
Ansible Case Studies
Greg DeKoenigsberg
 
PPTX
How to work with Selenium Grid and Cloud Solutions
Noam Zakai
 
Docker-Vancouver Meetup - March 18, 2014 - Contain(erize) the tests - Mark Ei...
bacongobbler
 
F5 Automation and service discovery
Scott van Kalken
 
Windows Azure PowerShell CmdLets
Pavel Revenkov
 
Spark Streaming Info
Doug Chang
 
Container Monitoring with Sysdig
Sreenivas Makam
 
Ansible testing
Scott van Kalken
 
Docker on AWS OpsWorks
Jonathan Weiss
 
EKS에서 Opentelemetry로 코드실행 모니터링하기 - 신재현 (인덴트코퍼레이션) :: AWS Community Day Online...
AWSKRUG - AWS한국사용자모임
 
Docker Introduction
w_akram
 
Performance testing meets the cloud - Artem Shendrikov
Aneta Kołosowska (Wiśniewska)
 
Testing Ansible with Jenkins and Docker
Dennis Rowe
 
Deploying PHP Applications with Ansible
Orestes Carracedo
 
Managing Large Selenium Grid
dimakovalenko
 
Docker orchestration using core os and ansible - Ansible IL 2015
Leonid Mirsky
 
Monitor-Driven Development Using Ansible
Itamar Hassin
 
Automation of Active Directory's Deployments on AWS
Devoteam Revolve
 
Ansible Case Studies
Greg DeKoenigsberg
 
How to work with Selenium Grid and Cloud Solutions
Noam Zakai
 

Viewers also liked (20)

PDF
De Mensajería hacia Logs con Apache Kafka
Jorge Esteban Quilcate Otoya
 
DOC
Syed Vali Resume
Syed Vali
 
DOC
resume
narendra varma
 
PDF
E10132
prathap kumar
 
DOCX
Troubleshooting guide for apache 2.2 service.
Wielbert Chouphen Collinson
 
PDF
WebLogic on ODA - Oracle Open World 2013
Michel Schildmeijer
 
ODP
Apache logs monitoring
Umair Amjad
 
PPTX
ApacheCon-HBase-2016
Jayesh Thakrar
 
PPTX
WebLogic Filtering ClassLoader and ClassLoader Analysis Tool Demo
Jeffrey West
 
PDF
WebLogic in Practice: SSL Configuration
Simon Haslam
 
PDF
SOA Suite 12c Customer implementation
Michel Schildmeijer
 
PPT
Web Server(Apache),
webhostingguy
 
PPT
WebLogic Performance on SOLARIS SPARC Servers
M. Fevzi Korkutata
 
PDF
Weblogic Cluster advanced performance tuning
Aditya Bhuyan
 
DOC
weblogic perfomence tuning
prathap kumar
 
PDF
Deployment Best Practices on WebLogic Server (DOAG IMC Summit 2013)
Andreas Koop
 
PDF
Oracle Fusion Middleware Infrastructure Best Practices
Revelation Technologies
 
PDF
Performance Tuning Oracle Weblogic Server 12c
Ajith Narayanan
 
PPT
WebLogic Developer Webcast 5: Troubleshooting and Testing with WebLogic, Soap...
Jeffrey West
 
PDF
How To Install and Configure Apache SSL on CentOS 7
VCP Muthukrishna
 
De Mensajería hacia Logs con Apache Kafka
Jorge Esteban Quilcate Otoya
 
Syed Vali Resume
Syed Vali
 
Troubleshooting guide for apache 2.2 service.
Wielbert Chouphen Collinson
 
WebLogic on ODA - Oracle Open World 2013
Michel Schildmeijer
 
Apache logs monitoring
Umair Amjad
 
ApacheCon-HBase-2016
Jayesh Thakrar
 
WebLogic Filtering ClassLoader and ClassLoader Analysis Tool Demo
Jeffrey West
 
WebLogic in Practice: SSL Configuration
Simon Haslam
 
SOA Suite 12c Customer implementation
Michel Schildmeijer
 
Web Server(Apache),
webhostingguy
 
WebLogic Performance on SOLARIS SPARC Servers
M. Fevzi Korkutata
 
Weblogic Cluster advanced performance tuning
Aditya Bhuyan
 
weblogic perfomence tuning
prathap kumar
 
Deployment Best Practices on WebLogic Server (DOAG IMC Summit 2013)
Andreas Koop
 
Oracle Fusion Middleware Infrastructure Best Practices
Revelation Technologies
 
Performance Tuning Oracle Weblogic Server 12c
Ajith Narayanan
 
WebLogic Developer Webcast 5: Troubleshooting and Testing with WebLogic, Soap...
Jeffrey West
 
How To Install and Configure Apache SSL on CentOS 7
VCP Muthukrishna
 
Ad

Similar to Troubleshooting Apache CloudStack at #ccceu14 by @jorizvl (20)

PDF
Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)
Codemotion
 
PDF
Road to Opscon (Pisa '15) - DevOoops
Gianluca Varisco
 
PDF
Austin Web Architecture
joaquincasares
 
PDF
IBM Think Session 8598 Domino and JavaScript Development MasterClass
Paul Withers
 
PPTX
Docker 1.11 Presentation
Sreenivas Makam
 
PDF
TechBeats #2
applausepoland
 
PPTX
Designing A Time bound resource download URL
Runcy Oommen
 
PDF
AtlasCamp 2015: The age of orchestration: From Docker basics to cluster manag...
Atlassian
 
PDF
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
Radhika Puthiyetath
 
PDF
The age of orchestration: from Docker basics to cluster management
Nicola Paolucci
 
PDF
Backup and Restore SQL Server Databases in Microsoft Azure
Datavail
 
PPTX
Become an Automation Ninja in 60 Minutes
Michael Rüefli
 
PDF
Automating hard things may 2015
Mark Baker
 
PPTX
PostgreSQL and Linux Containers
Jignesh Shah
 
PPTX
Secure360 - Attack All the Layers! Again!
Scott Sutherland
 
PPTX
Power of Azure Devops
Azure Riyadh User Group
 
PPTX
Getting Started with Docker
Geeta Vinnakota
 
PDF
ITB2017 - Keynote
Ortus Solutions, Corp
 
PDF
Scala at Treasure Data
Taro L. Saito
 
PDF
GoDocker presentation
Olivier Sallou
 
Gianluca Varisco - DevOoops (Increase awareness around DevOps infra security)
Codemotion
 
Road to Opscon (Pisa '15) - DevOoops
Gianluca Varisco
 
Austin Web Architecture
joaquincasares
 
IBM Think Session 8598 Domino and JavaScript Development MasterClass
Paul Withers
 
Docker 1.11 Presentation
Sreenivas Makam
 
TechBeats #2
applausepoland
 
Designing A Time bound resource download URL
Runcy Oommen
 
AtlasCamp 2015: The age of orchestration: From Docker basics to cluster manag...
Atlassian
 
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
Radhika Puthiyetath
 
The age of orchestration: from Docker basics to cluster management
Nicola Paolucci
 
Backup and Restore SQL Server Databases in Microsoft Azure
Datavail
 
Become an Automation Ninja in 60 Minutes
Michael Rüefli
 
Automating hard things may 2015
Mark Baker
 
PostgreSQL and Linux Containers
Jignesh Shah
 
Secure360 - Attack All the Layers! Again!
Scott Sutherland
 
Power of Azure Devops
Azure Riyadh User Group
 
Getting Started with Docker
Geeta Vinnakota
 
ITB2017 - Keynote
Ortus Solutions, Corp
 
Scala at Treasure Data
Taro L. Saito
 
GoDocker presentation
Olivier Sallou
 
Ad

Recently uploaded (20)

PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PDF
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PDF
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
PPTX
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PPTX
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
PPTX
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
PPTX
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Alexander Marshalov - How to use AI Assistants with your Monitoring system Q2...
VictoriaMetrics
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
[Solution] Why Choose the VeryPDF DRM Protector Custom-Built Solution for You...
Lingwen1998
 
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
In From the Cold: Open Source as Part of Mainstream Software Asset Management
Shane Coughlan
 
Agentic Automation: Build & Deploy Your First UiPath Agent
klpathrudu
 

Troubleshooting Apache CloudStack at #ccceu14 by @jorizvl

  • 3. #CCCEU14 JORIS VAN LIESHOUT Working @ Schuberg Philis since 2010 Mission Critical +3 year IaaS team Part of the initial CS vs OS Started with CloudStack 2.2.14
  • 4. #CCCEU14 Reading the logs Understanding System VMs Use the source, Luke! DB API, hands off? Employee Cloud Questions Side notes • Have you worked with ACS? • CloudStack 4.4.1 • XenServer 6.2.0 SP1 • Default log settings AGENDA
  • 5. #CCCEU14 Less is more <=> • less /var/log/cloudstack/management/management-server. log Keep management server ids at hand • select * from cloud.mshost where removed is null; Stack traces • Look at the first instead of the last Search for API key Log4j 1.2 EnhancedPatternLayout • /etc/cloudstack/management/log4j-cloud.xml READING THE LOGS
  • 6. #CCCEU14 READING THE LOGS (Date, Time), Log priority Class • Will match a .java file in the source
  • 7. #CCCEU14 READING THE LOGS Thread Name Thread context Optional Job id
  • 8. #CCCEU14 READING THE LOGS When forwarding commands • Host_id-Sequence_nr • MgmtId • Can be the other server • Via • Command
  • 9. #CCCEU14 READING THE LOGS Find the first call • API call • API key • Name Thread name and first context Async creates a job
  • 10. #CCCEU14 READING THE LOGS Find the first call Thread name and first context Might creates a job Picked up by…
  • 11. #CCCEU14 READING THE LOGS Thread name and first context Sending… • Track sequence id • Keep an eye on MgmtID Executing…
  • 12. #CCCEU14 UNDERSTANDING SYSTEM VMS ssh to SVM • From hypervisor • Port 3922 • /root/.ssh/id_rsa.cloud xensourse.log, SMlog and cloud/vmops.log XAPI call to vmops plugin
  • 13. #CCCEU14 USE THE SOURCE, LUKE! GitHub • https://github.com/apache/cloudstack Eclipse for Java EE • https://cwiki.apache.org/confluence/display/CLOUDSTACK/ Using+Eclipse+With+CloudStack DevCloud4 • https://github.com/imduffy15/devcloud4
  • 14. #CCCEU14 Read only! Unless… • After code review • Bug solution also changes db Why? • Change state => data mismatch • Incorrect value => ACS start fails • DB data is not always leading • Effects of DB change can stick around Marvin and CloudMonkey • And find a way out without DB change DB API, HANDS OFF!
  • 15. #CCCEU14 EMPLOYEE CLOUD Realistic workload Use as UAT environment To reproduce bugs and workarounds Technology test ground

Editor's Notes

  • #3: Everyone has seen <screenshot> Troubleshooting can be daunting <CLICK> Is it a infra issue, bug, something else Quick poll: Dev, Ops, Other? Today talk about: Fair share of outages (boot launch failed) From operational perspective: Some bugs only in env with PRD workload With this: Better DevOps relation
  • #4: As MCE (Mission Critical Engineer) CloudStack vs OpenStack, easily won by CS
  • #5: Bit much text, understanding will help a lot. Rohit api call life. How see commands are forwarded. Check out presentation Sten Use the source code and Dev tools Common discussion, my take on it Best tip I can give you Q: If time <CLICK> Notes: Ask: Who has? Based on DevCloud4 (4.4.1 and XS62EPS1) We never had the need to adjust log4j config
  • #6: Instead of grep, tail Or something like LogStash or Splunk When command gets forwarded to other management server And host ids (inc CVM and SSVM) Last ST frequently is result of earlier fault Search for instance name, network name, api call or API key API key: trace what user did. F.I. Citrix Studio *Brower plugin to see calls: Firefox, FireBug
  • #7: Bake down of a log line Remove date and time in all examples Debug is default Class (as of 4.4 abbreviated) Matches a source file on git
  • #8: Name and Number Can be many. F.I. DirectAgent thread ID: Unique per cycle Usually search for this combination For Async calls a Job id to track the work across threads We’ll look at this in a bit
  • #9: Forwarding commands to other hosts: hosts, CVMs, SSVMs host_id can be found in DB: cloud.hosts seq for tracking MgmtId might be other server when host managed by that server Via is the same as host_id. As of 4.4 this is expanded to full name.
  • #10: Finding call: API call or key or Name of Instance, Network, template, etc. between ===START=== and ===END=== Thread name and context can been seen as conversation. Async call will create a job for async execution
  • #11: Search for Job-id Again search for Thread name and context Repeat for next job can jump management server Different Thread and context
  • #12: In a conversation a task for a host is send to thread Picked up by thread and executed can jump management server In case of XenServer result in XAPI calls
  • #13: Very brief on SVMs: check out next talk by Sten This applies to XenServer get control ip from ACS (169.254.0.0) Depending on the call Calls for SVMs using vmops
  • #14: Have a look at GitHub clone essential Eclipse recommended DevCloud4 good to have
  • #15: As read only source really valuable Although most info using API Really careful Unless Know how the code responds to change in DB Stay in like with fix in code State change => example: NIC not removed, ref tables Incorrect state => example: instance state Expunged, network state Destroyed host ping => db is backup instance removed nic still there network will never return to allocated Both very powerful example => host in alert with pingtimeout, cluster unmanage.
  • #16: Load bugs, race conditions Also hardware load Not only ACS upgrades also other components: Hypervisors, storage, networking Tested solution for snapshot Dom0 load bug Adoption of Cloud tech For use Chef, Graphite, new OS, many PoCs Employee rack now is Employee cloud => lower power consumption