SlideShare a Scribd company logo
Jeff SlyPrincipal IT Architectjsly@nuskin.comCase Study Nagios @ Nu Skin
Who is in the Audience?How many of you are:Suppliers of Nagios or some value add-on for Nagios?Customers using Nagios?Just implementing Nagios or expanding implementation?Using NagiosXI?
Who is Nu Skin?
Our Technology FootprintEcommerce – Home grown Applications – Java, EJB, ABAP, .NetDatabases – Oracle, MySQL, MSSQLOS – HPUX, Redhat, Windows, VMWareERP – SAP Supply Chain, CRM, FIDatacenters – 6 locations in 6 countriesOffices – 50 Countries
Monitoring GoalsMonitoring presents operations with a completely integrated global view. Good monitoring is proactive; it helps teams prevent problems from becoming outages.  Good monitoring helps minimize outage downtime, quickly identify root cause and contacts correct people.
Centralized Monitoring System
Our Monitoring HistoryWe tried for 10 years…
Do it all in ‘One Tool Projects’One Monitoring Tool to rule them all:Mercury SiteScopeRemedy Help DeskHP OpenViewQuest FoglightHome grown (several)One monitoring personHe decided to quit!
Could never get everythingAll Failed – We always gave up! Why?Servers and agents that were proprietaryHuge foot print inefficient performanceSteep learning curveVery expensiveUpdates costly and very time consumingSystem Administrators like their own scripts, can see what they are doing
Resulting Monitoring IssuesTried to make Operations clearing house for all warnings and alerts from 10+ toolsOperations was overwhelmed Took 4 process steps and lots of software to notify of critical failuresMost Administrators setup own private monitoring to receive warningsMany false notificationsLate notifications
As Is (start of project)Our Business Customers were Unhappy
Old Monitoring Work FlowFour steps to notify system administrator
Step 1: Everything Emails OperationsEmail HelpDeskErrorNetworkHP NNMSystemScriptsNagiosDatabaseFoglightSiteScope 8BACSitescope 6
Step 2: Operations Opens EmailEmail HelpDeskErrorNetworkHP NNMSystemScriptsNagiosDatabaseFoglightSiteScope 8BACSitescope 6
Step 3: Operations Checks SourceEmail HelpDeskErrorNetworkHP NNMSystemScriptsNagiosDatabaseFoglightSiteScope 8BACSitescope 6
Step 4: Operations Calls adminEmail HelpDeskErrorNetworkHP NNMSystemScriptsNagiosDatabaseFoglightSiteScope 8BACSitescope 6
Inventory of Existing ChecksRegular Expression found on Web Page Monitoring	HTTP Check - Up or Down	Ping Host Up or Down	PORT monitoring	FTP checking	SMTP checking	SNMP monitoring - no trap catching yet	RadiusDNS monitoring	Disk Space monitoring	CPU and Load Average monitoringMemory Monitoring
Inventory of Existing ChecksService monitoring	Transaction monitoring - page load times – performance graph	Website click through (Webinject not working)	Log File monitor –parse for ErrorsJava HEAP, Thread, Threadlock monitoring	Apache thread and worker count monitors	Ecommerce shop monitorsEmail can send and receive	SQL query ODBC (catalog ODBC had bugs)
To BeHappy Customers
Key IdeasMoMTool RequirementsShared OwnershipLowest LevelNagios Monitor Method
Idea 1: MoMOur first “break though” was the idea that even through we needed a centralized view for all monitoring that did not mean all monitoring had to be done by one monitoring tool.  We had to pick a “Manager    of the Monitors” (MoM)    to bring together the best of    breed monitoring.
MoM - according to Gartner
Idea 2: Tool RequirementsOpen – not proprietary and closedMainstream – wanted good native support and strong communityInterface – to 3rd Party MonitoringFlexible – adapt to many types of monitoringEfficient – minimal foot print on production servers, not chatty on networkNotification – granular controlReliable – good clean architectureUsability – GUI interface, reporting
Idea 3: Shared OwnershipCore teamOperation of Monitoring Environment: backups, upgrades, & custom plug-insMonitoring ExpertsTrainingMonitoring leads in Development & Admin teams:Set up own monitorsKeep own monitors currentAdjust monitorsIf something is not monitored not core teams fault
Operations Owned MonitoringEmail HelpDeskErrorNetworkHP NNMSystemScriptsNagiosDatabaseFoglightSiteScope 8BACSitescope 6
Team Leads Own MonitoringOperationsNetworkAsiaSystemScriptsEuropeSAPDatabaseWeb
How to Guides
Nagios Conference 2011 - Jeff Sly - Case Study Nagios @ Nu Skin
How to Setup NRPE - HPUX
Idea 4: Lowest Level Handle alerts at the lowest possible level in the organizationOnly forward alerts if not handled at lower levels before they become critical
Handle events at lowest levelOperationsNetworkAsiaSystemScriptsEuropeSAPDatabaseWeb
Only forward unhandled alertsNetworkAsiaSystemScriptsEuropeSAPDatabaseWeb
Idea 5: Nagios Monitor MethodChoose the Nagios Monitoring MethodActive Check from Nagios Server (normal)Active Check performed by remote clientNRPE, NSClientPassive Check – Listen to 3rd party monitorsNSCA
Active Local CheckWebHTTP or PingNagiosDBMonitorUnixDBWin
Active Remote Check - UXWebNagiosCPU, RAM(NRPE)DBMonitorUnixDBWin
Active Remote Check - WinWebNagiosDBMonitorCPU, RAM(NSClient)UnixDBWin
Passive 3rd Party AlertWebNagiosDBMonitor3rd Party Alert NSCAUnixDBWin3rd Party Check DB
Bonus Idea - TuneTune the databaseAdd Ram Drive
Tune the Database Modify contents of the /etc/my.cnf [mysqld] section.tmp_table_size=524288000max_heap_table_size=524288000table_cache=768set-variable=max_connections=100wait_timeout=7800query_cache_size = 12582912query_cache_limit=80000thread_cache_size = 4join_buffer_size = 128Khttp://web3us.comInfo on: MySQL Tuning, Nagios Tuning
RAM DriveCreate a RAM disk for Nagios tempory filesI created a ramdisk by adding the following entry to the /etc/fstab file:  none                  /mnt/ram               tmpfs   size=500M           0 0Mount the disk using the following commands  # mkdir -p /mnt/ram; mount /mnt/ramVerify the disk was mounted and created  # df -kModify the /usr/local/nagios/etc/nagios.cfg file with the following tuned parameterstemp_file=/mnt/ram/nagios.tmptemp_path=/mnt/ramstatus_file=/mnt/ram/status.datprecached_object_file=/mnt/ram/objects.precacheobject_cache_file=/mnt/ram/objects.cache
Implementation MethodologySite SurveyInventory existing monitorsProof of conceptBuild new environmentMigrate monitors from each platform to Nagios, one at a timeIntegrate OEM, and to send monitors to Nagios
Three Project PhasesDeliver something useful in each phaseBuild a level at a time
Phase ISet up a pilot of Nagios XI using Trial License. Set up Foglight monitoring of JVM (Java Virtual Machine). Purchase NagiosXI and Consulting SupportBring in a consultant for two weeks to help set up the architecture and help us work with the system. Documentation Web Site for Nagios learning's and “How to guides”Define a set of standards and guidelines to follow to help aid an effective monitoring process.Backups on Running on Production Nagios ServerSet up services which aren't being caught right now and move a few of the important services over to the new Nagios XI monitoring system. Test Nagios plugins and server performance
Phase IIMigrate off of Sitescope 6 and shutdownMigrate off of Sitescope 8 and shutdownDecommission FoglightClean up the old monitoring serverMigrate the network team from old Nagios to core NagiosXI systemSet up standby NagiosXI system, cron to replicate weeklyResearch missing alerts and add them to the new NagiosXI system
Phase IIIImplement Global MonitoringAdd monitors for existing international systems Add monitors using JMX to monitor Java serversNagios Remote Process Execution (NRPE) to monitor remotelyRemote Monitoring for Windows Servers (NS Client++)Implement notification and escalation of alertsAdd monitors for critical business functions
Phase III continued…Corporate EnhancementsRequest recurring down time enhancement from Ethan GalstadAutomate refresh of NagiosXI standby systemBuild Network MapRetire Windows SiteScopeAdd monitors for phone systems Add monitors to data center (UPS, Temperature, Humidity)Integrate to SAP Tidal monitoring tool
Phase III continued…BusinessBusiness review and approve SLA (using business terms)Monitor both the Business Functions and the individual point devices that provide the Business FunctionFollow the Sun with Eyes on Glass.Training How to setup alertsHow to receive alertsHow to report on performance graphsCreate a new Dashboard for HelpDesk and International IT Staff
Inventory of Monitor Checks
Inventory continued…
Nagios XI Interface
Nagios Conference 2011 - Jeff Sly - Case Study Nagios @ Nu Skin
Nagios Conference 2011 - Jeff Sly - Case Study Nagios @ Nu Skin
Data Centers in 7 Countries
Nagios Conference 2011 - Jeff Sly - Case Study Nagios @ Nu Skin
IT Operations
IT Team Managers
SummaryMoM ~ Manager of ManagersAllow specialized toolsTool Requirements, enough but not allOwnership for implementation, sharedHandle alerts, lowest level in organizationChoose Nagios monitoring method
Tips, Tricks & DemosNagios XI Large Implementation Day 3, 2:00 Track 3 (Nate Broderick)3 DemosPerformance challenges and solutionsIntegrating monitoring solutions OracleMigrating from BAC & FoglightCustomizationGraphing, and more.

More Related Content

PPT
Nagios Conference 2011 - Dave Williams - Nagios In The Real World - The Datac...
ODP
Nagios
ODP
Nagios Conference 2011 - Kimbrough Henley - Using Nagios To Monitor ServiceDesk
ODP
Nagios Conference 2011 - Daniel Wittenberg - Scaling Nagios At A Giant Insur...
ODP
Nagios Conference 2011 - Ethan Galstad - Keynote Presentation
PDF
Monitor Your Business
PPTX
Nagios Conference 2014 - Sam Lansing - Advanced Features of Nagios XI
ODP
Nagios Conference 2014 - Eric Mislivec - Getting Started With Nagios Core
Nagios Conference 2011 - Dave Williams - Nagios In The Real World - The Datac...
Nagios
Nagios Conference 2011 - Kimbrough Henley - Using Nagios To Monitor ServiceDesk
Nagios Conference 2011 - Daniel Wittenberg - Scaling Nagios At A Giant Insur...
Nagios Conference 2011 - Ethan Galstad - Keynote Presentation
Monitor Your Business
Nagios Conference 2014 - Sam Lansing - Advanced Features of Nagios XI
Nagios Conference 2014 - Eric Mislivec - Getting Started With Nagios Core

What's hot (20)

PDF
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
PDF
Mike Weber - Nagios and Group Deployment of Service Checks
PPT
Nagios
ODP
Nagios Conference 2014 - Shamas Demoret - An Overview of Nagios Solutions
PPTX
Nagios Conference 2014 - Jim Prins - Passive Monitoring with Nagios
PPTX
Nagios Conference 2011 - Michael Medin - NSClient++: Whats New
ODP
Nagios Conference 2014 - Mike Weber - Nagios Rapid Deployment Options
ODP
Nagios Conference 2014 - Andy Brist - Nagios XI Failover and HA Solutions
PPT
Nagios Conference 2014 - Dave Williams - Multi-Tenant Nagios Monitoring
PPTX
Nagios World Conference 2015 - Scott Wilkerson Opening
PDF
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
PDF
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
PDF
Nagios Conference 2012 - Scott Wilkerson - Passive Monitoring Solutions For R...
PDF
Zabbix Monitoring Platform
PPTX
Zabbix
PPTX
Nagios Conference 2011 - Nate Broderick - Nagios XI Large Implementation Tips...
PDF
Zabbix Conference LatAm 2016 - Rodrigo Mohr - Challenges on Large Env with Or...
PPTX
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
PDF
Eric Loyd - Fractal Nagios
PDF
Nagios Conference 2014 - Jeremy Rust - Avoiding Downtime Using Linux High Ava...
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Mike Weber - Nagios and Group Deployment of Service Checks
Nagios
Nagios Conference 2014 - Shamas Demoret - An Overview of Nagios Solutions
Nagios Conference 2014 - Jim Prins - Passive Monitoring with Nagios
Nagios Conference 2011 - Michael Medin - NSClient++: Whats New
Nagios Conference 2014 - Mike Weber - Nagios Rapid Deployment Options
Nagios Conference 2014 - Andy Brist - Nagios XI Failover and HA Solutions
Nagios Conference 2014 - Dave Williams - Multi-Tenant Nagios Monitoring
Nagios World Conference 2015 - Scott Wilkerson Opening
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Nagios Conference 2012 - Scott Wilkerson - Passive Monitoring Solutions For R...
Zabbix Monitoring Platform
Zabbix
Nagios Conference 2011 - Nate Broderick - Nagios XI Large Implementation Tips...
Zabbix Conference LatAm 2016 - Rodrigo Mohr - Challenges on Large Env with Or...
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Eric Loyd - Fractal Nagios
Nagios Conference 2014 - Jeremy Rust - Avoiding Downtime Using Linux High Ava...
Ad

Viewers also liked (6)

PDF
Nagios Conference 2011 - Jared Bird - Using Nagios As A Security Tool
PDF
Management as a Service - MaaS for IT
PPTX
Nagios Conference 2013 - James Clark - Nagios On-Call Rotation
ODP
Nagios Conference 2012 - Mike Weber - NRPE
ODP
Nagios Conference 2014 - Andy Brist - Intro to Incident Manager
PPTX
Nagios XI Best Practices
Nagios Conference 2011 - Jared Bird - Using Nagios As A Security Tool
Management as a Service - MaaS for IT
Nagios Conference 2013 - James Clark - Nagios On-Call Rotation
Nagios Conference 2012 - Mike Weber - NRPE
Nagios Conference 2014 - Andy Brist - Intro to Incident Manager
Nagios XI Best Practices
Ad

Similar to Nagios Conference 2011 - Jeff Sly - Case Study Nagios @ Nu Skin (20)

PPTX
Functionality, security and performance monitoring of web assets (e.g. Joomla...
PDF
LOPSA East 2013 - Building a More Effective Monitoring Environment
PDF
Multi Layer Monitoring V1
PPTX
Monitoring Oracle SOA Suite - UKOUG Tech15 2015
PDF
Proactive monitoring tools or services - Open Source
PPT
Nagios Conference 2012 - Nate Broderick - Bringing Nagios XI Into Your Business
ODP
Monitoring - When To start (or Metrics led development)
PPTX
Nagios Conference 2014 - Simon Finch - Monitoring Maturity A 16 Year Journey
PDF
OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl
PDF
Monitoring of OpenNebula installations
PDF
Have you been stalking your servers?
PPTX
Nagios Conference 2014 - Nick Winn - Using Nagios XI to Empower Your Develope...
PDF
OSMC 2011 | Monitoring solutions for the next decade by Oliver Jan
PDF
Have you been stalking your servers?
ODP
Open Source Monitoring Tools Shootout
PDF
Nagios Conference 2007 | Nagios in very large Environments by Werner Neunteufl
PPTX
What does "monitoring" mean? (FOSDEM 2017)
PDF
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
PPTX
Evolution of Monitoring and Prometheus (Dublin 2018)
ODP
Monitoring shootout loadays
Functionality, security and performance monitoring of web assets (e.g. Joomla...
LOPSA East 2013 - Building a More Effective Monitoring Environment
Multi Layer Monitoring V1
Monitoring Oracle SOA Suite - UKOUG Tech15 2015
Proactive monitoring tools or services - Open Source
Nagios Conference 2012 - Nate Broderick - Bringing Nagios XI Into Your Business
Monitoring - When To start (or Metrics led development)
Nagios Conference 2014 - Simon Finch - Monitoring Maturity A 16 Year Journey
OpenNebulaConf 2013 - Monitoring of OpenNebula installations by Florian Heigl
Monitoring of OpenNebula installations
Have you been stalking your servers?
Nagios Conference 2014 - Nick Winn - Using Nagios XI to Empower Your Develope...
OSMC 2011 | Monitoring solutions for the next decade by Oliver Jan
Have you been stalking your servers?
Open Source Monitoring Tools Shootout
Nagios Conference 2007 | Nagios in very large Environments by Werner Neunteufl
What does "monitoring" mean? (FOSDEM 2017)
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
Evolution of Monitoring and Prometheus (Dublin 2018)
Monitoring shootout loadays

More from Nagios (16)

PDF
Jesse Olson - Nagios Log Server Architecture Overview
PDF
Trevor McDonald - Nagios XI Under The Hood
PDF
Sean Falzon - Nagios - Resilient Notifications
PDF
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
PDF
Janice Singh - Writing Custom Nagios Plugins
PDF
Dave Williams - Nagios Log Server - Practical Experience
PDF
Matt Bruzek - Monitoring Your Public Cloud With Nagios
PDF
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
PDF
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
PDF
Nagios Log Server - Features
PDF
Nagios Network Analyzer - Features
ODP
Nagios Conference 2014 - Trevor McDonald - Monitoring The Physical World With...
ODP
Nagios Conference 2014 - Shamas Demoret - Getting Started With Nagios XI
PPTX
Nagios Conference 2014 - Abbas Haider Ali - Proactive Alerting and Intelligen...
PPTX
Nagios Conference 2014 - Sam Lansing - Utilizing Data Visualizations in Syste...
PPTX
Nagios Conference 2014 - Paloma Galan - Monitoring Financial Protocols With N...
Jesse Olson - Nagios Log Server Architecture Overview
Trevor McDonald - Nagios XI Under The Hood
Sean Falzon - Nagios - Resilient Notifications
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Janice Singh - Writing Custom Nagios Plugins
Dave Williams - Nagios Log Server - Practical Experience
Matt Bruzek - Monitoring Your Public Cloud With Nagios
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nagios Log Server - Features
Nagios Network Analyzer - Features
Nagios Conference 2014 - Trevor McDonald - Monitoring The Physical World With...
Nagios Conference 2014 - Shamas Demoret - Getting Started With Nagios XI
Nagios Conference 2014 - Abbas Haider Ali - Proactive Alerting and Intelligen...
Nagios Conference 2014 - Sam Lansing - Utilizing Data Visualizations in Syste...
Nagios Conference 2014 - Paloma Galan - Monitoring Financial Protocols With N...

Recently uploaded (20)

PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
PDF
SparkLabs Primer on Artificial Intelligence 2025
PDF
Chapter 2 Digital Image Fundamentals.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
This slide provides an overview Technology
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
CIFDAQ's Teaching Thursday: Moving Averages Made Simple
PDF
Doc9.....................................
PPTX
How Much Does It Cost to Build a Train Ticket App like Trenitalia in Italy.pptx
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
PDF
Dell Pro 14 Plus: Be better prepared for what’s coming
PDF
DevOps & Developer Experience Summer BBQ
PDF
REPORT: Heating appliances market in Poland 2024
PPTX
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
PDF
Smarter Business Operations Powered by IoT Remote Monitoring
PDF
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
PDF
madgavkar20181017ppt McKinsey Presentation.pdf
PDF
Enable Enterprise-Ready Security on IBM i Systems.pdf
PDF
agentic-ai-and-the-future-of-autonomous-systems.pdf
PDF
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
SparkLabs Primer on Artificial Intelligence 2025
Chapter 2 Digital Image Fundamentals.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
This slide provides an overview Technology
Understanding_Digital_Forensics_Presentation.pptx
CIFDAQ's Teaching Thursday: Moving Averages Made Simple
Doc9.....................................
How Much Does It Cost to Build a Train Ticket App like Trenitalia in Italy.pptx
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Dell Pro 14 Plus: Be better prepared for what’s coming
DevOps & Developer Experience Summer BBQ
REPORT: Heating appliances market in Poland 2024
Telecom Fraud Prevention Guide | Hyperlink InfoSystem
Smarter Business Operations Powered by IoT Remote Monitoring
CIFDAQ's Market Wrap: Ethereum Leads, Bitcoin Lags, Institutions Shift
madgavkar20181017ppt McKinsey Presentation.pdf
Enable Enterprise-Ready Security on IBM i Systems.pdf
agentic-ai-and-the-future-of-autonomous-systems.pdf
CIFDAQ's Token Spotlight: SKY - A Forgotten Giant's Comeback?

Nagios Conference 2011 - Jeff Sly - Case Study Nagios @ Nu Skin

  • 1. Jeff SlyPrincipal IT [email protected] Study Nagios @ Nu Skin
  • 2. Who is in the Audience?How many of you are:Suppliers of Nagios or some value add-on for Nagios?Customers using Nagios?Just implementing Nagios or expanding implementation?Using NagiosXI?
  • 3. Who is Nu Skin?
  • 4. Our Technology FootprintEcommerce – Home grown Applications – Java, EJB, ABAP, .NetDatabases – Oracle, MySQL, MSSQLOS – HPUX, Redhat, Windows, VMWareERP – SAP Supply Chain, CRM, FIDatacenters – 6 locations in 6 countriesOffices – 50 Countries
  • 5. Monitoring GoalsMonitoring presents operations with a completely integrated global view. Good monitoring is proactive; it helps teams prevent problems from becoming outages. Good monitoring helps minimize outage downtime, quickly identify root cause and contacts correct people.
  • 7. Our Monitoring HistoryWe tried for 10 years…
  • 8. Do it all in ‘One Tool Projects’One Monitoring Tool to rule them all:Mercury SiteScopeRemedy Help DeskHP OpenViewQuest FoglightHome grown (several)One monitoring personHe decided to quit!
  • 9. Could never get everythingAll Failed – We always gave up! Why?Servers and agents that were proprietaryHuge foot print inefficient performanceSteep learning curveVery expensiveUpdates costly and very time consumingSystem Administrators like their own scripts, can see what they are doing
  • 10. Resulting Monitoring IssuesTried to make Operations clearing house for all warnings and alerts from 10+ toolsOperations was overwhelmed Took 4 process steps and lots of software to notify of critical failuresMost Administrators setup own private monitoring to receive warningsMany false notificationsLate notifications
  • 11. As Is (start of project)Our Business Customers were Unhappy
  • 12. Old Monitoring Work FlowFour steps to notify system administrator
  • 13. Step 1: Everything Emails OperationsEmail HelpDeskErrorNetworkHP NNMSystemScriptsNagiosDatabaseFoglightSiteScope 8BACSitescope 6
  • 14. Step 2: Operations Opens EmailEmail HelpDeskErrorNetworkHP NNMSystemScriptsNagiosDatabaseFoglightSiteScope 8BACSitescope 6
  • 15. Step 3: Operations Checks SourceEmail HelpDeskErrorNetworkHP NNMSystemScriptsNagiosDatabaseFoglightSiteScope 8BACSitescope 6
  • 16. Step 4: Operations Calls adminEmail HelpDeskErrorNetworkHP NNMSystemScriptsNagiosDatabaseFoglightSiteScope 8BACSitescope 6
  • 17. Inventory of Existing ChecksRegular Expression found on Web Page Monitoring HTTP Check - Up or Down Ping Host Up or Down PORT monitoring FTP checking SMTP checking SNMP monitoring - no trap catching yet RadiusDNS monitoring Disk Space monitoring CPU and Load Average monitoringMemory Monitoring
  • 18. Inventory of Existing ChecksService monitoring Transaction monitoring - page load times – performance graph Website click through (Webinject not working) Log File monitor –parse for ErrorsJava HEAP, Thread, Threadlock monitoring Apache thread and worker count monitors Ecommerce shop monitorsEmail can send and receive SQL query ODBC (catalog ODBC had bugs)
  • 20. Key IdeasMoMTool RequirementsShared OwnershipLowest LevelNagios Monitor Method
  • 21. Idea 1: MoMOur first “break though” was the idea that even through we needed a centralized view for all monitoring that did not mean all monitoring had to be done by one monitoring tool. We had to pick a “Manager of the Monitors” (MoM) to bring together the best of breed monitoring.
  • 22. MoM - according to Gartner
  • 23. Idea 2: Tool RequirementsOpen – not proprietary and closedMainstream – wanted good native support and strong communityInterface – to 3rd Party MonitoringFlexible – adapt to many types of monitoringEfficient – minimal foot print on production servers, not chatty on networkNotification – granular controlReliable – good clean architectureUsability – GUI interface, reporting
  • 24. Idea 3: Shared OwnershipCore teamOperation of Monitoring Environment: backups, upgrades, & custom plug-insMonitoring ExpertsTrainingMonitoring leads in Development & Admin teams:Set up own monitorsKeep own monitors currentAdjust monitorsIf something is not monitored not core teams fault
  • 25. Operations Owned MonitoringEmail HelpDeskErrorNetworkHP NNMSystemScriptsNagiosDatabaseFoglightSiteScope 8BACSitescope 6
  • 26. Team Leads Own MonitoringOperationsNetworkAsiaSystemScriptsEuropeSAPDatabaseWeb
  • 29. How to Setup NRPE - HPUX
  • 30. Idea 4: Lowest Level Handle alerts at the lowest possible level in the organizationOnly forward alerts if not handled at lower levels before they become critical
  • 31. Handle events at lowest levelOperationsNetworkAsiaSystemScriptsEuropeSAPDatabaseWeb
  • 32. Only forward unhandled alertsNetworkAsiaSystemScriptsEuropeSAPDatabaseWeb
  • 33. Idea 5: Nagios Monitor MethodChoose the Nagios Monitoring MethodActive Check from Nagios Server (normal)Active Check performed by remote clientNRPE, NSClientPassive Check – Listen to 3rd party monitorsNSCA
  • 34. Active Local CheckWebHTTP or PingNagiosDBMonitorUnixDBWin
  • 35. Active Remote Check - UXWebNagiosCPU, RAM(NRPE)DBMonitorUnixDBWin
  • 36. Active Remote Check - WinWebNagiosDBMonitorCPU, RAM(NSClient)UnixDBWin
  • 37. Passive 3rd Party AlertWebNagiosDBMonitor3rd Party Alert NSCAUnixDBWin3rd Party Check DB
  • 38. Bonus Idea - TuneTune the databaseAdd Ram Drive
  • 39. Tune the Database Modify contents of the /etc/my.cnf [mysqld] section.tmp_table_size=524288000max_heap_table_size=524288000table_cache=768set-variable=max_connections=100wait_timeout=7800query_cache_size = 12582912query_cache_limit=80000thread_cache_size = 4join_buffer_size = 128Khttp://web3us.comInfo on: MySQL Tuning, Nagios Tuning
  • 40. RAM DriveCreate a RAM disk for Nagios tempory filesI created a ramdisk by adding the following entry to the /etc/fstab file: none                  /mnt/ram               tmpfs   size=500M           0 0Mount the disk using the following commands # mkdir -p /mnt/ram; mount /mnt/ramVerify the disk was mounted and created # df -kModify the /usr/local/nagios/etc/nagios.cfg file with the following tuned parameterstemp_file=/mnt/ram/nagios.tmptemp_path=/mnt/ramstatus_file=/mnt/ram/status.datprecached_object_file=/mnt/ram/objects.precacheobject_cache_file=/mnt/ram/objects.cache
  • 41. Implementation MethodologySite SurveyInventory existing monitorsProof of conceptBuild new environmentMigrate monitors from each platform to Nagios, one at a timeIntegrate OEM, and to send monitors to Nagios
  • 42. Three Project PhasesDeliver something useful in each phaseBuild a level at a time
  • 43. Phase ISet up a pilot of Nagios XI using Trial License. Set up Foglight monitoring of JVM (Java Virtual Machine). Purchase NagiosXI and Consulting SupportBring in a consultant for two weeks to help set up the architecture and help us work with the system. Documentation Web Site for Nagios learning's and “How to guides”Define a set of standards and guidelines to follow to help aid an effective monitoring process.Backups on Running on Production Nagios ServerSet up services which aren't being caught right now and move a few of the important services over to the new Nagios XI monitoring system. Test Nagios plugins and server performance
  • 44. Phase IIMigrate off of Sitescope 6 and shutdownMigrate off of Sitescope 8 and shutdownDecommission FoglightClean up the old monitoring serverMigrate the network team from old Nagios to core NagiosXI systemSet up standby NagiosXI system, cron to replicate weeklyResearch missing alerts and add them to the new NagiosXI system
  • 45. Phase IIIImplement Global MonitoringAdd monitors for existing international systems Add monitors using JMX to monitor Java serversNagios Remote Process Execution (NRPE) to monitor remotelyRemote Monitoring for Windows Servers (NS Client++)Implement notification and escalation of alertsAdd monitors for critical business functions
  • 46. Phase III continued…Corporate EnhancementsRequest recurring down time enhancement from Ethan GalstadAutomate refresh of NagiosXI standby systemBuild Network MapRetire Windows SiteScopeAdd monitors for phone systems Add monitors to data center (UPS, Temperature, Humidity)Integrate to SAP Tidal monitoring tool
  • 47. Phase III continued…BusinessBusiness review and approve SLA (using business terms)Monitor both the Business Functions and the individual point devices that provide the Business FunctionFollow the Sun with Eyes on Glass.Training How to setup alertsHow to receive alertsHow to report on performance graphsCreate a new Dashboard for HelpDesk and International IT Staff
  • 53. Data Centers in 7 Countries
  • 57. SummaryMoM ~ Manager of ManagersAllow specialized toolsTool Requirements, enough but not allOwnership for implementation, sharedHandle alerts, lowest level in organizationChoose Nagios monitoring method
  • 58. Tips, Tricks & DemosNagios XI Large Implementation Day 3, 2:00 Track 3 (Nate Broderick)3 DemosPerformance challenges and solutionsIntegrating monitoring solutions OracleMigrating from BAC & FoglightCustomizationGraphing, and more.

Editor's Notes

  • #2: Isn’t Great to Be Here at the 1st Nagios World Conference, Ethan has done a great job with Nagios!
  • #3: I would like to introduce some of our Nu Skin folks here that work on Nagios.Nate Broderick Nagios Systems EngineerScott McWhorter Production Support
  • #4: Direct Selling - Lotions and Potions (or supplements), also Nourish the children
  • #8: We tried for 10 years…
  • #14: Everything = Critical Errors, Warnings, Information, False alerts
  • #15: Remember there are lots of emails, critical emails buried in with the rest.
  • #16: Decide if they think this is a critical problem, often a junior person trying to decided
  • #19: Later I show which of these we did in Nagios and how.