Service Catalog Runbook
Service Catalog Runbook
Table of Contents..................................................................................................................2
1. Overview of Product.......................................................................................................3
2. Business Impact............................................................................................................3
3. Teams affected..............................................................................................................3
4. Severity Definitions........................................................................................................3
5. Escalations....................................................................................................................4
6. Related/Dependent Applications & documents..................................................................4
7. Scope of Support...........................................................................................................4
8. Outage Notification Contact Plan.....................................................................................4
9. Architecture of System...................................................................................................5
10. High Level Technical Design........................................................................................5
11. Host lists...................................................................................................................5
12. System Monitoring......................................................................................................5
13. Troubleshooting.........................................................................................................7
13.1 Trouble Shooting Flowchart.....................................................................................7
13.2 Important Processes and Services............................................................................7
14. Tier-1 Procedure........................................................................................................7
15. Signoff.....................................................................................................................10
Service catalog portal will provide a holistic view of all applications and licenses held
by different teams. Users can update application attributes and can also create
requests for new applications, data fetching and updating is done using Clear-sky
APIs. The data is updated every 24 hours from Clear-sky APIs and stored in our
application's elastic search instance. The requests raised by users for updating any
attributes to an application or assigning users to the applications will be stored in a
mysql instance.
This will result in cost savings on duplicate subscriptions/apps to be retired or
bought.
Business Impact
Business impact will be that users will be unable to view Service details but it can still be
queried at OpsGps portal. This application majorly is concerned about providing an overview
of all applications within vmware and their related request workflow and this is classified as
non-critical to impact Vmware core business. Impact Category( P3, P4)
2. Teams affected
There will be a small team of service admins that will be unable to raise new
requests or process existing requests related to certain applications.
3. Severity Definitions
Please Note: vC3 IM is the sole owners for severity decision during the outage, please refer
to the attached document outlining the Incident Prioritization Matrix.
Incident Prioritization Matrix
This is the Priority from the app's prospective.The final Priority will be decided by the
Incident manager as there is no direct business impact (not a mission critical app)
Severity Definitions
Severity 2
Issues affecting a certain aspect of the application(eg : sometimes an
event reservation is not made or the email for activity is not sent out
)- Intermittent
Severity 3
Issues affecting performance or issues completely halting a single
person's productivity - Slowness
App is taking very long to process a request
4. Escalations
6. Scope of Support
* Enter mail lists to be used to notify end users during an outage or scheduled
maintenance(if any special DL needed)
[email protected] , [email protected] , [email protected]
Please Note: It is very important to input the information immediately below to ensure that the
monitoring is adequately set up for the system.
12.
Monitoring
Information
Prerequisites.
Application Name ElasticSearch
Host Name elk-prod-vip.vmware.com
IP Address 10.113.166.246,10.113.166.249
Component OS Version Windows/LINUX Linux
Details
Data Center
Threshold Alert for CPU Usage for Windows
Values hosts (0-100)%
Alert for Load on Linux Hosts (0 - 80%
100%)
Alert for Swap usage ( 0- 100%) 80%
Alert for Memory usage ( 0-100%) 80%
Community SNMP
String
WMI
Monitoring FileSystem
13. Troubleshooting
Kubectl kubeconfig=./{kubeconfig_file}
scale deployment service-catalog-prod –
replicas=0
Kubectl kubeconfig=./{kubeconfig_file}
Dependencies
Database Mount
Components Nodes Disk Solr Repo Cloud Horizon
(Custom DB) Point
Space Foundry
Backup schedule
Dates Times Systems Impacted
Application Escalation
Escalation Name Cell Phone Home Alternate
Path Phone Number
Primary Arambh +919986496120
Gaur
Secondary SK +1 (650)
Krithivasa 4272456
n
Tertiary
Vendor
Infrastructure Escalation:
Escalation Name Cell Home Alternate
Path Phone Phone Number
Primary it-vc3-portaladmin <it-vc3-
DB Escalation:
Escalation Name Cell Home Alternate
Path Phone Phone Number
Primary it-elk-
[email protected]
Secondary
Network Escalation:
Escalation Path Name Cell Phone Home Phone Alternate Number
Primary
Secondary
Tertiary
Primary
Secondary
Tertiary
15. Signoff
Sign off will be will be an agreement of the support will be between Service/Application
Owner and vC3 IM Manager.