0% found this document useful (0 votes)
15 views

Service Catalog Runbook

This document provides an overview of a service catalog portal and outlines procedures for incident response and escalation. The portal allows users to view all applications and licenses across teams and create/update requests. Key points: - The portal integrates with ClearSky APIs, Workspace One, and People Search to fetch and sync application data. - Impact of an outage would be that users cannot view application details but can still query OpsGps. The application is classified as non-critical. - Contact lists and escalation procedures are defined for various severity levels, with primary and secondary on-calls identified. - Monitoring details for the Kubernetes-hosted service catalog deployment are listed to ensure proper alerting is set

Uploaded by

wizardmohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Service Catalog Runbook

This document provides an overview of a service catalog portal and outlines procedures for incident response and escalation. The portal allows users to view all applications and licenses across teams and create/update requests. Key points: - The portal integrates with ClearSky APIs, Workspace One, and People Search to fetch and sync application data. - Impact of an outage would be that users cannot view application details but can still query OpsGps. The application is classified as non-critical. - Contact lists and escalation procedures are defined for various severity levels, with primary and secondary on-calls identified. - Monitoring details for the Kubernetes-hosted service catalog deployment are listed to ensure proper alerting is set

Uploaded by

wizardmohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Runbook

Service Catalog Runbook

Dated: Jan 22nd 2023

vC3 Support Runbook Template Page


-1-
Table of Contents

Table of Contents..................................................................................................................2
1. Overview of Product.......................................................................................................3
2. Business Impact............................................................................................................3
3. Teams affected..............................................................................................................3
4. Severity Definitions........................................................................................................3
5. Escalations....................................................................................................................4
6. Related/Dependent Applications & documents..................................................................4
7. Scope of Support...........................................................................................................4
8. Outage Notification Contact Plan.....................................................................................4
9. Architecture of System...................................................................................................5
10. High Level Technical Design........................................................................................5
11. Host lists...................................................................................................................5
12. System Monitoring......................................................................................................5
13. Troubleshooting.........................................................................................................7
13.1 Trouble Shooting Flowchart.....................................................................................7
13.2 Important Processes and Services............................................................................7
14. Tier-1 Procedure........................................................................................................7
15. Signoff.....................................................................................................................10

vC3 Support Runbook Template Page


-2-
1. Overview of Product

In this section, provide an overview of the entire application/feature and its


purpose:

 Service catalog portal will provide a holistic view of all applications and licenses held
by different teams. Users can update application attributes and can also create
requests for new applications, data fetching and updating is done using Clear-sky
APIs. The data is updated every 24 hours from Clear-sky APIs and stored in our
application's elastic search instance. The requests raised by users for updating any
attributes to an application or assigning users to the applications will be stored in a
mysql instance.
 This will result in cost savings on duplicate subscriptions/apps to be retired or
bought.

Business Impact

Business impact will be that users will be unable to view Service details but it can still be
queried at OpsGps portal. This application majorly is concerned about providing an overview
of all applications within vmware and their related request workflow and this is classified as
non-critical to impact Vmware core business. Impact Category( P3, P4)

2. Teams affected

There will be a small team of service admins that will be unable to raise new
requests or process existing requests related to certain applications.

3. Severity Definitions

Please Note: vC3 IM is the sole owners for severity decision during the outage, please refer
to the attached document outlining the Incident Prioritization Matrix.
Incident Prioritization Matrix

This is the Priority from the app's prospective.The final Priority will be decided by the
Incident manager as there is no direct business impact (not a mission critical app)

Severity Definitions

vC3 Support Runbook Template Page


-3-
Severity 1
Critical issues affecting large numbers of people (e.g. The application
is not launching and users are not able to log in) - Down

Outage to the elasticsearch instance.


Outage to auth-server
Outage to the Kubernetes cluster

Severity 2
Issues affecting a certain aspect of the application(eg : sometimes an
event reservation is not made or the email for activity is not sent out
)- Intermittent

Outage to ClearSky APIs which could lead to data sync issues


Outage to people search service

Severity 3
Issues affecting performance or issues completely halting a single
person's productivity - Slowness
App is taking very long to process a request

4. Escalations

Please provide the names or groups responsible for escalated issues:

Severity Primary (on-call) Secondary Management


Severity 1 Arambh Gaur SK Krithivasan N.Sreekanth Indireddy

E. Arambh Gaur E. SK Krithivasan E. Sreekanth Indireddy

vC3 Support Runbook Template Page


-4-
C. Arambh Gaur C. SK Krithivasan C. Sreekanth Indireddy

N. Arambh Gaur N. SK Krithivasan N. Sreekanth Indireddy

Severity 2 E. Arambh Gaur E. SK Krithivasan E. Sreekanth Indireddy

C. Arambh Gaur C. SK Krithivasan C. Sreekanth Indireddy

N. Arambh Gaur N. SK Krithivasan N. Sreekanth Indireddy

Severity 3 E. Arambh Gaur E. SK Krithivasan E. Sreekanth Indireddy

C. Arambh Gaur C. SK Krithivasan C. Sreekanth Indireddy

Please provide escalation rules here:


 Under what circumstances should the vC3 escalate an issue ? P1
 What hours should the vC3 escalate issues? IST working hours
 If no response from the on-call personnel, how long should the vC3 wait to
escalate to the next level? 2 hrs
 Since the application is not a business critical application an SLA of 30 minutes
– 2 hrs can be allowed. Hence primary app owners need not be put on pager
duty. Primary contacts can be contacted by email / phone once the vcc team is
engaged in case of an alert.
 In case of app outage the alternate mean for continuing with the business is by
viewing application details and raising requests on OpsGps portal which is the
existing procedure.

5. Related/Dependent Applications & documents


Our system is integrated with Workspace One, ClearSky APIs and People Search.

6. Scope of Support

In this section, provide the expected level of support:


 Is this considered to be a mission critical system? No
 Should 24x7 monitoring and support be provided by vC3 and/or the vendor?
Monitoring should be enabled on the provided endpoints
 May vC3 may become aware of issue via end user contact and send Email and
phone notfications to/from the vendor? Yes

7. Outage Notification Contact Plan

* Enter mail lists to be used to notify end users during an outage or scheduled
maintenance(if any special DL needed)
[email protected] , [email protected] , [email protected]

vC3 Support Runbook Template Page


-5-
8. Architecture of System

9. High Level Technical Design

vC3 Support Runbook Template Page


-6-
10. Host lists

List of production hosts pertaining to the application class

Hostname VM / ESX or Data Center IP Support Contact


Native or Location
Storage
Component
https://service- Arambh Gaur
catalog- ([email protected])
prod.tkg.vmware.com SK Krithivasan
/ ([email protected] )

11. System Monitoring

Please Note: It is very important to input the information immediately below to ensure that the
monitoring is adequately set up for the system.

Monitoring Information Prerequisites.


Application Name service-catalog-prod
Host Name https://service-catalog-
dev.tkg.vmware.com
IP Address
Component OS Version Windows/LINUX
Details TKG:
Data Center
SCDC2 (Active):
cluster: " emerging-tech-prd-context "
namespace: "emerging-tech-prd”
Threshold Alert for CPU Usage for
Values Windows hosts (0-100)%
Alert for Load on Linux 80%
Hosts (0 - 100%)
Alert for Swap usage ( 0- 80%
100%)
Alert for Memory usage ( 0- 80%
100%)
Community SNMP

vC3 Support Runbook Template Page


-7-
String WMI
Monitoring FileSystem
Details

Alert for Filesystem


Thresholds (0 -100%)
Ports service-catalog-prod: 8080
Processes Name ( linux
hosts)
Services Name ( Windows
hosts)
URL's
Advance Monitoring No
Alert On Restart No
Communication Email Notification emerging-technologies-
[email protected]

12.

Monitoring
Information
Prerequisites.
Application Name ElasticSearch
Host Name elk-prod-vip.vmware.com
IP Address 10.113.166.246,10.113.166.249
Component OS Version Windows/LINUX Linux
Details
Data Center
Threshold Alert for CPU Usage for Windows
Values hosts (0-100)%
Alert for Load on Linux Hosts (0 - 80%
100%)
Alert for Swap usage ( 0- 100%) 80%
Alert for Memory usage ( 0-100%) 80%
Community SNMP
String
WMI
Monitoring FileSystem

vC3 Support Runbook Template Page


-8-
Details
Alert for Filesystem Thresholds (0 -
100%)
Ports
Processes Name ( linux hosts) Elasticsearch instance needs to be
monitored.
Services Name ( Windows hosts)
URL's
Advance Monitoring No
Alert On Restart Yes
Communication Email Notification emerging-technologies-
[email protected]

13. Troubleshooting

The troubleshooting procedures will be handled entirely by the application owners so


no L1 task will be required from vcc team. The following troubleshooting steps are for
reference.

1.1 Important Processes and Services


List names of important processes and services and outline steps to restart the process.

Process / Service Name service-catalog-prod


Brief overview of process Spring boot Microservice that is the face
of application, that has UI module, SSO
module and the core API module
integrated with it along with some basic
user state management functions
Run the below command in any linux
Steps to restart the process
system that contains the kubeconfig file
for emerging-tech-prd-context cluster.

Kubectl kubeconfig=./{kubeconfig_file}
scale deployment service-catalog-prod –
replicas=0

Kubectl kubeconfig=./{kubeconfig_file}

vC3 Support Runbook Template Page


-9-
scale deployment service-catalog-prod –
replicas=2

14. Tier-1 Procedure

Please denote Critical scenarios in red.


Monitoring Alerts

Dependencies
Database Mount
Components Nodes Disk Solr Repo Cloud Horizon
(Custom DB) Point
Space Foundry

Backup schedule
Dates Times Systems Impacted

Application Escalation
Escalation Name Cell Phone Home Alternate
Path Phone Number
Primary Arambh +919986496120
Gaur
Secondary SK +1 (650)
Krithivasa 4272456
n
Tertiary
Vendor

Infrastructure Escalation:
Escalation Name Cell Home Alternate
Path Phone Phone Number
Primary it-vc3-portaladmin <it-vc3-

vC3 Support Runbook Template Page


- 10 -
[email protected]>
Secondary it-portaladmin <it-
[email protected]>
Tertiary

DB Escalation:
Escalation Name Cell Home Alternate
Path Phone Phone Number
Primary it-elk-
[email protected]
Secondary

Network Escalation:
Escalation Path Name Cell Phone Home Phone Alternate Number

Primary
Secondary
Tertiary

Sr. Management Escalations:


Escalation Path Name Cell Phone Home Phone Alternate Number

Primary
Secondary
Tertiary

15. Signoff

Sign off will be will be an agreement of the support will be between Service/Application
Owner and vC3 IM Manager.

vC3 Support Runbook Template Page


- 11 -
vC3 Support Runbook Template Page
- 12 -

You might also like