100% found this document useful (11 votes)

3K views

A Seminar Report On Devops: Bachelor of Technology in Electronics & Communication Engineering

This document is a seminar report on DevOps submitted by N.Veera shivaji to partially fulfill the requirements for a Bachelor of Technology degree in Electronics and Communication Engineering. It includes an acknowledgment, abstract, table of contents, and the beginning of the first section titled "What is DevOps?". The report provides an overview of DevOps practices and how they are implemented at Atlassian.

Uploaded by

Shivaji Nayineni

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (11 votes)

3K views

A Seminar Report On Devops: Bachelor of Technology in Electronics & Communication Engineering

Uploaded by

Shivaji Nayineni

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

A Seminar Report On

DevOps

Submitted in the partial fulfillment of the requirements for the award of the degree of

BACHELOR OF TECHNOLOGY
IN
ELECTRONICS & COMMUNICATION ENGINEERING

SUBMITTED BY
N.Veera shivaji - 15K81A0497
Under the esteemed guidance
Of
Mr.B.RAVI CHANDER
ASSISTANT PROFESSOR
DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING
St.MARTIN’S ENGINEERING COLLEGE
Non-Minority College,Affiliated to JNTUH, Approved By AICTE
NBA Accredited,ISO 9001:2008

2015-2019
ACKNOWLEDGEMENT

I express my thanks and gratitude to RAJIV SRIVASTAVA(Ph.D),HEAD OF THE

DEPARTMENT OF ELECTRONICS & COMMUNICATION ENGINEERING,
St.MARTIN’S ENGINEERING COLLEGE for his encouraging and guidance in carrying out
the seminar.I would like to express our gratitude and indebtedness to Mr.B.RAVI
CHANDER,my coordinator, for his valuable advice and guidance.

I owe my sincere gratitude to our principal Dr.P.SANTOSH KUMAR PATRA and also
to our College Management Committee Members for giving the encouragement that helped me.

I here thank one and all who extended a helping hand in the accomplishment of the
project.

N.VEERA SHIVAJI
15K81A0497
abstract
DevOps is a culture which promotes collaboration between Development and Operations
Team to deploy code to production faster in an automated & repeatable way. The word
'DevOps' is a combination of two words 'development' and 'operations.'

DevOps helps to increases an organization's speed to deliver applications and services. It

allows organizations to serve their customers better and compete more strongly in the market.

In simple words, DevOps can be defined as an alignment of development and IT operations

with better communication and collaboration.

i
Contents
What is DevOps? 01

DevOps and Atlassian 05

Building Products, DevOps Style 07

Continuous Delivery for Infrastructure 15

Handling Incidents at Atlassian 23

Being Proactive and Staying Ahead of the Game 29

Are You Ready for DevOps? 35

01
What is DevOps?
Five years ago, Marc Andreesen proclaimed that
software is eating the world. After all, what company
isn’t a software company? Case in point:

MODERN CARS CONTAIN HUNDREDS OF MILLIONS OF LINES OF CODE

Far more than all of Facebook, from Zuckerberg’s dorm

years to today.

PIZZA DELIVERY HAS GONE HIGH TECH

With advanced mobile applications for placing orders

and tracking deliveries, Dominos Pizza has increased
its IT workforce by 240%.

NIKE IS TURNING FOOTWEAR INTO A FULLY CONNECTED PLATFORM

Nike is turning footwear into a fully connected

platform by integrating shoes with lifestyle and
fitness applications.

8,000+ new Nike is turning UPS plans 240% increase Player statistics
developers the shoe into routes with in IT workforce and analytics
hired in the a connected real-time for Domino’s platform added
past 2 years platform traffic data Pizza on NHL.com
2

Old-school development models just don’t hold up to such high-demand, high-growth

environments. Traditionally, Development and Operations teams work separately in silos,
Companies that hindering the ability to move fast. The response to this contentious relationship was
practice DevOps
a movement called DevOps. It’s a fancy phrase for a simple idea: your dev and ops teams
enjoy:
work better together. It advocates for better communication and collaboration so that
developing, testing, releasing, and running software can happen more rapidly and reliably.
· 200x more frequent Instead of delivering big, infrequent releases (once every 3 to 9 months) like traditional
deployments development teams at major enterprises, DevOps takes a “continuous delivery” approach.
· 2555x faster lead times
This means releasing small, incremental improvements regularly—often even several times
· 24x faster recovery
· 3x lower change per day. The results are enormous, and go far beyond the operational.
failure rates
These results aren’t limited to major enterprises with billion-dollar dev teams, either. You can
Puppet Labs 2016
achieve them yourself, no matter how small your team is. The #1 success factor is teamwork.
State of DevOps Report
At Atlassian, the key to faster, higher quality releases is a strong relationship between our dev
and ops teams, and the right tools and processes in place to support them. So what does that
look like at Atlassian, and how did we get started?

High-performing organizations are decisively

outperforming their lower-performing peers
in terms of throughput and improving quality
is everyone’s job.
Puppet Labs 2016 State of DevOps Report
3
4

PRO TIP

The #1
success factor
is teamwork.
At Atlassian, the key to faster, higher quality releases
is a strong relationship between our dev and ops teams,
and the right tools and processes in place to support them.
5

02
DevOps and Atlassian
At some major big box retailers, really heavy items
have “team lift” stickers on them to indicate when
several employees need to help move the items from
shelf to shopping cart. “Team lift” is actually a perfect
analogy for the entire DevOps methodology, since
DevOps isn’t any single person’s job—it’s everyone’s job.

At Atlassian, we use our own products to understand

our various use cases, and provide additional testing
before we release them to our customers. In short,
we dogfood our own products.

In this ebook, we’ll cover each step in detail, and exactly

how we use each Atlassian solution. For now, let’s start
with our process, which looks a bit like a hot tasty pretzel.
6

First, we plan the features we will deliver to our customers. We use Confluence and JIRA
Software to organize customer feedback and list requirements. We create issues in JIRA
Software to start tracking the stories and epics we define for each software project.

Then, we build the software—writing code and running tests until we get it right. Bitbucket
lets us create branches for each new feature we need to create, and it also allows us to code
more collaboratively, since we can use pull requests to facilitate faster reviews, and comment
inline and hold conversations between our developers right within the code.

We continuously integrate new features back into a master branch for deployment. Bamboo
makes this easier, helping us automate builds, tests, and releases along the way. It really
speeds up deploying to AWS, too—we use Docker and Bamboo together for even faster,
more efficient deployment.

JIRA Software’s release hub also gives us full visibility across all our branches, builds, pull
requests, and deployment warnings, so we can release with confidence.

Once we’ve deployed a new feature into production, it’s time to run and operate it. At Atlassian,
our developers are fully responsible for the features they build, so using JIRA Service Desk helps
them track and resolve incidents faster. We use Confluence to manage run books, knowledge
base articles, and related documentation at every step.

We deliver continuous feedback (via reports, tickets, etc.) to our development teams, so they
can plan new releases, fix bugs, and deliver faster, more reliable software to our customers.
With JIRA Service Desk, we can even request customer feedback from both internal and
external users.

Throughout the entire lifecycle, HipChat is the secret salty coating to our pretzel. It adds
an additional layer of collaboration on top of our already collaborative processes and technology
by letting our teams swarm on incidents, wherever they are, via desktop, mobile apps,
and even wearables.

That’s just the basics, though, and you came here for details. So let’s dive in.
7

03
Building Products,
DevOps Style
Let’s say your engineering team has gone Agile. They
work in sprints, collaborate, and are building a lot of
great features. But there’s just one catch: you still have
to wait for the release train to leave the station, and
customers aren’t getting value fast enough.

We’ll show you our best practices for building products,

DevOps style. Let’s start with feedback; because no
matter the product, your success is solely based on
your users.

HOW TO GATHER FEEDBACK—AND USE IT TO SHAPE AND BUILD FEATURES

We’ve learned over the years that the easiest way

to make our product better is to listen to the people
that use it. Thousands of companies use HipChat,
and thousands of Atlassian use it internally, too.
You can collect feedback from just about every
source imaginable.
8

Ask for in-product feedback Use Apdex scores to monitor whether

our users are satisfied with HipChat’s
Collect user feedback from response times
JIRA Service Desk
Gather monitoring data from third party
Monitor social media channels solutions like Datadog and New Relic
like Twitter and Facebook

What do we do with all that feedback? Here’s what we do with it. Keep in mind: this may or may
not work for your team, but is nonetheless a useful starting framework that you can tweak.

We send all feedback to HipChat as notifications. For example, we get a ton of tweets:

We route them, along with all our other social media

hey @hipchat, any news mentions, bug reports, etc. into dedicated HipChat rooms
where the whole team can discuss each notification and
about deeper JIRA help shape our backlog.
integrations? issue links! Important feedback, like bugs, is then converted into
Eric Wood @ejwood79 a JIRA Software ticket—which we then prioritize into the
backlog. If there’s a new feature, we’ll typically create a
Confluence page to spec out goals and requirements.

In either case, we make sure to always listen to our customer feedback, wherever they are,
and take action when possible.

PLAN TOGETHER IN SPRINTS

So, how exactly do we plan what we’re going to build? Our small development teams regroup
and meet for an hour every week. We use the hour to:

Demo everything that was built in the previous week to keep the team informed and connected.

Review the objectives and sprint goals we established the previous week and agree on whether we
achieved them.

Define our objectives for our next sprints. At Atlassian, a sprint objective isn’t the same thing as
a ticket. A sprint objective is a unit of work that you have to be able to demo to the team, or ship
to production at the end of the sprint.

After the meeting, we break out. With our new objectives in hand, our developers can go through
all the issues in our backlog and pick out the ones that will help us achieve the sprint objectives we
took on during the meeting.
9

The end result is complete buy-in from the team. Everyone is fully involved in defining our goals,
how we are going to achieve them, and how we are dividing the work.

SPIKE EARLY AND OFTEN

You’re probably familiar with the term “spike” in agile development. A spike is a short effort to gather
information, validate ideas, identify early obstacles, and guesstimate the size of initiatives. Instead
of building a shippable product, we focus on end-to-end prototyping, to arm us with the knowledge
we need to get the job done right.

At the end of each spike, we have a better idea of the size and technical obstacles we will encounter
for each initiative, and we categorize them: Extra Small, Small, Medium, Large, Extra Large, or Godzilla.

We regularly rotate between normal sprints and spikes, and hold regular “innovation weeks” that result
in really amazing prototypes and insights around project scope and approach. Most teams at Atlassian
hold innovation weeks, too, and they love to write about them.

KEEP EVEN THE BIGGEST CHANGES SMALL

Instead of shipping big things infrequently, ship small changes very often. It makes it very easy to roll
back a particular change if we need to, or even better: fix and roll forward, and it helps us iterate fast.

For really big changes—like highly anticipated new features, for example—we still take a “start small”
approach, setting “step by step” goals and running frequent A/B tests and experiments to see what
our users like best.
10

Instead of shipping big things infrequently,

ship small changes very often.

To test, we divide our users into cohorts. For example, cohort A might see one version of a product
feature, and cohort B might see a slightly different version. We look at the usage data to see which
version of the feature is performing best against the goals we defined during planning—and we
keep iterating and testing until we get to the best version of that feature.

A tool we use during these testing phases is Launch Darkly, which lets us release new features
to small segments of users, gather feedback, and then gradually increase the audience size until
we’ve fully deployed. We often start with just 5% of users running the new feature—and then
slowly increase by 10 or 15 percent increments after each feedback and revision cycle.

37%
23%

A B

CONTROL VARIATION

GIT + BITBUCKET + BAMBOO = AWESOME AUTOMATION

We’re heavy users of Git and Bitbucket, using feature branches to make continuous integration far
more effective. Any feature however small, translates into a feature branch, which is automatically
tested via our Bamboo builds.

After we test a feature branch, we create a pull request to merge it back to the master branch,
and we select a minimum of two reviewers from our team to review and verify the code. Once you
get a green build and 2 approvals, you’re good to go.

Since our master branch is what gets shipped to production, we require that the master be
“green”—no known bugs, issues, or errors—at all times. If a build goes “red,” that means all hands
on deck, and the entire team has to drop everything to fix the build.
11

FEATURE 2

MASTER

FEATURE 3

FEATURE 1

ENCOURAGE ACCOUNTABILITY
A big difference between our team and many other DevOps teams is our ownership model. We’re big
on “you build it, you ship it, you run it”, meaning the team that is responsible for writing a feature also
becomes the team responsible for deploying it and providing ongoing maintenance once it’s live.

But isn’t that going to introduce a lot of issues in production? In fact it’s quite the contrary: It encourages
every developer to build the very best version of something, and gives each of us a vested interest in
its ongoing success.

What this leads to is 100+ developers being able to ship to production at any point in time. This is made
possible with the right process and especially the right tools. We use Chef and Puppet for automation,
and developed a number of Chat Apps (HipChat add-ons) to help us coordinate this process.

Finally, accountability for us also means keeping our users informed of what’s going on. Occasionally,
bad stuff happens, and glitches have the potential to impact all of our users. We love StatusPage.io
for keeping everyone up to date on the status of all of our services.
12
13
14

PRO TIP

DevOps
isn’t any single
person’s job.
It’s everyone’s job.
15

04
Continuous Delivery
for Infrastructure
It’s not just development teams that can use DevOps
practices. You can apply the same practices to your
hardware and configuration work too. At Atlassian,
we’ve built a team of a dozen employees (called Build
Engineers) that are dedicated to helping our developers
code faster, by giving them the best hardware and
infrastructure services possible. We oversee our
continuous integration service (Bamboo), our artifact
storage and retrieval service (Sonotype Nexus), and all
the hardware, server configurations, applications,
and services that glue them together and provide
a smooth experience to our dev teams.

STABLE AND ARTIFACT STORAGE ASSOCIATED

FAST CI AND RETRIEVAL TOOLING

Let’s take a deeper dive into the technology and

processes we depend on, and a few top tips for running
a Build Engineering team more efficiently and effectively.
16

GATHER FEEDBACK FROM DEVELOPERS

Our customers are Atlassian’s developers. We use JIRA Service Desk to create our own
engineering service desk, and that’s how the developers submit requests and feedback.

“WALK THE BOARD” DURING STANDUPS

Each morning, we have standups just like most software dev teams, where we go through
all the issues in flight using our Kanban board in JIRA Software. Each issue is categorized as:

TO DO REVIEW

READY MERGE

IN PROGRESS ROLLOUT

We set a maximum threshold for the number of issues that can be in each status column.
Below, you’ll see a few columns that have “gone red” because we’ve exceeded our defined
thresholds. This helps us determine in our standup that we need to finish the work in that
column before we pick up anything new.

PULL REQUESTS: SWARMS, APPROVALS AND KEEPING THINGS GREEN

We create branches for any hardware or configuration change, no matter how small, exactly
the same way that our developers do. Every single pull request is linked to a JIRA issue, and we
manage the pull requests in Bitbucket, requiring two approvals from our colleagues (plus a green
feature branch build) to move forward.
17

Our team also has a HipChat room where we wrote a bot to keep track of all our pull requests.
It shows all open pull requests, and how close they are to being merged. We leave it up to the
team to swarm over the pull requests and jump in and provide feedback for the ones they feel
most qualified to review. Everyone pitches in and works really well to move us through the pipeline
faster and knock out our in-process work. So JIRA Software, JIRA Service Desk, Bitbucket, and
HipChat are a big part of our day-to-day operations.

FAVORITE PIPELINE TOOLS

You might be wondering what tools to use for handling software, configuration, and hardware
deployments. Here are a few of our favorites:

SOFTWARE PIPELINE
Just like our software development team, we use Bamboo on the infrastructure side, to
manage and run our build plans and deployments. We use Bamboo to manage Puppet, where
we write new modules to install and configure components on our servers, like a model to
install the SSH keys from everyone on our team.

Vagrant lets us spin up test servers easily, which we apply Puppet configurations to for testing
purposes. Puppet and Vagrant integrate really well, and the combination makes it really easy
to test new AWS server configurations automatically.

Cucumber is great for testing, too. We use it to confirm that our agents are installed properly,
and that the changes we have made haven’t broken anything.

Once we’re finished testing a configuration or change, we deploy our new Puppet tree out
to production, and HipChat will automatically post a notification to the issue assignee to verify
that the change is working in production, and to also close the issue in JIRA.

As always, Bamboo shows the status of the build, and the details of each release, like which
environments it’s been deployed to, and which JIRA issues are addressed in each build
and release.
18

HARDWARE PIPELINE
Bamboo manages everything in our hardware pipeline as well, from start to finish. Since we
make quite extensive use of Amazon Web Services (AWS), we use Terraform to manage our
hardware infrastructure. We love it because it allows us to use software best practices and
workflows to make changes to our hardware.

For example: Changes we request to our hardware infrastructure through Terraform have to
be verified through pull requests, and deployed through a continuous delivery pipeline—the
same process our software developers have to follow for their work. This keeps us consistent
about how we manage quality across the board.

Here’s a quick example of what Terraform code looks like, just in case you’re curious:

Here, we’re basically setting up a new NAT server on AWS. We use code to set all the
parameters, like subnet, etc. We can feed an entire hardware configuration into Terraform,
and it will figure out all the API calls it needs to make to AWS to change our server topography
from its current state to what is specified by the code. Then, we can ask Terraform to execute
the plan and make those changes. It’s magical.

We track all of these releases with Bamboo, just like we do our software. Bamboo deploys
each Terraform release into our staging environment first, and then our production environment
once we’re ready. Bamboo is also used to see which releases have been deployed across what
environments.
19

THREE CORE CONCEPTS TO REMEMBER

Nothing changed the game more for us than the idea of “infrastructure as code.” It’s allowed us
to adopt software development’s best practices, but apply them to hardware and configuration
management, and it’s greatly improved the stability of our platform. Doubling the number of
servers dedicated to running Bamboo at Atlassian was pretty much the same amount of work
as just adding one would have been in a less efficient model.

Our team follows three basic principles that pretty much any engineering team can adopt:

Automate everything
It’s critical that our builds work. If we don’t test them thoroughly, we can’t be confident they will
work. Automated testing helps prevent regressions, gives us confidence in our changes, and makes
continuous delivery possible for us.

We automate notifications, too, and just about anything we can to reduce human error and
make sure we don’t miss important tasks.

Finally, with more automation, we can keep our team smaller. That means less communications
overhead, and more speed—which is exactly our team’s charter.

Stay focused on continuous delivery

Stable hardware and reliable configurations are critical to making sure our developers can get
their work done. So we follow continuous delivery best practices, just like they do:

OUR CODE IS ALWAYS RELEASABLE

Our master is always “green” and stable, so it can be released at any time.

WE RELEASE FREQUENTLY
This reduces risk, since there are only small changes from release to release, and we can revert
easily as needed.

WE FOCUS ON FAST VALUE DELIVERY

Since our users are Atlassian developers, we want them happy. Continuous delivery ensures
we get improvements and fixes out to them as quickly as possible.

Embrace infrastructure as code

Simply put, this just means that we execute code to automatically configure servers, apps,
and more instead of manually configuring them via other less efficient methods like in-tool
configuration screens and wizards. We can literally use code to hammer out commands like
“give me N servers configured with apps X, Y, and Z”, and then use review and approval
workflows to reduce human error significantly.

As a result, we’re able to perform 10x more builds, without adding a single person to our
engineering team. We can deploy with far higher confidence, and more independence.
20

As a result, we’re able to perform 10x more

builds, without adding a single person to
our engineering team.
21
22

PRO TIP

With more
automation,
we can keep
our team
smaller.
That means less communications overhead,
and more speed.
23

05
Handling Incidents
at Atlassian
But what about when things aren’t working as planned—
like when a feature rolls out that isn’t performing
optimally? That’s where our Service Operations team
comes in. Our job is to make it easier to spot and fix
incidents, and prevent them from happening again in
the future.

We use ITIL as the basic framework for our service

management practice. It gives us a standard set of
terminology and processes that make it easier to
communicate and work together. More specifically,
ITIL provides a strong foundation for how to classify
incidents, define severity, and perform and track
investigations into root cause and more.

Let’s take a look at how Atlassian handles incidents

when the poop (or anything else, really) does eventually
hit the fan.
24

Someone (or something) reports the incident

We learn about system outages and other potential performance glitches in two ways:

Our users raise incidents using JIRA Service Desk

Our monitoring systems (like Cacti, DataDog, Zabbix, and Nagios) send us a notification

Support Tickets Notification Incident Ticket HOT Room

and/or Alerts

Runbook

We aggregate the alerts into HipChat

We aggregate all of our incident alerts into a single stream in a HipChat room, so our teams
get directly informed that there is a problem. This can sometimes generate noise, so we turn
to tools like BigPanda to help out. BigPanda correlates massive amounts of IT alerts and events,
and helps group them together, saving us a ton of time.

BigPanda 3:45 PM

New Incident - Id-srv-25 | Id-srv-27 | Id-srv-18

Memory Usage | CPU Utilization | Disk Space
Status: Critical Active alerts: 3
25

We create an incident ticket

Occasionally, a team may know the outage was caused by a change they just made, and they can
quickly disable that change. But more often than not, we need to pull a team together to troubleshoot
and resolve something. The first step is to raise an incident ticket in JIRA Service Desk.

To create a ticket, we enter a few details, like a short name and description of the vent, and then
categorize each incident by the impact it could have on a service, the number of users impacted,
and how urgently it should be handled.

We notify our users

We use StatusPage.io to communicate with internal and external stakeholders, and push updates
with incident status at regular intervals.

We create a dedicated chat room

and swarm to resolve the incident
Within the incident ticket in JIRA Service Desk, we use the “create a room” feature to move the
conversation to a dedicated HipChat room and pull in the right team to solve the problem at hand.
The team discusses what went wrong, and agrees on an approach for troubleshooting and fixing it.

We resolve and categorize the root cause

ITIL recommends that we categorize each issue (bug, license expiry, infrastructure or configuration
issue, etc.) once we’ve identified the root cause and taken corrective action. We also document
the correction actions we took as well, and can use all of this information to run detailed reports
highlighting our most common incident types and more. This helps us to take a more preventative
approach to incident and problem management.
26

Finally, we conduct a post-mortem

and document what went wrong
Possibly the most critical step to resolving an incident is learning from it. At Atlassian, we have a couple
of different options for tracking the post-incident review activities: JIRA or Confluence. Confluence
lets us configure templates for a standard incident report layout, and it’s easy to get started quickly.
JIRA, on the other hand, lets us build structured workflows that guide teams through the post-incident
review process, and allow us to track each post-mortem review through to completion.

We’ve used both successfully. More important than the technology you use in the post-mortem process
is making sure that you are able to develop a good understanding of the root cause of your outage.
Use that to take the right set of actions to prevent the same outage from occurring again.

Our top recommendations

CAPTURE THE DATA WHILE IT’S FRESH IN YOUR MIND
We use a JIRA workflow we developed to walk our team members through the entire incident report
process, complete with target timeframes for each step.

MAKE SURE YOU DOCUMENT EVERYTHING IN YOUR KNOWLEDGE BASE

We write all our incident reports in Confluence (and link to them from JIRA), so we can refer back
to them for future similar incidents and ensure we keep getting smarter (and sharing the knowledge)
along the way.

AUDIT YOUR RESULTS REGULARLY

We run reports in JIRA to make sure our team is doing a good job of resolving incidents and of
documenting the results. By introducing better workflow and diagnosis tools and following a
standardized approach to incident and problem management, we’ve reduced our mean-time-to-diagnosis
from 113 minutes to just 23 minutes—and we’re committed to cutting it even more.
28

PRO TIP

We’re big on
you build it,
you ship it,
you run it.
29

06
Being Proactive and
Staying Ahead of the Game
As the saying goes: The best defense is a good offense.
While the core responsibility of an SRE team is to ensure
reliability and availability, even with the best planning
and processes in place, things can always go south.
For this reason, our SRE teams at Atlassian believe in
being proactive. It’s vital that with each and every
incident that occurs, we capture key takeaways that
will improve our processes and motivate us to take risks
and try things differently to drive positive change.

When the dust has settled from an incident, it’s time

to complete a thorough retrospective and ensure we’ve
identified areas of improvement for next time. We plan,
track and assess this work in JIRA Software. It’s par-
ticularly helpful in ensuring our teams across multiple
geographies are aligned and always on the same page.
Distributed teams are invaluable in providing around-
the-clock coverage, but working across different
timezones create collaboration challenges. For this
reason our team shares one complete backlog of
project work that is understood by all team members
across regions. We also adopt agile best practices for
our proactive work and forecast future work based on
capacity and historical velocity.
30

Here are two agile rituals we follow as a team:

SPRINT REVIEW
Team members educate and showcase the value delivered during the two week sprint to the
entire team. This helps members of our team learn from one another, try new and different
things to achieve better results.

SPRINT PLANNING
Our team prepares for a sprint ahead of time by considering the priority of work in the backlog,
items that are incomplete from the previous sprint, and new stories, tasks or bugs that have
been created since last sprint. With a good understanding of the team’s overall capacity, team
members add stories into the next empty sprint. This helps us have a pre-populated sprint
ready and “on deck” at any point in time.

Another key goal for our team has been to evaluate time spent on manual, time consuming
processes and find ways to reduce them. Two recent wins that helped us become more
productive are: improving the foundation of our monitoring systems and progressive
automation with JIRA workflows.

Monitoring Improvements
Our team was recently tasked with building our current monitoring platform. The key goals
were to reduce median time to resolve (MTTR) and lower the severity of incidents. Early
detection via monitoring allows us to detect potential threats before users do and react
proactively. We identified four ‘golden’ signals that help us detect these threats early on.

THE FOUR GOLDEN SIGNALS

We drew inspiration from the four signals that most SRE teams are probably familiar with:

Latency Errors
Traffic Saturation

These golden signals are the bare essential. They are the key aspects that should be
monitored as a team tasked with delivering a reliable, user-facing service. Knowing our own
team preferences, we expanded the list to include the following:

Availability Saturation Application/User

Reliability Latency

We decided to use the above signals based on historical analysis of incidents and a strong
understanding of the service level objectives we were hoping to meet for our service. Identifying
the right signals was pivotal to our monitoring effort since our goal was to provide monitoring
visualization to all dependent teams and give them the right data to make the best decision at
any given time. For example, we discovered that the anomaly detection could be as simple as
detecting a rapid rate of change as opposed to finding a minor deviation. Another important
aspect that helped improve our Monitoring system was to have clear actions around Alerts.
31

Monitoring vs. Alerts

While our monitoring boards were expansive under the five signal areas, we chose not to receive alerts
on everything. This was by choice to ensure that we:

Wouldn’t fall in to the ‘alert trap’ i.e. not get inundated with alerts at all times
Had a clear call to action for the recipient of each alert, and
Identified causes and symptoms of incidents, preferring to alert on a symptom

As a result of these actions, we were able to detect and take appropriate actions on 70% of the incidents
ahead of time.

PROGRESSIVE AUTOMATION WITH JIRA WORKFLOWS

JIRA Software workflows support teams beyond dev teams, and the workflows are capable of supporting
more than just the standard stages of:

PENDING IN PROGRESS DONE

For example, automation is often a gradual process for our team with several ad-hoc scripts designed
to progressively simplify established runbooks. These scripts are then often tied together into an end-to-
end service to remove manual intervention and automate the process. JIRA workflows help simplify that
process, reduce noise in the service request queues, and represent the various ticket stages in a runbook.

Here’s one example of how we automated the restoration process of servers post RMA.

RESTORING SERVERS TO SERVICE AFTER AN RMA

A returned materials authorization (or RMA) is an alphanumeric identifier used by hardware
manufacturers to indicate that a user has been authorized by the company to return or repair
a defective or broken product.

Post RMA i.e. post evacuation, power off, and repair or replacement, servers in our fleet are returned
to the cluster. That process was historically complicated, involving several different teams and
inconsistent tracking. It was difficult to track the stage of each ticket (of which there were often a dozen
at any given point), which ultimately led to tickets being deprioritized and not tracked to completion.

The laborious, manual processing of a ticket looked like this:

1. Opening the ticket queue and selecting a ticket.

2. Checking status of server in ticket. Note: Checkboxes wouldn’t reflect actual status of server in the
process, meaning we had to either read the comments in the ticket or consult with other team members.
3. Locating the relevant runbook.
4. Locating the relevant step in said runbook. Note: If the step involved waiting for an async process (like
a memtest) to complete, we had to ensure that it was complete. If not, this meant starting over from Step 1.
5. Executing steps in runbook as required.
32

This process was not only tedious and cumbersome, but also unproductive since multiple SREs
had to familiarize themselves with the status of each ticket in the queue.

PENDING REPLACEMENT ABORTED ALL

NOT READY REPLACED PENDING MDRAID REBUILD

READY FOR RESTORE

READY FOR MEMTEST READY FOR HPT

PENDING MEMTEST HPT RAISED

MEMTEST FAILED MEMTEST RUNNING PENDING SSD CACHE

MEMTEST SUCCESSFUL READY FOR INITIAL SYNC IN PROGRESS

READY FOR SEL CLEAR INITIAL SYNC RUNNING INITIAL SYNC FAILED

PENDING SEL CLEAR PENDING SYNC UNBLACKLIST

READY FOR MONITORING READY FOR SYNC

PENDING MONITORING SYNC RUNNING SYNC FAILED

READY TO UNBLACKLIST

READY FOR PRIMARY REBALANCE

PRIMARY REBALANCE FAILED REBALANCING PRIMARY

READY FOR SECONDARY BALANCE

SECONDARY REBALANCE FAILED REBALANCING SECONDARY

DONE

So, it was time to try something different.

Enforcing a Structure
The first step of automating the above process was to establish a structure for automation. JIRA workflows
not only provide structure, but also let teams see the benefits of this structure even before the automation
is complete. By simply creating a specific ticket type and a workflow that has a status for each step of the
runbook, we could instantly do the following:

1. See where in the process a ticket was from the queue.

2. Filter queues based on status.
3. Establish a seamless process to transition a ticket to the next status after a step in the runbook
was complete.

The next step was to automate this process.

Automating that Structure

With a runbook-like structure to follow, and state stored in the form of ticket status, it was easier to
automate ticket management using the JIRA API (instead of having to manage state in a database). This
allows a stateless microservice to handle any steps that don’t require human interaction (like waiting for
long-running tasks to complete) and removing any items that don’t currently require human interaction
from human-visible queues.

The result was the following interconnected components:

1. A stateless microservice that polled the JIRA API for tickets with a particular kind of status,
and processed them based on that status.
2. Cleaner service request queues that only included things that couldn’t be actioned by a human.
3. Tickets that naturally reflected their current status.

But we didn’t just stop at automation. Remember staying ahead of the game? That’s where our work
on improving the un-automatable components comes in.

Improving the Un-automatable

Naturally there are likely to be some steps of any process that are not easy to automate, or are expected
to take a significant amount of effort and/or time. You can often still do away with the runbooks in these
situations by providing instructions to the persons who will process any manual steps. For example, our
restore to service automation does not have access to the IPMI interfaces, so it was easier to have the
automation provide instructions as a comment on the ticket, as seen in this diagram. The instructions
are clear, easy to follow, and prevent the need to context switch.
34

These tickets appear in a JIRA Service Desk queue called “Actionable RMA Restores”
and we strictly follow this template for clear actions and expected behaviors. This is similar
to what would be put in a runbook, but reduces the chance of a transcription error and the
friction around performing the actions in this workflow.

Like all agile teams, we are in a constant state of evaluation. Our next goal is to begin to
automate more of these manual steps, but for the time being we are focused on continuing
to further improve the monitoring systems and progressive automation workflows that have
made such a positive impact thus far.
35

Are You Ready

for DevOps?
DevOps is a culture, a philosophy, a methodology.
Software Development and Operations teams that
practice DevOps are more agile, more innovative,
and more profitable. Through increased collaboration
and greater visibility across teams, Development and
Operations teams can work more productively and
efficiently than ever before. Atlassian’s mission is to
unleash the potential in every team. With the Atlassian
suite and our ecosystem of partner integrations,
Development, Operations and all associated teams
have the tools and processes to: foster a culture of
collaboration and trust, release faster, accelerate time
to resolution of critical issues, and better manage
unplanned work.

Ride the DevOps wave with us.

atlassian.com/devops/start-your-journey

DevOps Material
100% (6)
DevOps Material
153 pages
Devops Tutorial: Complete Beginners Training
100% (1)
Devops Tutorial: Complete Beginners Training
9 pages
DevOps Interview Questions and Answers
From Everand
DevOps Interview Questions and Answers
Tech Interviews
5/5 (1)
Agile Scrum MCQ
No ratings yet
Agile Scrum MCQ
6 pages
DevOps Adoption: How to Build a DevOps IT Environment and Kickstart Your Digital Transformation
From Everand
DevOps Adoption: How to Build a DevOps IT Environment and Kickstart Your Digital Transformation
Frank Millstein
4.5/5 (3)
DevOps and Microservices: Non-Programmer's Guide to DevOps and Microservices
From Everand
DevOps and Microservices: Non-Programmer's Guide to DevOps and Microservices
Stephen Fleming
4/5 (2)
DevOps Bootcamp
From Everand
DevOps Bootcamp
Mitesh Soni
No ratings yet
DEVOPS Training
50% (6)
DEVOPS Training
195 pages
Edureka DevOps Ebook
83% (6)
Edureka DevOps Ebook
21 pages
DevOps HandBook
No ratings yet
DevOps HandBook
18 pages
DevOps Interview Question IMP
100% (2)
DevOps Interview Question IMP
41 pages
Assignment: Research Portfolio and Justification of Cloud Based Design Artefact
No ratings yet
Assignment: Research Portfolio and Justification of Cloud Based Design Artefact
5 pages
Agile, CMMI, Rup, ISO/IEC 12207... Is There A Method in This Madness?
No ratings yet
Agile, CMMI, Rup, ISO/IEC 12207... Is There A Method in This Madness?
5 pages
DevOps Tutorial
100% (1)
DevOps Tutorial
17 pages
DevOps Tutorial
100% (2)
DevOps Tutorial
302 pages
Exp-1 DevOpsLab
100% (1)
Exp-1 DevOpsLab
6 pages
AWS-DevOps-DevOps Best Practices
No ratings yet
AWS-DevOps-DevOps Best Practices
12 pages
DevOps Essential 2
100% (2)
DevOps Essential 2
122 pages
Devops in Practice: Reliable and automated software delivery
From Everand
Devops in Practice: Reliable and automated software delivery
Danilo Sato
1/5 (1)
Hands-on Pipeline as Code with Jenkins: CI/CD Implementation for Mobile, Web, and Hybrid Applications Using Declarative Pipeline in Jenkins (English Edition)
From Everand
Hands-on Pipeline as Code with Jenkins: CI/CD Implementation for Mobile, Web, and Hybrid Applications Using Declarative Pipeline in Jenkins (English Edition)
Ankita Patil
No ratings yet
Hands-on Pipeline as YAML with Jenkins: A Beginner's Guide to Implement CI/CD Pipelines for Mobile, Hybrid, and Web Applications Using Jenkins (English Edition)
From Everand
Hands-on Pipeline as YAML with Jenkins: A Beginner's Guide to Implement CI/CD Pipelines for Mobile, Hybrid, and Web Applications Using Jenkins (English Edition)
Mitesh Soni
No ratings yet
Hands-on Site Reliability Engineering: Build Capability to Design, Deploy, Monitor, and Sustain Enterprise Software Systems at Scale (English Edition)
From Everand
Hands-on Site Reliability Engineering: Build Capability to Design, Deploy, Monitor, and Sustain Enterprise Software Systems at Scale (English Edition)
Shamayel Mohammed Farooqui
No ratings yet
DevOps: Introduction to DevOps and its impact on Business Ecosystem: Introduction to DevOps and its impact on Business Ecosystem
From Everand
DevOps: Introduction to DevOps and its impact on Business Ecosystem: Introduction to DevOps and its impact on Business Ecosystem
Stephen Fleming
No ratings yet
Hands-on DevOps with Linux: Build and Deploy DevOps Pipelines Using Linux Commands, Terraform, Docker, Vagrant, and Kubernetes (English Edition)
From Everand
Hands-on DevOps with Linux: Build and Deploy DevOps Pipelines Using Linux Commands, Terraform, Docker, Vagrant, and Kubernetes (English Edition)
Alisson Machado de Menezes
No ratings yet
DevOps and Site Reliability Engineering Handbook: Non-Programmer’s Guide
From Everand
DevOps and Site Reliability Engineering Handbook: Non-Programmer’s Guide
Stephen Fleming
4/5 (1)
A Concise Guide to Microservices for Executive (Now for DevOps too!)
From Everand
A Concise Guide to Microservices for Executive (Now for DevOps too!)
alasdair gilchrist
1/5 (1)
Unit 9 DevOps
100% (1)
Unit 9 DevOps
39 pages
Agile Vs DevOps
No ratings yet
Agile Vs DevOps
13 pages
CI CD PipeLine
No ratings yet
CI CD PipeLine
7 pages
DevOps Tutorial - Introduction To DevOps - Edureka
No ratings yet
DevOps Tutorial - Introduction To DevOps - Edureka
10 pages
Devops Timetable
No ratings yet
Devops Timetable
25 pages
Devops Interview Questions - 1 - Bigclasses
100% (1)
Devops Interview Questions - 1 - Bigclasses
43 pages
Unit1 Devops
No ratings yet
Unit1 Devops
8 pages
Dev Ops Interview Questions
No ratings yet
Dev Ops Interview Questions
13 pages
Devops PPT For Btech
75% (4)
Devops PPT For Btech
25 pages
Aws Devops
No ratings yet
Aws Devops
20 pages
Devops and CICD
100% (1)
Devops and CICD
36 pages
DevOps Essentials PDF
No ratings yet
DevOps Essentials PDF
2 pages
Land A DevOps Job (FINAL)
100% (1)
Land A DevOps Job (FINAL)
53 pages
DevOps Ready Guide
100% (2)
DevOps Ready Guide
8 pages
Interview Questions For A DevOps
No ratings yet
Interview Questions For A DevOps
88 pages
Devops Vs Agile: The Differences Between The Two Are Listed Down in The Table Below
100% (2)
Devops Vs Agile: The Differences Between The Two Are Listed Down in The Table Below
39 pages
DevOps Overview PDF
No ratings yet
DevOps Overview PDF
50 pages
Aws Devops Blueprint
No ratings yet
Aws Devops Blueprint
3 pages
Introduction To DevOps Slides PDF
No ratings yet
Introduction To DevOps Slides PDF
104 pages
Introduction To Devops On Aws: David Chapman
No ratings yet
Introduction To Devops On Aws: David Chapman
20 pages
CI - CD Pipeline With Jenkins On AWS - by Edmar Barros - FAUN - Medium
100% (1)
CI - CD Pipeline With Jenkins On AWS - by Edmar Barros - FAUN - Medium
15 pages
AWS DevOps Interview QuestionAnswer
100% (1)
AWS DevOps Interview QuestionAnswer
10 pages
Decoding DevOps
No ratings yet
Decoding DevOps
45 pages
DevOps Agile Ebook
80% (5)
DevOps Agile Ebook
22 pages
AWS Brochure
No ratings yet
AWS Brochure
8 pages
Dev Ops
100% (3)
Dev Ops
34 pages
DevOps Use Cases
100% (3)
DevOps Use Cases
19 pages
DevOps Sample Resume 1
100% (1)
DevOps Sample Resume 1
6 pages
Cloud Computing Tutorial
100% (1)
Cloud Computing Tutorial
36 pages
DevOps Report
100% (1)
DevOps Report
22 pages
Level 3 Level 4 Level 5 Level 1
No ratings yet
Level 3 Level 4 Level 5 Level 1
2 pages
Top Answers To DevOps Interview Questions
100% (1)
Top Answers To DevOps Interview Questions
9 pages
Aws Devops
100% (1)
Aws Devops
20 pages
Senior Cloud DevOps Architect
No ratings yet
Senior Cloud DevOps Architect
3 pages
Devops Adoption: Guiding Transformation From Application To Enterprise
100% (1)
Devops Adoption: Guiding Transformation From Application To Enterprise
20 pages
DevOps Practitioner v2
No ratings yet
DevOps Practitioner v2
5 pages
1 - Extreme Programming (XP) - Georgia Tech
No ratings yet
1 - Extreme Programming (XP) - Georgia Tech
1 page
Quizlet Questions Missed Questions - 042313
No ratings yet
Quizlet Questions Missed Questions - 042313
9 pages
AI Services Roadmap v1
No ratings yet
AI Services Roadmap v1
4 pages
WIH3001 - Lecture Slides Week 3 - 2024 - Extended
No ratings yet
WIH3001 - Lecture Slides Week 3 - 2024 - Extended
28 pages
03 - Agile Development
No ratings yet
03 - Agile Development
32 pages
Extreme Software Engineering A Hands On Approach PDF
No ratings yet
Extreme Software Engineering A Hands On Approach PDF
2 pages
PMP Notes
No ratings yet
PMP Notes
23 pages
A Survey of Failures in The Software Development Process: Ais Electronic Library (Aisel)
No ratings yet
A Survey of Failures in The Software Development Process: Ais Electronic Library (Aisel)
16 pages
Developing An Internal Agile Coaching Capability 2018 05
100% (2)
Developing An Internal Agile Coaching Capability 2018 05
17 pages
Logistics System Analysis
No ratings yet
Logistics System Analysis
26 pages
14th Annual State of Agile Report
No ratings yet
14th Annual State of Agile Report
19 pages
Ors Project Report
No ratings yet
Ors Project Report
95 pages
Chapter 3: Agile Software Development
No ratings yet
Chapter 3: Agile Software Development
34 pages
Be Computer Engineering Semester 5 2022 May Software Project Management SPM Pattern 2019
No ratings yet
Be Computer Engineering Semester 5 2022 May Software Project Management SPM Pattern 2019
2 pages
Princewill Obi - Business Application Development Term Paper
No ratings yet
Princewill Obi - Business Application Development Term Paper
19 pages
Real Project
No ratings yet
Real Project
42 pages
[P] YUNPENG 2019 - Towards an integrated process model for new product development
No ratings yet
[P] YUNPENG 2019 - Towards an integrated process model for new product development
19 pages
Interview Preparation Plan for Diamond Beverages
No ratings yet
Interview Preparation Plan for Diamond Beverages
6 pages
Software Development - RAPS
No ratings yet
Software Development - RAPS
34 pages
Agile in The Enterprise 2019 - Results Summary (Updated)
No ratings yet
Agile in The Enterprise 2019 - Results Summary (Updated)
50 pages
Software Development Methodology
No ratings yet
Software Development Methodology
5 pages
College Fees Management System
No ratings yet
College Fees Management System
19 pages
PM Newsletter - Q3 2022
No ratings yet
PM Newsletter - Q3 2022
11 pages
Project Life Cycle Selection For New Payment System
No ratings yet
Project Life Cycle Selection For New Payment System
2 pages
fosd PAPER
No ratings yet
fosd PAPER
2 pages
Summary
No ratings yet
Summary
5 pages
Bus Part 1 Reading Bank
No ratings yet
Bus Part 1 Reading Bank
2 pages
Agile Organization
No ratings yet
Agile Organization
5 pages