SlideShare a Scribd company logo
SPN CI/CD journey on AWS
SPN Infra., CoreTech
Scott Miao
11/22/2017
1
Who am I
• Scott Miao
• RD, SPN Infra., TrendMicro
• OOAD system dev. 10+ years
• Hadoop ecosystem 6 years
• AWS for BigData 4 years
• @linkedIn
• @slideshare
2
Agenda
• Original services delivery process in SPN
• Dev/Ops
– DevOps goals V.S. our original way
• CI/CD on AWS
• An example service CI/CD on AWS
• DevOps goals V.S. our original way V.S. CI/CD
on AWS
• Lessons learned
Original services delivery process
in SPN
Developers
2. Source Repo
1. Dev, utests,…
3. Back and forth
4. Trigger CI
Release portal
7. Trigger
Release
build
8.
Release
artifacts
Operators Infra. admin
5. Devices spec.
For both Stg/PROD6.1 Monitoring scripts
6.2 Puppet scripts
6.3 Operation guides
Release portal
Stg.
PROD
Service team Operation team DCS team
9. Stg resources
ready
11. Deploy
and monitor
13.
Release
artifacts
12.1 Itests
12.2 Stress tests
12.3 UAT
15. 16.
17. PROD
release
10. Deploy
service &scripts
14. PROD
resources ready
Dev/Ops
20171122 aws usergrp_coretech-spn-cicd-aws-v01
8
DevOps is not a new technology or a
product. It’s an approach or culture of
software development that seeks stability
and performance at the same time that it
speeds software deliveries to the business.
── Andi Mann, CA Technology ──
Cited from: Derek Chen, RD, TrendMicro
https://www.slideshare.net/derekhound/devops-in-practice-78905911, p#15
9
Software Delivery
Plan Release
Operat
e
Code Build DeployTest
Monito
r
Agile Development
Continuous Integration
Continuous Delivery
Continuous Deployment
DevOps
Cited from: Derek Chen, RD, TrendMicro
https://www.slideshare.net/derekhound/devops-in-practice-78905911, p#23
DevOps goals V.S. our original way
• Faster time to market
– Too complicated to miss steps
– Service team needs to follow up themselves
– Lead time needed steps (Machine resources, etc)
• Lower failure rate of new releases
– Manual steps lead to errors
• Shorten lead time between fixes
– Rolling upgrade
– Invasive
• Faster mean time to recovery
– Hard to deal with machine errors and peak
2https://en.wikipedia.org/wiki/DevOps#Goals
“Very often, automation supports
this objective”
https://en.wikipedia.org/wiki/DevOps#Goals
Quoted from Wikipedia for DevOps goals
CI/CD on AWS
TWO ACHIEVE SAME DEVOPS GOALS
DEVOPS FOCUSES ON ORGANIZATIONAL CHANGES
CI/CD FOCUSES ON TECHNICAL IMPLEMENTATIONS
Review for CI and CD
• Continuous Integration
– is the practice of merging all developer working
copies to a shared mainline (trunk) several times
a day
• Continuous Delivery
– produce software in short cycles, ensuring that
the software can be reliably released at any time
• Continuous Deployment
– means that every change is automatically
deployed to production
https://en.wikipedia.org/wiki/Continuous_integration
https://en.wikipedia.org/wiki/Continuous_delivery
Characteristics of Cloud Computing
• On-demand self-service
– A consumer can unilaterally provision computing capabilities
• Broad network access
– Capabilities are available over the network and accessed
through standard mechanisms
• Resource pooling
– The provider's computing resources are pooled to serve
multiple consumers using a multi-tenant model
• Rapid elasticity
– Capabilities can be elastically provisioned and released
• Measured service
– Cloud systems automatically control and optimize resource use
http://www.inforisktoday.com/5-essential-characteristics-cloud-computing-a-4189
https://en.wikipedia.org/wiki/Infrastructure_as_Code
(AWS)
DevOps
CI/CD
Automation
Cloud Computing
AWS managed services SPN used
• AWS CloudFormation
– Gives developers and systems administrators an easy
way to create and manage a collection of related
AWS resources
– We use it to provision our service components
• Such as Load balancer (ALB), machines (EC2)
• AWS OpsWorks
– A configuration management service that uses Chef,
an automation platform that treats server
configurations as code
– We use it to deploy, configure and startup our
service components
https://aws.amazon.com/cloudformation/
https://aws.amazon.com/opsworks/
AWS CloudFormation + OpsWorks
user
main
IAM ELB OpsWorks
AWS
CloudFormation
main
IAM ALB OpsWorks
AWS
OpsWorks
artifacts
AWS S3
AWS
VPC
Chef recipes1. Put CF templates
2. Put artifacts
3. Put Chef recipes
4. Create CF W/ params,
VPC ID, etc
5. Templates
input
6. Create CF
stacks
7. Provision
AWS resources
8. Create OpsWorks
9. Artifacts/recipes
input
10.
Deploy/Config/start
up service
User
CF
Ops
Ready to
serve
CoreTech DCS managed services
• Enterprise github
– Just like the github we use on Internet
• CloudCI – Enterprise Circle CI
– A Docker container based CI solution
– Seamlessly integrated with github
• JFrog Artifactory
– A CoreTech wise shared artifacts repo.
An example service CI/CD on AWS
ANALYTIC ENGINE
Analytic Engine is an API service for…
Common Big Data computation
service on Cloud (AWS)
https://www.slideshare.net/takeshi_miao/analytic-engine-a-common-big-data-computation-service-on-the-aws
IDC
AE High Level Architecture Design
AZb
AE API servers
RDS
AZa
AZb
AZc
AE API servers
RDS
services
services
services
peering
HTTPS
EMR
EMR
Cross-account
S3 buckets
Auto
Scaling
group
worker
s
worker
sMulti-AZs
Auto
Scaling
group
Auto
Scaling
group
Eureka
Eureka
VPN
HTTPS/HTTP
Basic
Cloud Storagepeering
isValidUser
CS output
HTTPS/HTTP
Basic
Amazon
SNS
Oregon (us-west-2)
IDC
VPN
Splunk
peering
Private ALB
IDC
This is really what we taking care about
AZb
AE API servers
RDS
AZa
AZb
AZc
AE API servers
RDS
services
services
services
peering
HTTPS
EMR
EMR
Cross-account
S3 buckets
Auto
Scaling
group
worker
s
worker
sMulti-AZs
Auto
Scaling
group
Auto
Scaling
group
Eureka
Eureka
VPN
HTTPS/HTTP
Basic
Cloud Storagepeering
isValidUser
CS output
HTTPS/HTTP
Basic
Amazon
SNS
Oregon (us-west-2)
IDC
VPN
Splunk
peering
Private ALB
What components in CI/CD scope
• In scope
– API, Worker, Eureka, Genie W/ auto-scaling group
• EC2, deploy, configure and startup component services
– AWS Elastic Application Load Balancer
– AWS Simple Notification Service
• NOT in scope
– VPC/subnets/VPC peerings
• We use fixed VPC and subnets for both VPN connections and VPC
peerings
– RDS MySQL DB
• Already pre-created
– EMR clusters
• Create by user API calls via AWS Java SDK
CI/CD Usecases
1. Developer edits/pushes codes to github
2. Developer deploys AE to Dev env. for tests
3. Developer terminates AE in Dev env. after tests
4. Developer deploys AE to Stg env. for integrated
tests/UAT
5. Developer deploys AE to PROD env.
6. Developer patches hotfixes and deploys to
PROD
7. Monitor your service components
1. Developer edits/pushes codes to github
Developers
master
AE-100
Repo: spn/ae-saas Project: spn/ae-saas
1.19.0 3.build 4.utests 5.package
6.cp artifacts
to S3
S3: dev-us-east-1
CF templates
ae-
1.19.AE_100.jar
s
Chef recipes
ae-
1.19.AE_100.jars
1. Push
AE-100 branch
2. Trigger CI
7. cp to S3
8.publish artifacts
to mvn repo.
9. Publish
artifacts to
mvn repo.
Feature branch workflow
https://www.atlassian.com/git/tutorials/comparing-workflows
Every commit will trigger this build
2. Developer deploys AE to Dev env. for tests
Developers
Repo: spn/ae-saas Project: spn/ae-saas
4.Create CF
S3: dev-us-east-1
CF templates
ae-
1.19.AE_100.jars
Chef recipes
1. Git tag: c-1.19.AE_100-
dev-us-east-1-myAE
3. Trigger CI
Feature branch workflow
2. Push tag
Dev VPC
AWS CF
5. CF creating for stack: ae-dev-myAE
5.1 Templates
input
6. Provision
resources
7.
Deploy/config/s
tartup service
Ready for
tests
Env.
variables
in CImaster
AE-100
3. Developer terminates AE in Dev env. after tests
Developers
Repo: spn/ae-saas Project: spn/ae-saas
4.delete CF
3. Trigger CI
Feature branch workflow
2. Push tag
Dev VPC
AWS CF
5. CF deleting for
stack: ae-dev-myAE
6. Terminating
resources
1. Git tag: d-1.19.AE_100-
dev-us-east-1-myAE
master
8.1
Deploy/config/
startup service
4. Developer deploys AE to Stg env. for integrated
tests/UAT (Much like UC#2)
Developers
Repo: spn/ae-saas Project: spn/ae-saas
7.Create CF
S3: dev-us-east-1
CF templates
ae-1.19.563.jars Chef recipes
2. Git tag: c-1.19.563-stg-
us-east-1-myAE
4. Trigger CI
Feature branch workflow
3. Push tag
Dev VPC
AWS CF
8. Provision resources
for stack: ae-stg-myAE
Ready for
tests
Env.
variables
in CImaster
AE-100
1.19.563
1. Merge feature branch:
1.19.<buildNum>
5.cp artifacts
to stg S3
●
●
●
6.1 copying
6. cp artifacts from dev to stg
9.Run itests
S3: stg-us-east-1
Run itests
on service
5. Developer deploys AE to PROD env. (Much like
UC#4)
29
Much like UC#4
Git tag: c-1.19.563-prod-us-west-2-myAE
6. Developer patches hotfixes and deploys to PROD
(1/2)
Developers
Repo: spn/ae-saas Project: spn/ae-saas
6.Update
CF
S3: stg-us-east-1
CF templates
ae-1.19.563.jars Chef recipes
1. Git tag: u-1.19.570-
prod-us-west-2-myAE
3. Trigger CI
Feature branch workflow
2. Push tag
Dev VPC
AWS CF
7. Update CF stack: ae-
prod-myAE
Ready to
serve
Env.
variables
in CImaster
AE-105
1.19.570
4.cp artifacts
to prod S3
●
●
●
5.1 copying
5. cp artifacts from stg to prod
S3: prod-us-west-2
8.1 Re-
Deploy/config/
startup service
6. Developer patches hotfixes and deploys to PROD
(2/2)
• Updating W/O SLA impact
– ALB W/ AutoScalingReplacingUpdate for
UpdatePolicy Attribute configured
• Better and flexible Auto-scaling
– EC2 Auto-scaling group + Opsworks
• Cross region deployment as early as possible
– Minor configuration diffs
• Deploy to us-east-1 successful does not assure on others…
– AWS SDK default value is us-east-1
• You may forgot to set in your code…
31
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-updatepolicy.html
https://aws.amazon.com/tw/blogs/devops/auto-scaling-aws-opsworks-instances/
(Auto-healing really sucks)
7. Monitor your service components (1/2)
These are the practices we learned from other teams in Trend
• Visibility
– Operator can get the timely system status every time every where
– Practice:
• CW metrics -> CW dashboard
• CloudWatchLog -> AWS Lambda -> Log management system
• Monitoring
– Operator can setup a threshold at specific point for any metrics as a
monitor
– Therefore, the monitor can trigger corresponding actions to notify operator
– Practice:
• [App logs -> WC agent -> | custom] WC metrics -> WC Alarm
• Auto-Recovery
– System can auto recovers itself for every component runs failed
– Practice:
• EC2 auto-scaling group + Opsworks
• WC metrics -> WC Alarm -> AWS Lambda -> AWS SDK -> AWS Opsworks|AWS EC2
32
7. Monitor your service components (2/2)
A high level architecture design
33
App
components
Managed
Services
AWS
CloudWatch
Default
metrics
Custom metrics
(CPU, mem, disk)
CW
metrics
CW Dashboard
CW Alarms
Pager
AWS SNS
AWS Lambda
AWS
CloudWatchLog
App logs to CWLog
Metric
filters
AWS Lambda
Input Store Process Output
Log management
Visibility
Monitoring
Visibility
AWS Lambda
Auto-recovery
DevOps goals V.S. our original way V.S. CI/CD on
AWS
Goals Original way CI/CD
Faster time to
market
• Too complicated to miss
steps
• Service team needs to
follow up themselves
• Lead time needed steps
(Machine resources, etc)
• One click delivery
• Only one role “developer”
• Minutes of lead time for
resources
Lower failure
rate of new
releases
• Manual steps lead to errors • Fully automation
Shorten lead
time between
fixes
• Rolling upgrade
• Invasive
• Replacing/Rolling upgrade
deployment
• Non-invasive
Faster mean
time to recovery
• Hard to deal with machine
errors and peak
• Elasticities brought from
Cloud Computing platform
https://en.wikipedia.org/wiki/DevOps#Goals
Lessons learned
• Try to automate everything as you can
– Cloudformation + EC2 Auto-scaling group + Opsworks
– AWS::CloudFormation::CustomResource is also a tool to rescue
• Consider to split your service CF template
– Service infra. (RDS, SNS, KMS key, etc)
• You not update your infra. often
– Service instance, (EC2, etc)
• We update our service instances very often
• Not only consider about first time creation
– How to update your services W/O impact SLA
• Monitor ! Monitor !! Monitor !!!
• TEST ! TEST !! TEST !!!
35
http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-cfn-customresource.html
2
37
Backups
Different types of Auto-scaling group
39
Service
Auto Scaling
Group
Features Deploy
OpsWorks
24/7
•manual creation/deletion
•configure one instance for one AZ
chef recipe
time-based
•can specify time slot(s) based on hour unit,
on everyday or any day in week
•configure one instance for one AZ
chef recipe
load-based
•can specify CPU/MEM/workload avg. based
on an OPS layer
•UP: when to increase instances
•Down: when to decrease instances
•No max./min. # of instances setting
•configure one instance for one AZ
chef recipe
EC2
•can set max./min. for # of instance
•Multi-AZs support
user-data
Auto Recovery based on Monit
• OpsWorks already use Monit for Auto
Recovery
– Leverage the Monit on EC2
– Have practices in on-premise
11/22/201
7
Confidential | Copyright 2014
TrendMicro Inc.
2
AZ1 AZ2
API
server
API
server
https://mmonit.com/monit/
Auto Scaling group
• Instance check by
CloudWatch
• Process check by
Monit
• No process –
restart process
• Process health
check failed –
terminate EC2
• Terminate EC2 !Auto Scaling group
launch new EC2
Little variances among AWS regions
• Impact
– Same automation scripts can not run successfully among regions, even the
same region sometimes
• Issues
11/22/201
7
Confidential | Copyright 2014
TrendMicro Inc.
2
Service Regions Root cause
OpsWorks Same region on
us-west-2
S3 URL acceptable spec. had changed for property
“Repository URL”
From “https://s3.amazonaws.com” to “https://s3-us-
west-2.amazonaws.com”
OpsWorks us-west-2 V.S. us-
east-1
Still be “Repository URL” issue. “https://s3-us-west-
2.amazonaws.com” V.S. “https://s3.amazonaws.com”
EC2 us-west-2 V.S. us-
east-1
EC2 FQDN spec. is different.
“ip-10-104-33-152.us-west-2.compute.internal” V.S. “ip-
10-103-73-248.ec2.internal”
OpsWorks V.S. image-based deployment
• OpsWorks deployment
– We are currently using
– It takes too long to launch a service component
• E.g. It takes about ~10 mins to launch a Genie node
• Image-based deployment
– Theoretically, it should takes very short time to
launch a service component
– More responsive for peak workloads
– AMI (AWS Machine Images) V.S. Docker images ?
How about API Gateway and ECS ?
• API Gateway
– Not good due to only Internet accessible
– Cold start
– RDB connection overflow
– CORS integration for web UI
• ECS
– Still need to run standby EC2 instances for peak…
– Only take care for RESTful API services
– Kubernates more suitable for our usecases
43
Ad

More Related Content

Similar to 20171122 aws usergrp_coretech-spn-cicd-aws-v01 (20)

Zure Azure PaaS Zero to Hero - DevOps training day
Zure Azure PaaS Zero to Hero - DevOps training dayZure Azure PaaS Zero to Hero - DevOps training day
Zure Azure PaaS Zero to Hero - DevOps training day
Okko Oulasvirta
 
Getting to Walk with DevOps
Getting to Walk with DevOpsGetting to Walk with DevOps
Getting to Walk with DevOps
Eklove Mohan
 
Zero to 1000+ Applications - Large Scale CD Adoption at Cisco with Spinnaker ...
Zero to 1000+ Applications - Large Scale CD Adoption at Cisco with Spinnaker ...Zero to 1000+ Applications - Large Scale CD Adoption at Cisco with Spinnaker ...
Zero to 1000+ Applications - Large Scale CD Adoption at Cisco with Spinnaker ...
DevOps.com
 
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdfNET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
Tamir Dresher
 
04_Azure Kubernetes Service: Basic Practices for Developers_GAB2019
04_Azure Kubernetes Service: Basic Practices for Developers_GAB201904_Azure Kubernetes Service: Basic Practices for Developers_GAB2019
04_Azure Kubernetes Service: Basic Practices for Developers_GAB2019
Kumton Suttiraksiri
 
CI/CD on AWS
CI/CD on AWSCI/CD on AWS
CI/CD on AWS
Bhargav Amin
 
DevOps Fest 2020. Андрей Шишенко. CI/CD for AWS Lambdas with Serverless frame...
DevOps Fest 2020. Андрей Шишенко. CI/CD for AWS Lambdas with Serverless frame...DevOps Fest 2020. Андрей Шишенко. CI/CD for AWS Lambdas with Serverless frame...
DevOps Fest 2020. Андрей Шишенко. CI/CD for AWS Lambdas with Serverless frame...
DevOps_Fest
 
Spinnaker Summit 2018: CI/CD Patterns for Kubernetes with Spinnaker
Spinnaker Summit 2018: CI/CD Patterns for Kubernetes with SpinnakerSpinnaker Summit 2018: CI/CD Patterns for Kubernetes with Spinnaker
Spinnaker Summit 2018: CI/CD Patterns for Kubernetes with Spinnaker
Andrew Phillips
 
Priming Your Teams For Microservice Deployment to the Cloud
Priming Your Teams For Microservice Deployment to the CloudPriming Your Teams For Microservice Deployment to the Cloud
Priming Your Teams For Microservice Deployment to the Cloud
Matt Callanan
 
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan GoksuSpring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
VMware Tanzu
 
AperiStorageResourceManager
AperiStorageResourceManagerAperiStorageResourceManager
AperiStorageResourceManager
Robert Wipfel
 
Enabling .NET Apps with Monitoring and Management Using Steeltoe
Enabling .NET Apps with Monitoring and Management Using SteeltoeEnabling .NET Apps with Monitoring and Management Using Steeltoe
Enabling .NET Apps with Monitoring and Management Using Steeltoe
VMware Tanzu
 
20211028 ADDO Adapting to Covid with Serverless Craeg Strong Ariel Partners
20211028 ADDO Adapting to Covid with Serverless Craeg Strong Ariel Partners20211028 ADDO Adapting to Covid with Serverless Craeg Strong Ariel Partners
20211028 ADDO Adapting to Covid with Serverless Craeg Strong Ariel Partners
Craeg Strong
 
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with ConcourseContinuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
VMware Tanzu
 
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Mandi Walls
 
Power of Azure Devops
Power of Azure DevopsPower of Azure Devops
Power of Azure Devops
Azure Riyadh User Group
 
Faster, more Secure Application Modernization and Replatforming with PKS - Ku...
Faster, more Secure Application Modernization and Replatforming with PKS - Ku...Faster, more Secure Application Modernization and Replatforming with PKS - Ku...
Faster, more Secure Application Modernization and Replatforming with PKS - Ku...
VMware Tanzu
 
Individual Serverless Development Environments for AWS
Individual Serverless Development Environments for AWSIndividual Serverless Development Environments for AWS
Individual Serverless Development Environments for AWS
Søren Peter Nielsen
 
Aws container webinar day 2
Aws container webinar day 2Aws container webinar day 2
Aws container webinar day 2
HoseokSeo7
 
Autoscaling OpenStack Natively with Heat, Ceilometer and LBaaS
Autoscaling OpenStack Natively with Heat, Ceilometer and LBaaSAutoscaling OpenStack Natively with Heat, Ceilometer and LBaaS
Autoscaling OpenStack Natively with Heat, Ceilometer and LBaaS
Shixiong Shang
 
Zure Azure PaaS Zero to Hero - DevOps training day
Zure Azure PaaS Zero to Hero - DevOps training dayZure Azure PaaS Zero to Hero - DevOps training day
Zure Azure PaaS Zero to Hero - DevOps training day
Okko Oulasvirta
 
Getting to Walk with DevOps
Getting to Walk with DevOpsGetting to Walk with DevOps
Getting to Walk with DevOps
Eklove Mohan
 
Zero to 1000+ Applications - Large Scale CD Adoption at Cisco with Spinnaker ...
Zero to 1000+ Applications - Large Scale CD Adoption at Cisco with Spinnaker ...Zero to 1000+ Applications - Large Scale CD Adoption at Cisco with Spinnaker ...
Zero to 1000+ Applications - Large Scale CD Adoption at Cisco with Spinnaker ...
DevOps.com
 
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdfNET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
NET Aspire - NET Conf IL 2024 - Tamir Dresher.pdf
Tamir Dresher
 
04_Azure Kubernetes Service: Basic Practices for Developers_GAB2019
04_Azure Kubernetes Service: Basic Practices for Developers_GAB201904_Azure Kubernetes Service: Basic Practices for Developers_GAB2019
04_Azure Kubernetes Service: Basic Practices for Developers_GAB2019
Kumton Suttiraksiri
 
DevOps Fest 2020. Андрей Шишенко. CI/CD for AWS Lambdas with Serverless frame...
DevOps Fest 2020. Андрей Шишенко. CI/CD for AWS Lambdas with Serverless frame...DevOps Fest 2020. Андрей Шишенко. CI/CD for AWS Lambdas with Serverless frame...
DevOps Fest 2020. Андрей Шишенко. CI/CD for AWS Lambdas with Serverless frame...
DevOps_Fest
 
Spinnaker Summit 2018: CI/CD Patterns for Kubernetes with Spinnaker
Spinnaker Summit 2018: CI/CD Patterns for Kubernetes with SpinnakerSpinnaker Summit 2018: CI/CD Patterns for Kubernetes with Spinnaker
Spinnaker Summit 2018: CI/CD Patterns for Kubernetes with Spinnaker
Andrew Phillips
 
Priming Your Teams For Microservice Deployment to the Cloud
Priming Your Teams For Microservice Deployment to the CloudPriming Your Teams For Microservice Deployment to the Cloud
Priming Your Teams For Microservice Deployment to the Cloud
Matt Callanan
 
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan GoksuSpring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
VMware Tanzu
 
AperiStorageResourceManager
AperiStorageResourceManagerAperiStorageResourceManager
AperiStorageResourceManager
Robert Wipfel
 
Enabling .NET Apps with Monitoring and Management Using Steeltoe
Enabling .NET Apps with Monitoring and Management Using SteeltoeEnabling .NET Apps with Monitoring and Management Using Steeltoe
Enabling .NET Apps with Monitoring and Management Using Steeltoe
VMware Tanzu
 
20211028 ADDO Adapting to Covid with Serverless Craeg Strong Ariel Partners
20211028 ADDO Adapting to Covid with Serverless Craeg Strong Ariel Partners20211028 ADDO Adapting to Covid with Serverless Craeg Strong Ariel Partners
20211028 ADDO Adapting to Covid with Serverless Craeg Strong Ariel Partners
Craeg Strong
 
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with ConcourseContinuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
Continuous Delivery: Fly the Friendly CI in Pivotal Cloud Foundry with Concourse
VMware Tanzu
 
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Habitat talk at CodeMonsters Sofia, Bulgaria Nov 27 2018
Mandi Walls
 
Faster, more Secure Application Modernization and Replatforming with PKS - Ku...
Faster, more Secure Application Modernization and Replatforming with PKS - Ku...Faster, more Secure Application Modernization and Replatforming with PKS - Ku...
Faster, more Secure Application Modernization and Replatforming with PKS - Ku...
VMware Tanzu
 
Individual Serverless Development Environments for AWS
Individual Serverless Development Environments for AWSIndividual Serverless Development Environments for AWS
Individual Serverless Development Environments for AWS
Søren Peter Nielsen
 
Aws container webinar day 2
Aws container webinar day 2Aws container webinar day 2
Aws container webinar day 2
HoseokSeo7
 
Autoscaling OpenStack Natively with Heat, Ceilometer and LBaaS
Autoscaling OpenStack Natively with Heat, Ceilometer and LBaaSAutoscaling OpenStack Natively with Heat, Ceilometer and LBaaS
Autoscaling OpenStack Natively with Heat, Ceilometer and LBaaS
Shixiong Shang
 

More from Scott Miao (9)

Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Scott Miao
 
Attack on graph
Attack on graphAttack on graph
Attack on graph
Scott Miao
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduse
Scott Miao
 
003 admin featuresandclients
003 admin featuresandclients003 admin featuresandclients
003 admin featuresandclients
Scott Miao
 
006 performance tuningandclusteradmin
006 performance tuningandclusteradmin006 performance tuningandclusteradmin
006 performance tuningandclusteradmin
Scott Miao
 
005 cluster monitoring
005 cluster monitoring005 cluster monitoring
005 cluster monitoring
Scott Miao
 
002 hbase clientapi
002 hbase clientapi002 hbase clientapi
002 hbase clientapi
Scott Miao
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
Scott Miao
 
20121022 tm hbasecanarytool
20121022 tm hbasecanarytool20121022 tm hbasecanarytool
20121022 tm hbasecanarytool
Scott Miao
 
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter MigrationZero-downtime Hadoop/HBase Cross-datacenter Migration
Zero-downtime Hadoop/HBase Cross-datacenter Migration
Scott Miao
 
Attack on graph
Attack on graphAttack on graph
Attack on graph
Scott Miao
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduse
Scott Miao
 
003 admin featuresandclients
003 admin featuresandclients003 admin featuresandclients
003 admin featuresandclients
Scott Miao
 
006 performance tuningandclusteradmin
006 performance tuningandclusteradmin006 performance tuningandclusteradmin
006 performance tuningandclusteradmin
Scott Miao
 
005 cluster monitoring
005 cluster monitoring005 cluster monitoring
005 cluster monitoring
Scott Miao
 
002 hbase clientapi
002 hbase clientapi002 hbase clientapi
002 hbase clientapi
Scott Miao
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
Scott Miao
 
20121022 tm hbasecanarytool
20121022 tm hbasecanarytool20121022 tm hbasecanarytool
20121022 tm hbasecanarytool
Scott Miao
 
Ad

Recently uploaded (20)

Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Ad

20171122 aws usergrp_coretech-spn-cicd-aws-v01

  • 1. SPN CI/CD journey on AWS SPN Infra., CoreTech Scott Miao 11/22/2017 1
  • 2. Who am I • Scott Miao • RD, SPN Infra., TrendMicro • OOAD system dev. 10+ years • Hadoop ecosystem 6 years • AWS for BigData 4 years • @linkedIn • @slideshare 2
  • 3. Agenda • Original services delivery process in SPN • Dev/Ops – DevOps goals V.S. our original way • CI/CD on AWS • An example service CI/CD on AWS • DevOps goals V.S. our original way V.S. CI/CD on AWS • Lessons learned
  • 4. Original services delivery process in SPN
  • 5. Developers 2. Source Repo 1. Dev, utests,… 3. Back and forth 4. Trigger CI Release portal 7. Trigger Release build 8. Release artifacts Operators Infra. admin 5. Devices spec. For both Stg/PROD6.1 Monitoring scripts 6.2 Puppet scripts 6.3 Operation guides Release portal Stg. PROD Service team Operation team DCS team 9. Stg resources ready 11. Deploy and monitor 13. Release artifacts 12.1 Itests 12.2 Stress tests 12.3 UAT 15. 16. 17. PROD release 10. Deploy service &scripts 14. PROD resources ready
  • 8. 8 DevOps is not a new technology or a product. It’s an approach or culture of software development that seeks stability and performance at the same time that it speeds software deliveries to the business. ── Andi Mann, CA Technology ── Cited from: Derek Chen, RD, TrendMicro https://www.slideshare.net/derekhound/devops-in-practice-78905911, p#15
  • 9. 9 Software Delivery Plan Release Operat e Code Build DeployTest Monito r Agile Development Continuous Integration Continuous Delivery Continuous Deployment DevOps Cited from: Derek Chen, RD, TrendMicro https://www.slideshare.net/derekhound/devops-in-practice-78905911, p#23
  • 10. DevOps goals V.S. our original way • Faster time to market – Too complicated to miss steps – Service team needs to follow up themselves – Lead time needed steps (Machine resources, etc) • Lower failure rate of new releases – Manual steps lead to errors • Shorten lead time between fixes – Rolling upgrade – Invasive • Faster mean time to recovery – Hard to deal with machine errors and peak 2https://en.wikipedia.org/wiki/DevOps#Goals
  • 11. “Very often, automation supports this objective” https://en.wikipedia.org/wiki/DevOps#Goals Quoted from Wikipedia for DevOps goals
  • 12. CI/CD on AWS TWO ACHIEVE SAME DEVOPS GOALS DEVOPS FOCUSES ON ORGANIZATIONAL CHANGES CI/CD FOCUSES ON TECHNICAL IMPLEMENTATIONS
  • 13. Review for CI and CD • Continuous Integration – is the practice of merging all developer working copies to a shared mainline (trunk) several times a day • Continuous Delivery – produce software in short cycles, ensuring that the software can be reliably released at any time • Continuous Deployment – means that every change is automatically deployed to production https://en.wikipedia.org/wiki/Continuous_integration https://en.wikipedia.org/wiki/Continuous_delivery
  • 14. Characteristics of Cloud Computing • On-demand self-service – A consumer can unilaterally provision computing capabilities • Broad network access – Capabilities are available over the network and accessed through standard mechanisms • Resource pooling – The provider's computing resources are pooled to serve multiple consumers using a multi-tenant model • Rapid elasticity – Capabilities can be elastically provisioned and released • Measured service – Cloud systems automatically control and optimize resource use http://www.inforisktoday.com/5-essential-characteristics-cloud-computing-a-4189 https://en.wikipedia.org/wiki/Infrastructure_as_Code
  • 16. AWS managed services SPN used • AWS CloudFormation – Gives developers and systems administrators an easy way to create and manage a collection of related AWS resources – We use it to provision our service components • Such as Load balancer (ALB), machines (EC2) • AWS OpsWorks – A configuration management service that uses Chef, an automation platform that treats server configurations as code – We use it to deploy, configure and startup our service components https://aws.amazon.com/cloudformation/ https://aws.amazon.com/opsworks/
  • 17. AWS CloudFormation + OpsWorks user main IAM ELB OpsWorks AWS CloudFormation main IAM ALB OpsWorks AWS OpsWorks artifacts AWS S3 AWS VPC Chef recipes1. Put CF templates 2. Put artifacts 3. Put Chef recipes 4. Create CF W/ params, VPC ID, etc 5. Templates input 6. Create CF stacks 7. Provision AWS resources 8. Create OpsWorks 9. Artifacts/recipes input 10. Deploy/Config/start up service User CF Ops Ready to serve
  • 18. CoreTech DCS managed services • Enterprise github – Just like the github we use on Internet • CloudCI – Enterprise Circle CI – A Docker container based CI solution – Seamlessly integrated with github • JFrog Artifactory – A CoreTech wise shared artifacts repo.
  • 19. An example service CI/CD on AWS ANALYTIC ENGINE
  • 20. Analytic Engine is an API service for… Common Big Data computation service on Cloud (AWS) https://www.slideshare.net/takeshi_miao/analytic-engine-a-common-big-data-computation-service-on-the-aws
  • 21. IDC AE High Level Architecture Design AZb AE API servers RDS AZa AZb AZc AE API servers RDS services services services peering HTTPS EMR EMR Cross-account S3 buckets Auto Scaling group worker s worker sMulti-AZs Auto Scaling group Auto Scaling group Eureka Eureka VPN HTTPS/HTTP Basic Cloud Storagepeering isValidUser CS output HTTPS/HTTP Basic Amazon SNS Oregon (us-west-2) IDC VPN Splunk peering Private ALB
  • 22. IDC This is really what we taking care about AZb AE API servers RDS AZa AZb AZc AE API servers RDS services services services peering HTTPS EMR EMR Cross-account S3 buckets Auto Scaling group worker s worker sMulti-AZs Auto Scaling group Auto Scaling group Eureka Eureka VPN HTTPS/HTTP Basic Cloud Storagepeering isValidUser CS output HTTPS/HTTP Basic Amazon SNS Oregon (us-west-2) IDC VPN Splunk peering Private ALB
  • 23. What components in CI/CD scope • In scope – API, Worker, Eureka, Genie W/ auto-scaling group • EC2, deploy, configure and startup component services – AWS Elastic Application Load Balancer – AWS Simple Notification Service • NOT in scope – VPC/subnets/VPC peerings • We use fixed VPC and subnets for both VPN connections and VPC peerings – RDS MySQL DB • Already pre-created – EMR clusters • Create by user API calls via AWS Java SDK
  • 24. CI/CD Usecases 1. Developer edits/pushes codes to github 2. Developer deploys AE to Dev env. for tests 3. Developer terminates AE in Dev env. after tests 4. Developer deploys AE to Stg env. for integrated tests/UAT 5. Developer deploys AE to PROD env. 6. Developer patches hotfixes and deploys to PROD 7. Monitor your service components
  • 25. 1. Developer edits/pushes codes to github Developers master AE-100 Repo: spn/ae-saas Project: spn/ae-saas 1.19.0 3.build 4.utests 5.package 6.cp artifacts to S3 S3: dev-us-east-1 CF templates ae- 1.19.AE_100.jar s Chef recipes ae- 1.19.AE_100.jars 1. Push AE-100 branch 2. Trigger CI 7. cp to S3 8.publish artifacts to mvn repo. 9. Publish artifacts to mvn repo. Feature branch workflow https://www.atlassian.com/git/tutorials/comparing-workflows Every commit will trigger this build
  • 26. 2. Developer deploys AE to Dev env. for tests Developers Repo: spn/ae-saas Project: spn/ae-saas 4.Create CF S3: dev-us-east-1 CF templates ae- 1.19.AE_100.jars Chef recipes 1. Git tag: c-1.19.AE_100- dev-us-east-1-myAE 3. Trigger CI Feature branch workflow 2. Push tag Dev VPC AWS CF 5. CF creating for stack: ae-dev-myAE 5.1 Templates input 6. Provision resources 7. Deploy/config/s tartup service Ready for tests Env. variables in CImaster AE-100
  • 27. 3. Developer terminates AE in Dev env. after tests Developers Repo: spn/ae-saas Project: spn/ae-saas 4.delete CF 3. Trigger CI Feature branch workflow 2. Push tag Dev VPC AWS CF 5. CF deleting for stack: ae-dev-myAE 6. Terminating resources 1. Git tag: d-1.19.AE_100- dev-us-east-1-myAE master
  • 28. 8.1 Deploy/config/ startup service 4. Developer deploys AE to Stg env. for integrated tests/UAT (Much like UC#2) Developers Repo: spn/ae-saas Project: spn/ae-saas 7.Create CF S3: dev-us-east-1 CF templates ae-1.19.563.jars Chef recipes 2. Git tag: c-1.19.563-stg- us-east-1-myAE 4. Trigger CI Feature branch workflow 3. Push tag Dev VPC AWS CF 8. Provision resources for stack: ae-stg-myAE Ready for tests Env. variables in CImaster AE-100 1.19.563 1. Merge feature branch: 1.19.<buildNum> 5.cp artifacts to stg S3 ● ● ● 6.1 copying 6. cp artifacts from dev to stg 9.Run itests S3: stg-us-east-1 Run itests on service
  • 29. 5. Developer deploys AE to PROD env. (Much like UC#4) 29 Much like UC#4 Git tag: c-1.19.563-prod-us-west-2-myAE
  • 30. 6. Developer patches hotfixes and deploys to PROD (1/2) Developers Repo: spn/ae-saas Project: spn/ae-saas 6.Update CF S3: stg-us-east-1 CF templates ae-1.19.563.jars Chef recipes 1. Git tag: u-1.19.570- prod-us-west-2-myAE 3. Trigger CI Feature branch workflow 2. Push tag Dev VPC AWS CF 7. Update CF stack: ae- prod-myAE Ready to serve Env. variables in CImaster AE-105 1.19.570 4.cp artifacts to prod S3 ● ● ● 5.1 copying 5. cp artifacts from stg to prod S3: prod-us-west-2 8.1 Re- Deploy/config/ startup service
  • 31. 6. Developer patches hotfixes and deploys to PROD (2/2) • Updating W/O SLA impact – ALB W/ AutoScalingReplacingUpdate for UpdatePolicy Attribute configured • Better and flexible Auto-scaling – EC2 Auto-scaling group + Opsworks • Cross region deployment as early as possible – Minor configuration diffs • Deploy to us-east-1 successful does not assure on others… – AWS SDK default value is us-east-1 • You may forgot to set in your code… 31 http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-attribute-updatepolicy.html https://aws.amazon.com/tw/blogs/devops/auto-scaling-aws-opsworks-instances/ (Auto-healing really sucks)
  • 32. 7. Monitor your service components (1/2) These are the practices we learned from other teams in Trend • Visibility – Operator can get the timely system status every time every where – Practice: • CW metrics -> CW dashboard • CloudWatchLog -> AWS Lambda -> Log management system • Monitoring – Operator can setup a threshold at specific point for any metrics as a monitor – Therefore, the monitor can trigger corresponding actions to notify operator – Practice: • [App logs -> WC agent -> | custom] WC metrics -> WC Alarm • Auto-Recovery – System can auto recovers itself for every component runs failed – Practice: • EC2 auto-scaling group + Opsworks • WC metrics -> WC Alarm -> AWS Lambda -> AWS SDK -> AWS Opsworks|AWS EC2 32
  • 33. 7. Monitor your service components (2/2) A high level architecture design 33 App components Managed Services AWS CloudWatch Default metrics Custom metrics (CPU, mem, disk) CW metrics CW Dashboard CW Alarms Pager AWS SNS AWS Lambda AWS CloudWatchLog App logs to CWLog Metric filters AWS Lambda Input Store Process Output Log management Visibility Monitoring Visibility AWS Lambda Auto-recovery
  • 34. DevOps goals V.S. our original way V.S. CI/CD on AWS Goals Original way CI/CD Faster time to market • Too complicated to miss steps • Service team needs to follow up themselves • Lead time needed steps (Machine resources, etc) • One click delivery • Only one role “developer” • Minutes of lead time for resources Lower failure rate of new releases • Manual steps lead to errors • Fully automation Shorten lead time between fixes • Rolling upgrade • Invasive • Replacing/Rolling upgrade deployment • Non-invasive Faster mean time to recovery • Hard to deal with machine errors and peak • Elasticities brought from Cloud Computing platform https://en.wikipedia.org/wiki/DevOps#Goals
  • 35. Lessons learned • Try to automate everything as you can – Cloudformation + EC2 Auto-scaling group + Opsworks – AWS::CloudFormation::CustomResource is also a tool to rescue • Consider to split your service CF template – Service infra. (RDS, SNS, KMS key, etc) • You not update your infra. often – Service instance, (EC2, etc) • We update our service instances very often • Not only consider about first time creation – How to update your services W/O impact SLA • Monitor ! Monitor !! Monitor !!! • TEST ! TEST !! TEST !!! 35 http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-cfn-customresource.html
  • 36. 2
  • 37. 37
  • 39. Different types of Auto-scaling group 39 Service Auto Scaling Group Features Deploy OpsWorks 24/7 •manual creation/deletion •configure one instance for one AZ chef recipe time-based •can specify time slot(s) based on hour unit, on everyday or any day in week •configure one instance for one AZ chef recipe load-based •can specify CPU/MEM/workload avg. based on an OPS layer •UP: when to increase instances •Down: when to decrease instances •No max./min. # of instances setting •configure one instance for one AZ chef recipe EC2 •can set max./min. for # of instance •Multi-AZs support user-data
  • 40. Auto Recovery based on Monit • OpsWorks already use Monit for Auto Recovery – Leverage the Monit on EC2 – Have practices in on-premise 11/22/201 7 Confidential | Copyright 2014 TrendMicro Inc. 2 AZ1 AZ2 API server API server https://mmonit.com/monit/ Auto Scaling group • Instance check by CloudWatch • Process check by Monit • No process – restart process • Process health check failed – terminate EC2 • Terminate EC2 !Auto Scaling group launch new EC2
  • 41. Little variances among AWS regions • Impact – Same automation scripts can not run successfully among regions, even the same region sometimes • Issues 11/22/201 7 Confidential | Copyright 2014 TrendMicro Inc. 2 Service Regions Root cause OpsWorks Same region on us-west-2 S3 URL acceptable spec. had changed for property “Repository URL” From “https://s3.amazonaws.com” to “https://s3-us- west-2.amazonaws.com” OpsWorks us-west-2 V.S. us- east-1 Still be “Repository URL” issue. “https://s3-us-west- 2.amazonaws.com” V.S. “https://s3.amazonaws.com” EC2 us-west-2 V.S. us- east-1 EC2 FQDN spec. is different. “ip-10-104-33-152.us-west-2.compute.internal” V.S. “ip- 10-103-73-248.ec2.internal”
  • 42. OpsWorks V.S. image-based deployment • OpsWorks deployment – We are currently using – It takes too long to launch a service component • E.g. It takes about ~10 mins to launch a Genie node • Image-based deployment – Theoretically, it should takes very short time to launch a service component – More responsive for peak workloads – AMI (AWS Machine Images) V.S. Docker images ?
  • 43. How about API Gateway and ECS ? • API Gateway – Not good due to only Internet accessible – Cold start – RDB connection overflow – CORS integration for web UI • ECS – Still need to run standby EC2 instances for peak… – Only take care for RESTful API services – Kubernates more suitable for our usecases 43

Editor's Notes

  • #5: What’s our goal
  • #7: What’s our goal
  • #9: DevOps 其實不是一種新的技術或是新的產品。他是一種軟體開發的文化,尋求一種穩定、高品質的方式,快速把軟體交付到客戶的手中。這句話我覺得非常精準到位,把 DevOps 想要追求的目標給明確的定義出來。
  • #10: 現在我們來談談 DevOps 在軟體交付中扮演了什麼樣的角色。 剛剛我們談到了 Agile Development 敏捷開發,主要的核心思想圍繞在 Plan、Code、Build這三個階段。 如果我們可以把 Test 自動化,也就是當 RD 開發完成,Check-in 原始碼,透過 Integration Server 做 Unit Test,接著自動部署到 Test Environment 做更多的 Integration Test,這一段就是我們常聽到的 Continuous Integration 持續集成的概念。 如果我們可以把 Release 自動化,把通過前一個步驟的程式碼,自動部署到 Stage Environment,做自動化的 Acceptance Test 或者 Performance Test,這一段就是我們常聽到的 Continuous Delivery 持續交付的概念。 如果我們可以把 Deploy 自動化,把通過前一個步驟的程式碼,直接推到我們的 Production Environment,這一段就是我們常聽到的 Continuous Deployment 持續部署的概念。 你會發現一個很有趣的事情,這些不斷冒出來的名詞,其實都是前面概念的延伸,當你談到 DevOps 的時候,就是把自動化這件事情延伸的更長,把 Operate 和 Monitor 也盡可能的自動化了。
  • #13: What’s our goal
  • #20: What’s our goal
  • #39: What’s our goal
  • #44: Cross-Origin Resource Sharing