SlideShare a Scribd company logo
Netflix and Open Source

            March 2013
          Adrian Cockcroft
    @adrianco #netflixcloud @NetflixOSS
 http://www.linkedin.com/in/adriancockcroft
Cloud Native

NetflixOSS – Cloud Native On-Ramp

Netflix Open Source Cloud Prize
Netflix Member Web Site Home Page
    Personalization Driven – How Does It Work?
How Netflix Streaming Works
Consumer
Electronics                                           User Data
                                    Web Site or
AWS Cloud
                                   Discovery API
 Services
                                                    Personalization
CDN Edge
Locations
                                                         DRM
                 Customer Device
                                   Streaming API
                  (PC, PS3, TV…)
                                                     QoS Logging


                                                        CDN
                                                   Management and
                                                      Steering
                                   OpenConnect
                                    CDN Boxes
                                                   Content Encoding
Content Delivery Service
Open Source Hardware Design + FreeBSD, bird, nginx
November 2012 Traffic
Real Web Server Dependencies Flow
         (Netflix Home page business transaction as seen by AppDynamics)

Each icon is
three to a few
hundred
instances
across three                                        Cassandra
AWS zones
                                                                memcached
                                                            Web service
         Start Here
                                                                S3 bucket




Three Personalization movie group
choosers (for US, Canada and Latam)
Cloud Native Architecture

     Clients                      Things




 Autoscaled Micro      JVM         JVM         JVM
     Services


 Autoscaled Micro      JVM         JVM       Memcached
     Services


Distributed Quorum   Cassandra   Cassandra   Cassandra
 NoSQL Datastores

                     Zone A       Zone B       Zone C
Non-Native Cloud Architecture

Agile Mobile       iOS/Android
 Mammals

  Cloudy
                   App Servers
  Buffer

Datacenter
               MySQL       Legacy Apps
Dinosaurs
New Anti-Fragile Patterns

          Micro-services
          Chaos engines
Highly available systems composed
   from ephemeral components
Stateless Micro-Service Architecture

Linux Base AMI (CentOS or Ubuntu)
  Optional
   Apache
  frontend,
                Java (JDK 6 or 7)
memcached,
non-java apps
                AppDynamics


 Monitoring
                  appagent
                 monitoring
                                Tomcat
 Log rotation                   Application war file, base     Healthcheck, status
    to S3       GC and thread    servlet, platform, client   servlets, JMX interface,
AppDynamics     dump logging     interface jars, Astyanax        Servo autoscale
machineagent
  Epic/Atlas
Cassandra Instance Architecture

Linux Base AMI (CentOS or Ubuntu)

 Tomcat and
Priam on JDK   Java (JDK 7)
Healthcheck,
   Status
               AppDynamics
                 appagent
                monitoring
                               Cassandra Server
 Monitoring
AppDynamics                    Local Ephemeral Disk Space – 2TB of SSD or 1.6TB disk
               GC and thread             holding Commit log and SSTables
machineagent   dump logging
 Epic/Atlas
Configuration State Management

      Datacenter CMDB’s woeful
      Cloud native is the solution
         Dependably complete
Edda – Configuration History
 http://techblog.netflix.com/2012/11/edda-learn-stories-of-your-cloud.html


                    Eureka
                   Services
                   metadata
   AWS
                                     AppDynamics
Instances,
                                     Request flow
ASGs, etc.



                   Edda                  Monkeys
Edda Query Examples
Find any instances that have ever had a specific public IP address
$ curl "http://edda/api/v2/view/instances;publicIpAddress=1.2.3.4;_since=0"
 ["i-0123456789","i-012345678a","i-012345678b”]

Show the most recent change to a security group
$ curl "http://edda/api/v2/aws/securityGroups/sg-0123456789;_diff;_all;_limit=2"
--- /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351040779810
+++ /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351044093504
@@ -1,33 +1,33 @@
 {
…
      "ipRanges" : [
        "10.10.1.1/32",
        "10.10.1.2/32",
+        "10.10.1.3/32",
-       "10.10.1.4/32"
…
 }
Cloud Native

Master copies of data are cloud resident
 Everything is dynamically provisioned
      All services are ephemeral
Scalability Demands
Asgard
http://techblog.netflix.com/2012/06/asgard-web-based-cloud-management-and.html
Cloud Deployment Scalability
       New Autoscaled AMI – zero to 500 instances from 21:38:52 - 21:46:32, 7m40s
Scaled up and down over a few days, total 2176 instance launches, m2.2xlarge (4 core 34GB)

                        Min. 1st Qu. Median Mean 3rd Qu. Max.
                         41.0 104.2 149.0 171.8 215.8 562.0
Ephemeral Instances
  • Largest services are autoscaled
  • Average lifetime of an instance is 36 hours
                                                  P
                                                  u
                                                  s
                                                  h



Autoscale Up
                Autoscale Down
Leveraging Public Scale


           1,000 Instances   100,000 Instances

                        Grey
Public                                    Private
                        Area

Startups                     Netflix             Google
How big is Public?
    AWS Maximum Possible Instance Count 3.7 Million
      Growth >10x in Three Years, >2x Per Annum




AWS upper bound estimate based on the number of public IP Addresses
       Every provisioned instance gets a public IP by default
Availability

        Is it running yet?
How many places is it running in?
 How far apart are those places?
Antifragile API Patterns
Functional Reactive with Circuit Breakers and Bulkheads
Netflix and Open Source
Outages
• Running very fast with scissors
  – Mostly self inflicted – bugs, mistakes
  – Some caused by AWS bugs and mistakes


• Next step is multi-region
  – Investigating and building in stages during 2013
  – Could have prevented some of our 2012 outages
Managing Multi-Region Availability

                                         AWS                                                           DynECT
                                        Route53                        UltraDNS                         DNS


                Regional Load Balancers                                                           Regional Load Balancers




     Zone A                  Zone B                    Zone C                          Zone A                    Zone B               Zone C

Cassandra Replicas      Cassandra Replicas        Cassandra Replicas              Cassandra Replicas        Cassandra Replicas   Cassandra Replicas




        What we need is a portable way to manage multiple DNS providers….
Denominator
                     Software Defined DNS for Java

                                                    Edda, Multi-
    Use Cases                                         Region
                                                      Failover




 Common Model                                   Denominator




DNS Vendor Plug-in      AWS Route53        DynECT              UltraDNS     Etc…




API Models (varied      IAM Key Auth      User/pwd             User/pwd
and mostly broken)         REST             REST                 SOAP


    Currently being built by Adrian Cole (the jClouds guy, he works for Netflix now…)
A Cloud Native Open Source Platform
Inspiration
Three Questions


 Why is Netflix doing this?

How does it all fit together?

   What is coming next?
Beware of Geeks Bearing Gifts: Strategies for an
         Increasingly Open Economy
      Simon Wardley - Researcher at the Leading Edge Forum
How did Netflix get ahead?
Netflix Business + Developer Org   Traditional IT Operations
•   Doing it right now             • Taking their time
•   SaaS Applications              • Pilot private cloud projects
•   PaaS for agility               • Beta quality installations
•   Public IaaS for AWS features   • Small scale
•   Big data in the cloud          • Integrating several vendors
•   Integrating many APIs          • Paying big $ for software
•   FOSS from github               • Paying big $ for consulting
•   Renting hardware for 1hr       • Buying hardware for 3yrs
•   Coding in Java/Groovy/Scala    • Hacking at scripts
Netflix Platform Evolution


  2009-2010                2011-2012                      2013-2014


Bleeding Edge              Common                         Shared
  Innovation                Pattern                       Pattern

          Netflix ended up several years ahead of the
          industry, but it’s not a sustainable position
Making it easy to follow
Exploring the wild west each time   vs. laying down a shared route
Establish our            Hire, Retain and
  solutions as Best            Engage Top
Practices / Standards           Engineers


                     Goals


  Build up Netflix             Benefit from a
 Technology Brand            shared ecosystem
How does it all fit together?
NetflixOSS Continuous Build and Deployment

  Github           Maven            AWS
 NetflixOSS        Central        Base AMI
  Source




 Cloudbees
                  Dynaslave
   Jenkins                           AWS
                  AWS Build
 Aminator                         Baked AMIs
                   Slaves
   Bakery




    Odin           Asgard           AWS
 Orchestration    (+ Frigga)       Account
     API           Console
NetflixOSS Services Scope


AWS Account
Asgard Console


Archaius Config
                  Multiple AWS Regions
    Service


 Cross region
  Priam C*        Eureka Registry


  Explorers
 Dashboards
                   Exhibitor ZK
                                    3 AWS Zones
                                      Application
                                                             Priam              Evcache
     Atlas         Edda History        Clusters
                                                          Cassandra           Memcached
  Monitoring                        Autoscale Groups
                                                       Persistent Storage   Ephemeral Storage
                                       Instances
                   Simian Army
Genie Hadoop
  Services
NetflixOSS Instance Libraries

                 • Baked AMI – Tomcat, Apache, your code

Initialization   • Governator – Guice based dependency injection
                 • Archaius – dynamic configuration properties client
                 • Eureka - service registration client




  Service        • Karyon - Base Server for inbound requests
                 • RxJava – Reactive pattern
                 • Hystrix/Turbine – dependencies and real-time status
 Requests        • Ribbon - REST Client for outbound calls



                 • Astyanax – Cassandra client and pattern library

Data Access      • Evcache – Zone aware Memcached client
                 • Curator – Zookeeper patterns
                 • Denominator – DNS routing abstraction



                 • Blitz4j – non-blocking logging
  Logging        • Servo – metrics export for autoscaling
                 • Atlas – high volume instrumentation
NetflixOSS Testing and Automation

               • CassJmeter – Load testing for Cassandra
 Test Tools    • Circus Monkey – Test account reservation rebalancing




               • Janitor Monkey – Cleans up unused resources
               • Efficiency Monkey
Maintenance    • Doctor Monkey
               • Howler Monkey – Complains about expiring certs


               • Chaos Monkey – Kills Instances
               • Chaos Gorilla – Kills Availability Zones
Availability   • Chaos Kong – Kills Regions
               • Latency Monkey – Latency and error injection




               • Security Monkey
  Security     • Conformity Monkey
Example Application – RSS Reader
What’s Coming Next?

           Better portability

           Higher availability
 More
Features   Easier to deploy

           Contributions from end users

           Contributions from vendors

                     More Use Cases
Vendor Driven Portability
     Interest in using NetflixOSS for Enterprise Private Clouds

                                            “It’s done when it runs Asgard”
                                            Functionally complete
                                            Demonstrated March
                                            Release 3.3 in 2Q13


                                            Some vendor interest
Some vendor interest                        Many missing features
Needs AWS compatible Autoscaler             Bait and switch AWS API strategy
AWS 2009 vs. ???




 Eucalyptus 3.3
Netflix Cloud Prize

Boosting the @NetflixOSS Ecosystem
In 2012 Netflix Engineering won this..
We’d like to give out prizes too
            But what for?
     Contributions to NetflixOSS!
     Shared under Apache license
          Located on github
Netflix and Open Source
How long do you have?

   Entries open March 13th
 Entries close September 15th
         Six months…
Who can win?

   Almost anyone, anywhere…
Except current or former Netflix or
         AWS employees
Who decides who wins?

   Nominating Committee
      Panel of Judges
Judges

         Aino Corry
                                                                        Martin Fowler
Program Chair for Qcon/GOTO          Simon Wardley              Chief Scientist Thoughtworks
                                        Strategist




       Werner Vogels                                                Yury Izrailevsky
       CTO Amazon                       Joe Weinman                 VP Cloud Netflix
                              SVP Telx, Author “Cloudonomics”
What are Judges Looking For?
   Eligible, Apache 2.0 licensed
                              Original and useful contribution to NetflixOSS

       Code that successfully builds and passes a test suite

                 A large number of watchers, stars and forks on github

   NetflixOSS project pull requests
                                                  Good code quality and structure

               Documentation on how to build and run it

Evidence that code is in use by other projects, or is running in production
What do you win?
One winner in each of the 10 categories
  Ticket and expenses to attend AWS
      Re:Invent 2013 in Las Vegas
               A Trophy
How do you enter?
    Get a (free) github account
Fork github.com/netflix/cloud-prize
    Send us your email address
   Describe and build your entry

           Twitter #cloudprize
Award
                                           Apache
         Registration                                                  Close Entries          AWS       Ceremony
Github   Opens Today
                               Github     Licensed           Github   September 15                       Dinner
                                        Contributions                                       Re:Invent
                                                                                                        November




                                                  Judges                                          Winners
            $10K cash
            $5K AWS


                                                   Netflix
                                                                                       Nominations          Categories
          Ten Prize                           Engineering
         Categories
                          AWS
Trophy                  Re:Invent                                 Conforms to           Working             Community
                         Tickets                 Entrants            Rules               Code                Traction
Functionality and scale now, portability coming

   Moving from parts to a platform in 2013

       Netflix is fostering an ecosystem

      Rapid Evolution - Low MTBIAMSH
      (Mean Time Between Idea And Making Stuff Happen)
Takeaway

Netflix is making it easy for everyone to adopt Cloud Native patterns.

     Open Source is not just the default, it’s a strategic weapon.

                        http://netflix.github.com
                       http://techblog.netflix.com
                       http://slideshare.net/Netflix

                http://www.linkedin.com/in/adriancockcroft

                   @adrianco #netflixcloud @NetflixOSS

More Related Content

What's hot (20)

PPTX
Aws introduction
MouryaKumar Reddy Rajala
 
PPTX
AWS vs Azure | AWS vs Azure Comparison | Difference Between AWS And Azure | S...
Simplilearn
 
PPTX
Serverless data and analytics on AWS for operations
CloudHesive
 
PPTX
AWS SQS SNS
Durgesh Vaishnav
 
PDF
Serverless with Google Cloud Functions
Jerry Jalava
 
PPTX
Multi-Tenancy and Virtualization in Cloud Computing
Alexandru Iosup
 
PDF
Cloud DW technology trends and considerations for enterprises to apply snowflake
SANG WON PARK
 
PPTX
Aws overview
abhijeetrajpurohit29
 
PPTX
Microsoft azure
Charith Suriyakula
 
PPTX
cloud_foundation_on_vxrail_vcf_pnp_licensing_guide.pptx
VitNguyn252054
 
PDF
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
Amazon Web Services Korea
 
PPTX
Introduction to GCP presentation
Mohit Kachhwani
 
PPTX
Portainer
Anthony Lapenna
 
PDF
AWS or Azure or Google Cloud | Best Cloud Platform | Cloud Platform Comparison
Mariya James
 
PPTX
KEDA Overview
Jeff Hollan
 
PDF
Deployment Strategies Powerpoint Presentation Slides
SlideTeam
 
PPTX
AWS Landing Zone - Architecting Security and Governance.pptx
Akesh Patil
 
PDF
Autoscaling Kubernetes
craigbox
 
PDF
Introduction au Cloud Computing
Marc Rousselet
 
PDF
금융권을 위한 AWS Direct Connect 기반 하이브리드 구성 방법 - AWS Summit Seoul 2017
Amazon Web Services Korea
 
Aws introduction
MouryaKumar Reddy Rajala
 
AWS vs Azure | AWS vs Azure Comparison | Difference Between AWS And Azure | S...
Simplilearn
 
Serverless data and analytics on AWS for operations
CloudHesive
 
AWS SQS SNS
Durgesh Vaishnav
 
Serverless with Google Cloud Functions
Jerry Jalava
 
Multi-Tenancy and Virtualization in Cloud Computing
Alexandru Iosup
 
Cloud DW technology trends and considerations for enterprises to apply snowflake
SANG WON PARK
 
Aws overview
abhijeetrajpurohit29
 
Microsoft azure
Charith Suriyakula
 
cloud_foundation_on_vxrail_vcf_pnp_licensing_guide.pptx
VitNguyn252054
 
Amazon OpenSearch Deep dive - 내부구조, 성능최적화 그리고 스케일링
Amazon Web Services Korea
 
Introduction to GCP presentation
Mohit Kachhwani
 
Portainer
Anthony Lapenna
 
AWS or Azure or Google Cloud | Best Cloud Platform | Cloud Platform Comparison
Mariya James
 
KEDA Overview
Jeff Hollan
 
Deployment Strategies Powerpoint Presentation Slides
SlideTeam
 
AWS Landing Zone - Architecting Security and Governance.pptx
Akesh Patil
 
Autoscaling Kubernetes
craigbox
 
Introduction au Cloud Computing
Marc Rousselet
 
금융권을 위한 AWS Direct Connect 기반 하이브리드 구성 방법 - AWS Summit Seoul 2017
Amazon Web Services Korea
 

Viewers also liked (20)

PDF
Netflix CDN and Open Source
Gleb Smirnoff
 
PDF
Netflix Global Cloud Architecture
Adrian Cockcroft
 
PPTX
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Adrian Cockcroft
 
PPTX
NetflixOSS Meetup
Adrian Cockcroft
 
PPTX
Dystopia as a Service
Adrian Cockcroft
 
PDF
Netflix Architecture Tutorial at Gluecon
Adrian Cockcroft
 
PPTX
Gluecon keynote
Adrian Cockcroft
 
PPTX
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Adrian Cockcroft
 
PPTX
Bottleneck analysis - Devopsdays Silicon Valley 2013
Adrian Cockcroft
 
PPTX
Architectures for High Availability - QConSF
Adrian Cockcroft
 
PPTX
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
Adrian Cockcroft
 
PPTX
Cassandra Performance and Scalability on AWS
Adrian Cockcroft
 
PPTX
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Adrian Cockcroft
 
PPTX
AWS Re:Invent - High Availability Architecture at Netflix
Adrian Cockcroft
 
PPTX
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Adrian Cockcroft
 
PDF
SV Forum Platform Architecture SIG - Netflix Open Source Platform
Adrian Cockcroft
 
PPTX
MicroServices at Netflix - challenges of scale
Sudhir Tonse
 
PDF
Ejercicios del tema4
Luis Maria Morales Alonso
 
PPTX
Netflix Culture: Freedom & Responsibility 넷플릭스 문화: 자유와 책임 (한국어 번역)
Hong Nam Yang
 
PPTX
ITI-Presentation-netflix
Angela Chen
 
Netflix CDN and Open Source
Gleb Smirnoff
 
Netflix Global Cloud Architecture
Adrian Cockcroft
 
Yow Conference Dec 2013 Netflix Workshop Slides with Notes
Adrian Cockcroft
 
NetflixOSS Meetup
Adrian Cockcroft
 
Dystopia as a Service
Adrian Cockcroft
 
Netflix Architecture Tutorial at Gluecon
Adrian Cockcroft
 
Gluecon keynote
Adrian Cockcroft
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Adrian Cockcroft
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Adrian Cockcroft
 
Architectures for High Availability - QConSF
Adrian Cockcroft
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
Adrian Cockcroft
 
Cassandra Performance and Scalability on AWS
Adrian Cockcroft
 
Flowcon (added to for CMG) Keynote talk on how Speed Wins and how Netflix is ...
Adrian Cockcroft
 
AWS Re:Invent - High Availability Architecture at Netflix
Adrian Cockcroft
 
Gluecon 2013 - Netflix Cloud Native Tutorial Details (part 2)
Adrian Cockcroft
 
SV Forum Platform Architecture SIG - Netflix Open Source Platform
Adrian Cockcroft
 
MicroServices at Netflix - challenges of scale
Sudhir Tonse
 
Ejercicios del tema4
Luis Maria Morales Alonso
 
Netflix Culture: Freedom & Responsibility 넷플릭스 문화: 자유와 책임 (한국어 번역)
Hong Nam Yang
 
ITI-Presentation-netflix
Angela Chen
 
Ad

Similar to Netflix and Open Source (20)

PDF
게임을 위한 Cloud Native on AWS (김일호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
Amazon Web Services Korea
 
PDF
Microservices reativos usando a stack do Netflix na AWS
Diego Pacheco
 
PPTX
Web Scale Applications using NeflixOSS Cloud Platform
Sudhir Tonse
 
PDF
Netflix presents at MassTLC Cloud Summit 2013
MassTLC
 
PDF
Netflix on Cloud - combined slides for Dev and Ops
Adrian Cockcroft
 
PPTX
Svc 202-netflix-open-source
Ruslan Meshenberg
 
PDF
AWS re:Invent 2016 Day 1 Keynote re:Cap
Ian Massingham
 
PDF
AWS re:Invent 2016 Day 1 Keynote re:Cap
Adrian Hornsby
 
PDF
Usman Shakeel - Cloud Rendering at Scale :: AWS Rendering Seminar
Amazon Web Services Korea
 
PPTX
Cloudian_Cassandra Summit 2012
CLOUDIAN KK
 
PDF
AWS re:Invent 2016 recap (part 1)
Julien SIMON
 
PPT
Ram chinta hug-20120922-v1
Ram Chinta
 
PPT
Docker based Hadoop provisioning - Hadoop Summit 2014
Janos Matyas
 
PPTX
Dragonflow Austin Summit Talk
Eran Gampel
 
PDF
Aws-What You Need to Know_Simon Elisha
Helen Rogers
 
PPTX
Cloud computing with AWS
ikanow
 
PDF
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS
ETCenter
 
PPTX
Enabling Microservices Frameworks to Solve Business Problems
Ken Owens
 
PDF
Jeff barr Seattle_interactive_2011_q4
Seattle Interactive Conference
 
PDF
Building Applications with AWS
Amazon Web Services LATAM
 
게임을 위한 Cloud Native on AWS (김일호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
Amazon Web Services Korea
 
Microservices reativos usando a stack do Netflix na AWS
Diego Pacheco
 
Web Scale Applications using NeflixOSS Cloud Platform
Sudhir Tonse
 
Netflix presents at MassTLC Cloud Summit 2013
MassTLC
 
Netflix on Cloud - combined slides for Dev and Ops
Adrian Cockcroft
 
Svc 202-netflix-open-source
Ruslan Meshenberg
 
AWS re:Invent 2016 Day 1 Keynote re:Cap
Ian Massingham
 
AWS re:Invent 2016 Day 1 Keynote re:Cap
Adrian Hornsby
 
Usman Shakeel - Cloud Rendering at Scale :: AWS Rendering Seminar
Amazon Web Services Korea
 
Cloudian_Cassandra Summit 2012
CLOUDIAN KK
 
AWS re:Invent 2016 recap (part 1)
Julien SIMON
 
Ram chinta hug-20120922-v1
Ram Chinta
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Janos Matyas
 
Dragonflow Austin Summit Talk
Eran Gampel
 
Aws-What You Need to Know_Simon Elisha
Helen Rogers
 
Cloud computing with AWS
ikanow
 
4K Media Workflows on AWS By Usman Shakeel of Amzaon AWS
ETCenter
 
Enabling Microservices Frameworks to Solve Business Problems
Ken Owens
 
Jeff barr Seattle_interactive_2011_q4
Seattle Interactive Conference
 
Building Applications with AWS
Amazon Web Services LATAM
 
Ad

More from Adrian Cockcroft (14)

PDF
Netflix Global Applications - NoSQL Search Roadshow
Adrian Cockcroft
 
PDF
Netflix in the Cloud at SV Forum
Adrian Cockcroft
 
PDF
Cloud Architecture Tutorial - Why and What (1of 3)
Adrian Cockcroft
 
PDF
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Adrian Cockcroft
 
PDF
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Adrian Cockcroft
 
PDF
Global Netflix Platform
Adrian Cockcroft
 
PDF
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Adrian Cockcroft
 
PDF
Migrating Netflix from Datacenter Oracle to Global Cassandra
Adrian Cockcroft
 
PDF
Netflix Velocity Conference 2011
Adrian Cockcroft
 
PDF
Migrating to Public Cloud
Adrian Cockcroft
 
PDF
Performance architecture for cloud connect
Adrian Cockcroft
 
PDF
Netflix in the cloud 2011
Adrian Cockcroft
 
PDF
Cmg06 utilization is useless
Adrian Cockcroft
 
PDF
NoSQL for Netflix
Adrian Cockcroft
 
Netflix Global Applications - NoSQL Search Roadshow
Adrian Cockcroft
 
Netflix in the Cloud at SV Forum
Adrian Cockcroft
 
Cloud Architecture Tutorial - Why and What (1of 3)
Adrian Cockcroft
 
Cloud Architecture Tutorial - Platform Component Architecture (2of3)
Adrian Cockcroft
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Adrian Cockcroft
 
Global Netflix Platform
Adrian Cockcroft
 
Global Netflix - HPTS Workshop - Scaling Cassandra benchmark to over 1M write...
Adrian Cockcroft
 
Migrating Netflix from Datacenter Oracle to Global Cassandra
Adrian Cockcroft
 
Netflix Velocity Conference 2011
Adrian Cockcroft
 
Migrating to Public Cloud
Adrian Cockcroft
 
Performance architecture for cloud connect
Adrian Cockcroft
 
Netflix in the cloud 2011
Adrian Cockcroft
 
Cmg06 utilization is useless
Adrian Cockcroft
 
NoSQL for Netflix
Adrian Cockcroft
 

Recently uploaded (20)

PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PDF
Bitcoin+ Escalando sin concesiones - Parte 1
Fernando Paredes García
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PPTX
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
PDF
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
PDF
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
PDF
Market Wrap for 18th July 2025 by CIFDAQ
CIFDAQ
 
PPTX
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
PDF
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
PDF
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
HydITEx corporation Booklet 2025 English
Георгий Феодориди
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Upgrading to z_OS V2R4 Part 01 of 02.pdf
Flavio787771
 
PDF
Are there government-backed agri-software initiatives in Limerick.pdf
giselawagner2
 
PDF
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
Bitcoin+ Escalando sin concesiones - Parte 1
Fernando Paredes García
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Simplifying End-to-End Apache CloudStack Deployment with a Web-Based Automati...
ShapeBlue
 
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
Market Wrap for 18th July 2025 by CIFDAQ
CIFDAQ
 
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
Women in Automation Presents: Reinventing Yourself — Bold Career Pivots That ...
DianaGray10
 
Arcee AI - building and working with small language models (06/25)
Julien SIMON
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
HydITEx corporation Booklet 2025 English
Георгий Феодориди
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Upgrading to z_OS V2R4 Part 01 of 02.pdf
Flavio787771
 
Are there government-backed agri-software initiatives in Limerick.pdf
giselawagner2
 
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 

Netflix and Open Source

  • 1. Netflix and Open Source March 2013 Adrian Cockcroft @adrianco #netflixcloud @NetflixOSS http://www.linkedin.com/in/adriancockcroft
  • 2. Cloud Native NetflixOSS – Cloud Native On-Ramp Netflix Open Source Cloud Prize
  • 3. Netflix Member Web Site Home Page Personalization Driven – How Does It Work?
  • 4. How Netflix Streaming Works Consumer Electronics User Data Web Site or AWS Cloud Discovery API Services Personalization CDN Edge Locations DRM Customer Device Streaming API (PC, PS3, TV…) QoS Logging CDN Management and Steering OpenConnect CDN Boxes Content Encoding
  • 5. Content Delivery Service Open Source Hardware Design + FreeBSD, bird, nginx
  • 7. Real Web Server Dependencies Flow (Netflix Home page business transaction as seen by AppDynamics) Each icon is three to a few hundred instances across three Cassandra AWS zones memcached Web service Start Here S3 bucket Three Personalization movie group choosers (for US, Canada and Latam)
  • 8. Cloud Native Architecture Clients Things Autoscaled Micro JVM JVM JVM Services Autoscaled Micro JVM JVM Memcached Services Distributed Quorum Cassandra Cassandra Cassandra NoSQL Datastores Zone A Zone B Zone C
  • 9. Non-Native Cloud Architecture Agile Mobile iOS/Android Mammals Cloudy App Servers Buffer Datacenter MySQL Legacy Apps Dinosaurs
  • 10. New Anti-Fragile Patterns Micro-services Chaos engines Highly available systems composed from ephemeral components
  • 11. Stateless Micro-Service Architecture Linux Base AMI (CentOS or Ubuntu) Optional Apache frontend, Java (JDK 6 or 7) memcached, non-java apps AppDynamics Monitoring appagent monitoring Tomcat Log rotation Application war file, base Healthcheck, status to S3 GC and thread servlet, platform, client servlets, JMX interface, AppDynamics dump logging interface jars, Astyanax Servo autoscale machineagent Epic/Atlas
  • 12. Cassandra Instance Architecture Linux Base AMI (CentOS or Ubuntu) Tomcat and Priam on JDK Java (JDK 7) Healthcheck, Status AppDynamics appagent monitoring Cassandra Server Monitoring AppDynamics Local Ephemeral Disk Space – 2TB of SSD or 1.6TB disk GC and thread holding Commit log and SSTables machineagent dump logging Epic/Atlas
  • 13. Configuration State Management Datacenter CMDB’s woeful Cloud native is the solution Dependably complete
  • 14. Edda – Configuration History http://techblog.netflix.com/2012/11/edda-learn-stories-of-your-cloud.html Eureka Services metadata AWS AppDynamics Instances, Request flow ASGs, etc. Edda Monkeys
  • 15. Edda Query Examples Find any instances that have ever had a specific public IP address $ curl "http://edda/api/v2/view/instances;publicIpAddress=1.2.3.4;_since=0" ["i-0123456789","i-012345678a","i-012345678b”] Show the most recent change to a security group $ curl "http://edda/api/v2/aws/securityGroups/sg-0123456789;_diff;_all;_limit=2" --- /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351040779810 +++ /api/v2/aws.securityGroups/sg-0123456789;_pp;_at=1351044093504 @@ -1,33 +1,33 @@ { … "ipRanges" : [ "10.10.1.1/32", "10.10.1.2/32", + "10.10.1.3/32", - "10.10.1.4/32" … }
  • 16. Cloud Native Master copies of data are cloud resident Everything is dynamically provisioned All services are ephemeral
  • 19. Cloud Deployment Scalability New Autoscaled AMI – zero to 500 instances from 21:38:52 - 21:46:32, 7m40s Scaled up and down over a few days, total 2176 instance launches, m2.2xlarge (4 core 34GB) Min. 1st Qu. Median Mean 3rd Qu. Max. 41.0 104.2 149.0 171.8 215.8 562.0
  • 20. Ephemeral Instances • Largest services are autoscaled • Average lifetime of an instance is 36 hours P u s h Autoscale Up Autoscale Down
  • 21. Leveraging Public Scale 1,000 Instances 100,000 Instances Grey Public Private Area Startups Netflix Google
  • 22. How big is Public? AWS Maximum Possible Instance Count 3.7 Million Growth >10x in Three Years, >2x Per Annum AWS upper bound estimate based on the number of public IP Addresses Every provisioned instance gets a public IP by default
  • 23. Availability Is it running yet? How many places is it running in? How far apart are those places?
  • 24. Antifragile API Patterns Functional Reactive with Circuit Breakers and Bulkheads
  • 26. Outages • Running very fast with scissors – Mostly self inflicted – bugs, mistakes – Some caused by AWS bugs and mistakes • Next step is multi-region – Investigating and building in stages during 2013 – Could have prevented some of our 2012 outages
  • 27. Managing Multi-Region Availability AWS DynECT Route53 UltraDNS DNS Regional Load Balancers Regional Load Balancers Zone A Zone B Zone C Zone A Zone B Zone C Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas Cassandra Replicas What we need is a portable way to manage multiple DNS providers….
  • 28. Denominator Software Defined DNS for Java Edda, Multi- Use Cases Region Failover Common Model Denominator DNS Vendor Plug-in AWS Route53 DynECT UltraDNS Etc… API Models (varied IAM Key Auth User/pwd User/pwd and mostly broken) REST REST SOAP Currently being built by Adrian Cole (the jClouds guy, he works for Netflix now…)
  • 29. A Cloud Native Open Source Platform
  • 31. Three Questions Why is Netflix doing this? How does it all fit together? What is coming next?
  • 32. Beware of Geeks Bearing Gifts: Strategies for an Increasingly Open Economy Simon Wardley - Researcher at the Leading Edge Forum
  • 33. How did Netflix get ahead? Netflix Business + Developer Org Traditional IT Operations • Doing it right now • Taking their time • SaaS Applications • Pilot private cloud projects • PaaS for agility • Beta quality installations • Public IaaS for AWS features • Small scale • Big data in the cloud • Integrating several vendors • Integrating many APIs • Paying big $ for software • FOSS from github • Paying big $ for consulting • Renting hardware for 1hr • Buying hardware for 3yrs • Coding in Java/Groovy/Scala • Hacking at scripts
  • 34. Netflix Platform Evolution 2009-2010 2011-2012 2013-2014 Bleeding Edge Common Shared Innovation Pattern Pattern Netflix ended up several years ahead of the industry, but it’s not a sustainable position
  • 35. Making it easy to follow Exploring the wild west each time vs. laying down a shared route
  • 36. Establish our Hire, Retain and solutions as Best Engage Top Practices / Standards Engineers Goals Build up Netflix Benefit from a Technology Brand shared ecosystem
  • 37. How does it all fit together?
  • 38. NetflixOSS Continuous Build and Deployment Github Maven AWS NetflixOSS Central Base AMI Source Cloudbees Dynaslave Jenkins AWS AWS Build Aminator Baked AMIs Slaves Bakery Odin Asgard AWS Orchestration (+ Frigga) Account API Console
  • 39. NetflixOSS Services Scope AWS Account Asgard Console Archaius Config Multiple AWS Regions Service Cross region Priam C* Eureka Registry Explorers Dashboards Exhibitor ZK 3 AWS Zones Application Priam Evcache Atlas Edda History Clusters Cassandra Memcached Monitoring Autoscale Groups Persistent Storage Ephemeral Storage Instances Simian Army Genie Hadoop Services
  • 40. NetflixOSS Instance Libraries • Baked AMI – Tomcat, Apache, your code Initialization • Governator – Guice based dependency injection • Archaius – dynamic configuration properties client • Eureka - service registration client Service • Karyon - Base Server for inbound requests • RxJava – Reactive pattern • Hystrix/Turbine – dependencies and real-time status Requests • Ribbon - REST Client for outbound calls • Astyanax – Cassandra client and pattern library Data Access • Evcache – Zone aware Memcached client • Curator – Zookeeper patterns • Denominator – DNS routing abstraction • Blitz4j – non-blocking logging Logging • Servo – metrics export for autoscaling • Atlas – high volume instrumentation
  • 41. NetflixOSS Testing and Automation • CassJmeter – Load testing for Cassandra Test Tools • Circus Monkey – Test account reservation rebalancing • Janitor Monkey – Cleans up unused resources • Efficiency Monkey Maintenance • Doctor Monkey • Howler Monkey – Complains about expiring certs • Chaos Monkey – Kills Instances • Chaos Gorilla – Kills Availability Zones Availability • Chaos Kong – Kills Regions • Latency Monkey – Latency and error injection • Security Monkey Security • Conformity Monkey
  • 43. What’s Coming Next? Better portability Higher availability More Features Easier to deploy Contributions from end users Contributions from vendors More Use Cases
  • 44. Vendor Driven Portability Interest in using NetflixOSS for Enterprise Private Clouds “It’s done when it runs Asgard” Functionally complete Demonstrated March Release 3.3 in 2Q13 Some vendor interest Some vendor interest Many missing features Needs AWS compatible Autoscaler Bait and switch AWS API strategy
  • 45. AWS 2009 vs. ??? Eucalyptus 3.3
  • 46. Netflix Cloud Prize Boosting the @NetflixOSS Ecosystem
  • 47. In 2012 Netflix Engineering won this..
  • 48. We’d like to give out prizes too But what for? Contributions to NetflixOSS! Shared under Apache license Located on github
  • 50. How long do you have? Entries open March 13th Entries close September 15th Six months…
  • 51. Who can win? Almost anyone, anywhere… Except current or former Netflix or AWS employees
  • 52. Who decides who wins? Nominating Committee Panel of Judges
  • 53. Judges Aino Corry Martin Fowler Program Chair for Qcon/GOTO Simon Wardley Chief Scientist Thoughtworks Strategist Werner Vogels Yury Izrailevsky CTO Amazon Joe Weinman VP Cloud Netflix SVP Telx, Author “Cloudonomics”
  • 54. What are Judges Looking For? Eligible, Apache 2.0 licensed Original and useful contribution to NetflixOSS Code that successfully builds and passes a test suite A large number of watchers, stars and forks on github NetflixOSS project pull requests Good code quality and structure Documentation on how to build and run it Evidence that code is in use by other projects, or is running in production
  • 55. What do you win? One winner in each of the 10 categories Ticket and expenses to attend AWS Re:Invent 2013 in Las Vegas A Trophy
  • 56. How do you enter? Get a (free) github account Fork github.com/netflix/cloud-prize Send us your email address Describe and build your entry Twitter #cloudprize
  • 57. Award Apache Registration Close Entries AWS Ceremony Github Opens Today Github Licensed Github September 15 Dinner Contributions Re:Invent November Judges Winners $10K cash $5K AWS Netflix Nominations Categories Ten Prize Engineering Categories AWS Trophy Re:Invent Conforms to Working Community Tickets Entrants Rules Code Traction
  • 58. Functionality and scale now, portability coming Moving from parts to a platform in 2013 Netflix is fostering an ecosystem Rapid Evolution - Low MTBIAMSH (Mean Time Between Idea And Making Stuff Happen)
  • 59. Takeaway Netflix is making it easy for everyone to adopt Cloud Native patterns. Open Source is not just the default, it’s a strategic weapon. http://netflix.github.com http://techblog.netflix.com http://slideshare.net/Netflix http://www.linkedin.com/in/adriancockcroft @adrianco #netflixcloud @NetflixOSS

Editor's Notes

  • #35: When Netflix first moved to cloud it was bleeding edge innovation, we figured stuff out and made stuff up from first principles. Over the last two years more large companies have moved to cloud, and the principles, practices and patterns have become better understood and adopted. At this point there is intense interest in how Netflix runs in the cloud, and several forward looking organizations adopting our architectures and starting to use some of the code we have shared. Over the coming years, we want to make it easier for people to share the patterns we use.
  • #36: The railroad made it possible for California to be developed quickly, by creating an easy to follow path we can create a much bigger ecosystem around the Netflix platform