SlideShare a Scribd company logo
John Cline
Engineering Lead, Growth
November 7, 2017
Experimentation @ Blue Apron
2
Who am I?
►Engineering Lead for Growth Team
►Growth owns:
−Marketing/landing pages
−Registration/reactivation
−Referral program
−Experimentation, tracking, and email
integrations
►Started at Blue Apron in August 2016
I’m online at @clinejj
3
Overview
►The Early Days
►Unique Challenges
►Let’s Talk Solutions
►Getting Testy
►Next Steps
3
4
The Early Days
5
The Early Days
Prior to ~Aug 2016, experimentation was
done only with Optimizely Web.
►Pros
−Easy for non-technical users to create
and launch tests
−Worked well with our SPA
►Cons
−Only worked with client side changes
−Could not populate events in Optimizely
results view with backend events
This worked pretty well, but...
5
6
Challenges
7
Unique
Challenges
Blue Apron is a unique business.
We’re the first meal-kit company to launch in the US, and a few things make
our form of e-commerce a bit more challenging than other companies selling
products online:
► We currently offer recurring meal plans (our customers get 2-4 recipes
every week unless they skip/cancel)
► We have seasonal recipes and a freshness guarantee
► We (currently) have a six day cutoff for changing your order
► We have multiple fulfillment and delivery methods
► We support customers across web, iOS, and Android
It’s possible for a customer to sign up for a plan and then never have a need
to log in to our digital product or contact support again.
That makes it hard to rely on a client-side only testing setup.
8
Unique Challenges
Scheduled backend jobs power many critical parts of our business.
A client side solution wouldn’t give us flexibility to test this business logic.
9
Unique
Challenges
Because of our unique business model, our KPIs
for most tests require long term evaluation.
Besides conversion and engagement, we also look at:
►Cohorted LTV by registration week (including accounting for acquisition
costs)
►Order rate: What % of weeks has a customer ordered?
►Performance relative to various user segments
−Referral vs non-referral
−Two person plan vs four person plan
−Zip code
Tracking these KPIs required someone on our analytics team to run the
analysis (anywhere from 2-4 weeks), creating a bottleneck to see test results.
10
The Solution
11
Enter Optimizely
Full Stack
Around this time, Optimizely Full Stack
was released and targeted precisely at
our use case.
We looked into open source frameworks
(eg Sixpack, Wasabi) but needed
something with less of a maintenance
cost. Our team also already knew how to
use the product.
We looked at feature flag frameworks
(like Flipper), but needed something for
the experimentation use case (vs a
feature flag).
Our main application is a Ruby/Rails app,
so we wrote a thin singleton wrapper for
the Full Stack Ruby gem, which helped
us support different environments and
handle errors.
11
12
Integrating with
Full Stack
We already had some pieces in place that
made integration easier:
►An internal concept of an experiment in
our data model
−A site test has many variations which
each have many users
►API for clients to log variation status to
our internal system
►Including test variation information in our
eventing frameworks (GA and Amplitude)
These helped ensure we had a good data
pipeline to tag users and events for further
analysis when required.
13
Integrating with
Full Stack
The Optimizely results dashboard made it
easy to get early directional decisions on
whether to stop/ramp a test, while our
wrapper gave us the information needed for
a deeper analysis.
We wrote a wrapper service around the
Optimizely client to integrate with our
existing site test data model to log bucketing
results for analytics purposes.
We added an asynchronous event reporter
for reporting events to Optimizely (runs in our
background job processor).
Currently, the Optimizely datafile is only
downloaded on application startup.
14
Integrating with Full Stack
15
Getting Testy
16
Testing with Full Stack
Creating a test in our application is fairly straightforward:
1. Run a migration to create the test/variations in our data model
class SiteTestVariationsForMyTest < ActiveRecord::Migration
def self.up
site_test = SiteTest.create!(
name: 'My Test',
experiment_id: 'my-test',
is_active: false
)
site_test.site_test_variations.create!(
variation_name: 'Control',
variation_id: 'my-test-control'
)
site_test.site_test_variations.create!(
variation_name: 'Variation',
variation_id: 'my-test-variation'
)
end
def self.down
raise ActiveRecord::IrreversibleMigration
end
end
17
Testing with Full Stack
2. Create a testing service to wrap bucketing logic
module SiteTests
class MyTestingService
include Experimentable
def initialize(user)
@user = user
end
def run_experiment
return unless user_valid?
return if bucket.blank?
# Take actions on the user
end
private
def user_valid?
# does user meet criteria for test (could also be handled with audiences)
end
def bucket
@variation_id ||= testing_service.bucket_user(@user)
end
end
end
18
Testing with Full Stack
3. Bucket users
SiteTests::MyTestingService.new(user).run_experiment
3. Read variation status
@user&.active_site_test_variation_ids.to_a
We generally bucket users at account creation or through an API call to our
configurations API (returns feature status/configurations for a user).
19
Testing with
Full Stack
Some tests that we’ve run since
integrating:
►New post-registration onboarding flow
►Second box reminder email
►More recipes/plan options
►New delivery schedule
►New reactivation experience
These helped ensure we had a good data
pipeline to tag users and events for
further analysis when required.
19
20
Testing with Full Stack
More recipes/plan options
Control Test
21
Testing with Full Stack
Control Test
More recipes/plan options
22
Testing with Full Stack
Results from more recipes/plan options test:
23
New Reactivation Flow
Testing with Full Stack
24
Results from new reactivation flow test:
Testing with Full Stack
25
Next Steps
26
Feature Flagging vs Experimentation
There is a lot of overlap between each, but they both have different user groups.
►Feature flagging
−Primary user is engineering/product
−May be used for a variety of reasons (enabling a new service, fallback behavior, or a user feature)
►Experimentation
−Primary user is product/analytics
−Care about being able to track through other metrics/events
−Generally focused on customer impacting features
As a developer, I don’t care if a feature is enabled via a flag or a test. I only care about knowing how to enable/disable
something.
As a PM or Analyst, I likely care more about experiments than feature flags (although I’d want to audit both).
27
Feature Flagging vs
Experimentation
We use two separate tools for
feature flagging:
Open Source
Optimizely
(GitHub Platform Team)
Full Stack
GOAL: Create a single source of truth
and with an easier to use dashboard for
setting up features.
27
28
Feature Flagging vs Experimentation
Rough plan:
► Expose “feature configuration” to clients through API (both internal code structure and our REST API)
−List of enabled features and any configuration parameters
►Consolidate features to be enabled if flipper || optimizely
►Add administration panel to create features/configurations and test or roll them out
►Support better cross-platform testing
−App version targeting
−User segmentation
−“Global holdback”
−Mutually exclusive tests
Why do we still use flipper? Local, and changes occur instantly. Better for arbitrary % rollouts (vs the more
heavyweight enablement through Optimizely).
29
Feature
Management
Optimizely Full Stack just launched a new
feature management system.
It supports:
►Defining a feature configuration
−A feature is a set of
boolean/double/integer/string
parameters
−Can modify parameters by variation
(or rollout)
►Enabling a feature in an experiment
variation or rolling out to %/audience
We’re still testing it, but looks promising
(and being able to update variables
without rolling code is incredibly helpful).
29
30
Tech Debt
We are still developing general guidelines for
engineers on how to set up tests, particularly
around which platform to implement and
how to implement (we use Optimizely Web,
Full Stack, iOS, Android, and have Flipper for
server side gating).
As we do more testing, we enable more
features, which makes our code more
complex.
On a quarterly-ish basis, we go through and
clean up unused tests (or do so when
launching).
You should definitely have a philosophy
on feature flags and tech debt cleanup.
31
Things to
Think About
Optimizely specifically:
►Environments
−Currently have each environment
(dev/staging/production) as separate
Optimizely projects - makes it
difficult to copy tests between each
environment
►Cross Platform Testing
−If serving multiple platforms, need
server managed solution (even if just
driving client-only changes) to ensure
consistent experience
31
32
Things to
Think About
At the end of the day, who are the users
of your feature/experimentation platform?
►Testing gives you insights into user
behavior - what are you going to do
with that?
►How do you measure your KPIs?
►How do you make decisions?
►What’s the developer experience like?
32
33
Questions?
http://blueapron.io

More Related Content

What's hot (20)

PDF
[Webinar] Innovate Faster by Adopting The Modern Growth Stack
Optimizely
 
PPTX
Opticon 2017 Experimenting in Modern Web Applications
Optimizely
 
PDF
Getting Started with Server-Side Testing
Optimizely
 
PPTX
Test Everything: TrustRadius Delivers Customer Value with Experimentation
Optimizely
 
PDF
Creating an Effective A/B Testing Strategy for App Stores
SplitMetrics
 
PPTX
Opticon 2017 Day in the Life of a Modern Experimenter
Optimizely
 
PPTX
How The Zebra Utilized Feature Experiments To Increase Carrier Card Engagemen...
Optimizely
 
PDF
Ahead of the Curve: How 23andMe Improved UX with Performance Edge
Optimizely
 
PPTX
Improve your content: The What, Why, Where and How about A/B Testing
introtodigital
 
PDF
The Future of Optimizely for Technical Teams
Optimizely
 
PDF
Apply A/B Testing with NGINX Routing Policy
Supachai Jaturaprom
 
PDF
The Optimizely Experience Keynote by Matt Althauser - Optimizely Experience L...
Optimizely
 
PDF
Optimizely Agent: Scaling Resilient Feature Delivery
Optimizely
 
PDF
Optimizely Experience Customer Story - Atlassian
Optimizely
 
PDF
4 Steps Toward Scientific A/B Testing
Janessa Lantz
 
PPTX
SAMPLE SIZE – The indispensable A/B test calculation that you’re not making
Zack Notes
 
PDF
Optimizely Partner Ecosystem
Optimizely
 
PPTX
How to get Automated Testing "Done"
TEST Huddle
 
PPTX
Streamlining Automation Scripts and Test Data Management
QASymphony
 
PDF
Building A Testing Culture At Autodesk
Mari Ju
 
[Webinar] Innovate Faster by Adopting The Modern Growth Stack
Optimizely
 
Opticon 2017 Experimenting in Modern Web Applications
Optimizely
 
Getting Started with Server-Side Testing
Optimizely
 
Test Everything: TrustRadius Delivers Customer Value with Experimentation
Optimizely
 
Creating an Effective A/B Testing Strategy for App Stores
SplitMetrics
 
Opticon 2017 Day in the Life of a Modern Experimenter
Optimizely
 
How The Zebra Utilized Feature Experiments To Increase Carrier Card Engagemen...
Optimizely
 
Ahead of the Curve: How 23andMe Improved UX with Performance Edge
Optimizely
 
Improve your content: The What, Why, Where and How about A/B Testing
introtodigital
 
The Future of Optimizely for Technical Teams
Optimizely
 
Apply A/B Testing with NGINX Routing Policy
Supachai Jaturaprom
 
The Optimizely Experience Keynote by Matt Althauser - Optimizely Experience L...
Optimizely
 
Optimizely Agent: Scaling Resilient Feature Delivery
Optimizely
 
Optimizely Experience Customer Story - Atlassian
Optimizely
 
4 Steps Toward Scientific A/B Testing
Janessa Lantz
 
SAMPLE SIZE – The indispensable A/B test calculation that you’re not making
Zack Notes
 
Optimizely Partner Ecosystem
Optimizely
 
How to get Automated Testing "Done"
TEST Huddle
 
Streamlining Automation Scripts and Test Data Management
QASymphony
 
Building A Testing Culture At Autodesk
Mari Ju
 

Similar to Optimizely NYC Developer Meetup - Experimentation at Blue Apron (20)

PDF
Experimentation at Blue Apron (webinar)
Optimizely
 
PPTX
Optimizely Product Vision: The Future of Experimentation
Optimizely
 
PDF
Failure is an Option: Scaling Resilient Feature Delivery
Optimizely
 
PDF
[Webinar] Getting started with server-side testing - presented by WiderFunnel...
Chris Goward
 
PPTX
Hypothesis-Driven Development & How to Fail-Fast Hacking Growth
Prabhat Gupta
 
PPTX
Full Stack Experimentation
Optimizely
 
PDF
Optimizely's Vision for Product Development Teams
Optimizely
 
PPTX
Taking Your Product Development to the Next Level with Full Stack
Optimizely
 
PPTX
Patrick McKenzie Opticon 2014: Advanced A/B Testing
Patrick McKenzie
 
PDF
[Webinar] Introducing Feature Management
Optimizely
 
PDF
Talks@Coursera - A/B Testing @ Internet Scale
courseratalks
 
PDF
Web Services Testing Best Practices: Secure, Reliable, and Scalable APIs
Shubham Joshi
 
PPTX
Scale your Experimentation with Full Stack Best Practices
Optimizely
 
PDF
Load testing, Lessons learnt and Loadzen - Martin Buhr at DevTank - 31st Janu...
Loadzen
 
PDF
LEAN UX: Solving Problems Instead of Producing Paperwork
Agnieszka Maria Walorska
 
PPTX
Implementing Test Automation in Agile Projects
Michael Palotas
 
PDF
Software Quality and Test Strategies for Ruby and Rails Applications
Bhavin Javia
 
PDF
Comprehensive Performance Testing: From Early Dev to Live Production
TechWell
 
PDF
Testing Without Waste - Automatic Testing
Futurice
 
PDF
Webinar: This Launch Will Self-Destruct in 3...2...1
Optimizely
 
Experimentation at Blue Apron (webinar)
Optimizely
 
Optimizely Product Vision: The Future of Experimentation
Optimizely
 
Failure is an Option: Scaling Resilient Feature Delivery
Optimizely
 
[Webinar] Getting started with server-side testing - presented by WiderFunnel...
Chris Goward
 
Hypothesis-Driven Development & How to Fail-Fast Hacking Growth
Prabhat Gupta
 
Full Stack Experimentation
Optimizely
 
Optimizely's Vision for Product Development Teams
Optimizely
 
Taking Your Product Development to the Next Level with Full Stack
Optimizely
 
Patrick McKenzie Opticon 2014: Advanced A/B Testing
Patrick McKenzie
 
[Webinar] Introducing Feature Management
Optimizely
 
Talks@Coursera - A/B Testing @ Internet Scale
courseratalks
 
Web Services Testing Best Practices: Secure, Reliable, and Scalable APIs
Shubham Joshi
 
Scale your Experimentation with Full Stack Best Practices
Optimizely
 
Load testing, Lessons learnt and Loadzen - Martin Buhr at DevTank - 31st Janu...
Loadzen
 
LEAN UX: Solving Problems Instead of Producing Paperwork
Agnieszka Maria Walorska
 
Implementing Test Automation in Agile Projects
Michael Palotas
 
Software Quality and Test Strategies for Ruby and Rails Applications
Bhavin Javia
 
Comprehensive Performance Testing: From Early Dev to Live Production
TechWell
 
Testing Without Waste - Automatic Testing
Futurice
 
Webinar: This Launch Will Self-Destruct in 3...2...1
Optimizely
 
Ad

More from Optimizely (20)

PDF
Clover Rings Up Digital Growth to Drive Experimentation
Optimizely
 
PPTX
Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...
Optimizely
 
PPTX
The Science of Getting Testing Right
Optimizely
 
PPTX
Autotrader Case Study: Migrating from Home-Grown Testing to Best-in-Class Too...
Optimizely
 
PPTX
Zillow + Optimizely: Building the Bridge to $20 Billion Revenue
Optimizely
 
PPTX
Empowering Agents to Provide Service from Anywhere: Contact Centers in the Ti...
Optimizely
 
PPTX
Experimentation Everywhere: Create Exceptional Online Shopping Experiences an...
Optimizely
 
PDF
Building an Experiment Pipeline for GitHub’s New Free Team Offering
Optimizely
 
PDF
Evolving Experimentation from CRO to Product Development
Optimizely
 
PDF
Overcoming the Challenges of Experimentation on a Service Oriented Architecture
Optimizely
 
PPTX
Making Your Hypothesis Work Harder to Inform Future Product Strategy
Optimizely
 
PPTX
Kick Your Assumptions: How Scholl's Test-Everything Culture Drives Revenue
Optimizely
 
PPTX
Experimentation through Clients' Eyes
Optimizely
 
PPTX
Shipping to Learn and Accelerate Growth with GitHub
Optimizely
 
PDF
The Future of Software Development
Optimizely
 
PPTX
Practical Use Case: How Dosh Uses Feature Experiments To Accelerate Mobile De...
Optimizely
 
PDF
Run High Impact Experimentation with High-quality Customer Discovery
Optimizely
 
PDF
Using Empathy to Build Custom Solutions at Scale
Optimizely
 
PPTX
How to find data insights that will drive a 10X impact
Optimizely
 
PPTX
Targeted Rollouts: How to Release Features to Multiple Audiences
Optimizely
 
Clover Rings Up Digital Growth to Drive Experimentation
Optimizely
 
Make Every Touchpoint Count: How to Drive Revenue in an Increasingly Online W...
Optimizely
 
The Science of Getting Testing Right
Optimizely
 
Autotrader Case Study: Migrating from Home-Grown Testing to Best-in-Class Too...
Optimizely
 
Zillow + Optimizely: Building the Bridge to $20 Billion Revenue
Optimizely
 
Empowering Agents to Provide Service from Anywhere: Contact Centers in the Ti...
Optimizely
 
Experimentation Everywhere: Create Exceptional Online Shopping Experiences an...
Optimizely
 
Building an Experiment Pipeline for GitHub’s New Free Team Offering
Optimizely
 
Evolving Experimentation from CRO to Product Development
Optimizely
 
Overcoming the Challenges of Experimentation on a Service Oriented Architecture
Optimizely
 
Making Your Hypothesis Work Harder to Inform Future Product Strategy
Optimizely
 
Kick Your Assumptions: How Scholl's Test-Everything Culture Drives Revenue
Optimizely
 
Experimentation through Clients' Eyes
Optimizely
 
Shipping to Learn and Accelerate Growth with GitHub
Optimizely
 
The Future of Software Development
Optimizely
 
Practical Use Case: How Dosh Uses Feature Experiments To Accelerate Mobile De...
Optimizely
 
Run High Impact Experimentation with High-quality Customer Discovery
Optimizely
 
Using Empathy to Build Custom Solutions at Scale
Optimizely
 
How to find data insights that will drive a 10X impact
Optimizely
 
Targeted Rollouts: How to Release Features to Multiple Audiences
Optimizely
 
Ad

Recently uploaded (20)

PPTX
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PPTX
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PPTX
Securing Model Context Protocol with Keycloak: AuthN/AuthZ for MCP Servers
Hitachi, Ltd. OSS Solution Center.
 
PDF
Software Development Company Keene Systems, Inc (1).pdf
Custom Software Development Company | Keene Systems, Inc.
 
PDF
NASA A Researcher’s Guide to International Space Station : Fundamental Physics
Dr. PANKAJ DHUSSA
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Evolution: How True AI is Redefining Safety in Industry 4.0
vikaassingh4433
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
NASA A Researcher’s Guide to International Space Station : Earth Observations
Dr. PANKAJ DHUSSA
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
Survival Models: Proper Scoring Rule and Stochastic Optimization with Competi...
Paris Women in Machine Learning and Data Science
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Dev Dives: Accelerating agentic automation with Autopilot for Everyone
UiPathCommunity
 
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
Securing Model Context Protocol with Keycloak: AuthN/AuthZ for MCP Servers
Hitachi, Ltd. OSS Solution Center.
 
Software Development Company Keene Systems, Inc (1).pdf
Custom Software Development Company | Keene Systems, Inc.
 
NASA A Researcher’s Guide to International Space Station : Fundamental Physics
Dr. PANKAJ DHUSSA
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Evolution: How True AI is Redefining Safety in Industry 4.0
vikaassingh4433
 
Digital Circuits, important subject in CS
contactparinay1
 
NASA A Researcher’s Guide to International Space Station : Earth Observations
Dr. PANKAJ DHUSSA
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
Survival Models: Proper Scoring Rule and Stochastic Optimization with Competi...
Paris Women in Machine Learning and Data Science
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Dev Dives: Accelerating agentic automation with Autopilot for Everyone
UiPathCommunity
 

Optimizely NYC Developer Meetup - Experimentation at Blue Apron

  • 1. John Cline Engineering Lead, Growth November 7, 2017 Experimentation @ Blue Apron
  • 2. 2 Who am I? ►Engineering Lead for Growth Team ►Growth owns: −Marketing/landing pages −Registration/reactivation −Referral program −Experimentation, tracking, and email integrations ►Started at Blue Apron in August 2016 I’m online at @clinejj
  • 3. 3 Overview ►The Early Days ►Unique Challenges ►Let’s Talk Solutions ►Getting Testy ►Next Steps 3
  • 5. 5 The Early Days Prior to ~Aug 2016, experimentation was done only with Optimizely Web. ►Pros −Easy for non-technical users to create and launch tests −Worked well with our SPA ►Cons −Only worked with client side changes −Could not populate events in Optimizely results view with backend events This worked pretty well, but... 5
  • 7. 7 Unique Challenges Blue Apron is a unique business. We’re the first meal-kit company to launch in the US, and a few things make our form of e-commerce a bit more challenging than other companies selling products online: ► We currently offer recurring meal plans (our customers get 2-4 recipes every week unless they skip/cancel) ► We have seasonal recipes and a freshness guarantee ► We (currently) have a six day cutoff for changing your order ► We have multiple fulfillment and delivery methods ► We support customers across web, iOS, and Android It’s possible for a customer to sign up for a plan and then never have a need to log in to our digital product or contact support again. That makes it hard to rely on a client-side only testing setup.
  • 8. 8 Unique Challenges Scheduled backend jobs power many critical parts of our business. A client side solution wouldn’t give us flexibility to test this business logic.
  • 9. 9 Unique Challenges Because of our unique business model, our KPIs for most tests require long term evaluation. Besides conversion and engagement, we also look at: ►Cohorted LTV by registration week (including accounting for acquisition costs) ►Order rate: What % of weeks has a customer ordered? ►Performance relative to various user segments −Referral vs non-referral −Two person plan vs four person plan −Zip code Tracking these KPIs required someone on our analytics team to run the analysis (anywhere from 2-4 weeks), creating a bottleneck to see test results.
  • 11. 11 Enter Optimizely Full Stack Around this time, Optimizely Full Stack was released and targeted precisely at our use case. We looked into open source frameworks (eg Sixpack, Wasabi) but needed something with less of a maintenance cost. Our team also already knew how to use the product. We looked at feature flag frameworks (like Flipper), but needed something for the experimentation use case (vs a feature flag). Our main application is a Ruby/Rails app, so we wrote a thin singleton wrapper for the Full Stack Ruby gem, which helped us support different environments and handle errors. 11
  • 12. 12 Integrating with Full Stack We already had some pieces in place that made integration easier: ►An internal concept of an experiment in our data model −A site test has many variations which each have many users ►API for clients to log variation status to our internal system ►Including test variation information in our eventing frameworks (GA and Amplitude) These helped ensure we had a good data pipeline to tag users and events for further analysis when required.
  • 13. 13 Integrating with Full Stack The Optimizely results dashboard made it easy to get early directional decisions on whether to stop/ramp a test, while our wrapper gave us the information needed for a deeper analysis. We wrote a wrapper service around the Optimizely client to integrate with our existing site test data model to log bucketing results for analytics purposes. We added an asynchronous event reporter for reporting events to Optimizely (runs in our background job processor). Currently, the Optimizely datafile is only downloaded on application startup.
  • 16. 16 Testing with Full Stack Creating a test in our application is fairly straightforward: 1. Run a migration to create the test/variations in our data model class SiteTestVariationsForMyTest < ActiveRecord::Migration def self.up site_test = SiteTest.create!( name: 'My Test', experiment_id: 'my-test', is_active: false ) site_test.site_test_variations.create!( variation_name: 'Control', variation_id: 'my-test-control' ) site_test.site_test_variations.create!( variation_name: 'Variation', variation_id: 'my-test-variation' ) end def self.down raise ActiveRecord::IrreversibleMigration end end
  • 17. 17 Testing with Full Stack 2. Create a testing service to wrap bucketing logic module SiteTests class MyTestingService include Experimentable def initialize(user) @user = user end def run_experiment return unless user_valid? return if bucket.blank? # Take actions on the user end private def user_valid? # does user meet criteria for test (could also be handled with audiences) end def bucket @variation_id ||= testing_service.bucket_user(@user) end end end
  • 18. 18 Testing with Full Stack 3. Bucket users SiteTests::MyTestingService.new(user).run_experiment 3. Read variation status @user&.active_site_test_variation_ids.to_a We generally bucket users at account creation or through an API call to our configurations API (returns feature status/configurations for a user).
  • 19. 19 Testing with Full Stack Some tests that we’ve run since integrating: ►New post-registration onboarding flow ►Second box reminder email ►More recipes/plan options ►New delivery schedule ►New reactivation experience These helped ensure we had a good data pipeline to tag users and events for further analysis when required. 19
  • 20. 20 Testing with Full Stack More recipes/plan options Control Test
  • 21. 21 Testing with Full Stack Control Test More recipes/plan options
  • 22. 22 Testing with Full Stack Results from more recipes/plan options test:
  • 24. 24 Results from new reactivation flow test: Testing with Full Stack
  • 26. 26 Feature Flagging vs Experimentation There is a lot of overlap between each, but they both have different user groups. ►Feature flagging −Primary user is engineering/product −May be used for a variety of reasons (enabling a new service, fallback behavior, or a user feature) ►Experimentation −Primary user is product/analytics −Care about being able to track through other metrics/events −Generally focused on customer impacting features As a developer, I don’t care if a feature is enabled via a flag or a test. I only care about knowing how to enable/disable something. As a PM or Analyst, I likely care more about experiments than feature flags (although I’d want to audit both).
  • 27. 27 Feature Flagging vs Experimentation We use two separate tools for feature flagging: Open Source Optimizely (GitHub Platform Team) Full Stack GOAL: Create a single source of truth and with an easier to use dashboard for setting up features. 27
  • 28. 28 Feature Flagging vs Experimentation Rough plan: ► Expose “feature configuration” to clients through API (both internal code structure and our REST API) −List of enabled features and any configuration parameters ►Consolidate features to be enabled if flipper || optimizely ►Add administration panel to create features/configurations and test or roll them out ►Support better cross-platform testing −App version targeting −User segmentation −“Global holdback” −Mutually exclusive tests Why do we still use flipper? Local, and changes occur instantly. Better for arbitrary % rollouts (vs the more heavyweight enablement through Optimizely).
  • 29. 29 Feature Management Optimizely Full Stack just launched a new feature management system. It supports: ►Defining a feature configuration −A feature is a set of boolean/double/integer/string parameters −Can modify parameters by variation (or rollout) ►Enabling a feature in an experiment variation or rolling out to %/audience We’re still testing it, but looks promising (and being able to update variables without rolling code is incredibly helpful). 29
  • 30. 30 Tech Debt We are still developing general guidelines for engineers on how to set up tests, particularly around which platform to implement and how to implement (we use Optimizely Web, Full Stack, iOS, Android, and have Flipper for server side gating). As we do more testing, we enable more features, which makes our code more complex. On a quarterly-ish basis, we go through and clean up unused tests (or do so when launching). You should definitely have a philosophy on feature flags and tech debt cleanup.
  • 31. 31 Things to Think About Optimizely specifically: ►Environments −Currently have each environment (dev/staging/production) as separate Optimizely projects - makes it difficult to copy tests between each environment ►Cross Platform Testing −If serving multiple platforms, need server managed solution (even if just driving client-only changes) to ensure consistent experience 31
  • 32. 32 Things to Think About At the end of the day, who are the users of your feature/experimentation platform? ►Testing gives you insights into user behavior - what are you going to do with that? ►How do you measure your KPIs? ►How do you make decisions? ►What’s the developer experience like? 32