SlideShare a Scribd company logo
From Data Science to Production
01
deploy, scale, enjoy!
Sergii Khomenko, Data Scientist
sergii.khomenko@stylight.com, @lc0d3r
PyData Amsterdam - March 12, 2016
Sergii Khomenko
2
Data scientist at one of the biggest fashion communities, Stylight.
Data analysis and visualisation hobbyist, working on problems not
only in working time but in free time for fun and personal data
visualisations.
Originally from computer engineering background.
Speaker at Berlin Buzzwords 2014, ApacheCon Europe 2014, Puppet
Camp London 2015, Berlin Buzzwords 2015 , Tableau Conference on
Tour 2015, Budapest BI Forum 2015, Crunchsconf 2015, FOSDEM
2016
Fellow DevOps
3
Quentin NerdenMilos Radovanovic Patrick Roelke
Profitable Leads
Stylight provides its
partners with high-
quality leads enabling
partner shops to
leverage Stylight as a
ROI positive traffic
channel.
Inspiration
Stylight offers
shoppable
inspiration that
makes it easy to
know what to
buy and how to
style it.
Branding & Reach
Stylight offers a unique
opportunity for brands to reach
an audience that is actively
looking for style online.
Shopping
Stylight helps users search
and shop fashion and lifestyle
products smarter across
hundreds of shops.
4
Stylight – Make Style Happen
Core Target Group
Stylight help aspiring women between 18 and 35 to evolve their style through shoppable inspiration.
Stylight – acting on a global scale
Experienced & Ambitious Team
Innovative cross-
functional organisation
with flat hierarchy builds a 

unique team spirit.
• +200 employees
• 40 PhDs/Engineers
• 28 years average age
• 63% female
• 23 nationalities
• 0 suits
6
7
D a t a S c i e n t i s t : P e r s o n w h o i s
b e t t e r a t s t a t i s t i c s t h a n a n y
s o f t w a r e e n g i n e e r a n d b e t t e r a t
s o f t w a r e e n g i n e e r i n g t h a n a n y
s t a t i s t i c i a n .
Agenda
8
E a r l y d a y s o f s t a r t u p s
S o f t w a r e e n g i n e e r i n g
I m m u t a b l e i n f r a s t r u c t u r e
S e r v e r l e s s a r c h i t e c t u r e
The Early Days of Startups
9
Problem definition:
10
• Many different technologies
• Hard to reproduce data science results
• Issues with backward compatibility
• Dependency hell
• Hard to scale products
• Hard to on-board new people
11
Software engineering
12
built circa 2015-16
Our stack
13
14
You most likely doing it already
15
• Version control
• Cover code with tests
• nosetests, pytest, unittest2
- start small with doc tests
- try out TDD: rednose, nose-watch
You most likely doing it already
16
• Cover code with tests
• yes, even your R application could
have tests
- testthat
- devtools
• Code reviews
• Pair programming
Some of the mentioned problems
17
• Many different technologies
• Issues with backward compatibility
• Dependency hell
• Hard to on-board new people
18image from http://udaypal.com/
19image from http://udaypal.com/
20image from http://udaypal.com/
Some of the mentioned problems
21
• Many different technologies
• Issues with backward compatibility
• Dependency hell
• Hard to on-board new people
How it could help:
22
• Every technology has its own container
- just docker run
• Every package with version defined in
Dockerfile
- have a base image for more advanced cases
• New people
- just docker run
23image from http://udaypal.com/
r-base/Dockerfile
24image from http://udaypal.com/
lc0/docker-shiny-server
25image from http://udaypal.com/
Known issues
26
• Images could be really huge
• Try to skip anything you do not need
• Alpine Linux as a base image
• 5 mb base image (musl libc and BusyBox)
• Iron.io has pre-built images based on alpine
• python, scala, java, elixir, etc
Known issues
27
16 mb
232 mb
Some of the mentioned problems
28
• Hard to roll out
• Hard to maintain production dependencies
29image from http://udaypal.com/
AWS ECR
30image from http://udaypal.com/
31image from http://udaypal.com/
CircleCI deployments
32image from http://udaypal.com/
CircleCI deployments
33image from http://udaypal.com/
CircleCI deployments
34image from http://udaypal.com/
CircleCI deployments
Immutable infrastructure
35
Infrastructure as Code
36
N e e d t o u p g r a d e ? N o p r o b l e m .
B u i l d a n e w , u p g r a d e d s y s t e m a n d
t h r o w t h e o l d o n e a w a y . N e w a p p
r e v i s i o n ? S a m e t h i n g . B u i l d a
s e r v e r ( o r i m a g e ) w i t h a n e w
r e v i s i o n a n d t h r o w a w a y t h e o l d
o n e s .
37
38
39
40
CloudFormation
41
CloudFormation
42
cloudtools/troposphere
43
cloudtools/troposphere
44
cloudtools/troposphere
45
Terraform
46
47
Terraform
Kubernetes and Docker {Swarm, Compose}
Serverless architecture
48
49
50
51
52
53
54
55
Possibilities
56
• all Lambdas in one place with version control
• integration tests with real events
• proper CI/CD setup
57
CircleCI deployments
58
CircleCI deployments
59
CircleCI deployments
60
Cloud functions
Use-case of
outlier detection
61
62
63
custom
unification
pipeline
Departments
Business
Intelligence
internal processes variety of event types
and structures
64
Outlier detection to Slack
www.stylight.com
sergii.khomenko@stylight.com
@lc0d3r
Related links
66
1. Testing Your Code - The Hitchhiker's Guide to Python
2. https://hub.docker.com/_/r-base/
3. http://www.alpinelinux.org/
4. https://github.com/iron-io/dockers
5. Docker Hub: A new stack plus ecosystem partners automate developer
workflows
6. Trash Your Servers and Burn Your Code: Immutable Infrastructure and
Disposable Components
Related links
67
7. https://github.com/cloudtools/troposphere
8. CloudFormation UpdatePolicy Attribute
9. https://www.terraform.io/
10.(Docker Compose + Docker Swarm) or Kubernetes
11.Google Cloud Functions
12.https://github.com/apex/apex
13.Streaming Data Processing with Amazon Kinesis and AWS Lambda
68
69

More Related Content

Viewers also liked (12)

PPTX
See This, Do That Analytics presentation from Superweek 2014
Peter O'Neill
 
PPTX
Impacting Business Performance with Analytics
Peter O'Neill
 
PPTX
Measure camp pres 5 cro myths
Amrdeep Athwal (L.I.O.N)
 
PDF
Get more from Analytics 360 with BigQuery and the Google Cloud Platform
javier ramirez
 
PPTX
Breaking down the barriers to the use of digital analytics
Peter O'Neill
 
PDF
User-Centric Analytics (MeasureCamp Talk)
Taste Medio
 
PPTX
Superweek 2015 traffic attribution
Jacob Kildebogaard
 
PDF
A/B Testing Pitfalls - MeasureCamp London 2015
Michal Parizek
 
PPTX
Apache Spark Model Deployment
Databricks
 
PPTX
Google BigQuery 101 & What’s New
DoiT International
 
PPTX
31 Ways To Destroy Your Google Analytics Implementation
Charles Meaden
 
PDF
Google Analytics Premium for Better Data-Driven Decisions With Swapnil Sinha
Tatvic Analytics
 
See This, Do That Analytics presentation from Superweek 2014
Peter O'Neill
 
Impacting Business Performance with Analytics
Peter O'Neill
 
Measure camp pres 5 cro myths
Amrdeep Athwal (L.I.O.N)
 
Get more from Analytics 360 with BigQuery and the Google Cloud Platform
javier ramirez
 
Breaking down the barriers to the use of digital analytics
Peter O'Neill
 
User-Centric Analytics (MeasureCamp Talk)
Taste Medio
 
Superweek 2015 traffic attribution
Jacob Kildebogaard
 
A/B Testing Pitfalls - MeasureCamp London 2015
Michal Parizek
 
Apache Spark Model Deployment
Databricks
 
Google BigQuery 101 & What’s New
DoiT International
 
31 Ways To Destroy Your Google Analytics Implementation
Charles Meaden
 
Google Analytics Premium for Better Data-Driven Decisions With Swapnil Sinha
Tatvic Analytics
 

Similar to From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016 (20)

PDF
Reliability Patterns for Distributed Applications
Andrew Hamilton
 
PDF
The "Holy Grail" of Dev/Ops
Erik Osterman
 
ODP
Path Dependent Development (PyCon AU)
ncoghlan_dev
 
PDF
An Introduction to developing for production
Amal George Thomas
 
ODP
Path dependent-development (PyCon India)
ncoghlan_dev
 
PPTX
Design Reviews for Operations - Velocity Europe 2014
Mandi Walls
 
PDF
Introduction to DevOps
OCTO Technology
 
PDF
How to get started with Site Reliability Engineering
Andrew Kirkpatrick
 
PDF
Software Engineering for Startups (University of St Andrews, 2013)
RightScale
 
PDF
DDDP 2019 - Brown to Green
John Archer
 
PDF
Building Evolvable Infrastructure
kiefdotcom
 
PDF
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Fwdays
 
PDF
Fine-Tuning of Agile Development
Thoughtworks
 
PDF
Microservices - Scaling Development and Service
Paulo Gaspar
 
PDF
Deploying large-scale, serverless and asynchronous systems - without integrat...
DiUS
 
PDF
Microservices: State of the Union
C4Media
 
PDF
The challenge of putting software sustainability research into practice
Green Software Development Munich
 
PDF
Lessons from Large-Scale Cloud Software at Databricks
Matei Zaharia
 
PPTX
Software engineering practices for the data science and machine learning life...
DataWorks Summit
 
PDF
Productionizing Data Science at Experience
Matt Mills
 
Reliability Patterns for Distributed Applications
Andrew Hamilton
 
The "Holy Grail" of Dev/Ops
Erik Osterman
 
Path Dependent Development (PyCon AU)
ncoghlan_dev
 
An Introduction to developing for production
Amal George Thomas
 
Path dependent-development (PyCon India)
ncoghlan_dev
 
Design Reviews for Operations - Velocity Europe 2014
Mandi Walls
 
Introduction to DevOps
OCTO Technology
 
How to get started with Site Reliability Engineering
Andrew Kirkpatrick
 
Software Engineering for Startups (University of St Andrews, 2013)
RightScale
 
DDDP 2019 - Brown to Green
John Archer
 
Building Evolvable Infrastructure
kiefdotcom
 
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Fwdays
 
Fine-Tuning of Agile Development
Thoughtworks
 
Microservices - Scaling Development and Service
Paulo Gaspar
 
Deploying large-scale, serverless and asynchronous systems - without integrat...
DiUS
 
Microservices: State of the Union
C4Media
 
The challenge of putting software sustainability research into practice
Green Software Development Munich
 
Lessons from Large-Scale Cloud Software at Databricks
Matei Zaharia
 
Software engineering practices for the data science and machine learning life...
DataWorks Summit
 
Productionizing Data Science at Experience
Matt Mills
 
Ad

More from Sergii Khomenko (9)

PDF
Building Data applications with Go: from Bloom filters to Data pipelines / FO...
Sergii Khomenko
 
PDF
Building data pipelines: from simple to more advanced - hands-on experience /...
Sergii Khomenko
 
PDF
Scaling up Business Intelligence from the scratch and to 15 countries worldwi...
Sergii Khomenko
 
PDF
Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift /...
Sergii Khomenko
 
PDF
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
Sergii Khomenko
 
PPTX
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...
Sergii Khomenko
 
PPTX
From simple to more advanced: Lessons learned in 13 months with Tableau
Sergii Khomenko
 
PPTX
Lean Ranking infrastructure with Solr
Sergii Khomenko
 
PDF
Data Visualization with R
Sergii Khomenko
 
Building Data applications with Go: from Bloom filters to Data pipelines / FO...
Sergii Khomenko
 
Building data pipelines: from simple to more advanced - hands-on experience /...
Sergii Khomenko
 
Scaling up Business Intelligence from the scratch and to 15 countries worldwi...
Sergii Khomenko
 
Secure Data Scalability at Stylight with Tableau Online and Amazon Redshift /...
Sergii Khomenko
 
Helping Data Teams with Puppet / Puppet Camp London - Apr 13, 2015
Sergii Khomenko
 
Scaling your Tableau - Migrating from Tableau Online to a proper DWH solution...
Sergii Khomenko
 
From simple to more advanced: Lessons learned in 13 months with Tableau
Sergii Khomenko
 
Lean Ranking infrastructure with Solr
Sergii Khomenko
 
Data Visualization with R
Sergii Khomenko
 
Ad

Recently uploaded (20)

PPT
Reliability Monitoring of Aircrfat commerce
Rizk2
 
PPTX
MENU-DRIVEN PROGRAM ON ARUNACHAL PRADESH.pptx
manvi200807
 
PPTX
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
 
PDF
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
PDF
Orchestrating Data Workloads With Airflow.pdf
ssuserae5511
 
PPTX
Daily, Weekly, Monthly Report MTC March 2025.pptx
PanjiDewaPamungkas1
 
PPTX
Artificial intelligence Presentation1.pptx
SaritaMahajan5
 
PDF
Informatics Market Insights AI Workforce.pdf
karizaroxx
 
PDF
Microsoft Power BI - Advanced Certificate for Business Intelligence using Pow...
Prasenjit Debnath
 
PDF
Exploiting the Low Volatility Anomaly: A Low Beta Model Portfolio for Risk-Ad...
Bradley Norbom, CFA
 
PPTX
PPT2 W1L2.pptx.........................................
palicteronalyn26
 
PDF
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
 
PPTX
ppt somu_Jarvis_AI_Assistant_presen.pptx
MohammedumarFarhan
 
PDF
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
gun3awan88
 
PDF
Data science AI/Ml basics to learn .pdf
deokhushi04
 
PDF
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
CristineGraceAcuyan
 
DOCX
Starbucks in the Indian market through its joint venture.
sales480687
 
PPTX
Mynd company all details what they are doing a
AniketKadam40952
 
PPTX
727325165-Unit-1-Data-Analytics-PPT-1.pptx
revathi148366
 
PDF
CT-2-Ancient ancient accept-Criticism.pdf
DepartmentofEnglishC1
 
Reliability Monitoring of Aircrfat commerce
Rizk2
 
MENU-DRIVEN PROGRAM ON ARUNACHAL PRADESH.pptx
manvi200807
 
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
 
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
Orchestrating Data Workloads With Airflow.pdf
ssuserae5511
 
Daily, Weekly, Monthly Report MTC March 2025.pptx
PanjiDewaPamungkas1
 
Artificial intelligence Presentation1.pptx
SaritaMahajan5
 
Informatics Market Insights AI Workforce.pdf
karizaroxx
 
Microsoft Power BI - Advanced Certificate for Business Intelligence using Pow...
Prasenjit Debnath
 
Exploiting the Low Volatility Anomaly: A Low Beta Model Portfolio for Risk-Ad...
Bradley Norbom, CFA
 
PPT2 W1L2.pptx.........................................
palicteronalyn26
 
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
 
ppt somu_Jarvis_AI_Assistant_presen.pptx
MohammedumarFarhan
 
11_L2_Defects_and_Trouble_Shooting_2014[1].pdf
gun3awan88
 
Data science AI/Ml basics to learn .pdf
deokhushi04
 
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
CristineGraceAcuyan
 
Starbucks in the Indian market through its joint venture.
sales480687
 
Mynd company all details what they are doing a
AniketKadam40952
 
727325165-Unit-1-Data-Analytics-PPT-1.pptx
revathi148366
 
CT-2-Ancient ancient accept-Criticism.pdf
DepartmentofEnglishC1
 

From Data Science to Production - deploy, scale, enjoy! / PyData Amsterdam - Mar 12, 2016