SlideShare a Scribd company logo
Building a high-performance, scalable
ML & NLP platform with Python
Sheer El Showk
CTO, Lore Ai
www.lore.ai
Lore is a small startup focused on developing and applying
machine-learning techniques to solve a wide array of business
problems.
We are product-focused but we also provide consulting services
(because building and selling products is hard!)
We are self-funded so we need to be cost-effective but also flexible and scalable.
- Need to be ready to pivot if product is not working
- Need to support both consulting work and product development
Over time developed a complex but modular stack:
- ML & NLP tools/services
- Standard web-stack: DB, caching, API servers, web servers
- Fully scalable (all components can be parallelized)
This has allowed us to develop & maintain several products:
- A chatbot
- A content-based recommender engine
- Our flagship product, Salient, a powerful document analysis engine
The Task
Client Data
Convert &
Parse
ML/NLP Services
-Question-Answering
-Document Similarity
-Information extraction
NLP Processing
Annotated
Documents
Entity Linking
Client
Knowledge Graph
Upload Docs
ETL
Web/API
Servers
Client
NLP Pipeline
-Grammar parsing
-Parts-of-speech
-Entities (people,etc..)
Knowledge Graph
Applications
● Contract Management: automate labelling database of contracts (PDFs, word, OCR)
○ Contract type, expiration date, other parties, important clauses, etc..
● Analyze news
○ Build a database of product releases or funding rounds from press releases
○ Find companies fitting a profile -- competitors, acquisition, investors, etc..
● Patent and Policy Document Analysis
○ Build an ontology and network linking concepts and content
The Challenges
Lore’s stack has all the normal challenges of developing a
large web/business application in python.
The addition of ML/NLP brings additional challenges
whose solutions are less well known.
Scalability
● Some use cases require processing millions of documents
○ E.g. we have a news DB with 4 million articles in it
● Python considered “slow” and not scalable (hard to parallelize)
○ Java preferred language of enterprise
● ML brings new problems:
○ Need models to be very performant
○ Parallelizing harder: need to share models/data between servers
Maintainability
● Mixing many different technologies
○ ML/compute stack: theano, gensim, spaCy, etc..
○ Web stack: django, celery, mysql, etc..
● Multiple servers/services can mean expensive/difficult devops.
● Large code base with different kinds of code (JS/web vs ML/NLP) make it hard to
coordinate between developers.
Flexibility
● Business requirements change very quickly
○ Need to provide a wide range of services from same platform
○ Need to be able to easily upgrade/improve parts of system
● Agility often requires integrating existing off-the-shelf solutions to solve non-core tasks.
● Easy to deploy or scale a deployment.
Where we Started
Where we started...
➔ Monolithic python-based server + PHP UI
➔ MySQL DB backend
➔ Hard coded dependence on target document set.
➔ Basic off-the-shelf NLP tools
◆ NLTK, etc..
➔ Serial pipeline
➔ No redundancy or scalability
➔ Hard to install/maintain (all manual)
A few pivots and consulting gigs later...
The new stack...
Mongo
Cluster
MySQL
(soon
ElasticSearch)
Minio
(distributed blob
storage)
Input
Processor
Service
Broker
Load
Balancer
NLP
Services
Django
Web
Web
scrapers
Pipeline
Logging
Cron-jobs
KG
Service
Entity
Generator
➔ Kubernetes & docker for devops
➔ Celery for parallelization
➔ Micro-services Architecure
➔ In-house NLP engine
➔ Distributed data stores:
◆ Mongo, Minio, Redis
➔ Very modular, re-usable
➔ Rapid deployment (even cluster)
Django
API
Celery Workers
Redis
(distributed
caching)
What we learned along the
way...
Lessons
● Devops: kubernetes, docker, docker-compose
● Parallelization: celery, dask, joblib,...
● Modularity: (stateless) microservices architecture
● Persistence: scalable distributed caching/persistence (redis, mongo, ES,...)
● Future-proofing: wrapper patterns
● Performance: cython or pythran for bottleneck code
Docker & Kubernetes
Dockerize early, dockerize often
1. Uniform environment between devs
a. Use ipython to develop in stack
2. Easy to add services
a. Solve dev problems using devops!
3. Fast, consistent deploy to production
a. Deploying our stack is rate-limited by
download speed :-)
4. Cut costs by running on bare metal
a. Run your own “AWS” with k8ns
type
AWS
(reserved)
Hetzner
(bare metal)
16 threads
64 gb ram
$360/mo
(no storage)
$70/mo
(1tb ssd)
GPU server
$450/mo
(K80)
$120/mo
(1080)
Celery for Parallelization
10 minute parallelization
● Many interesting options for parallelization:
○ Dask.distributed, joblib, celery, ipython
● Celery “complicated” -- requires Redis/RabbitMQ, etc..
○ Very easy with docker/kubernetes
○ NOTE: Redis has much lower latency
● Transparent parallelization
○ Wrap “entry-point” functions in a task and Bob’s your uncle.
● Lots of fancy features but don’t need to use them until you need them.
Stateless Microservices
Unlimited Scalability and Flexibility
● High level analog of an Abstract Base Class
● Split codebase into independent microservices
○ Each microservices handles one kind of thing: NLP, logging, DB persistence, etc…
○ Wrap underlying service abstractly: redis→ kv_cache, minio → kv_store, mysql → sql_db
● Applications can include multiple microservices talking to each other
● Microservices should be stateless
○ All state (caching, persistence, etc..) should be handled by external services (DB, redis, etc..)
● Microservices encapsulate underlying implementation of a service
● Microservices are not micro-servers: services are just APIs within an API.
○ Should not be tied to an interface (REST, JSON, etc..)
NLP ServiceNLP Service
Distributed Persistence
Let others do the hard work
● All persistence handled by “layers” of persistence
services.
○ Layering helps performance
● Persistent services accessed via “wrappers” (see next
slide) for easy replacement.
● “State” of services managed by distributed cache
○ ML models can be shared by workers using e.g. Redis
(caching) and Minio (persistence)
● Different types of persistence for different problems:
○ Mongo stores JSON, Minio stores blobs, mysql stores
tables
Minio
(distributed blob
storage)
Redis
(distributed
caching)
cache
save
Neural
network
NLP Service
create/
train
Design Patterns
Wrapper Pattern
● Wrap access to all external services (db,
logging, etc..)
● Easy to swap in new versions
○ MySQL vs Postgres, etc.
● Easily add logging, performance, etc..
● Insert custom logic to modify the behavior, eg
○ Manually shard/replicate DBs
○ Add new logging destinations
● This pattern has saved us countless hours of
refactoring!
Examples
● Local → Centralized logging & config with just
a few lines of code.
● Migrating from NLTK to better (in-house) NLP
● Restricting user access to DB by transparently
replacing tables with views.
Performance
● Use native types correctly:
○ set vs list, iteritems vs items
● Pythonic code is almost always faster.
● Use %timeit to test code snippets everywhere
● Beware of hidden memory allocation
● For critical bottlenecks use:
○ Pythran (very easy but limited coverage)
○ Cython (harder, but more flexible)
● For ML use generators to stream data from
disk/db (see gensim).
● Know your times
Performance (II)
Example
Classifying (short) documents
● Initial rate: 2k docs/sec
● Initial Profiling:
○ Db read rate: 250k docs/s
○ Feature generation 2k docs/s
○ Classification 10k docs/s
● Feature generation involves for-loops &
complex logic.
Fixes
● Refactored feature generation:
○ Extract features in list comprehensions
○ Convert to vectors in Pythran code
● New times
○ Python part: 10k docs/s
○ Pythran: 40k docs/s
● Fix memory allocation in classifier: 50k docs/s
● New times: ~10k docs/s
● Going forward: cache features in Mongo?
Further Reading...
● Radim Řehůřek (gensim):
“Does Python Stand a Chance in Today’s World of Data Science”
(https://youtu.be/jfbgt3KjWFQ)
● High Performance Python (O’Reilly)
○ “Lessons from the Field” (chapter 12)
● Fluent Python (O’Reilly)
● Latency Numbers Every Programmer Should Know
https://gist.github.com/jboner/2841832
Bye!
Open Problems
● Cores vs RAM
○ Many models require lots of RAM (models can be Gbs in size).
○ Models read-only but because of python GIL hard to share memory between “processes/threads”
○ Use “sharedmem” module?
● Generating models from terabytes of data?
○ “Embedding models” can capture interesting information from huge amounts of data
○ Train very quickly (millions words/sec per server)
○ How can we distribute parameters/data between large number of workers?
Ad

More Related Content

What's hot (20)

Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)
Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)
Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)
Fabrice Bernhard
 
SciPipe - A light-weight workflow library inspired by flow-based programming
SciPipe - A light-weight workflow library inspired by flow-based programmingSciPipe - A light-weight workflow library inspired by flow-based programming
SciPipe - A light-weight workflow library inspired by flow-based programming
Samuel Lampa
 
Pulumi. Modern Infrastructure as Code.
Pulumi. Modern Infrastructure as Code.Pulumi. Modern Infrastructure as Code.
Pulumi. Modern Infrastructure as Code.
Yurii Bychenok
 
Infrastructure-as-Code with Pulumi - Better than all the others (like Ansible)?
Infrastructure-as-Code with Pulumi- Better than all the others (like Ansible)?Infrastructure-as-Code with Pulumi- Better than all the others (like Ansible)?
Infrastructure-as-Code with Pulumi - Better than all the others (like Ansible)?
Jonas Hecht
 
Are you using an opensource library? There's a good chance you are vulnerable...
Are you using an opensource library? There's a good chance you are vulnerable...Are you using an opensource library? There's a good chance you are vulnerable...
Are you using an opensource library? There's a good chance you are vulnerable...
Codemotion
 
Deep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayDeep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the Bay
Adam Gibson
 
Infrastructure as "Code" with Pulumi
Infrastructure as "Code" with PulumiInfrastructure as "Code" with Pulumi
Infrastructure as "Code" with Pulumi
Venura Athukorala
 
How to write a well-behaved Python command line application
How to write a well-behaved Python command line applicationHow to write a well-behaved Python command line application
How to write a well-behaved Python command line application
gjcross
 
C++ on the Web: Run your big 3D game in the browser
C++ on the Web: Run your big 3D game in the browserC++ on the Web: Run your big 3D game in the browser
C++ on the Web: Run your big 3D game in the browser
Andre Weissflog
 
Introduction to go, and why it's awesome
Introduction to go, and why it's awesomeIntroduction to go, and why it's awesome
Introduction to go, and why it's awesome
Janet Kuo
 
How to rewrite the OS using C by strong type
How to rewrite the OS using C by strong typeHow to rewrite the OS using C by strong type
How to rewrite the OS using C by strong type
Kiwamu Okabe
 
Minko - Build WebGL applications with C++ and asm.js
Minko - Build WebGL applications with C++ and asm.jsMinko - Build WebGL applications with C++ and asm.js
Minko - Build WebGL applications with C++ and asm.js
Minko3D
 
Multiplatform C++ on the Web with Emscripten
Multiplatform C++ on the Web with EmscriptenMultiplatform C++ on the Web with Emscripten
Multiplatform C++ on the Web with Emscripten
Chad Austin
 
20151117 IoT를 위한 서비스 구성과 개발
20151117 IoT를 위한 서비스 구성과 개발20151117 IoT를 위한 서비스 구성과 개발
20151117 IoT를 위한 서비스 구성과 개발
영욱 김
 
AddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based ProgrammingAddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based Programming
Samuel Lampa
 
Netty training
Netty trainingNetty training
Netty training
Jackson dos Santos Olveira
 
.NET Core in the Real World
.NET Core in the Real World.NET Core in the Real World
.NET Core in the Real World
Nate Barbettini
 
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with PulumiInfrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
inovex GmbH
 
Net core
Net coreNet core
Net core
Damir Dobric
 
.NET Core Blimey! Windows Platform User Group, Manchester
.NET Core Blimey! Windows Platform User Group, Manchester.NET Core Blimey! Windows Platform User Group, Manchester
.NET Core Blimey! Windows Platform User Group, Manchester
citizenmatt
 
Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)
Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)
Adopt DevOps philosophy on your Symfony projects (Symfony Live 2011)
Fabrice Bernhard
 
SciPipe - A light-weight workflow library inspired by flow-based programming
SciPipe - A light-weight workflow library inspired by flow-based programmingSciPipe - A light-weight workflow library inspired by flow-based programming
SciPipe - A light-weight workflow library inspired by flow-based programming
Samuel Lampa
 
Pulumi. Modern Infrastructure as Code.
Pulumi. Modern Infrastructure as Code.Pulumi. Modern Infrastructure as Code.
Pulumi. Modern Infrastructure as Code.
Yurii Bychenok
 
Infrastructure-as-Code with Pulumi - Better than all the others (like Ansible)?
Infrastructure-as-Code with Pulumi- Better than all the others (like Ansible)?Infrastructure-as-Code with Pulumi- Better than all the others (like Ansible)?
Infrastructure-as-Code with Pulumi - Better than all the others (like Ansible)?
Jonas Hecht
 
Are you using an opensource library? There's a good chance you are vulnerable...
Are you using an opensource library? There's a good chance you are vulnerable...Are you using an opensource library? There's a good chance you are vulnerable...
Are you using an opensource library? There's a good chance you are vulnerable...
Codemotion
 
Deep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the BayDeep Learning with GPUs in Production - AI By the Bay
Deep Learning with GPUs in Production - AI By the Bay
Adam Gibson
 
Infrastructure as "Code" with Pulumi
Infrastructure as "Code" with PulumiInfrastructure as "Code" with Pulumi
Infrastructure as "Code" with Pulumi
Venura Athukorala
 
How to write a well-behaved Python command line application
How to write a well-behaved Python command line applicationHow to write a well-behaved Python command line application
How to write a well-behaved Python command line application
gjcross
 
C++ on the Web: Run your big 3D game in the browser
C++ on the Web: Run your big 3D game in the browserC++ on the Web: Run your big 3D game in the browser
C++ on the Web: Run your big 3D game in the browser
Andre Weissflog
 
Introduction to go, and why it's awesome
Introduction to go, and why it's awesomeIntroduction to go, and why it's awesome
Introduction to go, and why it's awesome
Janet Kuo
 
How to rewrite the OS using C by strong type
How to rewrite the OS using C by strong typeHow to rewrite the OS using C by strong type
How to rewrite the OS using C by strong type
Kiwamu Okabe
 
Minko - Build WebGL applications with C++ and asm.js
Minko - Build WebGL applications with C++ and asm.jsMinko - Build WebGL applications with C++ and asm.js
Minko - Build WebGL applications with C++ and asm.js
Minko3D
 
Multiplatform C++ on the Web with Emscripten
Multiplatform C++ on the Web with EmscriptenMultiplatform C++ on the Web with Emscripten
Multiplatform C++ on the Web with Emscripten
Chad Austin
 
20151117 IoT를 위한 서비스 구성과 개발
20151117 IoT를 위한 서비스 구성과 개발20151117 IoT를 위한 서비스 구성과 개발
20151117 IoT를 위한 서비스 구성과 개발
영욱 김
 
AddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based ProgrammingAddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based Programming
Samuel Lampa
 
.NET Core in the Real World
.NET Core in the Real World.NET Core in the Real World
.NET Core in the Real World
Nate Barbettini
 
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with PulumiInfrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
inovex GmbH
 
.NET Core Blimey! Windows Platform User Group, Manchester
.NET Core Blimey! Windows Platform User Group, Manchester.NET Core Blimey! Windows Platform User Group, Manchester
.NET Core Blimey! Windows Platform User Group, Manchester
citizenmatt
 

Similar to Building a high-performance, scalable ML & NLP platform with Python, Sheer El Showk (20)

Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Databricks
 
AirBNB's ML platform - BigHead
AirBNB's ML platform - BigHeadAirBNB's ML platform - BigHead
AirBNB's ML platform - BigHead
Karthik Murugesan
 
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
Hua Chu
 
Scaling symfony apps
Scaling symfony appsScaling symfony apps
Scaling symfony apps
Matteo Moretti
 
Scala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusScala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadius
BoldRadius Solutions
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
Fei Chen
 
Instant developer onboarding with self contained repositories
Instant developer onboarding with self contained repositoriesInstant developer onboarding with self contained repositories
Instant developer onboarding with self contained repositories
Yshay Yaacobi
 
[WSO2Con EU 2018] Architecting for a Container Native Environment
[WSO2Con EU 2018] Architecting for a Container Native Environment[WSO2Con EU 2018] Architecting for a Container Native Environment
[WSO2Con EU 2018] Architecting for a Container Native Environment
WSO2
 
2016_04_04_CNI_Spring_Meeting_Microservices
2016_04_04_CNI_Spring_Meeting_Microservices2016_04_04_CNI_Spring_Meeting_Microservices
2016_04_04_CNI_Spring_Meeting_Microservices
Jason Varghese
 
Serverless Compose vs hurtownia danych
Serverless Compose vs hurtownia danychServerless Compose vs hurtownia danych
Serverless Compose vs hurtownia danych
The Software House
 
Commit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architecture
Commit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architectureCommit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architecture
Commit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architecture
Jordi Puigsegur Figueras
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Omid Vahdaty
 
Mongo db transcript
Mongo db transcriptMongo db transcript
Mongo db transcript
foliba
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
vanphp
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
Omid Vahdaty
 
Stackato v5
Stackato v5Stackato v5
Stackato v5
Jonas Brømsø
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
Not my problem - Delegating responsibility to infrastructure
Not my problem - Delegating responsibility to infrastructureNot my problem - Delegating responsibility to infrastructure
Not my problem - Delegating responsibility to infrastructure
Yshay Yaacobi
 
NoSql Databases
NoSql DatabasesNoSql Databases
NoSql Databases
Nimat Khattak
 
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
 Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa... Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Bighead: Airbnb’s End-to-End Machine Learning Platform with Krishna Puttaswa...
Databricks
 
AirBNB's ML platform - BigHead
AirBNB's ML platform - BigHeadAirBNB's ML platform - BigHead
AirBNB's ML platform - BigHead
Karthik Murugesan
 
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
PyCon HK 2018 - Heterogeneous job processing with Apache Kafka
Hua Chu
 
Scala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusScala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadius
BoldRadius Solutions
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
aspyker
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
Fei Chen
 
Instant developer onboarding with self contained repositories
Instant developer onboarding with self contained repositoriesInstant developer onboarding with self contained repositories
Instant developer onboarding with self contained repositories
Yshay Yaacobi
 
[WSO2Con EU 2018] Architecting for a Container Native Environment
[WSO2Con EU 2018] Architecting for a Container Native Environment[WSO2Con EU 2018] Architecting for a Container Native Environment
[WSO2Con EU 2018] Architecting for a Container Native Environment
WSO2
 
2016_04_04_CNI_Spring_Meeting_Microservices
2016_04_04_CNI_Spring_Meeting_Microservices2016_04_04_CNI_Spring_Meeting_Microservices
2016_04_04_CNI_Spring_Meeting_Microservices
Jason Varghese
 
Serverless Compose vs hurtownia danych
Serverless Compose vs hurtownia danychServerless Compose vs hurtownia danych
Serverless Compose vs hurtownia danych
The Software House
 
Commit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architecture
Commit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architectureCommit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architecture
Commit Conf 2018 - Hotelbeds' journey to a microservice cloud-based architecture
Jordi Puigsegur Figueras
 
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3  Big Data in 200 km/h | AWS Big Data Demystified #1.3
Big Data in 200 km/h | AWS Big Data Demystified #1.3
Omid Vahdaty
 
Mongo db transcript
Mongo db transcriptMongo db transcript
Mongo db transcript
foliba
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
vanphp
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
Omid Vahdaty
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
Omid Vahdaty
 
Not my problem - Delegating responsibility to infrastructure
Not my problem - Delegating responsibility to infrastructureNot my problem - Delegating responsibility to infrastructure
Not my problem - Delegating responsibility to infrastructure
Yshay Yaacobi
 
Ad

More from Pôle Systematic Paris-Region (20)

OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
Pôle Systematic Paris-Region
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
Pôle Systematic Paris-Region
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
Pôle Systematic Paris-Region
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
Pôle Systematic Paris-Region
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
Pôle Systematic Paris-Region
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
Pôle Systematic Paris-Region
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
Pôle Systematic Paris-Region
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Pôle Systematic Paris-Region
 
Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?
Pôle Systematic Paris-Region
 
Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin
Pôle Systematic Paris-Region
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Pôle Systematic Paris-Region
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Pôle Systematic Paris-Region
 
Osis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritageOsis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritage
Pôle Systematic Paris-Region
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
Pôle Systematic Paris-Region
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
Pôle Systematic Paris-Region
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
Pôle Systematic Paris-Region
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
Pôle Systematic Paris-Region
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
Pôle Systematic Paris-Region
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
Pôle Systematic Paris-Region
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelat
Pôle Systematic Paris-Region
 
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
OSIS19_IoT :Transparent remote connectivity to short-range IoT devices, by Na...
Pôle Systematic Paris-Region
 
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
OSIS19_Cloud : SAFC: Scheduling and Allocation Framework for Containers in a ...
Pôle Systematic Paris-Region
 
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
OSIS19_Cloud : Qu’apporte l’observabilité à la gestion de configuration? par ...
Pôle Systematic Paris-Region
 
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...OSIS19_Cloud : Performance and power management in virtualized data centers, ...
OSIS19_Cloud : Performance and power management in virtualized data centers, ...
Pôle Systematic Paris-Region
 
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
OSIS19_Cloud : Des objets dans le cloud, et qui y restent -- L'expérience du ...
Pôle Systematic Paris-Region
 
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
OSIS19_Cloud : Attribution automatique de ressources pour micro-services, Alt...
Pôle Systematic Paris-Region
 
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
OSIS19_IoT : State of the art in security for embedded systems and IoT, by Pi...
Pôle Systematic Paris-Region
 
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick MoyOsis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Osis19_IoT: Proof of Pointer Programs with Ownership in SPARK, by Yannick Moy
Pôle Systematic Paris-Region
 
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Pôle Systematic Paris-Region
 
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur BittorrentOsis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Osis18_Cloud : DeepTorrent Stockage distribué perenne basé sur Bittorrent
Pôle Systematic Paris-Region
 
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
OSIS18_IoT: L'approche machine virtuelle pour les microcontrôleurs, le projet...
Pôle Systematic Paris-Region
 
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riotOSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
OSIS18_IoT: La securite des objets connectes a bas cout avec l'os et riot
Pôle Systematic Paris-Region
 
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
OSIS18_IoT : Solution de mise au point pour les systemes embarques, par Julio...
Pôle Systematic Paris-Region
 
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
OSIS18_IoT : Securisation du reseau des objets connectes, par Nicolas LE SAUZ...
Pôle Systematic Paris-Region
 
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
OSIS18_IoT : Ada and SPARK - Defense in Depth for Safe Micro-controller Progr...
Pôle Systematic Paris-Region
 
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
OSIS18_IoT : RTEMS pour l'IoT professionnel, par Pierre Ficheux (Smile ECS)
Pôle Systematic Paris-Region
 
PyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelat
Pôle Systematic Paris-Region
 
Ad

Recently uploaded (20)

How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Hands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordDataHands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordData
Lynda Kane
 
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from AnywhereAutomation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Lynda Kane
 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
Asthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdfAsthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdf
VanessaRaudez
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Hands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordDataHands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordData
Lynda Kane
 
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from AnywhereAutomation Hour 1/28/2022: Capture User Feedback from Anywhere
Automation Hour 1/28/2022: Capture User Feedback from Anywhere
Lynda Kane
 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
Asthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdfAsthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdf
VanessaRaudez
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 

Building a high-performance, scalable ML & NLP platform with Python, Sheer El Showk

  • 1. Building a high-performance, scalable ML & NLP platform with Python Sheer El Showk CTO, Lore Ai www.lore.ai
  • 2. Lore is a small startup focused on developing and applying machine-learning techniques to solve a wide array of business problems. We are product-focused but we also provide consulting services (because building and selling products is hard!)
  • 3. We are self-funded so we need to be cost-effective but also flexible and scalable. - Need to be ready to pivot if product is not working - Need to support both consulting work and product development Over time developed a complex but modular stack: - ML & NLP tools/services - Standard web-stack: DB, caching, API servers, web servers - Fully scalable (all components can be parallelized) This has allowed us to develop & maintain several products: - A chatbot - A content-based recommender engine - Our flagship product, Salient, a powerful document analysis engine
  • 5. Client Data Convert & Parse ML/NLP Services -Question-Answering -Document Similarity -Information extraction NLP Processing Annotated Documents Entity Linking Client Knowledge Graph Upload Docs ETL Web/API Servers Client NLP Pipeline -Grammar parsing -Parts-of-speech -Entities (people,etc..) Knowledge Graph
  • 6. Applications ● Contract Management: automate labelling database of contracts (PDFs, word, OCR) ○ Contract type, expiration date, other parties, important clauses, etc.. ● Analyze news ○ Build a database of product releases or funding rounds from press releases ○ Find companies fitting a profile -- competitors, acquisition, investors, etc.. ● Patent and Policy Document Analysis ○ Build an ontology and network linking concepts and content
  • 8. Lore’s stack has all the normal challenges of developing a large web/business application in python. The addition of ML/NLP brings additional challenges whose solutions are less well known.
  • 9. Scalability ● Some use cases require processing millions of documents ○ E.g. we have a news DB with 4 million articles in it ● Python considered “slow” and not scalable (hard to parallelize) ○ Java preferred language of enterprise ● ML brings new problems: ○ Need models to be very performant ○ Parallelizing harder: need to share models/data between servers
  • 10. Maintainability ● Mixing many different technologies ○ ML/compute stack: theano, gensim, spaCy, etc.. ○ Web stack: django, celery, mysql, etc.. ● Multiple servers/services can mean expensive/difficult devops. ● Large code base with different kinds of code (JS/web vs ML/NLP) make it hard to coordinate between developers.
  • 11. Flexibility ● Business requirements change very quickly ○ Need to provide a wide range of services from same platform ○ Need to be able to easily upgrade/improve parts of system ● Agility often requires integrating existing off-the-shelf solutions to solve non-core tasks. ● Easy to deploy or scale a deployment.
  • 13. Where we started... ➔ Monolithic python-based server + PHP UI ➔ MySQL DB backend ➔ Hard coded dependence on target document set. ➔ Basic off-the-shelf NLP tools ◆ NLTK, etc.. ➔ Serial pipeline ➔ No redundancy or scalability ➔ Hard to install/maintain (all manual)
  • 14. A few pivots and consulting gigs later...
  • 15. The new stack... Mongo Cluster MySQL (soon ElasticSearch) Minio (distributed blob storage) Input Processor Service Broker Load Balancer NLP Services Django Web Web scrapers Pipeline Logging Cron-jobs KG Service Entity Generator ➔ Kubernetes & docker for devops ➔ Celery for parallelization ➔ Micro-services Architecure ➔ In-house NLP engine ➔ Distributed data stores: ◆ Mongo, Minio, Redis ➔ Very modular, re-usable ➔ Rapid deployment (even cluster) Django API Celery Workers Redis (distributed caching)
  • 16. What we learned along the way...
  • 17. Lessons ● Devops: kubernetes, docker, docker-compose ● Parallelization: celery, dask, joblib,... ● Modularity: (stateless) microservices architecture ● Persistence: scalable distributed caching/persistence (redis, mongo, ES,...) ● Future-proofing: wrapper patterns ● Performance: cython or pythran for bottleneck code
  • 18. Docker & Kubernetes Dockerize early, dockerize often 1. Uniform environment between devs a. Use ipython to develop in stack 2. Easy to add services a. Solve dev problems using devops! 3. Fast, consistent deploy to production a. Deploying our stack is rate-limited by download speed :-) 4. Cut costs by running on bare metal a. Run your own “AWS” with k8ns type AWS (reserved) Hetzner (bare metal) 16 threads 64 gb ram $360/mo (no storage) $70/mo (1tb ssd) GPU server $450/mo (K80) $120/mo (1080)
  • 19. Celery for Parallelization 10 minute parallelization ● Many interesting options for parallelization: ○ Dask.distributed, joblib, celery, ipython ● Celery “complicated” -- requires Redis/RabbitMQ, etc.. ○ Very easy with docker/kubernetes ○ NOTE: Redis has much lower latency ● Transparent parallelization ○ Wrap “entry-point” functions in a task and Bob’s your uncle. ● Lots of fancy features but don’t need to use them until you need them.
  • 20. Stateless Microservices Unlimited Scalability and Flexibility ● High level analog of an Abstract Base Class ● Split codebase into independent microservices ○ Each microservices handles one kind of thing: NLP, logging, DB persistence, etc… ○ Wrap underlying service abstractly: redis→ kv_cache, minio → kv_store, mysql → sql_db ● Applications can include multiple microservices talking to each other ● Microservices should be stateless ○ All state (caching, persistence, etc..) should be handled by external services (DB, redis, etc..) ● Microservices encapsulate underlying implementation of a service ● Microservices are not micro-servers: services are just APIs within an API. ○ Should not be tied to an interface (REST, JSON, etc..)
  • 21. NLP ServiceNLP Service Distributed Persistence Let others do the hard work ● All persistence handled by “layers” of persistence services. ○ Layering helps performance ● Persistent services accessed via “wrappers” (see next slide) for easy replacement. ● “State” of services managed by distributed cache ○ ML models can be shared by workers using e.g. Redis (caching) and Minio (persistence) ● Different types of persistence for different problems: ○ Mongo stores JSON, Minio stores blobs, mysql stores tables Minio (distributed blob storage) Redis (distributed caching) cache save Neural network NLP Service create/ train
  • 22. Design Patterns Wrapper Pattern ● Wrap access to all external services (db, logging, etc..) ● Easy to swap in new versions ○ MySQL vs Postgres, etc. ● Easily add logging, performance, etc.. ● Insert custom logic to modify the behavior, eg ○ Manually shard/replicate DBs ○ Add new logging destinations ● This pattern has saved us countless hours of refactoring! Examples ● Local → Centralized logging & config with just a few lines of code. ● Migrating from NLTK to better (in-house) NLP ● Restricting user access to DB by transparently replacing tables with views.
  • 23. Performance ● Use native types correctly: ○ set vs list, iteritems vs items ● Pythonic code is almost always faster. ● Use %timeit to test code snippets everywhere ● Beware of hidden memory allocation ● For critical bottlenecks use: ○ Pythran (very easy but limited coverage) ○ Cython (harder, but more flexible) ● For ML use generators to stream data from disk/db (see gensim). ● Know your times
  • 24. Performance (II) Example Classifying (short) documents ● Initial rate: 2k docs/sec ● Initial Profiling: ○ Db read rate: 250k docs/s ○ Feature generation 2k docs/s ○ Classification 10k docs/s ● Feature generation involves for-loops & complex logic. Fixes ● Refactored feature generation: ○ Extract features in list comprehensions ○ Convert to vectors in Pythran code ● New times ○ Python part: 10k docs/s ○ Pythran: 40k docs/s ● Fix memory allocation in classifier: 50k docs/s ● New times: ~10k docs/s ● Going forward: cache features in Mongo?
  • 25. Further Reading... ● Radim Řehůřek (gensim): “Does Python Stand a Chance in Today’s World of Data Science” (https://youtu.be/jfbgt3KjWFQ) ● High Performance Python (O’Reilly) ○ “Lessons from the Field” (chapter 12) ● Fluent Python (O’Reilly) ● Latency Numbers Every Programmer Should Know https://gist.github.com/jboner/2841832
  • 26. Bye!
  • 27. Open Problems ● Cores vs RAM ○ Many models require lots of RAM (models can be Gbs in size). ○ Models read-only but because of python GIL hard to share memory between “processes/threads” ○ Use “sharedmem” module? ● Generating models from terabytes of data? ○ “Embedding models” can capture interesting information from huge amounts of data ○ Train very quickly (millions words/sec per server) ○ How can we distribute parameters/data between large number of workers?