SlideShare a Scribd company logo
𝜆
Open Source
-Architecture for Deep Learning
Use case
Patrick R Nicolas
Oct. 2020
pnicolasai@yahoo.com
Overview
3
“… and the wise man said,
thou shall embrace open source”.
21st century proverb
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Overview
4
Overview
Layers
Open-source components
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Overview
5
The world of data scientists accustomed to Python
scientific libraries have been shaken up by the
emergence of ’big data’ framework such as Apache
Hadoop, Spark and Kafka.
This presentation introduces a variant of the
architecture and describes the seamless integration of
various open source components to train, validate and
test deep learning models.
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
𝜆
Disclaimer
6
The concept and architecture are versatile enough to
accommodate a variety of open source, commercial
solutions and services beside the frameworks
prescribed in this presentation.
For instance, deep learning frameworks, such as Keras
or tensor flow are excellent alternatives to PyTorch.
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Requirements
7
• Process batch and stream data, concurrently
• Enforce data immutability
• Recover gracefully from human errors
• Handle hardware failures
• Minimize latency for real-time requests
• Scale for very large data set
• Optimize full lifecycle of data set
• Guarantee quality and integrity of data
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
A ‘big data’ framework should be able to ….
Optimizing data life cycle
8
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
The need for optimizing the data life cycle: 79% of data
scientist time is spent collecting and organizing data.
Source Quora
Data quality
9
Accuracy: Correct models and representative data.
Completeness: No missing data
Consistency: Applied to semantic and format
Timeliness: Up-to-date data and notification
Accessibility: Ease of use and high availability
Validity: Comply to constraints, rules and regulations
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Guaranteeing data quality and integrity
Solution …
10
- architecture is a large scale data processing that
balanced batch and real-time streamed data.
It is a one-stop shopping for various data sources that
balance latency, redundancy, easy of access and
throughput.
It breaks down into 3 layers
• Speed (streaming, real-time, …)
• Batch (training, analysis, …)
• Serving (query, visualization, …)
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
𝜆
… using open source
11
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
architecture using open source components?
𝜆
The task consists of reviewing and evaluating the trove
of available of open source libraries to build a robust
architecture that support the rigor of training and
tuning deep learning models.
The libraries are weaved through a set language-
agnostic REST API to form a coherent pipeline.
… for deep learning
12
• Python scientific libraries have been the go-to tools
for data scientists to analyze data and build models.
• PyTorch framework builds up on these libraries to
support the design and execution of deep learning
models.
• Apache Spark and Kafka complements these
frameworks for very large data set and real-time
processing.
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
architecture for deep learning?
𝜆
Bird-eye view
13
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Feel overwhelmed?
... Let’s break it down
Example open source
𝜆 architecture
Layers
14
Overview
Layers
Open-source components
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Batch layer
15
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Batch layer objective: load batch of data to be distributed,
preprocessed to train deep learning models.
Batch layer
16
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Typical use case:
1. Apache Spark loads training set from Amazon S3
2. Spark master partitions training data
3. Spark workers preprocessed data and notify
completion through Kafka event queue
4. Pytorch updated model parameters from pre-
processed training data
5. Pytorch broadcast model parameters and quality
metrics through Kafka
6. Apache Hive powered by Spark stores models related
data and metrics
Speed layer
17
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Speed layer objective: process queries to predictive
models with very low latency.
Speed layer
18
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Use case:
1. Kafka routes data streams to Spark master
2. Spark pre-processes requests and forward them to
deep model micro-service
3. Flask converts requests to prediction query to Pytorch
model
4. Pytorch model generate a prediction
5. Run-time metrics are broadcast through Kafka
Serving layer
19
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Serving layer objective: process queries to analyze data,
model performances and execute statistical inference
Serving layer
20
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Use case:
1. Analyst queries relational data base, MySQL for most
recent data, statistics using Fine report UI (low
latency)
2. Analyst queries asynchronously Hive data warehouse
for archived data, statistics (high latency)
3. Hive processes queries through Spark datasets
4. Spark updates regularly MySQL short term data
Overview
21
Overview
Layers
Open-source components
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
PyTorch
22
PyTorch is an optimized tensor library for deep
learning using GPUs and CPUs.
It extends the functionality of Numpy and Scikit-
learn to support the training, evaluation and
commercialization of complex machine learning
models.
https://pytorch.org/tutorials/
Alternatives:
Tensor flow: https://www.tensorflow.org/
Keras: https://keras.io
MxNet: https://mxnet.apache.org
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Apache Spark
23
Apache Spark is an open source cluster computing
framework for fast real-time processing.
It supports Scala, Java, Python and R programming
languages and includes streaming, graph and machine
learning libraries.
https://www.scala-lang.org
https://spark.apache.org
Alternative:
PySpark: https://databricks.com/glossary/pyspark
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Streaming
24
Apache Kafka is an open-source distributed event
streaming framework to large scale, real-time data
processing and analytics.
It captures data from various sources in real-time as a
continuous flow and routes it to the appropriate
processor.
https://kafka.apache.org
Alternatives:
Amazon SQS: https://aws.amazon.com/sqs/
RabbitMQ: https://www.rabbitmq.com
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Model tuning
25
Ray-tune is a distributed hyper-parameters
tuning framework particularly suitable to deep learning
models.
It reduces significantly the cost of optimizing the
configuration of a model. It is a wrapper around other
open source libraries
https://docs.ray.io/en/master/tune/index.html
Alternatives:
Amazon SageMaker: https://aws.amazon.com/sagemaker/
HyperOpt: https://github.com/hyperopt/hyperopt
Optuna: https://optuna.readthedocs.io
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Python REST service
26
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Flask is an easy to use implementation of the
RESTful interface to Python applications.
It supports most of web and deployment standards
such Docker, React.js, Angular, HTML5 and WSGI
containers.
https://palletsprojects.com/p/flask/
Alternatives:
Falcon: https://falcon.readthedocs.io
Fast API: https://fastapi.tiangolo.com
RDBMS
27
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
MySQL is an open source relational database
supporting partitioning, sharding, replication. It can
be extended with real-time analytics (Heatwave)
and enterprise clustering (CGE)
https://www.mysql.com
Alternatives:
PosgresSQL: https://www.postgresql.org
HyperSQL http://www.hsqldb.org
Amazon RDS: http://aws.amazon.com/rds
Data warehouse
28
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Apache Hive is a data warehouse framework that
leverages Spark to execute largely distributed SQL
queries.
It optimizes SQL queries through lazy evaluation of
acyclic execution graph. It is integrated with
Spark data set and HDFS.
https://hive.apache.org
Alternatives:
Vertica http://www.vertica.com
Amazon Redshift https://aws.amazon.com/redshift/
Dashboard
29
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Fine report is a business intelligence and
dashboard tool that supports real time analytics,
reporting and visualization. It accomodates needs
of business managers and data scientists
https://www.finereport.com
Alternatives:
Sisense: https://www.sisense.com
Tableau: https://www.tableau.com
30
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Final disclaimer
This presentation is not an endorsement of the various
tools, libraries or frameworks described or suggested in
this presentation.
Allthough the tools listed in the slides are known to work
in the context of the architecture, there are excellent
alternative libraries that may better meet your specific
needs.
31
Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
Thank you!
Q&A
Ad

More Related Content

What's hot (20)

Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
DataArt
 
NoSql
NoSqlNoSql
NoSql
Girish Khanzode
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
Databricks
 
MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow
Saurabh Kaushik
 
A Beginner's Guide to Large Language Models
A Beginner's Guide to Large Language ModelsA Beginner's Guide to Large Language Models
A Beginner's Guide to Large Language Models
Ajitesh Kumar
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
Bethmi Gunasekara
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
MariaDB plc
 
XSEDE and National Cyberinfrastructure
XSEDE and National CyberinfrastructureXSEDE and National Cyberinfrastructure
XSEDE and National Cyberinfrastructure
John Towns
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
Sigmoid
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
DataScienceConferenc1
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
Navdeep Charan
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
ateeq ateeq
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
Databricks
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
Zahra Eskandari
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
Home
 
SUN Network File system - Design, Implementation and Experience
SUN Network File system - Design, Implementation and Experience SUN Network File system - Design, Implementation and Experience
SUN Network File system - Design, Implementation and Experience
aniadkar
 
What is Talend | Talend Tutorial for Beginners | Talend Online Training | Edu...
What is Talend | Talend Tutorial for Beginners | Talend Online Training | Edu...What is Talend | Talend Tutorial for Beginners | Talend Online Training | Edu...
What is Talend | Talend Tutorial for Beginners | Talend Online Training | Edu...
Edureka!
 
5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database
ScyllaDB
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
DataArt
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
Databricks
 
A Beginner's Guide to Large Language Models
A Beginner's Guide to Large Language ModelsA Beginner's Guide to Large Language Models
A Beginner's Guide to Large Language Models
Ajitesh Kumar
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
Bethmi Gunasekara
 
Faster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDBFaster, better, stronger: The new InnoDB
Faster, better, stronger: The new InnoDB
MariaDB plc
 
XSEDE and National Cyberinfrastructure
XSEDE and National CyberinfrastructureXSEDE and National Cyberinfrastructure
XSEDE and National Cyberinfrastructure
John Towns
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
Sigmoid
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
DataScienceConferenc1
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
Simplilearn
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
Navdeep Charan
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
Databricks
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
Zahra Eskandari
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
James Serra
 
Introduction to spark
Introduction to sparkIntroduction to spark
Introduction to spark
Home
 
SUN Network File system - Design, Implementation and Experience
SUN Network File system - Design, Implementation and Experience SUN Network File system - Design, Implementation and Experience
SUN Network File system - Design, Implementation and Experience
aniadkar
 
What is Talend | Talend Tutorial for Beginners | Talend Online Training | Edu...
What is Talend | Talend Tutorial for Beginners | Talend Online Training | Edu...What is Talend | Talend Tutorial for Beginners | Talend Online Training | Edu...
What is Talend | Talend Tutorial for Beginners | Talend Online Training | Edu...
Edureka!
 
5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database
ScyllaDB
 

Similar to Open Source Lambda Architecture for deep learning (20)

04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
Marco Quartulli
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
Paco Nathan
 
Data Engineering A Deep Dive into Databricks
Data Engineering A Deep Dive into DatabricksData Engineering A Deep Dive into Databricks
Data Engineering A Deep Dive into Databricks
Knoldus Inc.
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case Studies
Paco Nathan
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
 
Deep learning and Apache Spark
Deep learning and Apache SparkDeep learning and Apache Spark
Deep learning and Apache Spark
QuantUniversity
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Jason Dai
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
Timothy Spann
 
Started with-apache-spark
Started with-apache-sparkStarted with-apache-spark
Started with-apache-spark
Happiest Minds Technologies
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_Veriticals
Peyman Mohajerian
 
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsfPyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
sasuke20y4sh
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
Dan Eaton
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
160606 data lifecycle project outline
160606 data lifecycle project outline160606 data lifecycle project outline
160606 data lifecycle project outline
Ian Duncan
 
Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache SparkStrata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
Databricks
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
DataStax Academy
 
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and REnabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and R
Databricks
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Luciano Resende
 
Databricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User GroupDatabricks Meetup @ Los Angeles Apache Spark User Group
Databricks Meetup @ Los Angeles Apache Spark User Group
Paco Nathan
 
Data Engineering A Deep Dive into Databricks
Data Engineering A Deep Dive into DatabricksData Engineering A Deep Dive into Databricks
Data Engineering A Deep Dive into Databricks
Knoldus Inc.
 
Strata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case StudiesStrata EU 2014: Spark Streaming Case Studies
Strata EU 2014: Spark Streaming Case Studies
Paco Nathan
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
 
Deep learning and Apache Spark
Deep learning and Apache SparkDeep learning and Apache Spark
Deep learning and Apache Spark
QuantUniversity
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Jason Dai
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
Timothy Spann
 
Media_Entertainment_Veriticals
Media_Entertainment_VeriticalsMedia_Entertainment_Veriticals
Media_Entertainment_Veriticals
Peyman Mohajerian
 
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsfPyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
Pyspark presentationsfspfsjfspfjsfpsjfspfjsfpsjfsfsf
sasuke20y4sh
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
Dan Eaton
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
160606 data lifecycle project outline
160606 data lifecycle project outline160606 data lifecycle project outline
160606 data lifecycle project outline
Ian Duncan
 
Strata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache SparkStrata NYC 2015 - Supercharging R with Apache Spark
Strata NYC 2015 - Supercharging R with Apache Spark
Databricks
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
DataStax Academy
 
Enabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and REnabling exploratory data science with Spark and R
Enabling exploratory data science with Spark and R
Databricks
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Michael Rys
 
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache BahirWriting Apache Spark and Apache Flink Applications Using Apache Bahir
Writing Apache Spark and Apache Flink Applications Using Apache Bahir
Luciano Resende
 
Ad

More from Patrick Nicolas (12)

Autonomous medical coding with discriminative transformers
Autonomous medical coding with discriminative transformersAutonomous medical coding with discriminative transformers
Autonomous medical coding with discriminative transformers
Patrick Nicolas
 
AI for electronic health records
AI for electronic health recordsAI for electronic health records
AI for electronic health records
Patrick Nicolas
 
Monadic genetic kernels in Scala
Monadic genetic kernels in ScalaMonadic genetic kernels in Scala
Monadic genetic kernels in Scala
Patrick Nicolas
 
Scala for Machine Learning
Scala for Machine LearningScala for Machine Learning
Scala for Machine Learning
Patrick Nicolas
 
Stock Market Prediction using Hidden Markov Models and Investor sentiment
Stock Market Prediction using Hidden Markov Models and Investor sentimentStock Market Prediction using Hidden Markov Models and Investor sentiment
Stock Market Prediction using Hidden Markov Models and Investor sentiment
Patrick Nicolas
 
Advanced Functional Programming in Scala
Advanced Functional Programming in ScalaAdvanced Functional Programming in Scala
Advanced Functional Programming in Scala
Patrick Nicolas
 
Adaptive Intrusion Detection Using Learning Classifiers
Adaptive Intrusion Detection Using Learning ClassifiersAdaptive Intrusion Detection Using Learning Classifiers
Adaptive Intrusion Detection Using Learning Classifiers
Patrick Nicolas
 
Data Modeling using Symbolic Regression
Data Modeling using Symbolic RegressionData Modeling using Symbolic Regression
Data Modeling using Symbolic Regression
Patrick Nicolas
 
Semantic Analysis using Wikipedia Taxonomy
Semantic Analysis using Wikipedia TaxonomySemantic Analysis using Wikipedia Taxonomy
Semantic Analysis using Wikipedia Taxonomy
Patrick Nicolas
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
Patrick Nicolas
 
Taxonomy-based Contextual Ads Targeting
Taxonomy-based Contextual Ads TargetingTaxonomy-based Contextual Ads Targeting
Taxonomy-based Contextual Ads Targeting
Patrick Nicolas
 
Multi-tenancy in Private Clouds
Multi-tenancy in Private CloudsMulti-tenancy in Private Clouds
Multi-tenancy in Private Clouds
Patrick Nicolas
 
Autonomous medical coding with discriminative transformers
Autonomous medical coding with discriminative transformersAutonomous medical coding with discriminative transformers
Autonomous medical coding with discriminative transformers
Patrick Nicolas
 
AI for electronic health records
AI for electronic health recordsAI for electronic health records
AI for electronic health records
Patrick Nicolas
 
Monadic genetic kernels in Scala
Monadic genetic kernels in ScalaMonadic genetic kernels in Scala
Monadic genetic kernels in Scala
Patrick Nicolas
 
Scala for Machine Learning
Scala for Machine LearningScala for Machine Learning
Scala for Machine Learning
Patrick Nicolas
 
Stock Market Prediction using Hidden Markov Models and Investor sentiment
Stock Market Prediction using Hidden Markov Models and Investor sentimentStock Market Prediction using Hidden Markov Models and Investor sentiment
Stock Market Prediction using Hidden Markov Models and Investor sentiment
Patrick Nicolas
 
Advanced Functional Programming in Scala
Advanced Functional Programming in ScalaAdvanced Functional Programming in Scala
Advanced Functional Programming in Scala
Patrick Nicolas
 
Adaptive Intrusion Detection Using Learning Classifiers
Adaptive Intrusion Detection Using Learning ClassifiersAdaptive Intrusion Detection Using Learning Classifiers
Adaptive Intrusion Detection Using Learning Classifiers
Patrick Nicolas
 
Data Modeling using Symbolic Regression
Data Modeling using Symbolic RegressionData Modeling using Symbolic Regression
Data Modeling using Symbolic Regression
Patrick Nicolas
 
Semantic Analysis using Wikipedia Taxonomy
Semantic Analysis using Wikipedia TaxonomySemantic Analysis using Wikipedia Taxonomy
Semantic Analysis using Wikipedia Taxonomy
Patrick Nicolas
 
Taxonomy-based Contextual Ads Targeting
Taxonomy-based Contextual Ads TargetingTaxonomy-based Contextual Ads Targeting
Taxonomy-based Contextual Ads Targeting
Patrick Nicolas
 
Multi-tenancy in Private Clouds
Multi-tenancy in Private CloudsMulti-tenancy in Private Clouds
Multi-tenancy in Private Clouds
Patrick Nicolas
 
Ad

Recently uploaded (20)

Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Induction Program of MTAB online session
Induction Program of MTAB online sessionInduction Program of MTAB online session
Induction Program of MTAB online session
LOHITH886892
 
brainstorming-techniques-infographics.pptx
brainstorming-techniques-infographics.pptxbrainstorming-techniques-infographics.pptx
brainstorming-techniques-infographics.pptx
maritzacastro321
 
i_o updated.pptx 6=₹cnjxifj,lsbd ধ and vjcjcdbgjfu n smn u cut the lb, it ও o...
i_o updated.pptx 6=₹cnjxifj,lsbd ধ and vjcjcdbgjfu n smn u cut the lb, it ও o...i_o updated.pptx 6=₹cnjxifj,lsbd ধ and vjcjcdbgjfu n smn u cut the lb, it ও o...
i_o updated.pptx 6=₹cnjxifj,lsbd ধ and vjcjcdbgjfu n smn u cut the lb, it ও o...
ggg032019
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Chromatography_Detailed_Information.docx
Chromatography_Detailed_Information.docxChromatography_Detailed_Information.docx
Chromatography_Detailed_Information.docx
NohaSalah45
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
AllContacts Vs AllSubscribers - SFMC.pptx
AllContacts Vs AllSubscribers - SFMC.pptxAllContacts Vs AllSubscribers - SFMC.pptx
AllContacts Vs AllSubscribers - SFMC.pptx
bpkr84
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Introcomputerscienceand datascience.pptx
Introcomputerscienceand datascience.pptxIntrocomputerscienceand datascience.pptx
Introcomputerscienceand datascience.pptx
abdulrehmanbscsf22
 
shit yudh slideshare power likha point presen
shit yudh slideshare power likha point presenshit yudh slideshare power likha point presen
shit yudh slideshare power likha point presen
vishalgurjar11229
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Induction Program of MTAB online session
Induction Program of MTAB online sessionInduction Program of MTAB online session
Induction Program of MTAB online session
LOHITH886892
 
brainstorming-techniques-infographics.pptx
brainstorming-techniques-infographics.pptxbrainstorming-techniques-infographics.pptx
brainstorming-techniques-infographics.pptx
maritzacastro321
 
i_o updated.pptx 6=₹cnjxifj,lsbd ধ and vjcjcdbgjfu n smn u cut the lb, it ও o...
i_o updated.pptx 6=₹cnjxifj,lsbd ধ and vjcjcdbgjfu n smn u cut the lb, it ও o...i_o updated.pptx 6=₹cnjxifj,lsbd ধ and vjcjcdbgjfu n smn u cut the lb, it ও o...
i_o updated.pptx 6=₹cnjxifj,lsbd ধ and vjcjcdbgjfu n smn u cut the lb, it ও o...
ggg032019
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Chromatography_Detailed_Information.docx
Chromatography_Detailed_Information.docxChromatography_Detailed_Information.docx
Chromatography_Detailed_Information.docx
NohaSalah45
 
computer organization and assembly language.docx
computer organization and assembly language.docxcomputer organization and assembly language.docx
computer organization and assembly language.docx
alisoftwareengineer1
 
AllContacts Vs AllSubscribers - SFMC.pptx
AllContacts Vs AllSubscribers - SFMC.pptxAllContacts Vs AllSubscribers - SFMC.pptx
AllContacts Vs AllSubscribers - SFMC.pptx
bpkr84
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Introcomputerscienceand datascience.pptx
Introcomputerscienceand datascience.pptxIntrocomputerscienceand datascience.pptx
Introcomputerscienceand datascience.pptx
abdulrehmanbscsf22
 
shit yudh slideshare power likha point presen
shit yudh slideshare power likha point presenshit yudh slideshare power likha point presen
shit yudh slideshare power likha point presen
vishalgurjar11229
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Calories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptxCalories_Prediction_using_Linear_Regression.pptx
Calories_Prediction_using_Linear_Regression.pptx
TijiLMAHESHWARI
 

Open Source Lambda Architecture for deep learning

  • 1. 𝜆 Open Source -Architecture for Deep Learning Use case Patrick R Nicolas Oct. 2020 [email protected]
  • 2. Overview 3 “… and the wise man said, thou shall embrace open source”. 21st century proverb Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
  • 3. Overview 4 Overview Layers Open-source components Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
  • 4. Overview 5 The world of data scientists accustomed to Python scientific libraries have been shaken up by the emergence of ’big data’ framework such as Apache Hadoop, Spark and Kafka. This presentation introduces a variant of the architecture and describes the seamless integration of various open source components to train, validate and test deep learning models. Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning 𝜆
  • 5. Disclaimer 6 The concept and architecture are versatile enough to accommodate a variety of open source, commercial solutions and services beside the frameworks prescribed in this presentation. For instance, deep learning frameworks, such as Keras or tensor flow are excellent alternatives to PyTorch. Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
  • 6. Requirements 7 • Process batch and stream data, concurrently • Enforce data immutability • Recover gracefully from human errors • Handle hardware failures • Minimize latency for real-time requests • Scale for very large data set • Optimize full lifecycle of data set • Guarantee quality and integrity of data Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning A ‘big data’ framework should be able to ….
  • 7. Optimizing data life cycle 8 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning The need for optimizing the data life cycle: 79% of data scientist time is spent collecting and organizing data. Source Quora
  • 8. Data quality 9 Accuracy: Correct models and representative data. Completeness: No missing data Consistency: Applied to semantic and format Timeliness: Up-to-date data and notification Accessibility: Ease of use and high availability Validity: Comply to constraints, rules and regulations Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Guaranteeing data quality and integrity
  • 9. Solution … 10 - architecture is a large scale data processing that balanced batch and real-time streamed data. It is a one-stop shopping for various data sources that balance latency, redundancy, easy of access and throughput. It breaks down into 3 layers • Speed (streaming, real-time, …) • Batch (training, analysis, …) • Serving (query, visualization, …) Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning 𝜆
  • 10. … using open source 11 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning architecture using open source components? 𝜆 The task consists of reviewing and evaluating the trove of available of open source libraries to build a robust architecture that support the rigor of training and tuning deep learning models. The libraries are weaved through a set language- agnostic REST API to form a coherent pipeline.
  • 11. … for deep learning 12 • Python scientific libraries have been the go-to tools for data scientists to analyze data and build models. • PyTorch framework builds up on these libraries to support the design and execution of deep learning models. • Apache Spark and Kafka complements these frameworks for very large data set and real-time processing. Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning architecture for deep learning? 𝜆
  • 12. Bird-eye view 13 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Feel overwhelmed? ... Let’s break it down Example open source 𝜆 architecture
  • 13. Layers 14 Overview Layers Open-source components Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
  • 14. Batch layer 15 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Batch layer objective: load batch of data to be distributed, preprocessed to train deep learning models.
  • 15. Batch layer 16 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Typical use case: 1. Apache Spark loads training set from Amazon S3 2. Spark master partitions training data 3. Spark workers preprocessed data and notify completion through Kafka event queue 4. Pytorch updated model parameters from pre- processed training data 5. Pytorch broadcast model parameters and quality metrics through Kafka 6. Apache Hive powered by Spark stores models related data and metrics
  • 16. Speed layer 17 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Speed layer objective: process queries to predictive models with very low latency.
  • 17. Speed layer 18 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Use case: 1. Kafka routes data streams to Spark master 2. Spark pre-processes requests and forward them to deep model micro-service 3. Flask converts requests to prediction query to Pytorch model 4. Pytorch model generate a prediction 5. Run-time metrics are broadcast through Kafka
  • 18. Serving layer 19 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Serving layer objective: process queries to analyze data, model performances and execute statistical inference
  • 19. Serving layer 20 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Use case: 1. Analyst queries relational data base, MySQL for most recent data, statistics using Fine report UI (low latency) 2. Analyst queries asynchronously Hive data warehouse for archived data, statistics (high latency) 3. Hive processes queries through Spark datasets 4. Spark updates regularly MySQL short term data
  • 20. Overview 21 Overview Layers Open-source components Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
  • 21. PyTorch 22 PyTorch is an optimized tensor library for deep learning using GPUs and CPUs. It extends the functionality of Numpy and Scikit- learn to support the training, evaluation and commercialization of complex machine learning models. https://pytorch.org/tutorials/ Alternatives: Tensor flow: https://www.tensorflow.org/ Keras: https://keras.io MxNet: https://mxnet.apache.org Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
  • 22. Apache Spark 23 Apache Spark is an open source cluster computing framework for fast real-time processing. It supports Scala, Java, Python and R programming languages and includes streaming, graph and machine learning libraries. https://www.scala-lang.org https://spark.apache.org Alternative: PySpark: https://databricks.com/glossary/pyspark Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
  • 23. Streaming 24 Apache Kafka is an open-source distributed event streaming framework to large scale, real-time data processing and analytics. It captures data from various sources in real-time as a continuous flow and routes it to the appropriate processor. https://kafka.apache.org Alternatives: Amazon SQS: https://aws.amazon.com/sqs/ RabbitMQ: https://www.rabbitmq.com Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
  • 24. Model tuning 25 Ray-tune is a distributed hyper-parameters tuning framework particularly suitable to deep learning models. It reduces significantly the cost of optimizing the configuration of a model. It is a wrapper around other open source libraries https://docs.ray.io/en/master/tune/index.html Alternatives: Amazon SageMaker: https://aws.amazon.com/sagemaker/ HyperOpt: https://github.com/hyperopt/hyperopt Optuna: https://optuna.readthedocs.io Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning
  • 25. Python REST service 26 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Flask is an easy to use implementation of the RESTful interface to Python applications. It supports most of web and deployment standards such Docker, React.js, Angular, HTML5 and WSGI containers. https://palletsprojects.com/p/flask/ Alternatives: Falcon: https://falcon.readthedocs.io Fast API: https://fastapi.tiangolo.com
  • 26. RDBMS 27 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning MySQL is an open source relational database supporting partitioning, sharding, replication. It can be extended with real-time analytics (Heatwave) and enterprise clustering (CGE) https://www.mysql.com Alternatives: PosgresSQL: https://www.postgresql.org HyperSQL http://www.hsqldb.org Amazon RDS: http://aws.amazon.com/rds
  • 27. Data warehouse 28 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Apache Hive is a data warehouse framework that leverages Spark to execute largely distributed SQL queries. It optimizes SQL queries through lazy evaluation of acyclic execution graph. It is integrated with Spark data set and HDFS. https://hive.apache.org Alternatives: Vertica http://www.vertica.com Amazon Redshift https://aws.amazon.com/redshift/
  • 28. Dashboard 29 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Fine report is a business intelligence and dashboard tool that supports real time analytics, reporting and visualization. It accomodates needs of business managers and data scientists https://www.finereport.com Alternatives: Sisense: https://www.sisense.com Tableau: https://www.tableau.com
  • 29. 30 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Final disclaimer This presentation is not an endorsement of the various tools, libraries or frameworks described or suggested in this presentation. Allthough the tools listed in the slides are known to work in the context of the architecture, there are excellent alternative libraries that may better meet your specific needs.
  • 30. 31 Patrick R. Nicolas - Open Source 𝜆 -Architecture for Deep Learning Thank you! Q&A