SlideShare a Scribd company logo
An Introduction to Machine Learning
with Python and scikit-learn
Julien Simon
Global Evangelist, AI and Machine Learning, AWS
@julsimon
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Agenda
• Machine Learning in 5 minutes
• Scikit-learn
• Algos & Demos: Linear Regression, Logistic Regression,
Decision Trees, K-Means, Principal Component Analysis
• Scikit-learn on Amazon SageMaker
• Resources
Machine Learning in 5 minutes
Artificial Intelligence: design software applications which
exhibit human-like behavior, e.g. speech, natural language
processing, reasoning or intuition
Machine Learning: using statistical algorithms, teach
machines to learn from featurized data without being
explicitly programmed
Deep Learning: using neural networks, teach machines to
learn from complex data where features cannot be explicitly
expressed
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Supervised learning
• Run an algorithm on a labeled data set.
• The model learns how to correctly predict the right answer.
• Regression and classification are examples of supervised learning.
Unsupervised learning
• Run an algorithm on an unlabeled data set.
• The model learns patterns and organizes samples accordingly.
• Clustering and topic modeling are examples of unsupervised learning.
Types of Machine Learning
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Data Visualization &
Analysis
Business Problem
ML problem framing Data Collection
Data Integration
Data Preparation &
Cleaning
Feature Engineering
Model Training &
Parameter Tuning
Model Evaluation
Are Business
Goals met?
Model Deployment
Monitoring &
Debugging
YesNo
DataAugmentation
Feature
Augmentation
The Machine Learning Process
Re-training
Predictions
Scikit-learn
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Scikit-learn
• Open Source library in Python released in February 2010
• Built on NumPy, SciPy, and matplotlib
• Simple tools for data analysis and Machine Learning
• Excellent collection of algorithms
• Very good documentation, tons of tutorials
• Limited scalability for data sets that don’t fit in RAM
• Not appropriate for Deep Learning (no GPU support)
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
http://scikit-learn.org/stable/tutorial/machine_learning_map/
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Linear Regression
https://en.wikipedia.org/wiki/Linear_regression
• Supervised learning algorithm
• Goal: fit data to a linear function in order to predict numerical values
• Data set: features + target (scalar or scalar vector)
• 1 feature à line, 2 features à plane, etc.
• Intuition: minimize the “distance” between data points and the linear
function
This can also be used for binary
classification: a sample is either “above”
or “below” the linear function
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Logistic Regression (1958)
https://en.wikipedia.org/wiki/Logistic_regression
• Supervised learning algorithm
• Goal: fit data to a linear function in order to predict the class of a sample
• Data set : features + binary label (yes/no, true/false, etc.)
• This algorithm can be extended to more than two classes
• Intuition: find a function computing a score between 0 and 1
and set a threshold separating both classes
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Decision Trees
https://en.wikipedia.org/wiki/Decision_tree
• Supervised learning algorithm
• Goal: build a decision tree for regression
or classification
• Data set : features + target value / class
• Intuition: find the “best” feature
thresholds to go left or right
• “Easy” to interpret, but prone to
overfitting
• Plenty of advanced variants with multiple
trees: Random Forests, XGBoost (2016),
etc.
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
K-means (1957)
https://en.wikipedia.org/wiki/K-means_clustering
• Unsupervised learning algorithm
• Goal: group samples in ‘k’ clusters
• Data set : features only
• Intuition: find ‘k’ cluster centers that
minimize the “distance” to their
respective samples
• This assumes “spherical” clusters of
similar ”radius”: maybe, maybe not!
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Principal Component Analysis aka PCA (1901!)
https://en.wikipedia.org/wiki/Principal_component_analysis
• Unsupervised learning algorithm
• Goal: build a new data set with a smaller number
of uncorrelated features (aka Dimensionality Reduction)
…keeping as much variance as the number of new features will allow
• Data set : features only
• Sample use cases:
• Visualize high-dimension data sets in 2D or 3D
• Remove correlation in high-dimension datasets
• Preliminary step to building linear models
Demos
- Linear Regression
- Logistic Regression
- Decision Trees
- K-Means
- PCA
- PCA + Logistic Regression on MNIST
https://gitlab.com/juliensimon/aws --> ML/scikit
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Scaling scikit-learn
• Scikit-learn runs on a single machine, loading the full data
set in memory
• Scaling options are quite limited
http://scikit-learn.org/stable/modules/computing.html
• Some algorithms can leverage multi-core (joblib)
• Some algorithms support incremental training
• Amazon SageMaker can help
• Use ML-optimized multi-core instances (C5)
• Use pipe mode, i.e. the ability to stream data from Amazon S3
© 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved.
Beyond scikit-learn
• Amazon SageMaker
• Train models on fully-managed infrastructure at any scale
• Built-in algorithms (17) for regression, classification, etc.
• Built-in environments for scikit-learn and Deep Learning librairies
• Apache Spark MLLib
• Available in Amazon Elastic Map Reduce (EMR)
• Distributed processing by design
• Nice collection of Machine Learning algorithms
• Seamless integration with Amazon SageMaker (Scala / PySpark SDK)
Resources
https://ml.aws
https://aws.amazon.com/sagemaker
https://scikit-learn.org
https://www.numpy.org
https://machinelearningmastery.com
https://medium.com/@julsimon
https://gitlab.com/juliensimon/aws
Thank you!
Julien Simon
Global Evangelist, AI and Machine Learning, AWS
@julsimon
Ad

More Related Content

What's hot (20)

Machine Learning with Apache Spark
Machine Learning with Apache SparkMachine Learning with Apache Spark
Machine Learning with Apache Spark
IBM Cloud Data Services
 
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle ManagementMLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
Databricks
 
CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache S...
CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache S...CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache S...
CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache S...
Databricks
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Databricks
 
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Databricks
 
Agile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsAgile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender Systems
Johann Schleier-Smith
 
Advanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflowAdvanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflow
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Databricks
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
Databricks
 
Bootstrapping of PySpark Models for Factorial A/B Tests
Bootstrapping of PySpark Models for Factorial A/B TestsBootstrapping of PySpark Models for Factorial A/B Tests
Bootstrapping of PySpark Models for Factorial A/B Tests
Databricks
 
Pinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at PinterestPinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at Pinterest
Alluxio, Inc.
 
Automate your Machine Learning
Automate your Machine LearningAutomate your Machine Learning
Automate your Machine Learning
Ajit Ananthram
 
Architecting for Data Science
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data Science
Johann Schleier-Smith
 
Introducing apache prediction io (incubating) (bay area spark meetup at sales...
Introducing apache prediction io (incubating) (bay area spark meetup at sales...Introducing apache prediction io (incubating) (bay area spark meetup at sales...
Introducing apache prediction io (incubating) (bay area spark meetup at sales...
Databricks
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Fred Madrid
 
NLP Text Recommendation System Journey to Automated Training
NLP Text Recommendation System Journey to Automated TrainingNLP Text Recommendation System Journey to Automated Training
NLP Text Recommendation System Journey to Automated Training
Databricks
 
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Databricks
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
Johann Schleier-Smith
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
Dmitry Petukhov
 
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle ManagementMLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
MLflow and Azure Machine Learning—The Power Couple for ML Lifecycle Management
Databricks
 
CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache S...
CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache S...CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache S...
CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache S...
Databricks
 
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...
Databricks
 
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Real-Time Fraud Detection at Scale—Integrating Real-Time Deep-Link Graph Anal...
Databricks
 
Agile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsAgile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender Systems
Johann Schleier-Smith
 
Advanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflowAdvanced Hyperparameter Optimization for Deep Learning with MLflow
Advanced Hyperparameter Optimization for Deep Learning with MLflow
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Databricks
 
Auto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine LearningAuto-Pilot for Apache Spark Using Machine Learning
Auto-Pilot for Apache Spark Using Machine Learning
Databricks
 
Bootstrapping of PySpark Models for Factorial A/B Tests
Bootstrapping of PySpark Models for Factorial A/B TestsBootstrapping of PySpark Models for Factorial A/B Tests
Bootstrapping of PySpark Models for Factorial A/B Tests
Databricks
 
Pinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at PinterestPinterest - Big Data Machine Learning Platform at Pinterest
Pinterest - Big Data Machine Learning Platform at Pinterest
Alluxio, Inc.
 
Automate your Machine Learning
Automate your Machine LearningAutomate your Machine Learning
Automate your Machine Learning
Ajit Ananthram
 
Introducing apache prediction io (incubating) (bay area spark meetup at sales...
Introducing apache prediction io (incubating) (bay area spark meetup at sales...Introducing apache prediction io (incubating) (bay area spark meetup at sales...
Introducing apache prediction io (incubating) (bay area spark meetup at sales...
Databricks
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Fred Madrid
 
NLP Text Recommendation System Journey to Automated Training
NLP Text Recommendation System Journey to Automated TrainingNLP Text Recommendation System Journey to Automated Training
NLP Text Recommendation System Journey to Automated Training
Databricks
 
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Databricks
 
An Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time ApplicationsAn Architecture for Agile Machine Learning in Real-Time Applications
An Architecture for Agile Machine Learning in Real-Time Applications
Johann Schleier-Smith
 

Similar to 16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning with Python and Scikit-Learn (17)

An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)
Julien SIMON
 
AWS re:Invent 2018 - ENT321 - SageMaker Workshop
AWS re:Invent 2018 - ENT321 - SageMaker WorkshopAWS re:Invent 2018 - ENT321 - SageMaker Workshop
AWS re:Invent 2018 - ENT321 - SageMaker Workshop
Julien SIMON
 
[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...
[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...
[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...
Amazon Web Services Korea
 
Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)
Julien SIMON
 
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Codiax
 
Practical data science
Practical data sciencePractical data science
Practical data science
Ding Li
 
Hands-on - Machine Learning using scikitLearn
Hands-on - Machine Learning using scikitLearnHands-on - Machine Learning using scikitLearn
Hands-on - Machine Learning using scikitLearn
avrtraining021
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
QuantUniversity
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
Databricks
 
Elegant database tuning
Elegant database tuningElegant database tuning
Elegant database tuning
Scott Mead
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
Alok Singh
 
Apache MXNet and Gluon
Apache MXNet and GluonApache MXNet and Gluon
Apache MXNet and Gluon
Soji Adeshina
 
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUGIntroducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
Sandesh Rao
 
An Introduction to Amazon SageMaker (October 2018)
An Introduction to Amazon SageMaker (October 2018)An Introduction to Amazon SageMaker (October 2018)
An Introduction to Amazon SageMaker (October 2018)
Julien SIMON
 
Amazon SageMaker workshop
Amazon SageMaker workshopAmazon SageMaker workshop
Amazon SageMaker workshop
Julien SIMON
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
Yalçın Yenigün
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
SrushtiSuvarna
 
An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)
Julien SIMON
 
AWS re:Invent 2018 - ENT321 - SageMaker Workshop
AWS re:Invent 2018 - ENT321 - SageMaker WorkshopAWS re:Invent 2018 - ENT321 - SageMaker Workshop
AWS re:Invent 2018 - ENT321 - SageMaker Workshop
Julien SIMON
 
[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...
[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...
[AWS Innovate 온라인 컨퍼런스] 간단한 Python 코드만으로 높은 성능의 기계 학습 모델 만들기 - 김무현, AWS Sr.데이...
Amazon Web Services Korea
 
Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)
Julien SIMON
 
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Codiax
 
Practical data science
Practical data sciencePractical data science
Practical data science
Ding Li
 
Hands-on - Machine Learning using scikitLearn
Hands-on - Machine Learning using scikitLearnHands-on - Machine Learning using scikitLearn
Hands-on - Machine Learning using scikitLearn
avrtraining021
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
QuantUniversity
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
Databricks
 
Elegant database tuning
Elegant database tuningElegant database tuning
Elegant database tuning
Scott Mead
 
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
ODSC18, London, How to build high performing weighted XGBoost ML Model for Re...
Alok Singh
 
Apache MXNet and Gluon
Apache MXNet and GluonApache MXNet and Gluon
Apache MXNet and Gluon
Soji Adeshina
 
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUGIntroducing new AIOps innovations in Oracle 19c - San Jose AICUG
Introducing new AIOps innovations in Oracle 19c - San Jose AICUG
Sandesh Rao
 
An Introduction to Amazon SageMaker (October 2018)
An Introduction to Amazon SageMaker (October 2018)An Introduction to Amazon SageMaker (October 2018)
An Introduction to Amazon SageMaker (October 2018)
Julien SIMON
 
Amazon SageMaker workshop
Amazon SageMaker workshopAmazon SageMaker workshop
Amazon SageMaker workshop
Julien SIMON
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
Yalçın Yenigün
 
Ad

More from Athens Big Data (20)

22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
Athens Big Data
 
21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system
21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system
21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system
Athens Big Data
 
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
Athens Big Data
 
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
Athens Big Data
 
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
Athens Big Data
 
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
Athens Big Data
 
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
Athens Big Data
 
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
Athens Big Data
 
19th Athens Big Data Meetup - 1st Talk - NLP understanding
19th Athens Big Data Meetup - 1st Talk - NLP understanding19th Athens Big Data Meetup - 1st Talk - NLP understanding
19th Athens Big Data Meetup - 1st Talk - NLP understanding
Athens Big Data
 
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
Athens Big Data
 
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
Athens Big Data
 
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
Athens Big Data
 
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
Athens Big Data
 
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
Athens Big Data
 
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
Athens Big Data
 
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
Athens Big Data
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
Athens Big Data
 
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
Athens Big Data
 
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading
Athens Big Data
 
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
Athens Big Data
 
21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system
21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system
21st Athens Big Data Meetup - 2nd Talk - Dive into ClickHouse storage system
Athens Big Data
 
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
19th Athens Big Data Meetup - 2nd Talk - NLP: From news recommendation to wor...
Athens Big Data
 
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
21st Athens Big Data Meetup - 3rd Talk - Dive into ClickHouse query execution
Athens Big Data
 
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
21st Athens Big Data Meetup - 1st Talk - Fast and simple data exploration wit...
Athens Big Data
 
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
20th Athens Big Data Meetup - 2nd Talk - Druid: under the covers
Athens Big Data
 
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
20th Athens Big Data Meetup - 3rd Talk - Message from our sponsor: Velti
Athens Big Data
 
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
20th Athens Big Data Meetup - 1st Talk - Druid: the open source, performant, ...
Athens Big Data
 
19th Athens Big Data Meetup - 1st Talk - NLP understanding
19th Athens Big Data Meetup - 1st Talk - NLP understanding19th Athens Big Data Meetup - 1st Talk - NLP understanding
19th Athens Big Data Meetup - 1st Talk - NLP understanding
Athens Big Data
 
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
18th Athens Big Data Meetup - 2nd Talk - Run Spark and Flink Jobs on Kubernetes
Athens Big Data
 
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
18th Athens Big Data Meetup - 1st Talk - Timeseries Forecasting as a Service
Athens Big Data
 
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
17th Athens Big Data Meetup - 2nd Talk - Data Flow Building and Calculation P...
Athens Big Data
 
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
17th Athens Big Data Meetup - 1st Talk - Speedup Machine Application Learning...
Athens Big Data
 
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
16th Athens Big Data Meetup - 2nd Talk - A Focus on Building and Optimizing M...
Athens Big Data
 
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
15th Athens Big Data Meetup - 1st Talk - Running Spark On Mesos
Athens Big Data
 
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
Athens Big Data
 
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
14th Athens Big Data Meetup - Landoop Workshop - Apache Kafka Entering The St...
Athens Big Data
 
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
13th Athens Big Data Meetup - 2nd Talk - Training Neural Networks With Enterp...
Athens Big Data
 
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
11th Athens Big Data Meetup - 2nd Talk - Beyond Bitcoin; Blockchain Technolog...
Athens Big Data
 
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading
9th Athens Big Data Meetup - 2nd Talk - Lead Scoring And Grading
Athens Big Data
 
Ad

Recently uploaded (20)

Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 

16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning with Python and Scikit-Learn

  • 1. An Introduction to Machine Learning with Python and scikit-learn Julien Simon Global Evangelist, AI and Machine Learning, AWS @julsimon
  • 2. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Agenda • Machine Learning in 5 minutes • Scikit-learn • Algos & Demos: Linear Regression, Logistic Regression, Decision Trees, K-Means, Principal Component Analysis • Scikit-learn on Amazon SageMaker • Resources
  • 4. Artificial Intelligence: design software applications which exhibit human-like behavior, e.g. speech, natural language processing, reasoning or intuition Machine Learning: using statistical algorithms, teach machines to learn from featurized data without being explicitly programmed Deep Learning: using neural networks, teach machines to learn from complex data where features cannot be explicitly expressed
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Supervised learning • Run an algorithm on a labeled data set. • The model learns how to correctly predict the right answer. • Regression and classification are examples of supervised learning. Unsupervised learning • Run an algorithm on an unlabeled data set. • The model learns patterns and organizes samples accordingly. • Clustering and topic modeling are examples of unsupervised learning. Types of Machine Learning
  • 6. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Data Visualization & Analysis Business Problem ML problem framing Data Collection Data Integration Data Preparation & Cleaning Feature Engineering Model Training & Parameter Tuning Model Evaluation Are Business Goals met? Model Deployment Monitoring & Debugging YesNo DataAugmentation Feature Augmentation The Machine Learning Process Re-training Predictions
  • 8. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Scikit-learn • Open Source library in Python released in February 2010 • Built on NumPy, SciPy, and matplotlib • Simple tools for data analysis and Machine Learning • Excellent collection of algorithms • Very good documentation, tons of tutorials • Limited scalability for data sets that don’t fit in RAM • Not appropriate for Deep Learning (no GPU support)
  • 9. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. http://scikit-learn.org/stable/tutorial/machine_learning_map/
  • 10. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Linear Regression https://en.wikipedia.org/wiki/Linear_regression • Supervised learning algorithm • Goal: fit data to a linear function in order to predict numerical values • Data set: features + target (scalar or scalar vector) • 1 feature à line, 2 features à plane, etc. • Intuition: minimize the “distance” between data points and the linear function This can also be used for binary classification: a sample is either “above” or “below” the linear function
  • 11. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Logistic Regression (1958) https://en.wikipedia.org/wiki/Logistic_regression • Supervised learning algorithm • Goal: fit data to a linear function in order to predict the class of a sample • Data set : features + binary label (yes/no, true/false, etc.) • This algorithm can be extended to more than two classes • Intuition: find a function computing a score between 0 and 1 and set a threshold separating both classes
  • 12. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Decision Trees https://en.wikipedia.org/wiki/Decision_tree • Supervised learning algorithm • Goal: build a decision tree for regression or classification • Data set : features + target value / class • Intuition: find the “best” feature thresholds to go left or right • “Easy” to interpret, but prone to overfitting • Plenty of advanced variants with multiple trees: Random Forests, XGBoost (2016), etc.
  • 13. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. K-means (1957) https://en.wikipedia.org/wiki/K-means_clustering • Unsupervised learning algorithm • Goal: group samples in ‘k’ clusters • Data set : features only • Intuition: find ‘k’ cluster centers that minimize the “distance” to their respective samples • This assumes “spherical” clusters of similar ”radius”: maybe, maybe not!
  • 14. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Principal Component Analysis aka PCA (1901!) https://en.wikipedia.org/wiki/Principal_component_analysis • Unsupervised learning algorithm • Goal: build a new data set with a smaller number of uncorrelated features (aka Dimensionality Reduction) …keeping as much variance as the number of new features will allow • Data set : features only • Sample use cases: • Visualize high-dimension data sets in 2D or 3D • Remove correlation in high-dimension datasets • Preliminary step to building linear models
  • 15. Demos - Linear Regression - Logistic Regression - Decision Trees - K-Means - PCA - PCA + Logistic Regression on MNIST https://gitlab.com/juliensimon/aws --> ML/scikit
  • 16. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Scaling scikit-learn • Scikit-learn runs on a single machine, loading the full data set in memory • Scaling options are quite limited http://scikit-learn.org/stable/modules/computing.html • Some algorithms can leverage multi-core (joblib) • Some algorithms support incremental training • Amazon SageMaker can help • Use ML-optimized multi-core instances (C5) • Use pipe mode, i.e. the ability to stream data from Amazon S3
  • 17. © 2018, Amazon Web Services, Inc. or Its Affiliates. All rights reserved. Beyond scikit-learn • Amazon SageMaker • Train models on fully-managed infrastructure at any scale • Built-in algorithms (17) for regression, classification, etc. • Built-in environments for scikit-learn and Deep Learning librairies • Apache Spark MLLib • Available in Amazon Elastic Map Reduce (EMR) • Distributed processing by design • Nice collection of Machine Learning algorithms • Seamless integration with Amazon SageMaker (Scala / PySpark SDK)
  • 19. Thank you! Julien Simon Global Evangelist, AI and Machine Learning, AWS @julsimon