SlideShare a Scribd company logo
© 2017 MapR Technologies
Spark Machine Learning
Carol McDonald
@caroljmcdonald
© 2017 MapR Technologies
Agenda
•  Introduction to Machine Learning Techniques
–  Classification
–  Clustering
•  Use Decision Tree to Predict Customer Churn
© 2017 MapR Technologies
What is Machine Learning?
Data Build ModelTrain Algorithm
Finds patterns
New Data Use Model
(prediction function)
Predictions
Contains patterns Recognizes patterns
© 2017 MapR Technologies
Examples of ML Algorithms
Supervised
•  Classification
–  Naïve Bayes
–  SVM
–  Random Decision
Forests
•  Regression
–  Linear
–  Logistic
Machine Learning
Unsupervised
•  Clustering
–  K-means
•  Dimensionality reduction
–  Principal Component
Analysis
–  SVD
© 2017 MapR Technologies
Supervised Algorithms use labeled data
Data
features
Build Model
New Data
features
Predict
Use Model
© 2017 MapR Technologies
Supervised Machine Learning: Classification & Regression
Classification
Identifies
category for item
© 2017 MapR Technologies
Classification: Definition
Form of ML that:
•  Identifies which category an item belongs to
•  Uses supervised learning algorithms
–  Data is labeled
Sentiment
© 2017 MapR Technologies
If it Walks/Swims/Quacks Like a Duck …… Then It Must Be a Duck
swims
walks
quacks
Features:
walks
quacks
swims
Features:
© 2017 MapR Technologies
Car Insurance Fraud Example
•  What are we trying to predict?
–  This is the Label or Target outcome:
–  The amount of Fraud
•  What are the “if questions” or properties we can use to predict?
–  These are the Features:
–  The claim Amount
© 2017 MapR Technologies
Label:
Amount of Fraud
Y
X
Feature: claimed amount
Data point: fraud amount,
claimed amount
AmntFraud = intercept + coeff * claimedAmnt
Car Insurance Fraud Regression Example
© 2017 MapR Technologies
Credit Card Fraud Example
•  What are we trying to predict?
–  This is the Label:
–  The probability of Fraud
•  What are the “if questions” or properties we can use to predict?
–  These are the Features:
–  transaction amount, type of merchant, distance from and time since last transaction
© 2017 MapR Technologies
Label
Probabilty
of Fraud 1
X
Features: trans amount, type of store,
Time Location difference last trans.
Fraud
0
Not Fraud
.5
Credit Card Fraud Logistic Regression Example
© 2017 MapR Technologies
Supervised Learning: Classification & Regression
•  Classification:
–  identifies which category (eg fraud or not fraud)
•  Linear Regression:
–  predicts a value (eg amount of fraud)
•  Logistic Regression:
–  predicts a probability (eg probability of fraud)
© 2017 MapR Technologies
Examples of ML Algorithms
Machine Learning
Unsupervised
•  Clustering
–  K-means
•  Dimensionality reduction
–  Principal Component
Analysis
–  SVD
Supervised
•  Classification
–  Naïve Bayes
–  SVM
–  Random Decision
Forests
•  Regression
–  Linear
–  Logistic
© 2017 MapR Technologies
Unsupervised Algorithms use Unlabeled data
Customer GroupsBuild ModelTrain Algorithm
Finds patterns
New Customer
Purchase Data
Use Model
(prediction function) Predict Group
Contains patterns Recognizes patterns
Customer purchase
data
© 2017 MapR Technologies
Unsupervised Machine Learning: Clustering
Clustering
group news articles into different categories
© 2017 MapR Technologies
Clustering: Definition
•  Unsupervised learning task
•  Groups objects into clusters of high similarity
© 2017 MapR Technologies
Clustering: Definition
•  Unsupervised learning task
•  Groups objects into clusters of high similarity
–  Search results grouping
–  Grouping of customers
–  Anomaly detection
–  Text categorization
© 2017 MapR Technologies
Clustering: Example
•  Group similar objects
© 2017 MapR Technologies
Clustering: Example
•  Group similar objects
•  Use MLlib K-means algorithm
1.  Initialize coordinates to center
of clusters (centroid)
x
x
x
x
x
© 2017 MapR Technologies
Clustering: Example
•  Group similar objects
•  Use MLlib K-means algorithm
1.  Initialize coordinates to center
of clusters (centroid)
2.  Assign all points to nearest
centroid
x
x
x
x
x
© 2017 MapR Technologies
Clustering: Example
•  Group similar objects
•  Use MLlib K-means algorithm
1.  Initialize coordinates to center
of clusters (centroid)
2.  Assign all points to nearest
centroid
3.  Update centroids to center of
points
x
x
x
x
x
© 2017 MapR Technologies
Clustering: Example
•  Group similar objects
•  Use MLlib K-means algorithm
1.  Initialize coordinates to center
of clusters (centroid)
2.  Assign all points to nearest
centroid
3.  Update centroids to center of
points
4.  Repeat until conditions met
x
x
x
x
x
© 2017 MapR Technologies
Predict Churn
© 2017 MapR Technologies
ML Discovery Model Building
Model
Training/
Building
Training
Set
Test Model
Predictions
Test
Set
Evaluate Results
Historical
Data
Deployed
Model
Predictions
Data
Discovery,
Model
Creation
Production
Feature Extraction
Feature
Extraction
New
Data
Customer Data
Call Center
Records
Web
Clickstream
Server Logs
●  Churn Modelling
© 2017 MapR Technologies
Telecom Customer Churn Data
•  State: string
•  Account length: integer
•  Area code: integer
•  International plan: string
•  Voice mail plan: string
•  Number vmail messages: integer
•  Total day minutes: double
•  Total day calls: integer
•  Total day charge: double
•  Total eve minutes: double
•  Total eve calls: integer
•  Total eve charge: double
•  Total night minutes: double
•  Total night calls: integer
•  Total night charge: double
•  Total intl minutes: double
•  Total intl calls: integer
•  Total intl charge: double
•  Customer service calls: integer
© 2017 MapR Technologies
Customer Churn Example
•  What are we trying to predict?
–  This is the Label:
–  Did the customer churn? True or False
•  What are the “if questions” or properties we can use to predict?
–  These are the Features:
–  Number of Customer service calls, Total day minutes …
© 2017 MapR Technologies
Decision Trees
•  Decision Tree for Classification
prediction
•  Represents tree with nodes
•  IF THEN ELSE questions using
features at each node
•  Answers branch to child nodes
If the number of customer
service calls < 3
If the total day
minutes > 200
Churned: T
If the total day
minutes < 200
Churned: F
T
Churned: T Churned: F
F
FF TT
© 2017 MapR Technologies
Example Decision Tree
© 2017 MapR Technologies
Spark ML workflow
© 2017 MapR Technologies
Spark ML workflow with a Pipeline
Pipeline
Estimator
Extract
Features
Load
Data
Train
Model
Estimator
Data
frame
Transformer
Cross
Validate
Pipeline Model
TransformerTest
Data
frame
Evaluate
fit
Train
Load
Data
Evaluator
Predict
With model
Extract
Features Evaluator
transform
© 2017 MapR Technologies
Zeppelin Notebook with Spark
Data
Engineer
Data
Scientist
© 2017 MapR Technologies
Load the data into a Dataframe: Define the Schema
case class Account(state: String, len: Integer, acode: String,
intlplan: String, vplan: String, numvmail: Double,
tdmins: Double, tdcalls: Double, tdcharge: Double,
temins: Double, tecalls: Double, techarge: Double,
tnmins: Double, tncalls: Double, tncharge: Double,
timins: Double, ticalls: Double, ticharge: Double,
numcs: Double, churn: String)
Input CSV File sample:
KS,128,415,No,Yes,25,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False
OH,107,415,No,Yes,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
© 2017 MapR Technologies
Data
Frame
Load data
Load the data into a Dataset
val train: Dataset[Account] = spark.read.option("inferSchema", "false")
.schema(schema).csv("/user/user01/data/churn-bigml-80.csv").as[Account]
© 2017 MapR Technologies
Dataset merged with Dataframe
in Spark 2.0, DataFrame APIs merged with Datasets APIs
© 2017 MapR Technologies
Extract the Features
Image reference O’Reilly Learning Spark
+
+
̶+
̶ ̶
Feature Vectors and Label Model
Featurization Training
Model
Evaluation
Best Model
Label:
Churned=T
Features:
Number customer
Service calls
Number day minutes
Training Data
Label:
Churned=F
Features:
Number customer
Service calls
Number day minutes
+
+
̶+
̶ ̶
+
+
̶+
̶ ̶
+
+
̶+
̶ ̶
+
+
̶+
̶ ̶
© 2017 MapR Technologies
Data
Frame
Add column
Use StringIndexer to map Strings to Numbers
val ipindexer = new StringIndexer()
.setInputCol("intlplan")
.setOutputCol("iplanIndex”)
Data
Frame
© 2017 MapR Technologies
Data
Frame
Add column
Use StringIndexer to map churn True False to Numbers
Val labelindexer = new StringIndexer()
.setInputCol(”churn")
.setOutputCol(”label”)
Data
Frame
© 2017 MapR Technologies
Data
Frame
Load data Add column DataFrame +
Features
Use VectorAssembler to put features in vector column
val featureCols = Array(”temins", "iplanIndex", "tdmins", "tdcalls”…)
val assembler = new VectorAssembler()
.setInputCols(featureCols)
.setOutputCol("features")
© 2017 MapR Technologies
Data
Frame
Load data transform
Estimator
val dTree = new DecisionTreeClassifier()
.setLabelCol("label")
.setFeaturesCol("features")
Create DecisionTree Estimator, Set Label and Features
DataFrame +
Features
© 2017 MapR Technologies
val pipeline = new Pipeline()
.setStages(Array(ipindexer, labelindexer, assembler, dTree))
Put Feature Transformers and Estimator in Pipeline
Pipeline
ipIndexer
feature
transform
assembler
Dtree
estimatorlabelindexer
feature
transform
assemble
Features
Produce
model
© 2017 MapR Technologies
Spark ML workflow with a Pipeline
Pipeline
Transfomers
Load
Data
estimator
Train model
Data
frame
Extract
Features
evaluator
Pipeline
Model
Test
Data
frame
evaluator
Use fitted model
Train
Load
Data
fit
transform
© 2017 MapR Technologies
K-fold Cross-Validation Process
Data
Model
Training/
Building
Training
Set
Test Model
Predictions
Test
Set
data is randomly split into K partition training and test dataset pairs
© 2017 MapR Technologies
K-fold Cross-Validation Process
Data
Model
Training
Training
Set
Test Model
Predictions
Test
Set
Train algorithm with training dataset
© 2017 MapR Technologies
ML Cross-Validation Process
Data Model
Training
Set
Test Model
Predictions
Test
Set
Evaluate the model with the Test Set
© 2017 MapR Technologies
K-fold Cross-Validation Process
Data
Model
Training/
Building
Training
Set
Test Model
Predictions
Test
Set
Train/Test loop K times
Repeat K times
select the Model produced by the best-performing set of parameters
© 2017 MapR Technologies
Cross Validation transformation estimation pipeline
Pipeline
Cross Validator
evaluatorParameter Grid
fit
Set up a CrossValidator with:
•  Parameter grid
•  Estimator (pipeline)
•  Evaluator
Perform grid search based model
selection
© 2017 MapR Technologies
Parameter Tuning with CrossValidator with a Paramgrid
CrossValidator
•  Given:
–  Estimator
–  Parameter grid
–  Evaluator
•  Find best parameters and
model
val paramGrid = new ParamGridBuilder()
.addGrid(dTree.maxDepth,
Array(2,3,4,5,6,7)).build()
val evaluator= new BinaryClassificationEvaluator()
.setLabelCol("label")
.setRawPredictionCol("prediction")
val crossval = new CrossValidator()
.setEstimator(pipeline)
.setEvaluator(evaluator)
.setEstimatorParamMaps(paramGrid)
.setNumFolds(3)
© 2017 MapR Technologies
val cvModel = crossval.fit(ntrain)
Cross Validator fit a model to the data
Pipeline
Cross Validator
evaluatorParameter Grid
fit
Pipeline
Model
fit a model to the data
with provided parameter grid
© 2017 MapR Technologies
Evaluate the fitted model
Pipeline
Transfomers
Load
Data
estimator
Train model
Data
frame
Extract
Features
evaluator
Pipeline
Model
Test
Data
frame
evaluator
transform
Train
Load
Data
Predict
With model
Extract
Features
fit
© 2017 MapR Technologies
fitted
model
Evaluate the Predictions from DecisionTree Estimator
Evaluator
transform
Test
features
val predictions = cvModel.transform(test)
val accuracy = evaluator.evaluate(predictions)
evaluate
prediction accuracy
© 2017 MapR Technologies
Area under the ROC curve
Accuracy is measured by the area under
the ROC curve.
The area measures correct classifications
•  An area of 1 represents a perfect test
•  an area of .5 represents a worthless
test
© 2017 MapR Technologies
To Learn More:
•  Read about and download example code
•  https://mapr.com/blog/churn-prediction-sparkml/
© 2017 MapR Technologies
To Learn More:
•  End to End Application for Monitoring Uber Data using Spark ML
•  https://mapr.com/blog/monitoring-real-time-uber-data-using-spark-machine-
learning-streaming-and-kafka-api-part-1/
© 2017 MapR Technologies
To Learn More:
•  MapR Free ODT http://learn.mapr.com/
© 2017 MapR Technologies
For Q&A :
•  https://community.mapr.com/
•  https://community.mapr.com/community/answers/pages/qa
© 2017 MapR Technologies
Open Source Engines & Tools Commercial Engines & Applications
Enterprise-Grade Platform Services
DataProcessing
Web-Scale Storage
MapR-XD MapR-DB
Search
and
Others
Real Time Unified Security Multi-tenancy Disaster
Recovery
Global NamespaceHigh Availability
MapR Streams
Cloud
and
Managed
Services
Search and
Others
UnifiedManagementandMonitoring
Search
and
Others
Event StreamingDatabase
Custom
Apps
MapR Converged Data Platform
HDFS API POSIX, NFS Kakfa APIHBase API OJAI API
© 2017 MapR Technologies
Q&A
ENGAGE WITH US
Ad

More Related Content

What's hot (20)

Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Edureka!
 
Regression Analysis Research Presentation
Regression Analysis Research PresentationRegression Analysis Research Presentation
Regression Analysis Research Presentation
DianaWilbur
 
Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using Classification
Vishva Abeyrathne
 
1.-Lecture-Notes-in-Statistics-POWERPOINT.pptx
1.-Lecture-Notes-in-Statistics-POWERPOINT.pptx1.-Lecture-Notes-in-Statistics-POWERPOINT.pptx
1.-Lecture-Notes-in-Statistics-POWERPOINT.pptx
AngelineAbella2
 
Logistic Regression Analysis
Logistic Regression AnalysisLogistic Regression Analysis
Logistic Regression Analysis
COSTARCH Analytical Consulting (P) Ltd.
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
Aniket Patil
 
Multiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionMultiple Regression and Logistic Regression
Multiple Regression and Logistic Regression
Kaushik Rajan
 
Binary Logistic Regression
Binary Logistic RegressionBinary Logistic Regression
Binary Logistic Regression
Seth Anandaram Jaipuria College
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear Regression
Ravindra Nath Shukla
 
Survival_Analysis
Survival_AnalysisSurvival_Analysis
Survival_Analysis
Rushil Goyal
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear Regression
Indus University
 
K Nearest Neighbor Algorithm
K Nearest Neighbor AlgorithmK Nearest Neighbor Algorithm
K Nearest Neighbor Algorithm
Tharuka Vishwajith Sarathchandra
 
Statistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-testsStatistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-tests
Eugene Yan Ziyou
 
Customer Segmentation
Customer SegmentationCustomer Segmentation
Customer Segmentation
Learnbay Datascience
 
Correlation and Simple Regression
Correlation  and Simple RegressionCorrelation  and Simple Regression
Correlation and Simple Regression
Venkata Reddy Konasani
 
churn prediction in telecom
churn prediction in telecom churn prediction in telecom
churn prediction in telecom
Hong Bui Van
 
Telecommunication Analysis (3 use-cases) with IBM watson analytics
Telecommunication Analysis (3 use-cases) with IBM watson analyticsTelecommunication Analysis (3 use-cases) with IBM watson analytics
Telecommunication Analysis (3 use-cases) with IBM watson analytics
sheetal sharma
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
ASAD ALI
 
Logistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression : Use Case | Background | Advantages | DisadvantagesLogistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression : Use Case | Background | Advantages | Disadvantages
Rajat Sharma
 
Logistic Regression.pptx
Logistic Regression.pptxLogistic Regression.pptx
Logistic Regression.pptx
Muskaan194530
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Edureka!
 
Regression Analysis Research Presentation
Regression Analysis Research PresentationRegression Analysis Research Presentation
Regression Analysis Research Presentation
DianaWilbur
 
Predicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using ClassificationPredicting Bank Customer Churn Using Classification
Predicting Bank Customer Churn Using Classification
Vishva Abeyrathne
 
1.-Lecture-Notes-in-Statistics-POWERPOINT.pptx
1.-Lecture-Notes-in-Statistics-POWERPOINT.pptx1.-Lecture-Notes-in-Statistics-POWERPOINT.pptx
1.-Lecture-Notes-in-Statistics-POWERPOINT.pptx
AngelineAbella2
 
Customer_Churn_prediction.pptx
Customer_Churn_prediction.pptxCustomer_Churn_prediction.pptx
Customer_Churn_prediction.pptx
Aniket Patil
 
Multiple Regression and Logistic Regression
Multiple Regression and Logistic RegressionMultiple Regression and Logistic Regression
Multiple Regression and Logistic Regression
Kaushik Rajan
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear Regression
Ravindra Nath Shukla
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear Regression
Indus University
 
Statistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-testsStatistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-tests
Eugene Yan Ziyou
 
churn prediction in telecom
churn prediction in telecom churn prediction in telecom
churn prediction in telecom
Hong Bui Van
 
Telecommunication Analysis (3 use-cases) with IBM watson analytics
Telecommunication Analysis (3 use-cases) with IBM watson analyticsTelecommunication Analysis (3 use-cases) with IBM watson analytics
Telecommunication Analysis (3 use-cases) with IBM watson analytics
sheetal sharma
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
ASAD ALI
 
Logistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression : Use Case | Background | Advantages | DisadvantagesLogistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression : Use Case | Background | Advantages | Disadvantages
Rajat Sharma
 
Logistic Regression.pptx
Logistic Regression.pptxLogistic Regression.pptx
Logistic Regression.pptx
Muskaan194530
 

Similar to Live Machine Learning Tutorial: Churn Prediction (20)

Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningDemystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep Learning
Carol McDonald
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Carol McDonald
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Carol McDonald
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Carol McDonald
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
MapR Technologies
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Carol McDonald
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
Kevin Lee
 
Vi sem
Vi semVi sem
Vi sem
Lavesh Kaushik
 
Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)
Vadlamudi Saketh
 
big-data-anallytics.pptx
big-data-anallytics.pptxbig-data-anallytics.pptx
big-data-anallytics.pptx
Sangamesh Kalyan
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
Dev Raj Gautam
 
Imtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsImtiaz khan data_science_analytics
Imtiaz khan data_science_analytics
imtiaz khan
 
Data Mining algorithms PPT with Overview explanation.
Data Mining algorithms PPT with Overview explanation.Data Mining algorithms PPT with Overview explanation.
Data Mining algorithms PPT with Overview explanation.
promptitude123456789
 
Data Mining 101
Data Mining 101Data Mining 101
Data Mining 101
Ali Septiandri
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
MapR Technologies
 
7 inspiring Big Data factories in AWS
7 inspiring Big Data factories in AWS7 inspiring Big Data factories in AWS
7 inspiring Big Data factories in AWS
Sebastien BONNOTTE
 
Using Machine Learning in the delivery of ads
Using Machine Learning in the delivery of adsUsing Machine Learning in the delivery of ads
Using Machine Learning in the delivery of ads
Ruth Garcia Gavilanes
 
Big Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with RedisBig Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with Redis
Matt Stubbs
 
Alpine Tech Talk: System ML by Berthold Reinwald
Alpine Tech Talk: System ML by Berthold ReinwaldAlpine Tech Talk: System ML by Berthold Reinwald
Alpine Tech Talk: System ML by Berthold Reinwald
Chester Chen
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena Pekez
Institute of Contemporary Sciences
 
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningDemystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep Learning
Carol McDonald
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real- T...
Carol McDonald
 
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Applying Machine Learning to IOT: End to End Distributed Pipeline for Real-Ti...
Carol McDonald
 
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Streaming Machine learning Distributed Pipeline for Real-Time Uber Data Using...
Carol McDonald
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
MapR Technologies
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Carol McDonald
 
Machine Learning : why we should know and how it works
Machine Learning : why we should know and how it worksMachine Learning : why we should know and how it works
Machine Learning : why we should know and how it works
Kevin Lee
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
Dev Raj Gautam
 
Imtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsImtiaz khan data_science_analytics
Imtiaz khan data_science_analytics
imtiaz khan
 
Data Mining algorithms PPT with Overview explanation.
Data Mining algorithms PPT with Overview explanation.Data Mining algorithms PPT with Overview explanation.
Data Mining algorithms PPT with Overview explanation.
promptitude123456789
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
MapR Technologies
 
7 inspiring Big Data factories in AWS
7 inspiring Big Data factories in AWS7 inspiring Big Data factories in AWS
7 inspiring Big Data factories in AWS
Sebastien BONNOTTE
 
Using Machine Learning in the delivery of ads
Using Machine Learning in the delivery of adsUsing Machine Learning in the delivery of ads
Using Machine Learning in the delivery of ads
Ruth Garcia Gavilanes
 
Big Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with RedisBig Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with Redis
Matt Stubbs
 
Alpine Tech Talk: System ML by Berthold Reinwald
Alpine Tech Talk: System ML by Berthold ReinwaldAlpine Tech Talk: System ML by Berthold Reinwald
Alpine Tech Talk: System ML by Berthold Reinwald
Chester Chen
 
Solving churn challenge in Big Data environment - Jelena Pekez
Solving churn challenge in Big Data environment  - Jelena PekezSolving churn challenge in Big Data environment  - Jelena Pekez
Solving churn challenge in Big Data environment - Jelena Pekez
Institute of Contemporary Sciences
 
Ad

More from MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
MapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
MapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
MapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
MapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
MapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
MapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
MapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
MapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
MapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
MapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
MapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
MapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
MapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
MapR Technologies
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0
MapR Technologies
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
MapR Technologies
 
Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
MapR Technologies
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
MapR Technologies
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
MapR Technologies
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
MapR Technologies
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
MapR Technologies
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
MapR Technologies
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
MapR Technologies
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
MapR Technologies
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
MapR Technologies
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
MapR Technologies
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
MapR Technologies
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
MapR Technologies
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
MapR Technologies
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
MapR Technologies
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
MapR Technologies
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
MapR Technologies
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
MapR Technologies
 
Evolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and RainEvolving Beyond the Data Lake: A Story of Wind and Rain
Evolving Beyond the Data Lake: A Story of Wind and Rain
MapR Technologies
 
Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0Open Source Innovations in the MapR Ecosystem Pack 2.0
Open Source Innovations in the MapR Ecosystem Pack 2.0
MapR Technologies
 
How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications How Spark is Enabling the New Wave of Converged Cloud Applications
How Spark is Enabling the New Wave of Converged Cloud Applications
MapR Technologies
 
Ad

Recently uploaded (20)

Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia03 Daniel 2-notes.ppt seminario escatologia
03 Daniel 2-notes.ppt seminario escatologia
Alexander Romero Arosquipa
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
chapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.pptchapter3 Central Tendency statistics.ppt
chapter3 Central Tendency statistics.ppt
justinebandajbn
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Data Science Courses in India iim skills
Data Science Courses in India iim skillsData Science Courses in India iim skills
Data Science Courses in India iim skills
dharnathakur29
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..Secure_File_Storage_Hybrid_Cryptography.pptx..
Secure_File_Storage_Hybrid_Cryptography.pptx..
yuvarajreddy2002
 
Principles of information security Chapter 5.ppt
Principles of information security Chapter 5.pptPrinciples of information security Chapter 5.ppt
Principles of information security Chapter 5.ppt
EstherBaguma
 
04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story04302025_CCC TUG_DataVista: The Design Story
04302025_CCC TUG_DataVista: The Design Story
ccctableauusergroup
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
Adobe Analytics NOAM Central User Group April 2025 Agent AI: Uncovering the S...
gmuir1066
 
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your CompetitorsAI Competitor Analysis: How to Monitor and Outperform Your Competitors
AI Competitor Analysis: How to Monitor and Outperform Your Competitors
Contify
 
Conic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptxConic Sectionfaggavahabaayhahahahahs.pptx
Conic Sectionfaggavahabaayhahahahahs.pptx
taiwanesechetan
 
FPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptxFPET_Implementation_2_MA to 360 Engage Direct.pptx
FPET_Implementation_2_MA to 360 Engage Direct.pptx
ssuser4ef83d
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 

Live Machine Learning Tutorial: Churn Prediction

  • 1. © 2017 MapR Technologies Spark Machine Learning Carol McDonald @caroljmcdonald
  • 2. © 2017 MapR Technologies Agenda •  Introduction to Machine Learning Techniques –  Classification –  Clustering •  Use Decision Tree to Predict Customer Churn
  • 3. © 2017 MapR Technologies What is Machine Learning? Data Build ModelTrain Algorithm Finds patterns New Data Use Model (prediction function) Predictions Contains patterns Recognizes patterns
  • 4. © 2017 MapR Technologies Examples of ML Algorithms Supervised •  Classification –  Naïve Bayes –  SVM –  Random Decision Forests •  Regression –  Linear –  Logistic Machine Learning Unsupervised •  Clustering –  K-means •  Dimensionality reduction –  Principal Component Analysis –  SVD
  • 5. © 2017 MapR Technologies Supervised Algorithms use labeled data Data features Build Model New Data features Predict Use Model
  • 6. © 2017 MapR Technologies Supervised Machine Learning: Classification & Regression Classification Identifies category for item
  • 7. © 2017 MapR Technologies Classification: Definition Form of ML that: •  Identifies which category an item belongs to •  Uses supervised learning algorithms –  Data is labeled Sentiment
  • 8. © 2017 MapR Technologies If it Walks/Swims/Quacks Like a Duck …… Then It Must Be a Duck swims walks quacks Features: walks quacks swims Features:
  • 9. © 2017 MapR Technologies Car Insurance Fraud Example •  What are we trying to predict? –  This is the Label or Target outcome: –  The amount of Fraud •  What are the “if questions” or properties we can use to predict? –  These are the Features: –  The claim Amount
  • 10. © 2017 MapR Technologies Label: Amount of Fraud Y X Feature: claimed amount Data point: fraud amount, claimed amount AmntFraud = intercept + coeff * claimedAmnt Car Insurance Fraud Regression Example
  • 11. © 2017 MapR Technologies Credit Card Fraud Example •  What are we trying to predict? –  This is the Label: –  The probability of Fraud •  What are the “if questions” or properties we can use to predict? –  These are the Features: –  transaction amount, type of merchant, distance from and time since last transaction
  • 12. © 2017 MapR Technologies Label Probabilty of Fraud 1 X Features: trans amount, type of store, Time Location difference last trans. Fraud 0 Not Fraud .5 Credit Card Fraud Logistic Regression Example
  • 13. © 2017 MapR Technologies Supervised Learning: Classification & Regression •  Classification: –  identifies which category (eg fraud or not fraud) •  Linear Regression: –  predicts a value (eg amount of fraud) •  Logistic Regression: –  predicts a probability (eg probability of fraud)
  • 14. © 2017 MapR Technologies Examples of ML Algorithms Machine Learning Unsupervised •  Clustering –  K-means •  Dimensionality reduction –  Principal Component Analysis –  SVD Supervised •  Classification –  Naïve Bayes –  SVM –  Random Decision Forests •  Regression –  Linear –  Logistic
  • 15. © 2017 MapR Technologies Unsupervised Algorithms use Unlabeled data Customer GroupsBuild ModelTrain Algorithm Finds patterns New Customer Purchase Data Use Model (prediction function) Predict Group Contains patterns Recognizes patterns Customer purchase data
  • 16. © 2017 MapR Technologies Unsupervised Machine Learning: Clustering Clustering group news articles into different categories
  • 17. © 2017 MapR Technologies Clustering: Definition •  Unsupervised learning task •  Groups objects into clusters of high similarity
  • 18. © 2017 MapR Technologies Clustering: Definition •  Unsupervised learning task •  Groups objects into clusters of high similarity –  Search results grouping –  Grouping of customers –  Anomaly detection –  Text categorization
  • 19. © 2017 MapR Technologies Clustering: Example •  Group similar objects
  • 20. © 2017 MapR Technologies Clustering: Example •  Group similar objects •  Use MLlib K-means algorithm 1.  Initialize coordinates to center of clusters (centroid) x x x x x
  • 21. © 2017 MapR Technologies Clustering: Example •  Group similar objects •  Use MLlib K-means algorithm 1.  Initialize coordinates to center of clusters (centroid) 2.  Assign all points to nearest centroid x x x x x
  • 22. © 2017 MapR Technologies Clustering: Example •  Group similar objects •  Use MLlib K-means algorithm 1.  Initialize coordinates to center of clusters (centroid) 2.  Assign all points to nearest centroid 3.  Update centroids to center of points x x x x x
  • 23. © 2017 MapR Technologies Clustering: Example •  Group similar objects •  Use MLlib K-means algorithm 1.  Initialize coordinates to center of clusters (centroid) 2.  Assign all points to nearest centroid 3.  Update centroids to center of points 4.  Repeat until conditions met x x x x x
  • 24. © 2017 MapR Technologies Predict Churn
  • 25. © 2017 MapR Technologies ML Discovery Model Building Model Training/ Building Training Set Test Model Predictions Test Set Evaluate Results Historical Data Deployed Model Predictions Data Discovery, Model Creation Production Feature Extraction Feature Extraction New Data Customer Data Call Center Records Web Clickstream Server Logs ●  Churn Modelling
  • 26. © 2017 MapR Technologies Telecom Customer Churn Data •  State: string •  Account length: integer •  Area code: integer •  International plan: string •  Voice mail plan: string •  Number vmail messages: integer •  Total day minutes: double •  Total day calls: integer •  Total day charge: double •  Total eve minutes: double •  Total eve calls: integer •  Total eve charge: double •  Total night minutes: double •  Total night calls: integer •  Total night charge: double •  Total intl minutes: double •  Total intl calls: integer •  Total intl charge: double •  Customer service calls: integer
  • 27. © 2017 MapR Technologies Customer Churn Example •  What are we trying to predict? –  This is the Label: –  Did the customer churn? True or False •  What are the “if questions” or properties we can use to predict? –  These are the Features: –  Number of Customer service calls, Total day minutes …
  • 28. © 2017 MapR Technologies Decision Trees •  Decision Tree for Classification prediction •  Represents tree with nodes •  IF THEN ELSE questions using features at each node •  Answers branch to child nodes If the number of customer service calls < 3 If the total day minutes > 200 Churned: T If the total day minutes < 200 Churned: F T Churned: T Churned: F F FF TT
  • 29. © 2017 MapR Technologies Example Decision Tree
  • 30. © 2017 MapR Technologies Spark ML workflow
  • 31. © 2017 MapR Technologies Spark ML workflow with a Pipeline Pipeline Estimator Extract Features Load Data Train Model Estimator Data frame Transformer Cross Validate Pipeline Model TransformerTest Data frame Evaluate fit Train Load Data Evaluator Predict With model Extract Features Evaluator transform
  • 32. © 2017 MapR Technologies Zeppelin Notebook with Spark Data Engineer Data Scientist
  • 33. © 2017 MapR Technologies Load the data into a Dataframe: Define the Schema case class Account(state: String, len: Integer, acode: String, intlplan: String, vplan: String, numvmail: Double, tdmins: Double, tdcalls: Double, tdcharge: Double, temins: Double, tecalls: Double, techarge: Double, tnmins: Double, tncalls: Double, tncharge: Double, timins: Double, ticalls: Double, ticharge: Double, numcs: Double, churn: String) Input CSV File sample: KS,128,415,No,Yes,25,265.1,110,45.07,197.4,99,16.78,244.7,91,11.01,10.0,3,2.7,1,False OH,107,415,No,Yes,26,161.6,123,27.47,195.5,103,16.62,254.4,103,11.45,13.7,3,3.7,1,False
  • 34. © 2017 MapR Technologies Data Frame Load data Load the data into a Dataset val train: Dataset[Account] = spark.read.option("inferSchema", "false") .schema(schema).csv("/user/user01/data/churn-bigml-80.csv").as[Account]
  • 35. © 2017 MapR Technologies Dataset merged with Dataframe in Spark 2.0, DataFrame APIs merged with Datasets APIs
  • 36. © 2017 MapR Technologies Extract the Features Image reference O’Reilly Learning Spark + + ̶+ ̶ ̶ Feature Vectors and Label Model Featurization Training Model Evaluation Best Model Label: Churned=T Features: Number customer Service calls Number day minutes Training Data Label: Churned=F Features: Number customer Service calls Number day minutes + + ̶+ ̶ ̶ + + ̶+ ̶ ̶ + + ̶+ ̶ ̶ + + ̶+ ̶ ̶
  • 37. © 2017 MapR Technologies Data Frame Add column Use StringIndexer to map Strings to Numbers val ipindexer = new StringIndexer() .setInputCol("intlplan") .setOutputCol("iplanIndex”) Data Frame
  • 38. © 2017 MapR Technologies Data Frame Add column Use StringIndexer to map churn True False to Numbers Val labelindexer = new StringIndexer() .setInputCol(”churn") .setOutputCol(”label”) Data Frame
  • 39. © 2017 MapR Technologies Data Frame Load data Add column DataFrame + Features Use VectorAssembler to put features in vector column val featureCols = Array(”temins", "iplanIndex", "tdmins", "tdcalls”…) val assembler = new VectorAssembler() .setInputCols(featureCols) .setOutputCol("features")
  • 40. © 2017 MapR Technologies Data Frame Load data transform Estimator val dTree = new DecisionTreeClassifier() .setLabelCol("label") .setFeaturesCol("features") Create DecisionTree Estimator, Set Label and Features DataFrame + Features
  • 41. © 2017 MapR Technologies val pipeline = new Pipeline() .setStages(Array(ipindexer, labelindexer, assembler, dTree)) Put Feature Transformers and Estimator in Pipeline Pipeline ipIndexer feature transform assembler Dtree estimatorlabelindexer feature transform assemble Features Produce model
  • 42. © 2017 MapR Technologies Spark ML workflow with a Pipeline Pipeline Transfomers Load Data estimator Train model Data frame Extract Features evaluator Pipeline Model Test Data frame evaluator Use fitted model Train Load Data fit transform
  • 43. © 2017 MapR Technologies K-fold Cross-Validation Process Data Model Training/ Building Training Set Test Model Predictions Test Set data is randomly split into K partition training and test dataset pairs
  • 44. © 2017 MapR Technologies K-fold Cross-Validation Process Data Model Training Training Set Test Model Predictions Test Set Train algorithm with training dataset
  • 45. © 2017 MapR Technologies ML Cross-Validation Process Data Model Training Set Test Model Predictions Test Set Evaluate the model with the Test Set
  • 46. © 2017 MapR Technologies K-fold Cross-Validation Process Data Model Training/ Building Training Set Test Model Predictions Test Set Train/Test loop K times Repeat K times select the Model produced by the best-performing set of parameters
  • 47. © 2017 MapR Technologies Cross Validation transformation estimation pipeline Pipeline Cross Validator evaluatorParameter Grid fit Set up a CrossValidator with: •  Parameter grid •  Estimator (pipeline) •  Evaluator Perform grid search based model selection
  • 48. © 2017 MapR Technologies Parameter Tuning with CrossValidator with a Paramgrid CrossValidator •  Given: –  Estimator –  Parameter grid –  Evaluator •  Find best parameters and model val paramGrid = new ParamGridBuilder() .addGrid(dTree.maxDepth, Array(2,3,4,5,6,7)).build() val evaluator= new BinaryClassificationEvaluator() .setLabelCol("label") .setRawPredictionCol("prediction") val crossval = new CrossValidator() .setEstimator(pipeline) .setEvaluator(evaluator) .setEstimatorParamMaps(paramGrid) .setNumFolds(3)
  • 49. © 2017 MapR Technologies val cvModel = crossval.fit(ntrain) Cross Validator fit a model to the data Pipeline Cross Validator evaluatorParameter Grid fit Pipeline Model fit a model to the data with provided parameter grid
  • 50. © 2017 MapR Technologies Evaluate the fitted model Pipeline Transfomers Load Data estimator Train model Data frame Extract Features evaluator Pipeline Model Test Data frame evaluator transform Train Load Data Predict With model Extract Features fit
  • 51. © 2017 MapR Technologies fitted model Evaluate the Predictions from DecisionTree Estimator Evaluator transform Test features val predictions = cvModel.transform(test) val accuracy = evaluator.evaluate(predictions) evaluate prediction accuracy
  • 52. © 2017 MapR Technologies Area under the ROC curve Accuracy is measured by the area under the ROC curve. The area measures correct classifications •  An area of 1 represents a perfect test •  an area of .5 represents a worthless test
  • 53. © 2017 MapR Technologies To Learn More: •  Read about and download example code •  https://mapr.com/blog/churn-prediction-sparkml/
  • 54. © 2017 MapR Technologies To Learn More: •  End to End Application for Monitoring Uber Data using Spark ML •  https://mapr.com/blog/monitoring-real-time-uber-data-using-spark-machine- learning-streaming-and-kafka-api-part-1/
  • 55. © 2017 MapR Technologies To Learn More: •  MapR Free ODT http://learn.mapr.com/
  • 56. © 2017 MapR Technologies For Q&A : •  https://community.mapr.com/ •  https://community.mapr.com/community/answers/pages/qa
  • 57. © 2017 MapR Technologies Open Source Engines & Tools Commercial Engines & Applications Enterprise-Grade Platform Services DataProcessing Web-Scale Storage MapR-XD MapR-DB Search and Others Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability MapR Streams Cloud and Managed Services Search and Others UnifiedManagementandMonitoring Search and Others Event StreamingDatabase Custom Apps MapR Converged Data Platform HDFS API POSIX, NFS Kakfa APIHBase API OJAI API
  • 58. © 2017 MapR Technologies Q&A ENGAGE WITH US