SlideShare a Scribd company logo
Distributed Logistic Model Trees, Stratio Intelligence
Mateo Álvarez and Antonio Soriano
@StratioBD
Aerospace Engineer, MSc in
Propulsion Systems (UPM), Master
in Data Science (URJC).
Working as data scientist and Big
Data developer at Stratio Big Data in
the data science department
mateo-alvarez
Ph.D. in Telecommunications, MSc in
Electronic Systems Engineering and
Telecommunication Technologies,
Systems and Networks (UPV), and MSc
“Big Data Expert” (UTAD).
Working as data scientist and Big Data
developer at at Stratio Big Data in the
data science department
@Phd_A_Soriano
Why using interpretable algorithms instead of “black boxes”
Logistic Regression
Decision Trees
Variance-Bias tradeoff
Metrics
Demo
Logistic Model Trees
Distributed implementation
Cost function & configuration params
Demo
• Why use interpretable algorithms instead of “black boxes”
• Logistic Regression
• Decision Trees
• Variance-Bias tradeoff
@StratioBD
Accuracy Explainability
VS
Medical Studies Power management Financial environment Criminal activity
Threshold
Probability
Feature
Threshold
Probability
Local bad adjust
Feature
Threshold
Probability
Feature
Feature 1
Root node
Leaf node Leaf node
Feature 1
Root node
Feature 2 Feature 2
Leaf node Leaf node Leaf node Leaf node
Feature 1
Root node
Feature 2 Feature 2
Leaf node Leaf node Leaf node Leaf node
Local bad adjust
MeanError
Model complexity
Test error
Training error
Model complexity
Variance
Total error
Bias
2
Error
OverfittingUnderfitting
Variance
Bias
Missing important variables for the problem
to make the predictions
Variance
Bias
Overfitting to the sample/training data
Variance
Bias
Irreducible error on prediction
Variance
Bias
• Logistic Model Trees
• Distributed implementation
• Cost function & configuration parametres
• Demo
@StratioBD
Root node
Performance
metrics
Performance
metrics
Feature 1
Root node
Performance
metrics
Performance
metrics
Feature 1
Root node
Performance
metrics
Performance
metrics
Performance
metrics
Performance
metrics
Performance
metrics
Performance
metrics
Feature 1
Root node
Feature 2 Feature 2
Feature 1
Root node
Feature 2
Feature 1
Root node
Feature 2
DISTRIBUTED IMPLEMENTATION
Spark’s Decision Tree
(distributed implementation of random forests)
Spark’s Logistic Regression / weka’s
Logistic Regression on the nodes
LMT Cost function to fix the logistic regression threshold
• AccuracyCostFunction
• ConfusionMatrix
• PrecisionCostFunction
• PrecisionRecallCostFunction
• RocCostFunction
The same cost function for pruning criteria
Performance
metrics
Performance
metrics
Performance
metrics
Big datasets
Power of spark to distribute building the
tree and logistic regressions
ADVANTAGES OF THIS IMPLEMENTATION
Medium datasets
Distributed tree growth and weka’s
logistic regression
Small datasets
Although it can be slow to
distribute the data for the decision
tree, cost functions can be still
used and specific optimization for
particular cases
Example of DLMT algorithm
in a synthetic dataset
• Metrics
• Demo
@StratioBD
PREDICTION
Positive Negative
TRUE
CONDITION
Positive True Positives False Negatives
Negative False Positives True Negatives
Precission
True Positive Rate (Recall)
False Positive Rate
TPR = TP/(TP+FN) Insensitive to unbalance
FPR = FP/(FP+TN) Insensitive to unbalance
Precision = TP/(TP+FP) Sensitive to unbalance
Accuracy = (TP+TN)/(TP+TN+FP+FN) Sensitive to unbalance
PREDICTION
Positive Negative
TRUE
CONDITION
Positive True Positives False Negatives
Negative False Positives True Negatives
True Positive Rate (Recall)
False Positive Rate
AUROC (AUC): TPR/FPR -> Insensitive to unbalance! TPR
FPR
Best performance
PREDICTION
Positive Negative
TRUE
CONDITION
Positive True Positives False Negatives
Negative False Positives True Negatives
True Positive Rate (Recall)
AUPRC: Precision/TPR -> Sensitive to unbalance!
Precission
Precision
Recall
Best performance
f
f
1
n
Benchmark
ABF
Data
Algorithms
@StratioBD
Accuracy ExplainabilityVS
Performance Metrics:
AUROC, AUPRC, ACCURACY
Automatic
Benchmarking
Framework
f
f
1
n
BenchmarkABF
1
2
3 4
THANK YOU
UNITED STATES
Tel: (+1) 408 5998830
EUROPE
Tel: (+34) 91 828 64 73
contact@stratio.com
www.stratio.com
@StratioBD
people@stratio.com
WE ARE HIRING
@StratioBD

More Related Content

What's hot (18)

PDF
Big Data Landscape 2016
Josef Adersberger
 
PDF
Big Data Tech Stack
Abdullah Çetin ÇAVDAR
 
PPT
Counting Unique Users in Real-Time: Here's a Challenge for You!
DataWorks Summit
 
PDF
Embedding Insight through Prediction Driven Logistics
Databricks
 
PDF
The Curse of the Data Lake Monster
Thoughtworks
 
PDF
Future of Data Platform in Cloud Native world
Srivatsan Srinivasan
 
PDF
The Synapse IoT Stack: Technology Trends in IOT and Big Data
InMobi Technology
 
PDF
Graphs in Telecommunications - Jesus Barrasa, Neo4j
Neo4j
 
PDF
Graph-based Network & IT Management.
Linkurious
 
PDF
Introducing Databricks Delta
Databricks
 
PDF
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dataconomy Media
 
PDF
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j
 
PDF
Privacy-Preserving AI Network - PlatON 2.0
ShiHeng1
 
PDF
From hadoop to spark
steccami
 
PPTX
Data Mining - The Big Picture!
Khalid Salama
 
PDF
RAPIDS cuGraph – Accelerating all your Graph needs
Connected Data World
 
PDF
"Application monitoring — from requirements to tools, not the other way aroun...
Fwdays
 
PDF
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Databricks
 
Big Data Landscape 2016
Josef Adersberger
 
Big Data Tech Stack
Abdullah Çetin ÇAVDAR
 
Counting Unique Users in Real-Time: Here's a Challenge for You!
DataWorks Summit
 
Embedding Insight through Prediction Driven Logistics
Databricks
 
The Curse of the Data Lake Monster
Thoughtworks
 
Future of Data Platform in Cloud Native world
Srivatsan Srinivasan
 
The Synapse IoT Stack: Technology Trends in IOT and Big Data
InMobi Technology
 
Graphs in Telecommunications - Jesus Barrasa, Neo4j
Neo4j
 
Graph-based Network & IT Management.
Linkurious
 
Introducing Databricks Delta
Databricks
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dataconomy Media
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #7 GDS Best Practices
Neo4j
 
Privacy-Preserving AI Network - PlatON 2.0
ShiHeng1
 
From hadoop to spark
steccami
 
Data Mining - The Big Picture!
Khalid Salama
 
RAPIDS cuGraph – Accelerating all your Graph needs
Connected Data World
 
"Application monitoring — from requirements to tools, not the other way aroun...
Fwdays
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Databricks
 

Viewers also liked (20)

PDF
Stratio platform overview v4.1
Stratio
 
PPTX
Lunch&Learn: Combinación de modelos
Stratio
 
PPTX
[Strata] Sparkta
Stratio
 
PDF
Meetup: Spark + Kerberos
Stratio
 
PDF
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio
 
PDF
Multiplaform Solution for Graph Datasources
Stratio
 
PDF
Stratio CrossData: an efficient distributed datahub with batch and streaming ...
Stratio
 
PDF
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Spark Summit
 
PDF
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Stratio
 
PDF
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Stratio
 
PDF
Functional programming in scala
Stratio
 
PDF
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
Stratio
 
PDF
Introduction to Asynchronous scala
Stratio
 
PPTX
La Unión Bancaria Europea
koball
 
PPTX
Presentacion
ComunicacionesPDB
 
PPT
El modelo europeo de reporting y el lenguaje XBRL - Ignacio Boixo
Asociación XBRL España
 
PPTX
UNION BANCARIA EN LA UNION EUROPEA
Ramiro Ojeda
 
PDF
Recuperación y Unión Bancaria Europea. Emilio Ontiveros
Universidad de Deusto - Deustuko Unibertsitatea - University of Deusto
 
PPTX
Stratio big data spain
Álvaro Agea Herradón
 
PPTX
Estándares en Unión Europea: Marco, Desafíos y Oportunidades - Francisco Garc...
Asociación XBRL España
 
Stratio platform overview v4.1
Stratio
 
Lunch&Learn: Combinación de modelos
Stratio
 
[Strata] Sparkta
Stratio
 
Meetup: Spark + Kerberos
Stratio
 
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Stratio
 
Multiplaform Solution for Graph Datasources
Stratio
 
Stratio CrossData: an efficient distributed datahub with batch and streaming ...
Stratio
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
Spark Summit
 
Apache Spark & Cassandra use case at Telefónica Cbs by Antonio Alcacer
Stratio
 
Spark Streaming @ Berlin Apache Spark Meetup, March 2015
Stratio
 
Functional programming in scala
Stratio
 
Primeros pasos con Spark - Spark Meetup Madrid 30-09-2014
Stratio
 
Introduction to Asynchronous scala
Stratio
 
La Unión Bancaria Europea
koball
 
Presentacion
ComunicacionesPDB
 
El modelo europeo de reporting y el lenguaje XBRL - Ignacio Boixo
Asociación XBRL España
 
UNION BANCARIA EN LA UNION EUROPEA
Ramiro Ojeda
 
Recuperación y Unión Bancaria Europea. Emilio Ontiveros
Universidad de Deusto - Deustuko Unibertsitatea - University of Deusto
 
Stratio big data spain
Álvaro Agea Herradón
 
Estándares en Unión Europea: Marco, Desafíos y Oportunidades - Francisco Garc...
Asociación XBRL España
 
Ad

Similar to Distributed Logistic Model Trees (20)

PDF
Understanding Decision Trees in Machine Learning: A Comprehensive Guide
CyberPro Magazine
 
PPTX
Apache Spark MLlib - Random Foreset and Desicion Trees
Tuhin Mahmud
 
PPTX
Ai & Machine learning - 31140523010 - BDS302.pptx
BhaktMahadevKA
 
PDF
Random forests-talk-nl-meetup
Willem Hendriks
 
PPTX
Introduction to Machine Learning Concepts
RyujiChanneru
 
PPTX
Decision Tree in Machine Learning
Tutort Academy
 
PDF
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Databricks
 
PPTX
Intro to ml_2021
Sanghamitra Deb
 
PPTX
Decision Tree Machine Learning Detailed Explanation.
DrezzingGaming
 
PDF
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
AdityaSoraut
 
PPTX
DECESION TREE and -SVM-NAIVEs bayes-BAYS.pptx
DulalChandraDas1
 
PPTX
Introduction to random forest and gradient boosting methods a lecture
Shreyas S K
 
PPTX
Learning Trees - Decision Tree Learning Methods
HPCC Systems
 
PPTX
Decision_Tree_Presentation_with_indepth_analysis.pptx
sman22230
 
PDF
Random Forest / Bootstrap Aggregation
Rupak Roy
 
PDF
Diabetes Prediction Using Machine Learning
jagan477830
 
PPTX
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Simplilearn
 
PPTX
Decision Forest: Twenty Years of Research
Lior Rokach
 
PPTX
supervised machine learning algorithms support vector machine
pranalisonawane8600
 
PPTX
Machine learning tree models for classification
Kv Sagar
 
Understanding Decision Trees in Machine Learning: A Comprehensive Guide
CyberPro Magazine
 
Apache Spark MLlib - Random Foreset and Desicion Trees
Tuhin Mahmud
 
Ai & Machine learning - 31140523010 - BDS302.pptx
BhaktMahadevKA
 
Random forests-talk-nl-meetup
Willem Hendriks
 
Introduction to Machine Learning Concepts
RyujiChanneru
 
Decision Tree in Machine Learning
Tutort Academy
 
Understanding Parallelization of Machine Learning Algorithms in Apache Spark™
Databricks
 
Intro to ml_2021
Sanghamitra Deb
 
Decision Tree Machine Learning Detailed Explanation.
DrezzingGaming
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
AdityaSoraut
 
DECESION TREE and -SVM-NAIVEs bayes-BAYS.pptx
DulalChandraDas1
 
Introduction to random forest and gradient boosting methods a lecture
Shreyas S K
 
Learning Trees - Decision Tree Learning Methods
HPCC Systems
 
Decision_Tree_Presentation_with_indepth_analysis.pptx
sman22230
 
Random Forest / Bootstrap Aggregation
Rupak Roy
 
Diabetes Prediction Using Machine Learning
jagan477830
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Simplilearn
 
Decision Forest: Twenty Years of Research
Lior Rokach
 
supervised machine learning algorithms support vector machine
pranalisonawane8600
 
Machine learning tree models for classification
Kv Sagar
 
Ad

More from Stratio (15)

PPTX
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Stratio
 
PPTX
Can an intelligent system exist without awareness? BDS18
Stratio
 
PPTX
Kafka and KSQL - Apache Kafka Meetup
Stratio
 
PPTX
Wild Data - The Data Science Meetup
Stratio
 
PPTX
Ensemble methods in Machine Learning
Stratio
 
PPTX
Stratio Sparta 2.0
Stratio
 
PPTX
Big Data Security: Facing the challenge
Stratio
 
PPTX
Artificial Intelligence on Data Centric Platform
Stratio
 
PDF
Introduction to Artificial Neural Networks
Stratio
 
PDF
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Stratio
 
PDF
Advanced search and Top-K queries in Cassandra
Stratio
 
PDF
[Spark meetup] Spark Streaming Overview
Stratio
 
PDF
Why spark by Stratio - v.1.0
Stratio
 
PPTX
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
Stratio
 
PDF
Spark Summit - Stratio Streaming
Stratio
 
Mesos Meetup - Building an enterprise-ready analytics and operational ecosyst...
Stratio
 
Can an intelligent system exist without awareness? BDS18
Stratio
 
Kafka and KSQL - Apache Kafka Meetup
Stratio
 
Wild Data - The Data Science Meetup
Stratio
 
Ensemble methods in Machine Learning
Stratio
 
Stratio Sparta 2.0
Stratio
 
Big Data Security: Facing the challenge
Stratio
 
Artificial Intelligence on Data Centric Platform
Stratio
 
Introduction to Artificial Neural Networks
Stratio
 
Meetup: Cómo monitorizar y optimizar procesos de Spark usando la Spark Web - ...
Stratio
 
Advanced search and Top-K queries in Cassandra
Stratio
 
[Spark meetup] Spark Streaming Overview
Stratio
 
Why spark by Stratio - v.1.0
Stratio
 
On-the-fly ETL con EFK: ElasticSearch, Flume, Kibana
Stratio
 
Spark Summit - Stratio Streaming
Stratio
 

Recently uploaded (20)

PDF
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
PPTX
美国史蒂文斯理工学院毕业证书{SIT学费发票SIT录取通知书}哪里购买
Taqyea
 
PPTX
How to Add Columns and Rows in an R Data Frame
subhashenia
 
PPTX
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
PPTX
Comparative Study of ML Techniques for RealTime Credit Card Fraud Detection S...
Debolina Ghosh
 
PDF
5991-5857_Agilent_MS_Theory_EN (1).pdf. pdf
NohaSalah45
 
PPTX
Generative AI Boost Data Governance and Quality- Tejasvi Addagada
Tejasvi Addagada
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PDF
Using AI/ML for Space Biology Research.pdf
VICTOR MAESTRE RAMIREZ
 
PDF
Loading Data into Snowflake (Bulk & Stream)
Accentfuture
 
PPTX
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
PDF
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
PPTX
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
PDF
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
PPTX
big data eco system fundamentals of data science
arivukarasi
 
PPTX
BinarySearchTree in datastructures in detail
kichokuttu
 
PPTX
Krezentios memories in college data.pptx
notknown9
 
PPTX
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
PPTX
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
PPTX
Data anlytics Hospitals Research India.pptx
SayantanChakravorty2
 
apidays Singapore 2025 - Building a Federated Future, Alex Szomora (GSMA)
apidays
 
美国史蒂文斯理工学院毕业证书{SIT学费发票SIT录取通知书}哪里购买
Taqyea
 
How to Add Columns and Rows in an R Data Frame
subhashenia
 
01_Nico Vincent_Sailpeak.pptx_AI_Barometer_2025
FinTech Belgium
 
Comparative Study of ML Techniques for RealTime Credit Card Fraud Detection S...
Debolina Ghosh
 
5991-5857_Agilent_MS_Theory_EN (1).pdf. pdf
NohaSalah45
 
Generative AI Boost Data Governance and Quality- Tejasvi Addagada
Tejasvi Addagada
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
Using AI/ML for Space Biology Research.pdf
VICTOR MAESTRE RAMIREZ
 
Loading Data into Snowflake (Bulk & Stream)
Accentfuture
 
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
Unlocking Insights: Introducing i-Metrics Asia-Pacific Corporation and Strate...
Janette Toral
 
SHREYAS25 INTERN-I,II,III PPT (1).pptx pre
swapnilherage
 
apidays Singapore 2025 - Streaming Lakehouse with Kafka, Flink and Iceberg by...
apidays
 
big data eco system fundamentals of data science
arivukarasi
 
BinarySearchTree in datastructures in detail
kichokuttu
 
Krezentios memories in college data.pptx
notknown9
 
04_Tamás Marton_Intuitech .pptx_AI_Barometer_2025
FinTech Belgium
 
apidays Singapore 2025 - Generative AI Landscape Building a Modern Data Strat...
apidays
 
Data anlytics Hospitals Research India.pptx
SayantanChakravorty2
 

Distributed Logistic Model Trees