SlideShare a Scribd company logo
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed
quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.
Uber Risk Team
Semi-supervised Learning In An
Adversarial Environment
Karthik Ramasamy
Gaurav Agarwal
About us
• Data Science Manager and founding member of - Account Security and Privacy
• Previously:
‐ 2 years at my own startup
‐ 4.5 years at LinkedIn (founding member of the security team)
• Interests:
‐ Applying ML, recently deep learning, in security and fraud problems
‐ Building scalable infra for ML systems
Karthik Ramasamy
About us
• Senior engineer and founding member of Account security team
• 2 years at Uber working on fraud problems
• Previously:
‐ 4 years at Microsoft working on NLP Question answering systems
• Interests:
‐ Intersection of distributed systems and ML
‐ Natural language processing
Gaurav Agarwal
Focus of the talk
• DS algorithms and process
• Features
‐ Not covered in this talk
‐ Arbitrary names will be used to describe features
• Engineering architecture
• Engineering challenges
Semi-Supervised Learning
Classification Clustering
Adversarial Problem
Not-Hotdog Classifier Fake Account Classifier
Adversarial Problem
Recall for a specific model % of null features
Account Takeovers (ATO)
Single IP IPs in same
class c/b
Targeted Attacks on
specific accounts
Phishing and Malware
Attacks
Massive Botnet
with >100K IPs
Proxy IPs
across world
Easy to detect Hard to detect
Clustering: K-Means
Credits: www.naftaliharris.com
https://youtu.be/vY5q20NXVJM
Clustering: DBSCAN
Credits: www.naftaliharris.com
https://youtu.be/h53WMIImUuc
Clustering Algorithm: DBSCAN
1. Method 1 - use labels to tune hyperparameter
a. Solves the hyperparameter tuning issue
2. Method 2 - use labels as constraints in the clustering algorithm
a. Challenges
i. Working with week labels
ii. Scalability
Semi-Supervised Clustering Approach
Login Clusters 2016
*PCA used for visualization
Login Clusters 2017
*PCA used for visualization
ML Challenges
• Feature Selection
‐ Manual feature selection
‐ Aggregate features are better
‐ Feature normalization is very important
▪ Features like trip fare and #trips are bad features
▪ % of UberX trips for a user is a good feature
▪ All features having same scale like % of X
• Feature evolution in adversarial environment
• Scalability
‐ DBSCAN for large dataset (10’s of millions) takes long time to fit
• No online DBSCAN
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed
quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.
Engineering Architecture
Goals
● Minimum human input for unknown anomalies
● Automatic action support
● Low latency for sync scenarios eg: Login
● Easily reusable for other use cases eg: Bot signups
● Good ML library support
Query Flow
Login
Client
Feature
Gathering
ML
Models
Actions
Risk
Gateway
Rule
Engine
Other
Stores
Error code
Challenge
thrown
Async call
Parallel calls
Sync callChallenges:
● 2FA
● Captcha
● ... Streaming
Counters
Clustering
Features
Feature Computation
Feature
Normalization
Categorical
Feature
Transformation
Spark
Mllib
(k-means)
Spark
Clustering
Features
(Cassandra)
Offline
Features
Attempt
Thresholding
DBSCAN
(sklearn)
Parameter
Tuning
Login
Attempts
(Kafka)
Challenge
Feedback
Signals
(Kafka)
Streaming pipeline
Hourly job pipeline
Engineering Challenges
● Python vs Scala perf for streaming case
● DBScan limitations in Spark
● Window aggregations limited to 30 minutes
● JOINS with feedback signals in realtime
Production Setup
● Batch: 7 days worth of data, run DBSCAN hourly
● Streaming: 60 minutes moving window, run streaming k-means
● Used feedback signal success ratios to mark clusters as good, bad or unknown
● Bad clusters: Always throw
● Good clusters: Small % of attempts
● Unknown clusters: X% of attempts
Results
Good Clusters
Bad Clusters
Unknown Clusters
GPU DBSCAN using Faiss from FB
Thank you!
Ad

More Related Content

What's hot (20)

Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Sources와 Sinks를 Confluent Cloud에 원활하게 연결
Sources와 Sinks를 Confluent Cloud에 원활하게 연결Sources와 Sinks를 Confluent Cloud에 원활하게 연결
Sources와 Sinks를 Confluent Cloud에 원활하게 연결
confluent
 
Spm unit2
Spm unit2Spm unit2
Spm unit2
sweetyammu
 
Data streaming fundamentals
Data streaming fundamentalsData streaming fundamentals
Data streaming fundamentals
Mohammed Fazuluddin
 
DASK and Apache Spark
DASK and Apache SparkDASK and Apache Spark
DASK and Apache Spark
Databricks
 
Batch 21(14,64,66)
Batch 21(14,64,66)Batch 21(14,64,66)
Batch 21(14,64,66)
swethadln
 
Project Planning in Software Engineering
Project Planning in Software EngineeringProject Planning in Software Engineering
Project Planning in Software Engineering
Fáber D. Giraldo
 
Cohesion and coupling
Cohesion and couplingCohesion and coupling
Cohesion and coupling
Aprajita (Abbey) Singh
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
Software Engineering ppt
Software Engineering pptSoftware Engineering ppt
Software Engineering ppt
shruths2890
 
DevOps
DevOps DevOps
DevOps
Hakan Yüksel
 
Project scheduling and tracking
Project scheduling and trackingProject scheduling and tracking
Project scheduling and tracking
Computer_ at_home
 
University/College Transport management system Documentation
University/College Transport management system DocumentationUniversity/College Transport management system Documentation
University/College Transport management system Documentation
Gaurav kumar rai - student
 
Online Quiz System Project Report ppt
Online Quiz System Project Report pptOnline Quiz System Project Report ppt
Online Quiz System Project Report ppt
Kishan Maurya
 
Cloud computing using Eucalyptus
Cloud computing using EucalyptusCloud computing using Eucalyptus
Cloud computing using Eucalyptus
Abhishek Dey
 
Kafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema RegistryKafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema Registry
Jean-Paul Azar
 
2.software requirement specification
2.software requirement specification2.software requirement specification
2.software requirement specification
Deepak Sharma
 
Attendance management system
Attendance management system Attendance management system
Attendance management system
SHIVANGI GOEL
 
Introducing DevOps
Introducing DevOpsIntroducing DevOps
Introducing DevOps
Nishanth K Hydru
 
PySpark Training | PySpark Tutorial for Beginners | Apache Spark with Python ...
PySpark Training | PySpark Tutorial for Beginners | Apache Spark with Python ...PySpark Training | PySpark Tutorial for Beginners | Apache Spark with Python ...
PySpark Training | PySpark Tutorial for Beginners | Apache Spark with Python ...
Edureka!
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Jeff Holoman
 
Sources와 Sinks를 Confluent Cloud에 원활하게 연결
Sources와 Sinks를 Confluent Cloud에 원활하게 연결Sources와 Sinks를 Confluent Cloud에 원활하게 연결
Sources와 Sinks를 Confluent Cloud에 원활하게 연결
confluent
 
DASK and Apache Spark
DASK and Apache SparkDASK and Apache Spark
DASK and Apache Spark
Databricks
 
Batch 21(14,64,66)
Batch 21(14,64,66)Batch 21(14,64,66)
Batch 21(14,64,66)
swethadln
 
Project Planning in Software Engineering
Project Planning in Software EngineeringProject Planning in Software Engineering
Project Planning in Software Engineering
Fáber D. Giraldo
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
Software Engineering ppt
Software Engineering pptSoftware Engineering ppt
Software Engineering ppt
shruths2890
 
Project scheduling and tracking
Project scheduling and trackingProject scheduling and tracking
Project scheduling and tracking
Computer_ at_home
 
University/College Transport management system Documentation
University/College Transport management system DocumentationUniversity/College Transport management system Documentation
University/College Transport management system Documentation
Gaurav kumar rai - student
 
Online Quiz System Project Report ppt
Online Quiz System Project Report pptOnline Quiz System Project Report ppt
Online Quiz System Project Report ppt
Kishan Maurya
 
Cloud computing using Eucalyptus
Cloud computing using EucalyptusCloud computing using Eucalyptus
Cloud computing using Eucalyptus
Abhishek Dey
 
Kafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema RegistryKafka and Avro with Confluent Schema Registry
Kafka and Avro with Confluent Schema Registry
Jean-Paul Azar
 
2.software requirement specification
2.software requirement specification2.software requirement specification
2.software requirement specification
Deepak Sharma
 
Attendance management system
Attendance management system Attendance management system
Attendance management system
SHIVANGI GOEL
 
PySpark Training | PySpark Tutorial for Beginners | Apache Spark with Python ...
PySpark Training | PySpark Tutorial for Beginners | Apache Spark with Python ...PySpark Training | PySpark Tutorial for Beginners | Apache Spark with Python ...
PySpark Training | PySpark Tutorial for Beginners | Apache Spark with Python ...
Edureka!
 

Similar to Semi-Supervised Learning In An Adversarial Environment (20)

Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
zekeLabs Technologies
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Edunomica
 
Agile india2018 exp_report
Agile india2018 exp_reportAgile india2018 exp_report
Agile india2018 exp_report
Vinayak Joglekar
 
Open source ml systems that need to be built
Open source ml systems that need to be builtOpen source ml systems that need to be built
Open source ml systems that need to be built
Nikhil Garg
 
Satheesh.G_IDM
Satheesh.G_IDMSatheesh.G_IDM
Satheesh.G_IDM
satheesh64
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
Fei Chen
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
MLconf
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
Adam Gibson
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlow
GoDataDriven
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)
Nikhil Garg
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015
Turi, Inc.
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
Awantik Das
 
Software architecture, Patterns for Scale
Software architecture, Patterns for ScaleSoftware architecture, Patterns for Scale
Software architecture, Patterns for Scale
iGbanam
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
Bill Liu
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
vitm11
 
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in PracticeGDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
James Anderson
 
Simha-Resume2
Simha-Resume2Simha-Resume2
Simha-Resume2
Narasimha Reddy
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Spark Summit
 
How I became ML Engineer
How I became ML Engineer How I became ML Engineer
How I became ML Engineer
Kevin Lee
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
Ivo Andreev
 
Machine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabsMachine learning at scale - Webinar By zekeLabs
Machine learning at scale - Webinar By zekeLabs
zekeLabs Technologies
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Edunomica
 
Agile india2018 exp_report
Agile india2018 exp_reportAgile india2018 exp_report
Agile india2018 exp_report
Vinayak Joglekar
 
Open source ml systems that need to be built
Open source ml systems that need to be builtOpen source ml systems that need to be built
Open source ml systems that need to be built
Nikhil Garg
 
Satheesh.G_IDM
Satheesh.G_IDMSatheesh.G_IDM
Satheesh.G_IDM
satheesh64
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
Fei Chen
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
MLconf
 
World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018World Artificial Intelligence Conference Shanghai 2018
World Artificial Intelligence Conference Shanghai 2018
Adam Gibson
 
Building a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlowBuilding a Scalable and reliable open source ML Platform with MLFlow
Building a Scalable and reliable open source ML Platform with MLFlow
GoDataDriven
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)
Nikhil Garg
 
Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015Strata London - Deep Learning 05-2015
Strata London - Deep Learning 05-2015
Turi, Inc.
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
Awantik Das
 
Software architecture, Patterns for Scale
Software architecture, Patterns for ScaleSoftware architecture, Patterns for Scale
Software architecture, Patterns for Scale
iGbanam
 
C19013010 the tutorial to build shared ai services session 1
C19013010  the tutorial to build shared ai services session 1C19013010  the tutorial to build shared ai services session 1
C19013010 the tutorial to build shared ai services session 1
Bill Liu
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
vitm11
 
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in PracticeGDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
GDG Cloud Southlake #3 Charles Adetiloye: Enterprise MLOps in Practice
James Anderson
 
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Using SparkML to Power a DSaaS (Data Science as a Service): Spark Summit East...
Spark Summit
 
How I became ML Engineer
How I became ML Engineer How I became ML Engineer
How I became ML Engineer
Kevin Lee
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
Ivo Andreev
 
Ad

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Ad

Recently uploaded (20)

tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 

Semi-Supervised Learning In An Adversarial Environment