SlideShare a Scribd company logo
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Scaling Face Recognition
with Big Data
Bogdan BOCȘE
Solutions Architect & Co-founder VisageCloud
https://VisageCloud.com
https://www.linkedin.com/in/bogdanbocse/
https://twitter.com/bocse
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Many thanks to our sponsors & partners!
GOLD
SILVER
PARTNERS
PLATINUM
POWERED BY
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• How to learn ?
• What to learn?
• Defining learning objectives
• How to scale learning?
• Gotchas
• VisageCloud
–Architecture
–Use Cases
Agenda
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• What questions to ask before writing the code?
• How to look at the data before feeding it to the
machine?
• What is the state of the art regarding ML?
• What frameworks to use?
• What are the common traps to avoid?
• How to design for scale?
Objectives
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
HOW TO LEARN?
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Vision
• Convolutional Neural Networks
• Inception Paper
NLP
• Word2Vec
• GloVe: Global Vectors for Words Representation
Generic
• Classification
• Prediction
How to Learn?
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Convolutional Neural Networks: Big Picture
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Pooling / Max Pooling
• Convolution
• Fully Connected Activation
– Activation Function, eg. ReLu
Convolutional Neural Networks : Components
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Learning is an optimization problem
–Find parameters of a system (neural network) that
minimize a fixed error function
–Not unlike planning orbital paths
• Defining the network architecture
• Defining the training algorithm
–Stochastic Gradient Descent
• With momentum
• With noisy
Taking a Step Back: The Math
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• DeepLearning4j
– Independent company
– Java interface with C-bindings for performance
• TensorFlow
– Python & C++ API
– Developed by Google
– Compatible with TPU
• Torch
– Developed by Facebook
– Written in LuaJIT, with Python bindings
Frameworks
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
WHAT TO LEARN?
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Public data sets
–Labelled Faces in the Wild (LFW)
–Youtube faces
–Kaggle
• Private data sets
• Build your own
–Outsourcing: Mechanical Turk
–Crowsourcing: ReCaptcha model
Data Sets
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Preparing Data
Clean
data
Cropping
Structure
Homogeneity
Normalization
Histograms
Filtering
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Machine learning is not magic
• If you can’t understand the data, a machine probably
won’t either
• Preprocessing makes the difference between results
• Applying filters, normalization, anomaly detection is
computationally inexpensive
Preparing Data
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
DEFINING LEARNING OBJECTIVES
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Supervised
–Classification
–Scoring and regression
–Identification
• Unsupervised
–Clustering
Defining learning objectives
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Projecting input onto a fixed set of classes
• “Don’t use a cannon to kill a fly”
–Support Vector Machines
• Linear
• Radial Based Functions
Classification
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Embedding
–Projecting input (image) onto an vector space with a
known property
• Triplet Loss Function
Identification
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Splitting a set of items into non-overlapping subsets,
based on item attributes
• Counting people in video streams
• Algorithms:
–Fixed threshold
–K-means
–Rank-order clustering
Clustering
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
HOW TO SCALE LEARNING?
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Scaling training
– Requires shared memory space
– Vertical scaling
• GPU
• Soon-to-come: TPU (tensor processing unit)
• Scaling evaluation
– Shared nothing architecture
– Neural network/classifier rarely change
– Load balancing pattern
– Partitioning data if needed
How to scale learning?
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• There is no “reduce” for neural networks
• Averaging weights/parameters
– Usually not a good idea
• Genetic algorithms
– Requires a lot of processing power
– Running independent iterations on different machines
– Crossover between weights/parameters of independently
trained neural networks after each epoch
Ideas for horizontal scaling
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
GOTCHAS
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Our 2D and 3D intuition often fails in high dimensions
• Distances tend to become relatively “the same” as
number of dimensions increases
• Dimensionality reduction
– Embedding functions
– Principal component analysis
The Curse of Dimensionality
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• “The bottom of a valley is not necessarily the lowest
point on Earth”
• Learning algorithms may get stuck in local optima
• Using momentum or some random noise reduces
this possibility
• Using genetic algorithms can be even more robust,
but it’s computationally expensive
Local Optima
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Visualizing Local Optima
monkey saddle
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
“Based on state-of-the-art machine learning, our
weather forecast system can predict tomorrow’s
weather with 72% accuracy”
Evaluating of Learning
You get the same results by saying “it’s going to be the same as today”
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Don’t test on the data you train on
– Use different data set
– Split the data sets you have
• Beware of data biases
– Confirmation bias
– Survivorship bias
– Selection bias
• Compare against a benchmark, even a dummy one
– Coin flip
– Linear algorithms
– “Same-as-before”
Evaluation of Learning
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Architecture and Use Cases
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
High Level Architecture
VisageCloud Production
HAProxy
(reverse proxy)
Image Storage
AWS S3
Service
(API Controller)
Cassandra
Containers
(Docker)
Neural Networks
(OpenCV, Dlib,
Torch, pixie magic)
CQL Binary
HTTP
API Consumer
(Customer Infrastructure)
HTTPS
HTTP
HTTPS
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Detect
faces
Align faces
Pre-
processing
Feature
extraction
Feature
comparison
Processing Pipeline
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• The collection
–Slice of data used together
–10K-100K records
• The Cache-Inside Pattern
–Loading / preloading collection in one application server
–Content based routing/balancing to maximize cache hits
–No logic in the database layer
–Requires periodic polling for updates
• Weaker consistency
Partitioning Data: Application Level Logic
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Partitioning Data: Application Level Logic
Application Layer
Application Application Application
Cassandra (Database Layer)
Cassandra Node Cassandra Node Cassandra Node Cassandra Node
Content-based balancing/routing
Preload collectionPoll for updatesWrite updates
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• Perform comparison logic in database
–User Defined Aggregate Functions
• Removes the need to move data around between
application and database
• Harder to deploy/test
• Stronger consistency
Partitioning Data: Application Level Logic
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
• It’s math, not magic
• If you don’t understand the data, neither will the
machine
• Preprocessing makes the difference
• Test against a benchmark, any benchmark
• Evaluate first, scale later
Key Take-away
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Bogdan@VisageCloud.com
+(40) 724 714 234
https://www.linkedin.com/in/bogdanbocse/
https://twitter.com/bocse
Let’s keep in touch
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Many thanks to our sponsors & partners!
GOLD
SILVER
PARTNERS
PLATINUM
POWERED BY
@ITCAMPRO #ITCAMP17Community Conference for IT Professionals
Q & A

More Related Content

What's hot (20)

PPTX
ITCamp 2017 - Laurent Ellerbach - Bot. You said bot? Let's build a bot then...
ITCamp
 
PDF
Testing your PowerShell code with Pester - Florin Loghiade
ITCamp
 
PDF
The fight for surviving in the IoT world - Radu Vunvulea
ITCamp
 
PDF
7 Habits of Highly Paid Developers - Gaines Kergosien
ITCamp
 
PDF
The best of Windows Server 2016 - Thomas Maurer
ITCamp
 
PDF
Windows 10 Creators Update: what’s on tap for business users - Ionut Balan
ITCamp
 
PDF
Emerging Experiences - More Personal Computing (MPC) - Tim Huckaby
ITCamp
 
PDF
Enacting Scrum - What it takes to maximize the chances for a successful adopt...
ITCamp
 
PDF
How to Start-up a Start-up - Mihail Rotenberg
ITCamp
 
PDF
Docker adventures in Continuous Delivery - Alex Vranceanu
ITCamp
 
PDF
Building Your First SPA with Aurelia and MVC 6 - Mihai Coros
ITCamp
 
PDF
Azure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
ITCamp
 
PDF
How to secure and manage modern IT - Ondrej Vysek
ITCamp
 
PDF
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
ITCamp
 
PDF
ITCamp 2017 - Raffaele Rialdi - A Deep Dive Into Bridging Node-js with .NET Core
ITCamp
 
PDF
Building and Managing your Virtual Datacenter using PowerShell DSC - Florin L...
ITCamp
 
PDF
Suddenly Reality - Peter Leeson
ITCamp
 
PDF
A new world of possibilities for contextual awareness with beacons - Dan Arde...
ITCamp
 
PDF
Developing PowerShell Tools - Razvan Rusu
ITCamp
 
PDF
ITCamp 2017 - Raffaele Rialdi - Adopting .NET Core in Mainstream Projects
ITCamp
 
ITCamp 2017 - Laurent Ellerbach - Bot. You said bot? Let's build a bot then...
ITCamp
 
Testing your PowerShell code with Pester - Florin Loghiade
ITCamp
 
The fight for surviving in the IoT world - Radu Vunvulea
ITCamp
 
7 Habits of Highly Paid Developers - Gaines Kergosien
ITCamp
 
The best of Windows Server 2016 - Thomas Maurer
ITCamp
 
Windows 10 Creators Update: what’s on tap for business users - Ionut Balan
ITCamp
 
Emerging Experiences - More Personal Computing (MPC) - Tim Huckaby
ITCamp
 
Enacting Scrum - What it takes to maximize the chances for a successful adopt...
ITCamp
 
How to Start-up a Start-up - Mihail Rotenberg
ITCamp
 
Docker adventures in Continuous Delivery - Alex Vranceanu
ITCamp
 
Building Your First SPA with Aurelia and MVC 6 - Mihai Coros
ITCamp
 
Azure tales: a real world CQRS and ES Deep Dive - Andrea Saltarello
ITCamp
 
How to secure and manage modern IT - Ondrej Vysek
ITCamp
 
The Fine Art of Time Travelling - Implementing Event Sourcing - Andrea Saltar...
ITCamp
 
ITCamp 2017 - Raffaele Rialdi - A Deep Dive Into Bridging Node-js with .NET Core
ITCamp
 
Building and Managing your Virtual Datacenter using PowerShell DSC - Florin L...
ITCamp
 
Suddenly Reality - Peter Leeson
ITCamp
 
A new world of possibilities for contextual awareness with beacons - Dan Arde...
ITCamp
 
Developing PowerShell Tools - Razvan Rusu
ITCamp
 
ITCamp 2017 - Raffaele Rialdi - Adopting .NET Core in Mainstream Projects
ITCamp
 

Viewers also liked (13)

PDF
ITCamp 2017 - Florin Coros - Decide between In-Process or Inter-Processes Com...
ITCamp
 
PDF
The best of Hyper-V 2016 - Thomas Maurer
ITCamp
 
PPTX
One Azure Monitor to Rule Them All? (IT Camp 2017, Cluj, RO)
Marius Zaharia
 
PDF
Great all this new stuff, but how do I convince my management - Erwin Derksen
ITCamp
 
PDF
Building Powerful Applications with AngularJS 2 and TypeScript - David Giard
ITCamp
 
PDF
Forget Process, Focus on People - Peter Leeson
ITCamp
 
PDF
Columnstore indexes - best practices for the ETL process - Damian Widera
ITCamp
 
PDF
The Secret of Engaging Presentations - Boris Hristov
ITCamp
 
PDF
Big Data Solutions in Azure - David Giard
ITCamp
 
PDF
Kubernetes - Cloud Native Application Orchestration - Catalin Jora
ITCamp
 
PDF
Provisioning Windows instances at scale on Azure, AWS and OpenStack - Adrian ...
ITCamp
 
PPTX
ITCamp 2017 - Ciprian Sorlea - Fostering Heroes
ITCamp
 
PDF
Xamarin Under The Hood - Dan Ardelean
ITCamp
 
ITCamp 2017 - Florin Coros - Decide between In-Process or Inter-Processes Com...
ITCamp
 
The best of Hyper-V 2016 - Thomas Maurer
ITCamp
 
One Azure Monitor to Rule Them All? (IT Camp 2017, Cluj, RO)
Marius Zaharia
 
Great all this new stuff, but how do I convince my management - Erwin Derksen
ITCamp
 
Building Powerful Applications with AngularJS 2 and TypeScript - David Giard
ITCamp
 
Forget Process, Focus on People - Peter Leeson
ITCamp
 
Columnstore indexes - best practices for the ETL process - Damian Widera
ITCamp
 
The Secret of Engaging Presentations - Boris Hristov
ITCamp
 
Big Data Solutions in Azure - David Giard
ITCamp
 
Kubernetes - Cloud Native Application Orchestration - Catalin Jora
ITCamp
 
Provisioning Windows instances at scale on Azure, AWS and OpenStack - Adrian ...
ITCamp
 
ITCamp 2017 - Ciprian Sorlea - Fostering Heroes
ITCamp
 
Xamarin Under The Hood - Dan Ardelean
ITCamp
 
Ad

Similar to Scaling face recognition with big data - Bogdan Bocse (20)

PPTX
Scaling Face Recognition with Big Data
Bogdan Bocse
 
PPTX
InfoEducatie - Face Recognition Architecture
Bogdan Bocse
 
PPTX
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
VisageCloud
 
PPTX
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp
 
PDF
Knowledge Discovery
André Karpištšenko
 
PDF
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Ganesan Narayanasamy
 
PDF
Big Data and artificial intelligence and it's usage in artificial intelligence
sohailahmed1683
 
PDF
Machine Learning Foundations for Professional Managers
Albert Y. C. Chen
 
PPTX
Germany 20180424 v8
home
 
PDF
Data Science as Scale
Conor B. Murphy
 
PDF
EDW 2015 cognitive computing panel session
Steve Ardire
 
PDF
Storage Challenges for Production Machine Learning
Nisha Talagala
 
PPTX
Integrating Machine Learning Capabilities into your team
Cameron Vetter
 
PPTX
ML6 talk at Nexxworks Bootcamp
Karel Dumon
 
PPTX
Ntegra 20180523 v10 copy.pptx
home
 
PPT
Big-Data Analytics for Media Management
techkrish
 
PDF
Big Data & Artificial Intelligence
Zavain Dar
 
PDF
Data Science with Spark
Krishna Sankar
 
PDF
Architecting for Data Science
Johann Schleier-Smith
 
PDF
Data Science, Machine Learning and Neural Networks
BICA Labs
 
Scaling Face Recognition with Big Data
Bogdan Bocse
 
InfoEducatie - Face Recognition Architecture
Bogdan Bocse
 
Scaling Face Recognition with Big Data - Key Notes at DevTalks Bucharest 2017
VisageCloud
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp
 
Knowledge Discovery
André Karpištšenko
 
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Ganesan Narayanasamy
 
Big Data and artificial intelligence and it's usage in artificial intelligence
sohailahmed1683
 
Machine Learning Foundations for Professional Managers
Albert Y. C. Chen
 
Germany 20180424 v8
home
 
Data Science as Scale
Conor B. Murphy
 
EDW 2015 cognitive computing panel session
Steve Ardire
 
Storage Challenges for Production Machine Learning
Nisha Talagala
 
Integrating Machine Learning Capabilities into your team
Cameron Vetter
 
ML6 talk at Nexxworks Bootcamp
Karel Dumon
 
Ntegra 20180523 v10 copy.pptx
home
 
Big-Data Analytics for Media Management
techkrish
 
Big Data & Artificial Intelligence
Zavain Dar
 
Data Science with Spark
Krishna Sankar
 
Architecting for Data Science
Johann Schleier-Smith
 
Data Science, Machine Learning and Neural Networks
BICA Labs
 
Ad

More from ITCamp (20)

PDF
ITCamp 2019 - Stacey M. Jenkins - Protecting your company's data - By psychol...
ITCamp
 
PDF
ITCamp 2019 - Silviu Niculita - Supercharge your AI efforts with the use of A...
ITCamp
 
PDF
ITCamp 2019 - Peter Leeson - Managing Skills
ITCamp
 
PPTX
ITCamp 2019 - Mihai Tataran - Governing your Cloud Resources
ITCamp
 
PDF
ITCamp 2019 - Ivana Milicic - Color - The Shadow Ruler of UX
ITCamp
 
PDF
ITCamp 2019 - Florin Coros - Implementing Clean Architecture
ITCamp
 
PPTX
ITCamp 2019 - Florin Loghiade - Azure Kubernetes in Production - Field notes...
ITCamp
 
PPTX
ITCamp 2019 - Florin Flestea - How 3rd Level support experience influenced m...
ITCamp
 
PPTX
ITCamp 2019 - Emil Craciun - RoboRestaurant of the future powered by serverle...
ITCamp
 
PPTX
ITCamp 2019 - Eldert Grootenboer - Cloud Architecture Recipes for The Enterprise
ITCamp
 
PPTX
ITCamp 2019 - Cristiana Fernbach - Blockchain Legal Trends
ITCamp
 
PPTX
ITCamp 2019 - Andy Cross - Business Outcomes from AI
ITCamp
 
PDF
ITCamp 2019 - Andrea Saltarello - Modernise your app. The Cloud Story
ITCamp
 
PDF
ITCamp 2019 - Andrea Saltarello - Implementing bots and Alexa skills using Az...
ITCamp
 
PPTX
ITCamp 2019 - Alex Mang - I'm Confused Should I Orchestrate my Containers on ...
ITCamp
 
PPTX
ITCamp 2019 - Alex Mang - How Far Can Serverless Actually Go Now
ITCamp
 
PDF
ITCamp 2019 - Peter Leeson - Vitruvian Quality
ITCamp
 
PDF
ITCamp 2018 - Ciprian Sorlea - Million Dollars Hello World Application
ITCamp
 
PDF
ITCamp 2018 - Ciprian Sorlea - Enterprise Architectures with TypeScript And F...
ITCamp
 
PDF
ITCamp 2018 - Mete Atamel Ian Talarico - Google Home meets .NET containers on...
ITCamp
 
ITCamp 2019 - Stacey M. Jenkins - Protecting your company's data - By psychol...
ITCamp
 
ITCamp 2019 - Silviu Niculita - Supercharge your AI efforts with the use of A...
ITCamp
 
ITCamp 2019 - Peter Leeson - Managing Skills
ITCamp
 
ITCamp 2019 - Mihai Tataran - Governing your Cloud Resources
ITCamp
 
ITCamp 2019 - Ivana Milicic - Color - The Shadow Ruler of UX
ITCamp
 
ITCamp 2019 - Florin Coros - Implementing Clean Architecture
ITCamp
 
ITCamp 2019 - Florin Loghiade - Azure Kubernetes in Production - Field notes...
ITCamp
 
ITCamp 2019 - Florin Flestea - How 3rd Level support experience influenced m...
ITCamp
 
ITCamp 2019 - Emil Craciun - RoboRestaurant of the future powered by serverle...
ITCamp
 
ITCamp 2019 - Eldert Grootenboer - Cloud Architecture Recipes for The Enterprise
ITCamp
 
ITCamp 2019 - Cristiana Fernbach - Blockchain Legal Trends
ITCamp
 
ITCamp 2019 - Andy Cross - Business Outcomes from AI
ITCamp
 
ITCamp 2019 - Andrea Saltarello - Modernise your app. The Cloud Story
ITCamp
 
ITCamp 2019 - Andrea Saltarello - Implementing bots and Alexa skills using Az...
ITCamp
 
ITCamp 2019 - Alex Mang - I'm Confused Should I Orchestrate my Containers on ...
ITCamp
 
ITCamp 2019 - Alex Mang - How Far Can Serverless Actually Go Now
ITCamp
 
ITCamp 2019 - Peter Leeson - Vitruvian Quality
ITCamp
 
ITCamp 2018 - Ciprian Sorlea - Million Dollars Hello World Application
ITCamp
 
ITCamp 2018 - Ciprian Sorlea - Enterprise Architectures with TypeScript And F...
ITCamp
 
ITCamp 2018 - Mete Atamel Ian Talarico - Google Home meets .NET containers on...
ITCamp
 

Recently uploaded (20)

PDF
The Past, Present & Future of Kenya's Digital Transformation
Moses Kemibaro
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
PDF
Market Wrap for 18th July 2025 by CIFDAQ
CIFDAQ
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
How Current Advanced Cyber Threats Transform Business Operation
Eryk Budi Pratama
 
PDF
Lecture A - AI Workflows for Banking.pdf
Dr. LAM Yat-fai (林日辉)
 
PDF
Alpha Altcoin Setup : TIA - 19th July 2025
CIFDAQ
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
PDF
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PPTX
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
PDF
OpenInfra ID 2025 - Are Containers Dying? Rethinking Isolation with MicroVMs.pdf
Muhammad Yuga Nugraha
 
PPTX
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 
The Past, Present & Future of Kenya's Digital Transformation
Moses Kemibaro
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
Market Wrap for 18th July 2025 by CIFDAQ
CIFDAQ
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
How Current Advanced Cyber Threats Transform Business Operation
Eryk Budi Pratama
 
Lecture A - AI Workflows for Banking.pdf
Dr. LAM Yat-fai (林日辉)
 
Alpha Altcoin Setup : TIA - 19th July 2025
CIFDAQ
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
Generative AI vs Predictive AI-The Ultimate Comparison Guide
Lily Clark
 
The Future of Artificial Intelligence (AI)
Mukul
 
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Researching The Best Chat SDK Providers in 2025
Ray Fields
 
OpenInfra ID 2025 - Are Containers Dying? Rethinking Isolation with MicroVMs.pdf
Muhammad Yuga Nugraha
 
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
State-Dependent Conformal Perception Bounds for Neuro-Symbolic Verification
Ivan Ruchkin
 

Scaling face recognition with big data - Bogdan Bocse

  • 1. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Scaling Face Recognition with Big Data Bogdan BOCȘE Solutions Architect & Co-founder VisageCloud https://VisageCloud.com https://www.linkedin.com/in/bogdanbocse/ https://twitter.com/bocse
  • 2. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Many thanks to our sponsors & partners! GOLD SILVER PARTNERS PLATINUM POWERED BY
  • 3. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • How to learn ? • What to learn? • Defining learning objectives • How to scale learning? • Gotchas • VisageCloud –Architecture –Use Cases Agenda
  • 4. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • What questions to ask before writing the code? • How to look at the data before feeding it to the machine? • What is the state of the art regarding ML? • What frameworks to use? • What are the common traps to avoid? • How to design for scale? Objectives
  • 5. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals HOW TO LEARN?
  • 6. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Vision • Convolutional Neural Networks • Inception Paper NLP • Word2Vec • GloVe: Global Vectors for Words Representation Generic • Classification • Prediction How to Learn?
  • 7. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Convolutional Neural Networks: Big Picture
  • 8. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Pooling / Max Pooling • Convolution • Fully Connected Activation – Activation Function, eg. ReLu Convolutional Neural Networks : Components
  • 9. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Learning is an optimization problem –Find parameters of a system (neural network) that minimize a fixed error function –Not unlike planning orbital paths • Defining the network architecture • Defining the training algorithm –Stochastic Gradient Descent • With momentum • With noisy Taking a Step Back: The Math
  • 10. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • DeepLearning4j – Independent company – Java interface with C-bindings for performance • TensorFlow – Python & C++ API – Developed by Google – Compatible with TPU • Torch – Developed by Facebook – Written in LuaJIT, with Python bindings Frameworks
  • 11. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals WHAT TO LEARN?
  • 12. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Public data sets –Labelled Faces in the Wild (LFW) –Youtube faces –Kaggle • Private data sets • Build your own –Outsourcing: Mechanical Turk –Crowsourcing: ReCaptcha model Data Sets
  • 13. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Preparing Data Clean data Cropping Structure Homogeneity Normalization Histograms Filtering
  • 14. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Machine learning is not magic • If you can’t understand the data, a machine probably won’t either • Preprocessing makes the difference between results • Applying filters, normalization, anomaly detection is computationally inexpensive Preparing Data
  • 15. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals DEFINING LEARNING OBJECTIVES
  • 16. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Supervised –Classification –Scoring and regression –Identification • Unsupervised –Clustering Defining learning objectives
  • 17. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Projecting input onto a fixed set of classes • “Don’t use a cannon to kill a fly” –Support Vector Machines • Linear • Radial Based Functions Classification
  • 18. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Embedding –Projecting input (image) onto an vector space with a known property • Triplet Loss Function Identification
  • 19. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Splitting a set of items into non-overlapping subsets, based on item attributes • Counting people in video streams • Algorithms: –Fixed threshold –K-means –Rank-order clustering Clustering
  • 20. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals HOW TO SCALE LEARNING?
  • 21. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Scaling training – Requires shared memory space – Vertical scaling • GPU • Soon-to-come: TPU (tensor processing unit) • Scaling evaluation – Shared nothing architecture – Neural network/classifier rarely change – Load balancing pattern – Partitioning data if needed How to scale learning?
  • 22. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • There is no “reduce” for neural networks • Averaging weights/parameters – Usually not a good idea • Genetic algorithms – Requires a lot of processing power – Running independent iterations on different machines – Crossover between weights/parameters of independently trained neural networks after each epoch Ideas for horizontal scaling
  • 23. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals GOTCHAS
  • 24. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Our 2D and 3D intuition often fails in high dimensions • Distances tend to become relatively “the same” as number of dimensions increases • Dimensionality reduction – Embedding functions – Principal component analysis The Curse of Dimensionality
  • 25. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • “The bottom of a valley is not necessarily the lowest point on Earth” • Learning algorithms may get stuck in local optima • Using momentum or some random noise reduces this possibility • Using genetic algorithms can be even more robust, but it’s computationally expensive Local Optima
  • 26. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Visualizing Local Optima monkey saddle
  • 27. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals “Based on state-of-the-art machine learning, our weather forecast system can predict tomorrow’s weather with 72% accuracy” Evaluating of Learning You get the same results by saying “it’s going to be the same as today”
  • 28. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Don’t test on the data you train on – Use different data set – Split the data sets you have • Beware of data biases – Confirmation bias – Survivorship bias – Selection bias • Compare against a benchmark, even a dummy one – Coin flip – Linear algorithms – “Same-as-before” Evaluation of Learning
  • 29. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Architecture and Use Cases
  • 30. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals High Level Architecture VisageCloud Production HAProxy (reverse proxy) Image Storage AWS S3 Service (API Controller) Cassandra Containers (Docker) Neural Networks (OpenCV, Dlib, Torch, pixie magic) CQL Binary HTTP API Consumer (Customer Infrastructure) HTTPS HTTP HTTPS
  • 31. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Detect faces Align faces Pre- processing Feature extraction Feature comparison Processing Pipeline
  • 32. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • The collection –Slice of data used together –10K-100K records • The Cache-Inside Pattern –Loading / preloading collection in one application server –Content based routing/balancing to maximize cache hits –No logic in the database layer –Requires periodic polling for updates • Weaker consistency Partitioning Data: Application Level Logic
  • 33. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Partitioning Data: Application Level Logic Application Layer Application Application Application Cassandra (Database Layer) Cassandra Node Cassandra Node Cassandra Node Cassandra Node Content-based balancing/routing Preload collectionPoll for updatesWrite updates
  • 34. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • Perform comparison logic in database –User Defined Aggregate Functions • Removes the need to move data around between application and database • Harder to deploy/test • Stronger consistency Partitioning Data: Application Level Logic
  • 35. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals • It’s math, not magic • If you don’t understand the data, neither will the machine • Preprocessing makes the difference • Test against a benchmark, any benchmark • Evaluate first, scale later Key Take-away
  • 36. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals [email protected] +(40) 724 714 234 https://www.linkedin.com/in/bogdanbocse/ https://twitter.com/bocse Let’s keep in touch
  • 37. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Many thanks to our sponsors & partners! GOLD SILVER PARTNERS PLATINUM POWERED BY
  • 38. @ITCAMPRO #ITCAMP17Community Conference for IT Professionals Q & A