SlideShare a Scribd company logo
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
Elastic @ Adobe: Making Search Smarter with Machine Learning at Scale

--- Baldo Faieta & Gaurav Kukal
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. 2
Adobe is unique in Search space as large
part of content is non textual in nature like
images, videos, 3d templates, psd, dcx but at
the same time billions of content pieces have
text as well ….
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
A buffet of Search Use Cases
3
1 2 3 4 5
Search Based
on Computer
Vision &
Metadata
Deep Textual
and hybrid
content Search
Video and
Richer format
Search
Enterprise
Search
Discovery and
Recommendation
s
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
“Computer vision is an interdisciplinary field that
deals with how computers can be made to gain
high-level understanding from digital
images or videos. From the perspective
of engineering, it seeks to automate tasks that
the human visual system can do”
Computer Vision, ML, AI to the rescue
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
Landscape of Search Product Wise
6
Adobe Experience CloudAdobe Creative Cloud Adobe Document Cloud
ADOBE MARKETING
CLOUD
ADOBE ANALYTICS CLOUD
ADOBE ADVERTISING CLOUD
Experience
Manager
Campaign PrimetimeTarget
Audience
Manager
Analytics
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
Landscape of Search Product Wise : Current State
7
Adobe Experience CloudAdobe Creative Cloud Adobe Document Cloud
ADOBE MARKETING
CLOUD
ADOBE ANALYTICS CLOUD
ADOBE ADVERTISING CLOUD
Experience
Manager
Campaign PrimetimeTarget
Audience
Manager
Analytics
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. 8
SEARCH EXPERIENCE
MATTERS MORE THAN
EVER
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
Architecture Search @Adobe
Birds Eye View
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
Adobe Search Elasticsearch Stats
3

Geographical Regions
~10
Billion
Documents
18

Production
Elastic Search Clusters
* As of June 2018
200
Shards

Max # Shards
16
~400 

Virtual Machines in AWS and
Azure
2
Public Clouds
~6000
Live Ingestion Rate/
second
~25000
Ingestion Rate/second
Capacity for Reindexing
~600
Queries Per Second
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
Adobe Search Elasticsearch : Self Managed
* As of June 2018
17
Self Managed Clusters on public clouds1
Moved Lr Search from AWS Elasticsearch hosted to Self-
Managed, 3.4 billion docs
2
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
Why Elasticsearch?
* As of June 2018
18
Because we have seen Solr Cloud code and cloud is an
after thought
1
Stability and resilience is very high (if done right )
2
Right balance of open-source with tight review process
3
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
Adobe Search Play with Elasticsearch
Elasticsearch as a white box ES Plugin for Image/Video similarity
& Search Ranking
* As of June 2018
19
1
Cost Saving: Hot & Cold Indices & Static and Live Indices
2
Index any Generic Entity into Elastic Search with zero code
change
3
Out of order Event : Optimistic locks using ES scripting
4
Custom ACLs in Elastic Search for Enterprise Searches
5
© 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
ML Deep dive
© 2018 Adobe Systems Incorporated. All Rights Reserved.
Stock powered by Elasticsearch
License images used in projects, ads,
websites, etc.
• Over 130M images
• Images indexed by tags, price, type,
…
• Novel visual search applications to
surface more results as well as
differentiator
Do these better using deep learned
representations (DLR)
© 2018 Adobe Systems Incorporated. All Rights Reserved.
Deep Learned Representation: Embedding
Project to continuous space where similarity of
particular property can be mapped to
(Euclidean) distance:
Dense, small (1k dimensions) vectors
Similarity score corresponds to square
distance
Dimensions usually don’t have meaning
Usually trained using (deep) neural
networks
Used to power deep search engine Image embedding space
Image Similar tags (“semantic”)
maps to
Word (word2vec) Similar word context
Face Similar face attributes
© 2018 Adobe Systems Incorporated. All Rights Reserved.
Image Embedding
Trained using a neural network (CNN)
On an auxiliary task (e.g., classification)
Layer 1
Layer 4 Layer 5
CNN
Embedding
Layer
1 DOG
0 SURF
1 SUNSET
0 SUN
1 WOMAN
0 PALM
0 HEAD
0 50s
1 BEACH
0 STARFISH
0 SMILE
0
RETRIEVER
…
…
WOMAN, DOG, BEACH, …
Training:
• Feed image and network tries to
predict tag indicators. Errors
backpropagated
Embedding layer = layer before last
• Embedding is output of embedding
layer
• Captures abstract rep. of tags for
image
• 1k dimensions vector
• Learns filters activated when
patterns arise at various layers
© 2018 Adobe Systems Incorporated. All Rights Reserved.
Deep Search Engine
24
Get embeddings for all images and index them
Deep
Search Engine
Embeddings
Query
embedding
Query
Query using an embedding corresp. to query image
Relevant results are those images whose embeddings
are closest to query embedding using Euclidean distance
Close by embeddings corresp. to ”semantically” similar
Finding similar images:
• Calculate similarity for all candidates
• Pick top k
© 2018 Adobe Systems Incorporated. All Rights Reserved.
Towards a Reverse Index for Embeddings
Too expensive to calc. distance with 100M+ per query 3
1 4
8
15
18
20
14
16
17
56
7
12
9
10
13
2
19
11
14
9
10
13
query
…Bucket1
Bucket2
Bucket3
…
…
…
…
So, bucketize (cluster) embeddings assigning them
to one of 1k buckets
Find nearest 20 buckets to query from bucket
centroids
Only compare query with embeddings of 20 nearest
buckets (2% of corpus)
Reverse index:
• Bucket to image embedding
© 2018 Adobe Systems Incorporated. All Rights Reserved.
PQ-Codes
Embeddings are still too big to keep in index (4kb *
100M+)
• Also, lots of floating point distance calculations
1011
0110
1111
0100
1010
01010010
1110
1001
1101
0111
0011
1000
1100
0001
0000
1011
0110
1111
0100
1010
01010010
1110
1001
1101
0111
0011
1000
1100
0001
0000
1011
0110
1111
0100
1010
0101 0010
1110
1001
1101
0111
0011
1000
1100
0001
0000
1011
0110
1111
0100
1010
01010010
1110
1001
1101
0111
0011
1000
1100
0001
0000
s1 s2 s63 s64
1k floats / 4KB
64 bytes
…
…
Compress embedding using trained encoder:
• Subdivide embedding space in (64) subspaces
During search:
• Because codes quantize sub-vectors, we can pre-
calculate values dependent on quantized -centroids
and bucket centroids
• Distance to candidate is fast because we can leverage
LUTs calculated once per query per bucket
• Cluster each subspace in 256 clusters
• Encode every subspace-vector of
embedding with ID of nearest cluster (as
byte)
• PQ-Code is concatenation of subspace IDs
• Store pqcode, bucketID in index
tabQC – 2.ql + tl + 2.tcl
Query
&
quantized-centroids
Query
&
Bucket
Bucket Bucket
&
Sub-centroids
PrecalculatedPrecalculatedOnce per query
per bucket
Once per query
© 2018 Adobe Systems Incorporated. All Rights Reserved.
Elasticsearch Pqcode Plugin
27
Has to work in conjunction with other asset data in ES
So, implement deep search as a plugin and store pqcodes in ES
Plugin implements comparison between query embedding and candidates pqcodes
CAS analyzer outputs query embedding
Reverse index used to limit search to nearest 20 buckets
Calculate scores for all candidates in buckets
CAS
Model Encode
r
Embeddin
g +
Nearest
Buckets
Elastic Search
ES Pqcode
Plugin
Encode
r
Images
index
pqcode
s
© 2018 Adobe Systems Incorporated. All Rights Reserved..
Demo – Find Similar Controls
© 2018 Adobe Systems Incorporated. All Rights Reserved..
Exploration/Refinement
• Majority 1, 2 word queries
Numberofsearches(M)
0
15
30
45
60
Number of words / query
1 2 3 4 5
• 1 word queries
• Banana, christmas, family, beach, food, flowers,
car, …
• Very general
• Exploration
• 2 word queries
• happy family, wood background, doctor patient,
business man,…
• Still too general
• Refinement by query rewrite
• Both modes can be helped by way of
clustering
© 2018 Adobe Systems Incorporated. All Rights Reserved.
Clustering PQ Codes
Use pqcodes for top 5k results
to cluster using k-means:
© 2018 Adobe Systems Incorporated. All Rights Reserved.
Clustering PQ Codes
Use pqcodes for top 5k results to
cluster using k-means:
Iteratively assign pqcodes to cluster
of closest centroid and recalculate
centroids until convergence
© 2018 Adobe Systems Incorporated. All Rights Reserved.
Clustering PQ Codes
Use pqcodes for top 5k results to
cluster using k-means:
Iteratively assign pqcodes to cluster
of closest centroid and recalculate
centroids until convergence
Centroids like queries and
assignments like find-similar
Decision to assign to a cluster uses
only additions and subtractions
s(y1,xi)−s(y2,xi) = tabQC1 + ql1
− tabQC2 − ql2
Distance to
Centroid Y1
Distance to
Centroid Y2
Once per
cluster centroid
Once per cluster centroid
per bucket
© 2018 Adobe Systems Incorporated. All Rights Reserved..
Demo – Clustering Stock Images
Elastic @ Adobe: Making Search Smarter with Machine Learning at Scale

More Related Content

What's hot (20)

PDF
Security Events Logging at Bell with the Elastic Stack
Elasticsearch
 
PDF
InfoTrack: Creating a single source of truth with the Elastic Stack
Elasticsearch
 
PDF
Centralized logging in a changing environment at the UK’s DVLA
Elasticsearch
 
PDF
CSG’s Journey with Elastic
Elasticsearch
 
PDF
Elastic @ John Deere
Elasticsearch
 
PDF
IoTforReal Seminar slidedeck
Codit
 
PDF
Machine Learning for Anomaly Detection, Time Series Modeling, and More
Elasticsearch
 
PDF
#IoTforReal Seminar slidedeck (Codit Belgium - Ghelamco Arena Gent)
Codit
 
PDF
Kubernetes Jakarta Meetup 010 - Service Mesh Observability with Kiali
Yusuf Hadiwinata Sutandar
 
PDF
Infrastructure monitoring made easy, from ingest to insight
Elasticsearch
 
PPTX
Aeris + Cassandra: An IOT Solution Helping Automakers Make the Connected Car ...
DataStax
 
PPTX
CI/CD for a Data Platform
Codit
 
PDF
Elastic and Google: Observability for multicloud and hybrid environments
Elasticsearch
 
PDF
CSX: Real-time Business Discovery with the Elastic Stack
Elasticsearch
 
PPTX
Elastic community Abidjan #225 meetup 08 May 2021
Yassine, LASRI
 
PDF
Securing APIs for ultimate security and privacy with Azure | Codit Webinar
Codit
 
PDF
Combining Logs, Metrics, and Traces for Unified Observability
Elasticsearch
 
PDF
Empower Your Security Practitioners with Elastic SIEM
Elasticsearch
 
PPTX
Hoe het Azure ecosysteem een cruciale rol speelt in uw IoT-oplossing (Glenn C...
Codit
 
PDF
Elastic Security : Protéger son entreprise avec la Suite Elastic
Elasticsearch
 
Security Events Logging at Bell with the Elastic Stack
Elasticsearch
 
InfoTrack: Creating a single source of truth with the Elastic Stack
Elasticsearch
 
Centralized logging in a changing environment at the UK’s DVLA
Elasticsearch
 
CSG’s Journey with Elastic
Elasticsearch
 
Elastic @ John Deere
Elasticsearch
 
IoTforReal Seminar slidedeck
Codit
 
Machine Learning for Anomaly Detection, Time Series Modeling, and More
Elasticsearch
 
#IoTforReal Seminar slidedeck (Codit Belgium - Ghelamco Arena Gent)
Codit
 
Kubernetes Jakarta Meetup 010 - Service Mesh Observability with Kiali
Yusuf Hadiwinata Sutandar
 
Infrastructure monitoring made easy, from ingest to insight
Elasticsearch
 
Aeris + Cassandra: An IOT Solution Helping Automakers Make the Connected Car ...
DataStax
 
CI/CD for a Data Platform
Codit
 
Elastic and Google: Observability for multicloud and hybrid environments
Elasticsearch
 
CSX: Real-time Business Discovery with the Elastic Stack
Elasticsearch
 
Elastic community Abidjan #225 meetup 08 May 2021
Yassine, LASRI
 
Securing APIs for ultimate security and privacy with Azure | Codit Webinar
Codit
 
Combining Logs, Metrics, and Traces for Unified Observability
Elasticsearch
 
Empower Your Security Practitioners with Elastic SIEM
Elasticsearch
 
Hoe het Azure ecosysteem een cruciale rol speelt in uw IoT-oplossing (Glenn C...
Codit
 
Elastic Security : Protéger son entreprise avec la Suite Elastic
Elasticsearch
 

Similar to Elastic @ Adobe: Making Search Smarter with Machine Learning at Scale (20)

PDF
Open Source AI - News and examples
Luciano Resende
 
PPTX
IBM Developer Model Asset eXchange - Deep Learning for Everyone
Nick Pentreath
 
PDF
Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024
Tobias Schneck
 
PDF
Containers & AI - Beauty and the Beast!?!
Tobias Schneck
 
PPTX
Oracle Data Science Platform
Oracle Developers
 
PDF
Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....
Tobias Schneck
 
PPTX
Inteligencia artificial, open source e IBM Call for Code
Luciano Resende
 
PDF
Building Generative AI-infused apps: what's possible and how to start
Maxim Salnikov
 
PDF
2018 Oracle Impact 발표자료: Oracle Enterprise AI
Taewan Kim
 
PDF
GPT and Graph Data Science to power your Knowledge Graph
Neo4j
 
PPTX
Adobe Spark Meetup - 9/19/2018 - San Jose, CA
Jaemi Bremner
 
PDF
AWS で構築するコンピュータビジョンアプリケーション
Amazon Web Services Japan
 
PPTX
AEM DataLayer IMMERSE 2017 Presentation by Dan Klco
Daniel Klco
 
PPTX
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy Cabral
 
PPTX
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Sri Ambati
 
PDF
20181127 オラクル講演資料(DataRobot AI Experience Tokyo)
オラクルエンジニア通信
 
PDF
Amazon Deeplens 와 컴퓨터 비전 딥러닝 어플리케이션 활용::Sunil Mallya::AWS Summit Seoul 2018
Amazon Web Services Korea
 
PDF
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
Codemotion
 
PDF
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
Kai Wähner
 
PPTX
search_demystified_presentation for SEO SE<
Abhishek Sharma
 
Open Source AI - News and examples
Luciano Resende
 
IBM Developer Model Asset eXchange - Deep Learning for Everyone
Nick Pentreath
 
Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024
Tobias Schneck
 
Containers & AI - Beauty and the Beast!?!
Tobias Schneck
 
Oracle Data Science Platform
Oracle Developers
 
Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....
Tobias Schneck
 
Inteligencia artificial, open source e IBM Call for Code
Luciano Resende
 
Building Generative AI-infused apps: what's possible and how to start
Maxim Salnikov
 
2018 Oracle Impact 발표자료: Oracle Enterprise AI
Taewan Kim
 
GPT and Graph Data Science to power your Knowledge Graph
Neo4j
 
Adobe Spark Meetup - 9/19/2018 - San Jose, CA
Jaemi Bremner
 
AWS で構築するコンピュータビジョンアプリケーション
Amazon Web Services Japan
 
AEM DataLayer IMMERSE 2017 Presentation by Dan Klco
Daniel Klco
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy Cabral
 
Drive Away Fraudsters With Driverless AI - Venkatesh Ramanathan, Senior Data ...
Sri Ambati
 
20181127 オラクル講演資料(DataRobot AI Experience Tokyo)
オラクルエンジニア通信
 
Amazon Deeplens 와 컴퓨터 비전 딥러닝 어플리케이션 활용::Sunil Mallya::AWS Summit Seoul 2018
Amazon Web Services Korea
 
How to Leverage Machine Learning (R, Hadoop, Spark, H2O) for Real Time Proces...
Codemotion
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
Kai Wähner
 
search_demystified_presentation for SEO SE<
Abhishek Sharma
 
Ad

More from Elasticsearch (20)

PDF
An introduction to Elasticsearch's advanced relevance ranking toolbox
Elasticsearch
 
PDF
From MSP to MSSP using Elastic
Elasticsearch
 
PDF
Cómo crear excelentes experiencias de búsqueda en sitios web
Elasticsearch
 
PDF
Te damos la bienvenida a una nueva forma de realizar búsquedas
Elasticsearch
 
PDF
Tirez pleinement parti d'Elastic grâce à Elastic Cloud
Elasticsearch
 
PDF
Comment transformer vos données en informations exploitables
Elasticsearch
 
PDF
Plongez au cœur de la recherche dans tous ses états.
Elasticsearch
 
PDF
Modernising One Legal Se@rch with Elastic Enterprise Search [Customer Story]
Elasticsearch
 
PDF
An introduction to Elasticsearch's advanced relevance ranking toolbox
Elasticsearch
 
PDF
Welcome to a new state of find
Elasticsearch
 
PDF
Building great website search experiences
Elasticsearch
 
PDF
Keynote: Harnessing the power of Elasticsearch for simplified search
Elasticsearch
 
PDF
Cómo transformar los datos en análisis con los que tomar decisiones
Elasticsearch
 
PDF
Explore relève les défis Big Data avec Elastic Cloud
Elasticsearch
 
PDF
Comment transformer vos données en informations exploitables
Elasticsearch
 
PDF
Transforming data into actionable insights
Elasticsearch
 
PDF
Opening Keynote: Why Elastic?
Elasticsearch
 
PDF
Empowering agencies using Elastic as a Service inside Government
Elasticsearch
 
PDF
The opportunities and challenges of data for public good
Elasticsearch
 
PDF
Enterprise search and unstructured data with CGI and Elastic
Elasticsearch
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
Elasticsearch
 
From MSP to MSSP using Elastic
Elasticsearch
 
Cómo crear excelentes experiencias de búsqueda en sitios web
Elasticsearch
 
Te damos la bienvenida a una nueva forma de realizar búsquedas
Elasticsearch
 
Tirez pleinement parti d'Elastic grâce à Elastic Cloud
Elasticsearch
 
Comment transformer vos données en informations exploitables
Elasticsearch
 
Plongez au cœur de la recherche dans tous ses états.
Elasticsearch
 
Modernising One Legal Se@rch with Elastic Enterprise Search [Customer Story]
Elasticsearch
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
Elasticsearch
 
Welcome to a new state of find
Elasticsearch
 
Building great website search experiences
Elasticsearch
 
Keynote: Harnessing the power of Elasticsearch for simplified search
Elasticsearch
 
Cómo transformar los datos en análisis con los que tomar decisiones
Elasticsearch
 
Explore relève les défis Big Data avec Elastic Cloud
Elasticsearch
 
Comment transformer vos données en informations exploitables
Elasticsearch
 
Transforming data into actionable insights
Elasticsearch
 
Opening Keynote: Why Elastic?
Elasticsearch
 
Empowering agencies using Elastic as a Service inside Government
Elasticsearch
 
The opportunities and challenges of data for public good
Elasticsearch
 
Enterprise search and unstructured data with CGI and Elastic
Elasticsearch
 
Ad

Recently uploaded (20)

PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Biography of Daniel Podor.pdf
Daniel Podor
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 

Elastic @ Adobe: Making Search Smarter with Machine Learning at Scale

  • 1. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. Elastic @ Adobe: Making Search Smarter with Machine Learning at Scale
 --- Baldo Faieta & Gaurav Kukal
  • 2. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. 2 Adobe is unique in Search space as large part of content is non textual in nature like images, videos, 3d templates, psd, dcx but at the same time billions of content pieces have text as well ….
  • 3. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. A buffet of Search Use Cases 3 1 2 3 4 5 Search Based on Computer Vision & Metadata Deep Textual and hybrid content Search Video and Richer format Search Enterprise Search Discovery and Recommendation s
  • 4. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. “Computer vision is an interdisciplinary field that deals with how computers can be made to gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to automate tasks that the human visual system can do” Computer Vision, ML, AI to the rescue
  • 5. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
  • 6. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. Landscape of Search Product Wise 6 Adobe Experience CloudAdobe Creative Cloud Adobe Document Cloud ADOBE MARKETING CLOUD ADOBE ANALYTICS CLOUD ADOBE ADVERTISING CLOUD Experience Manager Campaign PrimetimeTarget Audience Manager Analytics
  • 7. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. Landscape of Search Product Wise : Current State 7 Adobe Experience CloudAdobe Creative Cloud Adobe Document Cloud ADOBE MARKETING CLOUD ADOBE ANALYTICS CLOUD ADOBE ADVERTISING CLOUD Experience Manager Campaign PrimetimeTarget Audience Manager Analytics
  • 8. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. 8 SEARCH EXPERIENCE MATTERS MORE THAN EVER
  • 9. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. Architecture Search @Adobe Birds Eye View
  • 10. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
  • 11. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
  • 12. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
  • 13. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
  • 14. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
  • 15. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential.
  • 16. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. Adobe Search Elasticsearch Stats 3
 Geographical Regions ~10 Billion Documents 18
 Production Elastic Search Clusters * As of June 2018 200 Shards
 Max # Shards 16 ~400 
 Virtual Machines in AWS and Azure 2 Public Clouds ~6000 Live Ingestion Rate/ second ~25000 Ingestion Rate/second Capacity for Reindexing ~600 Queries Per Second
  • 17. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. Adobe Search Elasticsearch : Self Managed * As of June 2018 17 Self Managed Clusters on public clouds1 Moved Lr Search from AWS Elasticsearch hosted to Self- Managed, 3.4 billion docs 2
  • 18. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. Why Elasticsearch? * As of June 2018 18 Because we have seen Solr Cloud code and cloud is an after thought 1 Stability and resilience is very high (if done right ) 2 Right balance of open-source with tight review process 3
  • 19. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. Adobe Search Play with Elasticsearch Elasticsearch as a white box ES Plugin for Image/Video similarity & Search Ranking * As of June 2018 19 1 Cost Saving: Hot & Cold Indices & Static and Live Indices 2 Index any Generic Entity into Elastic Search with zero code change 3 Out of order Event : Optimistic locks using ES scripting 4 Custom ACLs in Elastic Search for Enterprise Searches 5
  • 20. © 2018 Adobe. All Rights Reserved. Adobe Confidential.© 2018 Adobe. All Rights Reserved. Adobe Confidential. ML Deep dive
  • 21. © 2018 Adobe Systems Incorporated. All Rights Reserved. Stock powered by Elasticsearch License images used in projects, ads, websites, etc. • Over 130M images • Images indexed by tags, price, type, … • Novel visual search applications to surface more results as well as differentiator Do these better using deep learned representations (DLR)
  • 22. © 2018 Adobe Systems Incorporated. All Rights Reserved. Deep Learned Representation: Embedding Project to continuous space where similarity of particular property can be mapped to (Euclidean) distance: Dense, small (1k dimensions) vectors Similarity score corresponds to square distance Dimensions usually don’t have meaning Usually trained using (deep) neural networks Used to power deep search engine Image embedding space Image Similar tags (“semantic”) maps to Word (word2vec) Similar word context Face Similar face attributes
  • 23. © 2018 Adobe Systems Incorporated. All Rights Reserved. Image Embedding Trained using a neural network (CNN) On an auxiliary task (e.g., classification) Layer 1 Layer 4 Layer 5 CNN Embedding Layer 1 DOG 0 SURF 1 SUNSET 0 SUN 1 WOMAN 0 PALM 0 HEAD 0 50s 1 BEACH 0 STARFISH 0 SMILE 0 RETRIEVER … … WOMAN, DOG, BEACH, … Training: • Feed image and network tries to predict tag indicators. Errors backpropagated Embedding layer = layer before last • Embedding is output of embedding layer • Captures abstract rep. of tags for image • 1k dimensions vector • Learns filters activated when patterns arise at various layers
  • 24. © 2018 Adobe Systems Incorporated. All Rights Reserved. Deep Search Engine 24 Get embeddings for all images and index them Deep Search Engine Embeddings Query embedding Query Query using an embedding corresp. to query image Relevant results are those images whose embeddings are closest to query embedding using Euclidean distance Close by embeddings corresp. to ”semantically” similar Finding similar images: • Calculate similarity for all candidates • Pick top k
  • 25. © 2018 Adobe Systems Incorporated. All Rights Reserved. Towards a Reverse Index for Embeddings Too expensive to calc. distance with 100M+ per query 3 1 4 8 15 18 20 14 16 17 56 7 12 9 10 13 2 19 11 14 9 10 13 query …Bucket1 Bucket2 Bucket3 … … … … So, bucketize (cluster) embeddings assigning them to one of 1k buckets Find nearest 20 buckets to query from bucket centroids Only compare query with embeddings of 20 nearest buckets (2% of corpus) Reverse index: • Bucket to image embedding
  • 26. © 2018 Adobe Systems Incorporated. All Rights Reserved. PQ-Codes Embeddings are still too big to keep in index (4kb * 100M+) • Also, lots of floating point distance calculations 1011 0110 1111 0100 1010 01010010 1110 1001 1101 0111 0011 1000 1100 0001 0000 1011 0110 1111 0100 1010 01010010 1110 1001 1101 0111 0011 1000 1100 0001 0000 1011 0110 1111 0100 1010 0101 0010 1110 1001 1101 0111 0011 1000 1100 0001 0000 1011 0110 1111 0100 1010 01010010 1110 1001 1101 0111 0011 1000 1100 0001 0000 s1 s2 s63 s64 1k floats / 4KB 64 bytes … … Compress embedding using trained encoder: • Subdivide embedding space in (64) subspaces During search: • Because codes quantize sub-vectors, we can pre- calculate values dependent on quantized -centroids and bucket centroids • Distance to candidate is fast because we can leverage LUTs calculated once per query per bucket • Cluster each subspace in 256 clusters • Encode every subspace-vector of embedding with ID of nearest cluster (as byte) • PQ-Code is concatenation of subspace IDs • Store pqcode, bucketID in index tabQC – 2.ql + tl + 2.tcl Query & quantized-centroids Query & Bucket Bucket Bucket & Sub-centroids PrecalculatedPrecalculatedOnce per query per bucket Once per query
  • 27. © 2018 Adobe Systems Incorporated. All Rights Reserved. Elasticsearch Pqcode Plugin 27 Has to work in conjunction with other asset data in ES So, implement deep search as a plugin and store pqcodes in ES Plugin implements comparison between query embedding and candidates pqcodes CAS analyzer outputs query embedding Reverse index used to limit search to nearest 20 buckets Calculate scores for all candidates in buckets CAS Model Encode r Embeddin g + Nearest Buckets Elastic Search ES Pqcode Plugin Encode r Images index pqcode s
  • 28. © 2018 Adobe Systems Incorporated. All Rights Reserved.. Demo – Find Similar Controls
  • 29. © 2018 Adobe Systems Incorporated. All Rights Reserved.. Exploration/Refinement • Majority 1, 2 word queries Numberofsearches(M) 0 15 30 45 60 Number of words / query 1 2 3 4 5 • 1 word queries • Banana, christmas, family, beach, food, flowers, car, … • Very general • Exploration • 2 word queries • happy family, wood background, doctor patient, business man,… • Still too general • Refinement by query rewrite • Both modes can be helped by way of clustering
  • 30. © 2018 Adobe Systems Incorporated. All Rights Reserved. Clustering PQ Codes Use pqcodes for top 5k results to cluster using k-means:
  • 31. © 2018 Adobe Systems Incorporated. All Rights Reserved. Clustering PQ Codes Use pqcodes for top 5k results to cluster using k-means: Iteratively assign pqcodes to cluster of closest centroid and recalculate centroids until convergence
  • 32. © 2018 Adobe Systems Incorporated. All Rights Reserved. Clustering PQ Codes Use pqcodes for top 5k results to cluster using k-means: Iteratively assign pqcodes to cluster of closest centroid and recalculate centroids until convergence Centroids like queries and assignments like find-similar Decision to assign to a cluster uses only additions and subtractions s(y1,xi)−s(y2,xi) = tabQC1 + ql1 − tabQC2 − ql2 Distance to Centroid Y1 Distance to Centroid Y2 Once per cluster centroid Once per cluster centroid per bucket
  • 33. © 2018 Adobe Systems Incorporated. All Rights Reserved.. Demo – Clustering Stock Images