SlideShare a Scribd company logo
SEQUEL: Query Completion via Pattern Mining
on Multi-Column Structural Data
Chuancong Gao, Qingyan Yang, Jianyong Wang Tsinghua University, Beijing, China
Structural Data Description
Mined Pattern Structure
Suggestion Progress
STEP 1: Search the index of each column, find at least one combination
(matching order) of columns matching on the input query.
E.g., Query “www da” will be matched as (with the indexes in right-side):
Advantages Comparing to Other Systems
Pattern Index Structure – Trie Tree
Example on Column Title Phrase and Venue
Structural
Data
Formalize Mine & Index
Mined Patterns
Indexes for Each Column
Query
...
...Preprocess
...
...
Try to Match Greedily on
Each Column Index
Patterns for m
Match
Combinations
Top-k Selection on
Last-Matched Column
for m Combinations Top-k
Selection from
m×k
Candidates
Output
Offline Part
Online Part
≥ ≥
≥ ≥
≥ ≥
≥ ≥
... .........
≥ : Ranking Score Comparison
: supnn -
The DBLP Computer Science Bibliography (DBLP)
• > 1,400,000 Publication Entries
• Four Attributes for each Publication Entry:
• Authors (e.g. Jiawei Han, Guozhu Dong, Yiwen Yin)
• Title (e.g. Efficient Mining of Partial Periodic Patterns in Time
Series Database)
• Venue (e.g. ICDE)
• Year (e.g. 1999)
1. Title Phrase “frequent patterns” appears 17 times in Venue “icdm”
2. Title Phrase “pattern” appears 14 times for Authors “jian pei” and
“jiawei han”
• Suggests Patterns mined from underlying Data instead of Query Logs
• More Accurate and Meaningful
• Low Amount and Quality of Query Logs on Structural Data
• No need to Specify Explicitly Different Columns in Query
• Suggests Phrases instead of Single Terms
• Fast for both Offline Pattern Mining and Online Suggestion
d
a
t
a
b
e
s
a w
e
b
tl
a
m
r
o
f
me
d
c
i
w
w
w
m
l1 2 3 ...
...
... ...
2 5 6 ... ... ...
3 4 8 10 ...
5 ... 4 ...
data
data icde
data www
data web www
database icde
icde
www
1
2
3
4
5
6
7
8
w
w
w
7 8 ...
www www
www
9
10
50263
514
14
14
312
2666
880
4
1262
Title Phrase Index Venue Index
Title Phrase Venueid supid
Some Selected Patterns
d
a
t
a
9 ...
Blank Node Normal Node Phrase-end Node
www data 17
http://dbgroup.cs.tsinghua.edu.cn/chuancong/sequel
STEP 2: Suggest on the last matched column of each matching order.
Based on Frequent Sequential Pattern Mining algorithm PrefixSpan:
• Treat Authors as Itemset
• Treat Title as Sequence
• Treat Venue & Year as Single-Item
• Concatenate all the columns together as a new Sequence
• Mine and Index
Used Minimum Support (Frequency) Threshold: 10
Pattern Mining Algorithm
• Used for fast column text matching
• Every column has one corresponding Trie tree
• All the indexes share a global table storing all the patterns
• Close to 2GB in total in memory

More Related Content

What's hot (18)

Corpus studio Erwin Komen
Corpus studio Erwin KomenCorpus studio Erwin Komen
Corpus studio Erwin Komen
CLARIAH
 
Henning agt talk-caise-semnet
Henning agt   talk-caise-semnetHenning agt   talk-caise-semnet
Henning agt talk-caise-semnet
caise2013vlc
 
Bca ii dfs u-2 linklist,stack,queue
Bca ii  dfs u-2 linklist,stack,queueBca ii  dfs u-2 linklist,stack,queue
Bca ii dfs u-2 linklist,stack,queue
Rai University
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
Victoria López
 
Data wrangling with dplyr
Data wrangling with dplyrData wrangling with dplyr
Data wrangling with dplyr
C. Tobin Magle
 
Data and Donuts: Data cleaning with OpenRefine
Data and Donuts: Data cleaning with OpenRefineData and Donuts: Data cleaning with OpenRefine
Data and Donuts: Data cleaning with OpenRefine
C. Tobin Magle
 
Deletion from single way linked list and search
Deletion from single way linked list and searchDeletion from single way linked list and search
Deletion from single way linked list and search
Estiak Khan
 
Data Structures 01
Data Structures 01Data Structures 01
Data Structures 01
Budditha Hettige
 
Link List
Link ListLink List
Link List
Budditha Hettige
 
Linked list
Linked listLinked list
Linked list
Md. Afif Al Mamun
 
linked list
linked list linked list
linked list
Mohaimin Rahat
 
ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018
ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018
ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018
CLARIAH
 
02 Stack
02 Stack02 Stack
02 Stack
Budditha Hettige
 
Effective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF dataEffective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF data
Roi Blanco
 
Starting work with R
Starting work with RStarting work with R
Starting work with R
Vladimir Bakhrushin
 
AITC: White Paper on Distributed Level Of Permission Hierarchy
AITC: White Paper on Distributed Level Of Permission HierarchyAITC: White Paper on Distributed Level Of Permission Hierarchy
AITC: White Paper on Distributed Level Of Permission Hierarchy
Rajesh Kumar
 
Circular link list.ppt
Circular link list.pptCircular link list.ppt
Circular link list.ppt
Tirthika Bandi
 
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentAre Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Harsh Thakkar
 
Corpus studio Erwin Komen
Corpus studio Erwin KomenCorpus studio Erwin Komen
Corpus studio Erwin Komen
CLARIAH
 
Henning agt talk-caise-semnet
Henning agt   talk-caise-semnetHenning agt   talk-caise-semnet
Henning agt talk-caise-semnet
caise2013vlc
 
Bca ii dfs u-2 linklist,stack,queue
Bca ii  dfs u-2 linklist,stack,queueBca ii  dfs u-2 linklist,stack,queue
Bca ii dfs u-2 linklist,stack,queue
Rai University
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
Victoria López
 
Data wrangling with dplyr
Data wrangling with dplyrData wrangling with dplyr
Data wrangling with dplyr
C. Tobin Magle
 
Data and Donuts: Data cleaning with OpenRefine
Data and Donuts: Data cleaning with OpenRefineData and Donuts: Data cleaning with OpenRefine
Data and Donuts: Data cleaning with OpenRefine
C. Tobin Magle
 
Deletion from single way linked list and search
Deletion from single way linked list and searchDeletion from single way linked list and search
Deletion from single way linked list and search
Estiak Khan
 
ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018
ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018
ACAD Presentation by Wilbert Spooren, CLARIAH Toogdag 19-10-2018
CLARIAH
 
Effective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF dataEffective and Efficient Entity Search in RDF data
Effective and Efficient Entity Search in RDF data
Roi Blanco
 
AITC: White Paper on Distributed Level Of Permission Hierarchy
AITC: White Paper on Distributed Level Of Permission HierarchyAITC: White Paper on Distributed Level Of Permission Hierarchy
AITC: White Paper on Distributed Level Of Permission Hierarchy
Rajesh Kumar
 
Circular link list.ppt
Circular link list.pptCircular link list.ppt
Circular link list.ppt
Tirthika Bandi
 
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality AssessmentAre Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Are Linked Datasets fit for Open-domain Question Answering? A Quality Assessment
Harsh Thakkar
 

Similar to CIKM 2010 Demo - SEQUEL: query completion via pattern mining on multi-column structural data (20)

Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Chris Fregly
 
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
Spark Summit
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Spark Summit
 
Harvester_presentaion
Harvester_presentaionHarvester_presentaion
Harvester_presentaion
Ashwin Kasilingam
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
DataWorks Summit
 
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaThe Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
Spark Summit
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
Minsoo Jun
 
Workflow Provenance: From Modelling to Reporting
Workflow Provenance: From Modelling to ReportingWorkflow Provenance: From Modelling to Reporting
Workflow Provenance: From Modelling to Reporting
Rayhan Ferdous
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...
Databricks
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things Right
Databricks
 
Intro to Data warehousing lecture 19
Intro to Data warehousing   lecture 19Intro to Data warehousing   lecture 19
Intro to Data warehousing lecture 19
AnwarrChaudary
 
Intro to Data warehousing lecture 14
Intro to Data warehousing   lecture 14Intro to Data warehousing   lecture 14
Intro to Data warehousing lecture 14
AnwarrChaudary
 
Intro to Data warehousing lecture 11
Intro to Data warehousing   lecture 11Intro to Data warehousing   lecture 11
Intro to Data warehousing lecture 11
AnwarrChaudary
 
Spark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesSpark DataFrames and ML Pipelines
Spark DataFrames and ML Pipelines
Databricks
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
Ian Foster
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
yashsharma863914
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
DanBarcan2
 
Cassandra
CassandraCassandra
Cassandra
ssuserbad56d
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
DataStax Academy
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
Angelo Salatino
 
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Cassandra Summit Sept 2015 - Real Time Advanced Analytics with Spark and Cass...
Chris Fregly
 
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
Building, Debugging, and Tuning Spark Machine Leaning Pipelines-(Joseph Bradl...
Spark Summit
 
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Building a Dataset Search Engine with Spark and Elasticsearch: Spark Summit E...
Spark Summit
 
Practical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on HadoopPractical Distributed Machine Learning Pipelines on Hadoop
Practical Distributed Machine Learning Pipelines on Hadoop
DataWorks Summit
 
The Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago MolaThe Pushdown of Everything by Stephan Kessler and Santiago Mola
The Pushdown of Everything by Stephan Kessler and Santiago Mola
Spark Summit
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
Minsoo Jun
 
Workflow Provenance: From Modelling to Reporting
Workflow Provenance: From Modelling to ReportingWorkflow Provenance: From Modelling to Reporting
Workflow Provenance: From Modelling to Reporting
Rayhan Ferdous
 
Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...Why you should care about data layout in the file system with Cheng Lian and ...
Why you should care about data layout in the file system with Cheng Lian and ...
Databricks
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things Right
Databricks
 
Intro to Data warehousing lecture 19
Intro to Data warehousing   lecture 19Intro to Data warehousing   lecture 19
Intro to Data warehousing lecture 19
AnwarrChaudary
 
Intro to Data warehousing lecture 14
Intro to Data warehousing   lecture 14Intro to Data warehousing   lecture 14
Intro to Data warehousing lecture 14
AnwarrChaudary
 
Intro to Data warehousing lecture 11
Intro to Data warehousing   lecture 11Intro to Data warehousing   lecture 11
Intro to Data warehousing lecture 11
AnwarrChaudary
 
Spark DataFrames and ML Pipelines
Spark DataFrames and ML PipelinesSpark DataFrames and ML Pipelines
Spark DataFrames and ML Pipelines
Databricks
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
Ian Foster
 
6.1-Cassandra.ppt
6.1-Cassandra.ppt6.1-Cassandra.ppt
6.1-Cassandra.ppt
DanBarcan2
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
Angelo Salatino
 

More from Chuancong Gao (8)

WI 2017 - Preference-driven Similarity Join
WI 2017 - Preference-driven Similarity JoinWI 2017 - Preference-driven Similarity Join
WI 2017 - Preference-driven Similarity Join
Chuancong Gao
 
IRI 2017 - Schemaless Join for Result Set Preferences
IRI 2017 - Schemaless Join for Result Set PreferencesIRI 2017 - Schemaless Join for Result Set Preferences
IRI 2017 - Schemaless Join for Result Set Preferences
Chuancong Gao
 
Master Thesis 2010 - Pattern Discovery Algorithms for Classification
Master Thesis 2010 - Pattern Discovery Algorithms for ClassificationMaster Thesis 2010 - Pattern Discovery Algorithms for Classification
Master Thesis 2010 - Pattern Discovery Algorithms for Classification
Chuancong Gao
 
WWW 2008 Poster - Efficient mining of frequent sequence generators
WWW 2008 Poster - Efficient mining of frequent sequence generatorsWWW 2008 Poster - Efficient mining of frequent sequence generators
WWW 2008 Poster - Efficient mining of frequent sequence generators
Chuancong Gao
 
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
Chuancong Gao
 
CIKM 2009 - Efficient itemset generator discovery over a stream sliding window
CIKM 2009 - Efficient itemset generator discovery over a stream sliding windowCIKM 2009 - Efficient itemset generator discovery over a stream sliding window
CIKM 2009 - Efficient itemset generator discovery over a stream sliding window
Chuancong Gao
 
ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding ...
ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding ...ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding ...
ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding ...
Chuancong Gao
 
KDD 2010 - Direct mining of discriminative patterns for classifying uncertain...
KDD 2010 - Direct mining of discriminative patterns for classifying uncertain...KDD 2010 - Direct mining of discriminative patterns for classifying uncertain...
KDD 2010 - Direct mining of discriminative patterns for classifying uncertain...
Chuancong Gao
 
WI 2017 - Preference-driven Similarity Join
WI 2017 - Preference-driven Similarity JoinWI 2017 - Preference-driven Similarity Join
WI 2017 - Preference-driven Similarity Join
Chuancong Gao
 
IRI 2017 - Schemaless Join for Result Set Preferences
IRI 2017 - Schemaless Join for Result Set PreferencesIRI 2017 - Schemaless Join for Result Set Preferences
IRI 2017 - Schemaless Join for Result Set Preferences
Chuancong Gao
 
Master Thesis 2010 - Pattern Discovery Algorithms for Classification
Master Thesis 2010 - Pattern Discovery Algorithms for ClassificationMaster Thesis 2010 - Pattern Discovery Algorithms for Classification
Master Thesis 2010 - Pattern Discovery Algorithms for Classification
Chuancong Gao
 
WWW 2008 Poster - Efficient mining of frequent sequence generators
WWW 2008 Poster - Efficient mining of frequent sequence generatorsWWW 2008 Poster - Efficient mining of frequent sequence generators
WWW 2008 Poster - Efficient mining of frequent sequence generators
Chuancong Gao
 
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
Chuancong Gao
 
CIKM 2009 - Efficient itemset generator discovery over a stream sliding window
CIKM 2009 - Efficient itemset generator discovery over a stream sliding windowCIKM 2009 - Efficient itemset generator discovery over a stream sliding window
CIKM 2009 - Efficient itemset generator discovery over a stream sliding window
Chuancong Gao
 
ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding ...
ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding ...ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding ...
ICDM 2011 - Efficient Mining of Closed Sequential Patterns on Stream Sliding ...
Chuancong Gao
 
KDD 2010 - Direct mining of discriminative patterns for classifying uncertain...
KDD 2010 - Direct mining of discriminative patterns for classifying uncertain...KDD 2010 - Direct mining of discriminative patterns for classifying uncertain...
KDD 2010 - Direct mining of discriminative patterns for classifying uncertain...
Chuancong Gao
 

Recently uploaded (20)

Dev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API WorkflowsDev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API Workflows
UiPathCommunity
 
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Nikki Chapple
 
Cybersecurity Fundamentals: Apprentice - Palo Alto Certificate
Cybersecurity Fundamentals: Apprentice - Palo Alto CertificateCybersecurity Fundamentals: Apprentice - Palo Alto Certificate
Cybersecurity Fundamentals: Apprentice - Palo Alto Certificate
VICTOR MAESTRE RAMIREZ
 
Offshore IT Support: Balancing In-House and Offshore Help Desk Technicians
Offshore IT Support: Balancing In-House and Offshore Help Desk TechniciansOffshore IT Support: Balancing In-House and Offshore Help Desk Technicians
Offshore IT Support: Balancing In-House and Offshore Help Desk Technicians
john823664
 
Palo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity FoundationPalo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity Foundation
VICTOR MAESTRE RAMIREZ
 
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Lorenzo Miniero
 
European Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility TestingEuropean Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility Testing
Julia Undeutsch
 
TrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy ContractingTrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy Contracting
TrustArc
 
Let’s Get Slack Certified! 🚀- Slack Community
Let’s Get Slack Certified! 🚀- Slack CommunityLet’s Get Slack Certified! 🚀- Slack Community
Let’s Get Slack Certified! 🚀- Slack Community
SanjeetMishra29
 
SDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhereSDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhere
Adtran
 
Introducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and ARIntroducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and AR
Safe Software
 
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath InsightsUiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPathCommunity
 
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyesEnd-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
ThousandEyes
 
Microsoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentationMicrosoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentation
Digitalmara
 
Co-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using ProvenanceCo-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
Fortinet Certified Associate in Cybersecurity
Fortinet Certified Associate in CybersecurityFortinet Certified Associate in Cybersecurity
Fortinet Certified Associate in Cybersecurity
VICTOR MAESTRE RAMIREZ
 
Contributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptxContributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptx
Patrick Lumumba
 
New Ways to Reduce Database Costs with ScyllaDB
New Ways to Reduce Database Costs with ScyllaDBNew Ways to Reduce Database Costs with ScyllaDB
New Ways to Reduce Database Costs with ScyllaDB
ScyllaDB
 
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptxECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
Jasper Oosterveld
 
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
James Anderson
 
Dev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API WorkflowsDev Dives: System-to-system integration with UiPath API Workflows
Dev Dives: System-to-system integration with UiPath API Workflows
UiPathCommunity
 
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Protecting Your Sensitive Data with Microsoft Purview - IRMS 2025
Nikki Chapple
 
Cybersecurity Fundamentals: Apprentice - Palo Alto Certificate
Cybersecurity Fundamentals: Apprentice - Palo Alto CertificateCybersecurity Fundamentals: Apprentice - Palo Alto Certificate
Cybersecurity Fundamentals: Apprentice - Palo Alto Certificate
VICTOR MAESTRE RAMIREZ
 
Offshore IT Support: Balancing In-House and Offshore Help Desk Technicians
Offshore IT Support: Balancing In-House and Offshore Help Desk TechniciansOffshore IT Support: Balancing In-House and Offshore Help Desk Technicians
Offshore IT Support: Balancing In-House and Offshore Help Desk Technicians
john823664
 
Palo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity FoundationPalo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity Foundation
VICTOR MAESTRE RAMIREZ
 
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025
Lorenzo Miniero
 
European Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility TestingEuropean Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility Testing
Julia Undeutsch
 
TrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy ContractingTrustArc Webinar: Mastering Privacy Contracting
TrustArc Webinar: Mastering Privacy Contracting
TrustArc
 
Let’s Get Slack Certified! 🚀- Slack Community
Let’s Get Slack Certified! 🚀- Slack CommunityLet’s Get Slack Certified! 🚀- Slack Community
Let’s Get Slack Certified! 🚀- Slack Community
SanjeetMishra29
 
SDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhereSDG 9000 Series: Unleashing multigigabit everywhere
SDG 9000 Series: Unleashing multigigabit everywhere
Adtran
 
Introducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and ARIntroducing FME Realize: A New Era of Spatial Computing and AR
Introducing FME Realize: A New Era of Spatial Computing and AR
Safe Software
 
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath InsightsUiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPath Community Berlin: Studio Tips & Tricks and UiPath Insights
UiPathCommunity
 
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyesEnd-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
ThousandEyes
 
Microsoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentationMicrosoft Build 2025 takeaways in one presentation
Microsoft Build 2025 takeaways in one presentation
Digitalmara
 
Co-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using ProvenanceCo-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
Fortinet Certified Associate in Cybersecurity
Fortinet Certified Associate in CybersecurityFortinet Certified Associate in Cybersecurity
Fortinet Certified Associate in Cybersecurity
VICTOR MAESTRE RAMIREZ
 
Contributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptxContributing to WordPress With & Without Code.pptx
Contributing to WordPress With & Without Code.pptx
Patrick Lumumba
 
New Ways to Reduce Database Costs with ScyllaDB
New Ways to Reduce Database Costs with ScyllaDBNew Ways to Reduce Database Costs with ScyllaDB
New Ways to Reduce Database Costs with ScyllaDB
ScyllaDB
 
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptxECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
ECS25 - The adventures of a Microsoft 365 Platform Owner - Website.pptx
Jasper Oosterveld
 
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
James Anderson
 

CIKM 2010 Demo - SEQUEL: query completion via pattern mining on multi-column structural data

  • 1. SEQUEL: Query Completion via Pattern Mining on Multi-Column Structural Data Chuancong Gao, Qingyan Yang, Jianyong Wang Tsinghua University, Beijing, China Structural Data Description Mined Pattern Structure Suggestion Progress STEP 1: Search the index of each column, find at least one combination (matching order) of columns matching on the input query. E.g., Query “www da” will be matched as (with the indexes in right-side): Advantages Comparing to Other Systems Pattern Index Structure – Trie Tree Example on Column Title Phrase and Venue Structural Data Formalize Mine & Index Mined Patterns Indexes for Each Column Query ... ...Preprocess ... ... Try to Match Greedily on Each Column Index Patterns for m Match Combinations Top-k Selection on Last-Matched Column for m Combinations Top-k Selection from m×k Candidates Output Offline Part Online Part ≥ ≥ ≥ ≥ ≥ ≥ ≥ ≥ ... ......... ≥ : Ranking Score Comparison : supnn - The DBLP Computer Science Bibliography (DBLP) • > 1,400,000 Publication Entries • Four Attributes for each Publication Entry: • Authors (e.g. Jiawei Han, Guozhu Dong, Yiwen Yin) • Title (e.g. Efficient Mining of Partial Periodic Patterns in Time Series Database) • Venue (e.g. ICDE) • Year (e.g. 1999) 1. Title Phrase “frequent patterns” appears 17 times in Venue “icdm” 2. Title Phrase “pattern” appears 14 times for Authors “jian pei” and “jiawei han” • Suggests Patterns mined from underlying Data instead of Query Logs • More Accurate and Meaningful • Low Amount and Quality of Query Logs on Structural Data • No need to Specify Explicitly Different Columns in Query • Suggests Phrases instead of Single Terms • Fast for both Offline Pattern Mining and Online Suggestion d a t a b e s a w e b tl a m r o f me d c i w w w m l1 2 3 ... ... ... ... 2 5 6 ... ... ... 3 4 8 10 ... 5 ... 4 ... data data icde data www data web www database icde icde www 1 2 3 4 5 6 7 8 w w w 7 8 ... www www www 9 10 50263 514 14 14 312 2666 880 4 1262 Title Phrase Index Venue Index Title Phrase Venueid supid Some Selected Patterns d a t a 9 ... Blank Node Normal Node Phrase-end Node www data 17 http://dbgroup.cs.tsinghua.edu.cn/chuancong/sequel STEP 2: Suggest on the last matched column of each matching order. Based on Frequent Sequential Pattern Mining algorithm PrefixSpan: • Treat Authors as Itemset • Treat Title as Sequence • Treat Venue & Year as Single-Item • Concatenate all the columns together as a new Sequence • Mine and Index Used Minimum Support (Frequency) Threshold: 10 Pattern Mining Algorithm • Used for fast column text matching • Every column has one corresponding Trie tree • All the indexes share a global table storing all the patterns • Close to 2GB in total in memory