SlideShare a Scribd company logo
Applied Enterprise
Semantic Mining
Mark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT)
PASS SQL Saturday #198 Vancouver BC
February 16, 2013
Photos © 2013 Mark Tabladillo, All Rights Reserved
Photos © 2013 Mark Tabladillo, All Rights Reserved
Networking
Interactive
About MarkTab
Training and Consulting with        Ph.D. – Industrial Engineering,
http://marktab.com                  Georgia Tech
Data Mining Resources and Blog at   Training and consulting
http://marktab.net                  internationally across many
                                    industries – SAS and Microsoft
                                    Contributed to peer-reviewed
                                    research and legislation
                                      Mentoring doctoral dissertations at the
                                      accredited University of Phoenix
                                    Presenter
Quick Look
My Semantic Search
Interactive
Name three things you want from enterprise text
mining
Introduction
SQL Server 2012 has new Programmability Enhancements
  Statistical Semantic Search
  File Tables
  Full-Text Search Improvements
These combined technologies make SQL Server 2012 a strong contender in text
mining
Outline
Why Microsoft is competitive for data mining
Definitions: what is text mining?
History: how Microsoft’s semantic search was born
What is inside semantic search
 Logical model
 Demos
 Performance
Microsoft Resources
Why Microsoft is
Competitive for Data
Mining
Based on 2012 and 2013 Surveys
Gartner 2013
           Magic Quadrant for
           Business Intelligence
           and Analytics
           Platforms




  Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb
  – February 5, 2013
Gartner 2013
           Magic Quadrant for
           Data Warehouse
           Database
           Management
           Systems




  Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb
  – January 31, 2013
KDNuggets 2012
http://marktab.net/datamining/2012/06/15/excel-number-
commercial-tool-analytics-data-mining-big-data/
Definitions
What is text mining?
Definition
Data mining is the automated or semi-automated process of
discovering patterns in data
  Text mining is the automated or semi-automated process of
  discovering patterns from textual data
Machine learning is the development and optimization of
algorithms for automated or semi-automated pattern discovery
Purposes
    Phrase          Goal

    “Data Mining”   Inform actionable decisions
    “Text Mining”


    “Machine        Determine best performing
    Learning”       algorithm
MarkTab Decision Cycle
                             GO




           Synthesis                 Analysis
               (art)                (science)


         Science needs science fiction -- MarkTab
MarkTab Decision Cycle
                      GO




          Synthesis        Analysis
            (art)          (science)
History
How Microsoft’s semantic search came to be
History
July 2008
  Microsoft purchases Powerset for US$100 Million
  Google Dismisses Semantic Search
  http://venturebeat.com/2008/06/26/microsoft-to-buy-semantic-search-engine-
  powerset-for-100m-plus/
  http://www.forbes.com/2008/07/01/powerset-msft-search-tech-intel-
  cx_ag_0701powerset.html
History
March 2009
 Google announces “snippets” as relevant to search
 The media picks this story up as “semantic search”
 http://googleblog.blogspot.com/2009/03/two-new-improvements-to-google-
 results.html#!/2009/03/two-new-improvements-to-google-results.html
History
February 2012
  Google announces Knowledge Graph, an explicit application of semantic search
  http://mashable.com/2012/02/13/google-knowledge-graph-change-search/
History
April 2012
  Microsoft purchases 800+ patents from AOL for US$1 Billion
  Among the patents are semantic search and metadata querying – older than
  Google
  http://www.theregister.co.uk/2012/04/09/aol_microsoft_patent_deal/
What is inside Semantic
Search
Text Mining introduced for SQL Server 2012
Future: Most data is Text
Two Research Types
• Quantitative research = data mining
• Qualitative research = text mining
The future is combining both
Statistical Semantic Search
Comprises some aspects of text mining
Identifies statistically relevant key phrases
Based on these phrases, can identify (by score) similar documents
FileTables
Built on existing SQL Server FILESTREAM technology
Files and documents
   Stored in special tables in SQL Server
   Accessed if they were stored in the file system
Full-Text Search Enhancements
Property search: search on tagged properties (such as author or title)
Customizable NEAR: find words or phrases close to one another
New Word Breakers and Stemmers (for many languages)
Logical Model
How semantic search works
From Documents to Output
                    Office
         Varchar
                                 PDF
        NVarchar
                     Rowset
                     Output
                   with Scores
(iFilter Required)
                                  iFilters   Full-Text
       Documents                             Keyword
                                              Index
                                               “FTI”



                                              Semantic
                                             Key Phrase
                                  Semantic     Index –
         Semantic Document        Database    Tag Index
         Similarity Index “DSI”                  “TI”
Languages Currently Supported
Traditional Chinese   Simplified Chinese
German                British English
English               Portuguese
French                Chinese (Hong Kong SAR, PRC)
Italian               Spanish
Brazilian             Chinese (Singapore)
Russian               Chinese (Macau SAR)
Swedish
Phases of Semantic Indexing
      Full Text Keyword Index “FTI”

                                                 Semantic Document Similarity
                                                         Index “DSI”
      Semantic Key Phrase Index –
            Tag Index “TI”




     http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
Interactive Demo
SQL Server Management Studio
Semantic Search and
SQL Server Data Mining
SQL Server Data Tools: data mining plus text mining
Performance
The Million-Dollar Edge
Integrated Full Text Search (iFTS)
Improved Performance and Scale:
  Scale-up to 350M documents for storage and search
  iFTS query performance 7-10 times faster than in SQL Server 2008
  Worst-case iFTS query response times less than 3 sec for corpus
  Similar or better than main database search competitors
(2012, Michael Rys, Microsoft)
Linear Scale of FTI/TI/DSI
First known linearly scaling end-to-end Search and Semantic product in the industry




            Time in Seconds vs. Number of Documents
            (2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
Text Mining References
Video
  http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic-
  Search
  http://www.microsoftpdc.com/2009/SVR32
Semantic Search (Books Online) – explains the demo
  http://msdn.microsoft.com/en-us/library/gg492075.aspx
Paper
  http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
Microsoft Resources
Links
Software
SQL Server 2012 Enterprise
(includes database engine, Analysis Services, SSMS and SSDT)
 http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx
Microsoft Office 2012 Professional
 http://office.microsoft.com/en-us/try
Organizations
 Professional Association for SQL Server http://www.sqlpass.org
   Atlanta MDF http://www.atlantamdf.com/
   Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft-
   Business-Intelligence-Users/
PASS Business Analytics Conference http://www.passbaconference.com
Microsoft TechEd North America http://northamerica.msteched.com/
Interactive
Takeaways
Conclusion
SQL Server Data Mining 2012 provides data mining and semantic search
The core technology allows document similarity matching
The results can be combined with SQL Server Data Mining (such as
Association Analysis)
Connect
Data Mining Resources and blog http://marktab.net
Data Mining Training and Consulting (especially Microsoft and SAS)
http://marktab.com

More Related Content

What's hot (19)

PPTX
Elasticsearch as a search alternative to a relational database
Kristijan Duvnjak
 
PPT
Oracle by Muhammad Iqbal
YOUTH MEDIA AGENCY
 
PPTX
Spsvb Developer Intro to SharePoint Search
Michael Oryszak
 
PDF
Intro to Elasticsearch
Clifford James
 
PPT
Csci12 report aug18
karenostil
 
PPT
Lucene basics
Nitin Pande
 
PDF
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Trey Grainger
 
PDF
Apache Lucene intro - Breizhcamp 2015
Adrien Grand
 
PPTX
3. ADO.NET
Rohit Rao
 
PPTX
OData and SharePoint
Sanjay Patel
 
PPTX
OData Services
Jovan Popovic
 
PPT
Lucene BootCamp
GokulD
 
PPTX
Tagging search solution design Advanced edition
Alexander Tokarev
 
PPTX
Apache tika
NexThoughts Technologies
 
PDF
High Performance JSON Search and Relational Faceted Browsing with Lucene
lucenerevolution
 
PPT
ADO CONTROLS - Database usage
Muralidharan Radhakrishnan
 
PDF
Solr Architecture
Ramez Al-Fayez
 
PPT
Database programming in vb net
Zishan yousaf
 
PPTX
Open Data Protocol (OData)
Pistoia Alliance
 
Elasticsearch as a search alternative to a relational database
Kristijan Duvnjak
 
Oracle by Muhammad Iqbal
YOUTH MEDIA AGENCY
 
Spsvb Developer Intro to SharePoint Search
Michael Oryszak
 
Intro to Elasticsearch
Clifford James
 
Csci12 report aug18
karenostil
 
Lucene basics
Nitin Pande
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Trey Grainger
 
Apache Lucene intro - Breizhcamp 2015
Adrien Grand
 
3. ADO.NET
Rohit Rao
 
OData and SharePoint
Sanjay Patel
 
OData Services
Jovan Popovic
 
Lucene BootCamp
GokulD
 
Tagging search solution design Advanced edition
Alexander Tokarev
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
lucenerevolution
 
ADO CONTROLS - Database usage
Muralidharan Radhakrishnan
 
Solr Architecture
Ramez Al-Fayez
 
Database programming in vb net
Zishan yousaf
 
Open Data Protocol (OData)
Pistoia Alliance
 

Viewers also liked (14)

PPTX
Understanding indices
Richard Douglas
 
PDF
Secrets of Enterprise Data Mining 201310
Mark Tabladillo
 
PDF
Sql Saturday 111 Atlanta applied enterprise semantic mining
Mark Tabladillo
 
PPTX
SQL Server - Full text search
Peter Gfader
 
PPTX
FileTable and Semantic Search in SQL Server 2012
Michael Rys
 
KEY
Sql 2012 development and programming
LearnNowOnline
 
PPT
Effective Usage of SQL Server 2005 Database Mirroring
webhostingguy
 
PDF
SQL Server Performance Tuning Baseline
► Supreme Mandal ◄
 
PPT
Sql Server Performance Tuning
Bala Subra
 
PDF
SQL Server - Querying and Managing XML Data
Marek Maśko
 
PPTX
Always on in SQL Server 2012
Fadi Abdulwahab
 
PPT
File Upload
webhostingguy
 
PPTX
What's new in SQL Server 2016
James Serra
 
PPTX
Implementing Full Text in SQL Server
Microsoft TechNet - Belgium and Luxembourg
 
Understanding indices
Richard Douglas
 
Secrets of Enterprise Data Mining 201310
Mark Tabladillo
 
Sql Saturday 111 Atlanta applied enterprise semantic mining
Mark Tabladillo
 
SQL Server - Full text search
Peter Gfader
 
FileTable and Semantic Search in SQL Server 2012
Michael Rys
 
Sql 2012 development and programming
LearnNowOnline
 
Effective Usage of SQL Server 2005 Database Mirroring
webhostingguy
 
SQL Server Performance Tuning Baseline
► Supreme Mandal ◄
 
Sql Server Performance Tuning
Bala Subra
 
SQL Server - Querying and Managing XML Data
Marek Maśko
 
Always on in SQL Server 2012
Fadi Abdulwahab
 
File Upload
webhostingguy
 
What's new in SQL Server 2016
James Serra
 
Implementing Full Text in SQL Server
Microsoft TechNet - Belgium and Luxembourg
 
Ad

Similar to Applied Semantic Search with Microsoft SQL Server (20)

PDF
Applied Enterprise Semantic Search 201305
Mark Tabladillo
 
PDF
Applied Semantic Search 201306
Mark Tabladillo
 
PDF
Applied Enterprise Semantic Mining -- Charlotte 201410
Mark Tabladillo
 
PDF
Applied enterprise semantic mining
Mark Tabladillo
 
PDF
Secrets of Enterprise Data Mining 201305
Mark Tabladillo
 
PDF
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Mark Tabladillo
 
PPT
Text Mining
sathish sak
 
PDF
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Mark Tabladillo
 
PDF
Getting Started with Unstructured Data
Christine Connors
 
PPTX
Semantic Search tutorial at SemTech 2012
Peter Mika
 
PDF
SA2: Text Mining from User Generated Content
John Breslin
 
PDF
Semtech2006
Adrian Walker
 
PDF
Data Science - Part XI - Text Analytics
Derek Kane
 
PPT
Analysis of ‘Unstructured’ Data
Seth Grimes
 
PPTX
The Microsoft BigData Story
Lynn Langit
 
PDF
Semantic Search Tutorial at SemTech 2012
Thanh Tran
 
PPTX
THAT Conference 2021 - State-of-the-art Search with Azure Cognitive Search
Brian McKeiver
 
PDF
Open Source for Enterprise Search: Breaking Down the Barriers to Information
Lucidworks (Archived)
 
PDF
Use of natural language processing in semantic search.pdf
Tutors India
 
PPTX
Text Analytics Overview, 2011
Seth Grimes
 
Applied Enterprise Semantic Search 201305
Mark Tabladillo
 
Applied Semantic Search 201306
Mark Tabladillo
 
Applied Enterprise Semantic Mining -- Charlotte 201410
Mark Tabladillo
 
Applied enterprise semantic mining
Mark Tabladillo
 
Secrets of Enterprise Data Mining 201305
Mark Tabladillo
 
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Mark Tabladillo
 
Text Mining
sathish sak
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Mark Tabladillo
 
Getting Started with Unstructured Data
Christine Connors
 
Semantic Search tutorial at SemTech 2012
Peter Mika
 
SA2: Text Mining from User Generated Content
John Breslin
 
Semtech2006
Adrian Walker
 
Data Science - Part XI - Text Analytics
Derek Kane
 
Analysis of ‘Unstructured’ Data
Seth Grimes
 
The Microsoft BigData Story
Lynn Langit
 
Semantic Search Tutorial at SemTech 2012
Thanh Tran
 
THAT Conference 2021 - State-of-the-art Search with Azure Cognitive Search
Brian McKeiver
 
Open Source for Enterprise Search: Breaking Down the Barriers to Information
Lucidworks (Archived)
 
Use of natural language processing in semantic search.pdf
Tutors India
 
Text Analytics Overview, 2011
Seth Grimes
 
Ad

More from Mark Tabladillo (20)

PDF
How to find low-cost or free data science resources 202006
Mark Tabladillo
 
PDF
Microsoft Build 2020: Data Science Recap
Mark Tabladillo
 
PDF
201909 Automated ML for Developers
Mark Tabladillo
 
PDF
201908 Overview of Automated ML
Mark Tabladillo
 
PDF
201906 01 Introduction to ML.NET 1.0
Mark Tabladillo
 
PDF
201906 04 Overview of Automated ML June 2019
Mark Tabladillo
 
PDF
201906 03 Introduction to NimbusML
Mark Tabladillo
 
PDF
201906 02 Introduction to AutoML with ML.NET 1.0
Mark Tabladillo
 
PDF
201905 Azure Databricks for Machine Learning
Mark Tabladillo
 
PDF
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
Mark Tabladillo
 
PDF
Big Data Advanced Analytics on Microsoft Azure 201904
Mark Tabladillo
 
PDF
Managing Enterprise Data Science 201904
Mark Tabladillo
 
PDF
Training of Python scikit-learn models on Azure
Mark Tabladillo
 
PDF
Big Data Adavnced Analytics on Microsoft Azure
Mark Tabladillo
 
PDF
Advanced Analytics with Power BI 201808
Mark Tabladillo
 
PDF
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Mark Tabladillo
 
PDF
Machine learning services with SQL Server 2017
Mark Tabladillo
 
PDF
Microsoft Technologies for Data Science 201612
Mark Tabladillo
 
PDF
How Big Companies plan to use Our Big Data 201610
Mark Tabladillo
 
PDF
Georgia Tech Data Science Hackathon September 2016
Mark Tabladillo
 
How to find low-cost or free data science resources 202006
Mark Tabladillo
 
Microsoft Build 2020: Data Science Recap
Mark Tabladillo
 
201909 Automated ML for Developers
Mark Tabladillo
 
201908 Overview of Automated ML
Mark Tabladillo
 
201906 01 Introduction to ML.NET 1.0
Mark Tabladillo
 
201906 04 Overview of Automated ML June 2019
Mark Tabladillo
 
201906 03 Introduction to NimbusML
Mark Tabladillo
 
201906 02 Introduction to AutoML with ML.NET 1.0
Mark Tabladillo
 
201905 Azure Databricks for Machine Learning
Mark Tabladillo
 
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
Mark Tabladillo
 
Big Data Advanced Analytics on Microsoft Azure 201904
Mark Tabladillo
 
Managing Enterprise Data Science 201904
Mark Tabladillo
 
Training of Python scikit-learn models on Azure
Mark Tabladillo
 
Big Data Adavnced Analytics on Microsoft Azure
Mark Tabladillo
 
Advanced Analytics with Power BI 201808
Mark Tabladillo
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Mark Tabladillo
 
Machine learning services with SQL Server 2017
Mark Tabladillo
 
Microsoft Technologies for Data Science 201612
Mark Tabladillo
 
How Big Companies plan to use Our Big Data 201610
Mark Tabladillo
 
Georgia Tech Data Science Hackathon September 2016
Mark Tabladillo
 

Recently uploaded (20)

PPTX
Top Oil and Gas Companies in India Fuelling the Nation’s Growth.pptx
Essar Group
 
PDF
3rd Edition of Human Resources Management Awards
resources7371
 
DOCX
TCP Communication Flag Txzczczxcxzzxypes.docx
esso24
 
PDF
HOW TO RECOVER LOST CRYPTOCURRENCY - VISIT iBOLT CYBER HACKER COMPANY
diegovalentin771
 
PPTX
Oil and Gas EPC Market Size & Share | Growth - 2034
Aman Bansal
 
PPTX
Technical Analysis of 1st Generation Biofuel Feedstocks - 25th June 2025
TOFPIK
 
PDF
Top 10 Emerging Tech Trends to Watch in 2025.pdf
marketingyourtechdig
 
PDF
Step-by-Step: Buying a Verified Cash App Accounts| PDF | Payments Service
https://pvabulkpro.com/
 
PDF
BeMetals_Presentation_July_2025 .pdf
DerekIwanaka2
 
PDF
Buy Facebook Accounts Buy Facebook Accounts
darlaknowles49
 
PDF
"Complete Guide to the Partner Visa 2025
Zealand Immigration
 
DOCX
How to Build Digital Income From Scratch Without Tech Skills or Experience
legendarybook73
 
PPTX
Micro Battery Market Size & Share | Growth - 2034
Aman Bansal
 
PDF
Top Trends Redefining B2B Apparel Exporting in 2025
ananyaa2255
 
PPTX
Hackathon - Technology - Idea Submission Template -HackerEarth.pptx
nanster236
 
PPTX
Phygital & Omnichannel Retail: Navigating the Future of Seamless Shopping
RUPAL AGARWAL
 
PPTX
World First Cardiovascular & Thoracic CT Scanner
arineta37
 
PDF
Cloud Budgeting for Startups: Principles, Strategies, and Tools That Scale
Amnic
 
PDF
Gabino Barbosa - A Master Of Efficiency
Gabino Barbosa
 
PDF
The Complete Guide to SME IPO in 2025.pdf
India IPO
 
Top Oil and Gas Companies in India Fuelling the Nation’s Growth.pptx
Essar Group
 
3rd Edition of Human Resources Management Awards
resources7371
 
TCP Communication Flag Txzczczxcxzzxypes.docx
esso24
 
HOW TO RECOVER LOST CRYPTOCURRENCY - VISIT iBOLT CYBER HACKER COMPANY
diegovalentin771
 
Oil and Gas EPC Market Size & Share | Growth - 2034
Aman Bansal
 
Technical Analysis of 1st Generation Biofuel Feedstocks - 25th June 2025
TOFPIK
 
Top 10 Emerging Tech Trends to Watch in 2025.pdf
marketingyourtechdig
 
Step-by-Step: Buying a Verified Cash App Accounts| PDF | Payments Service
https://pvabulkpro.com/
 
BeMetals_Presentation_July_2025 .pdf
DerekIwanaka2
 
Buy Facebook Accounts Buy Facebook Accounts
darlaknowles49
 
"Complete Guide to the Partner Visa 2025
Zealand Immigration
 
How to Build Digital Income From Scratch Without Tech Skills or Experience
legendarybook73
 
Micro Battery Market Size & Share | Growth - 2034
Aman Bansal
 
Top Trends Redefining B2B Apparel Exporting in 2025
ananyaa2255
 
Hackathon - Technology - Idea Submission Template -HackerEarth.pptx
nanster236
 
Phygital & Omnichannel Retail: Navigating the Future of Seamless Shopping
RUPAL AGARWAL
 
World First Cardiovascular & Thoracic CT Scanner
arineta37
 
Cloud Budgeting for Startups: Principles, Strategies, and Tools That Scale
Amnic
 
Gabino Barbosa - A Master Of Efficiency
Gabino Barbosa
 
The Complete Guide to SME IPO in 2025.pdf
India IPO
 

Applied Semantic Search with Microsoft SQL Server

  • 1. Applied Enterprise Semantic Mining Mark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT) PASS SQL Saturday #198 Vancouver BC February 16, 2013
  • 2. Photos © 2013 Mark Tabladillo, All Rights Reserved
  • 3. Photos © 2013 Mark Tabladillo, All Rights Reserved
  • 5. About MarkTab Training and Consulting with Ph.D. – Industrial Engineering, http://marktab.com Georgia Tech Data Mining Resources and Blog at Training and consulting http://marktab.net internationally across many industries – SAS and Microsoft Contributed to peer-reviewed research and legislation Mentoring doctoral dissertations at the accredited University of Phoenix Presenter
  • 7. Interactive Name three things you want from enterprise text mining
  • 8. Introduction SQL Server 2012 has new Programmability Enhancements Statistical Semantic Search File Tables Full-Text Search Improvements These combined technologies make SQL Server 2012 a strong contender in text mining
  • 9. Outline Why Microsoft is competitive for data mining Definitions: what is text mining? History: how Microsoft’s semantic search was born What is inside semantic search Logical model Demos Performance Microsoft Resources
  • 10. Why Microsoft is Competitive for Data Mining Based on 2012 and 2013 Surveys
  • 11. Gartner 2013 Magic Quadrant for Business Intelligence and Analytics Platforms Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb – February 5, 2013
  • 12. Gartner 2013 Magic Quadrant for Data Warehouse Database Management Systems Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb – January 31, 2013
  • 15. Definition Data mining is the automated or semi-automated process of discovering patterns in data Text mining is the automated or semi-automated process of discovering patterns from textual data Machine learning is the development and optimization of algorithms for automated or semi-automated pattern discovery
  • 16. Purposes Phrase Goal “Data Mining” Inform actionable decisions “Text Mining” “Machine Determine best performing Learning” algorithm
  • 17. MarkTab Decision Cycle GO Synthesis Analysis (art) (science) Science needs science fiction -- MarkTab
  • 18. MarkTab Decision Cycle GO Synthesis Analysis (art) (science)
  • 20. History July 2008 Microsoft purchases Powerset for US$100 Million Google Dismisses Semantic Search http://venturebeat.com/2008/06/26/microsoft-to-buy-semantic-search-engine- powerset-for-100m-plus/ http://www.forbes.com/2008/07/01/powerset-msft-search-tech-intel- cx_ag_0701powerset.html
  • 21. History March 2009 Google announces “snippets” as relevant to search The media picks this story up as “semantic search” http://googleblog.blogspot.com/2009/03/two-new-improvements-to-google- results.html#!/2009/03/two-new-improvements-to-google-results.html
  • 22. History February 2012 Google announces Knowledge Graph, an explicit application of semantic search http://mashable.com/2012/02/13/google-knowledge-graph-change-search/
  • 23. History April 2012 Microsoft purchases 800+ patents from AOL for US$1 Billion Among the patents are semantic search and metadata querying – older than Google http://www.theregister.co.uk/2012/04/09/aol_microsoft_patent_deal/
  • 24. What is inside Semantic Search Text Mining introduced for SQL Server 2012
  • 25. Future: Most data is Text Two Research Types • Quantitative research = data mining • Qualitative research = text mining The future is combining both
  • 26. Statistical Semantic Search Comprises some aspects of text mining Identifies statistically relevant key phrases Based on these phrases, can identify (by score) similar documents
  • 27. FileTables Built on existing SQL Server FILESTREAM technology Files and documents Stored in special tables in SQL Server Accessed if they were stored in the file system
  • 28. Full-Text Search Enhancements Property search: search on tagged properties (such as author or title) Customizable NEAR: find words or phrases close to one another New Word Breakers and Stemmers (for many languages)
  • 30. From Documents to Output Office Varchar PDF NVarchar Rowset Output with Scores
  • 31. (iFilter Required) iFilters Full-Text Documents Keyword Index “FTI” Semantic Key Phrase Semantic Index – Semantic Document Database Tag Index Similarity Index “DSI” “TI”
  • 32. Languages Currently Supported Traditional Chinese Simplified Chinese German British English English Portuguese French Chinese (Hong Kong SAR, PRC) Italian Spanish Brazilian Chinese (Singapore) Russian Chinese (Macau SAR) Swedish
  • 33. Phases of Semantic Indexing Full Text Keyword Index “FTI” Semantic Document Similarity Index “DSI” Semantic Key Phrase Index – Tag Index “TI” http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
  • 34. Interactive Demo SQL Server Management Studio
  • 35. Semantic Search and SQL Server Data Mining SQL Server Data Tools: data mining plus text mining
  • 37. Integrated Full Text Search (iFTS) Improved Performance and Scale: Scale-up to 350M documents for storage and search iFTS query performance 7-10 times faster than in SQL Server 2008 Worst-case iFTS query response times less than 3 sec for corpus Similar or better than main database search competitors (2012, Michael Rys, Microsoft)
  • 38. Linear Scale of FTI/TI/DSI First known linearly scaling end-to-end Search and Semantic product in the industry Time in Seconds vs. Number of Documents (2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
  • 39. Text Mining References Video http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search http://www.microsoftpdc.com/2009/SVR32 Semantic Search (Books Online) – explains the demo http://msdn.microsoft.com/en-us/library/gg492075.aspx Paper http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
  • 41. Software SQL Server 2012 Enterprise (includes database engine, Analysis Services, SSMS and SSDT) http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx Microsoft Office 2012 Professional http://office.microsoft.com/en-us/try
  • 42. Organizations Professional Association for SQL Server http://www.sqlpass.org Atlanta MDF http://www.atlantamdf.com/ Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft- Business-Intelligence-Users/ PASS Business Analytics Conference http://www.passbaconference.com Microsoft TechEd North America http://northamerica.msteched.com/
  • 44. Conclusion SQL Server Data Mining 2012 provides data mining and semantic search The core technology allows document similarity matching The results can be combined with SQL Server Data Mining (such as Association Analysis)
  • 45. Connect Data Mining Resources and blog http://marktab.net Data Mining Training and Consulting (especially Microsoft and SAS) http://marktab.com