SlideShare a Scribd company logo
Secrets of Enterprise
Data Mining
Mark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT)
PASS SQL Saturday #220 Atlanta GA
May 18, 2013
Networking
Interactive
About MarkTab
Training and Consulting with
http://marktab.com
Data Mining Resources and Blog at
http://marktab.net
Twitter @marktabnet
Interactive
Name (up to) three things you want from enterprise
data mining
Secret: Excel data
mining
Excel add-in for SQL Server data mining
Secret: More than just
SQL Server
Microsoft continues to add machine learning
technology
Microsoft Offers
Bing
Maps
Xbox Kinect
Hacker Magnet
SQL Server 2012
Analysis Services (Multidimensional and Data Mining)
Integration Services
Semantic Search
Hadoop Partnership
Excel Projects from Microsoft Research
Microsoft Data Lab: http://passfiles.sqlpass.org/vc/ba/PASSBAVC042513/PASSBAVC042513.pdf
Definitions
What is data mining?
Definition
Data mining is the automated or semi-automated process of
discovering patterns in data
Machine learning is the development and optimization of
algorithms for automated or semi-automated pattern discovery
Purposes
Phrase Goal
“Data Mining” Inform actionable decisions
“Machine
Learning”
Determine best performing
algorithm
Secret: Give artists art
Data mining is part of a complete decision cycle
MarkTab Decision Cycle
Analysis
(science)
Synthesis
(art)
GO
Science needs science fiction -- MarkTab
MarkTab Decision Cycle
Analysis
(science)
Synthesis
(art)
GO
XKCD: Shopping Teams
XKCD: Shopping Teams
XKCD: Shopping Teams
Secret: Microsoft is an
analytics competitor
Industry Comparisons 2012-2013
Gartner 2013
Magic Quadrant for
Business Intelligence
and Analytics
Platforms
Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb
– February 5, 2013
Gartner 2013
Magic Quadrant for
Data Warehouse
Database
Management
Systems
Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb
– January 31, 2013
KDNuggets 2012
http://marktab.net/datamining/2012/06/15/excel-number-
commercial-tool-analytics-data-mining-big-data/
SQL Server 2012
Business Intelligence and Business Analytics
New Platform options: managed services
Applications
Data
Runtime
Middleware
Database
O/S
Virtualization
Servers
Storage
Networking
Platform
(Self Managed)
Applications
Data
Runtime
Middleware
Database
O/S
Virtualization
Servers
Storage
Networking
Infrastructure
(as a Service)
Applications
Data
Runtime
Middleware
Database
O/S
Virtualization
Servers
Storage
Networking
Platform
(as a Service)
Applications
Data
Runtime
Middleware
Database
O/S
Virtualization
Servers
Storage
Networking
Software
(as a Service)
ManagedServices
ManagedServices
ManagedServices
SQL Release timelines
1996
SQL Server 6.5
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
2005
SQL Server 2005
Unicode Support
Native XML
SQLCLR
Service Broker
Integration Services
1993
SQL Server 4.21
(NT)
1995
SQL Server 6.0
1989
SQL Server 1.0
(OS/2)
2000
SQL Server 2000
Reporting Services
2010
SQL Server 2008 R2
Data-tier Apps
StreamInsight
PowerPivot
Master Data Services
2008
SQL Server 2008
Sparse Columns
Spatial Types
FILESTREAM
1998
SQL Server 7.0
Dynamic Locking
Auto-Tuning
Full-text search
Replication
Analysis Services
1991
SQL Server 1.1
(OS/2)
2012
SQL Server 2012
AlwaysOn
Columnstore
FileTable
Semantic Search
Power View
Apr 10 Jul 10 Oct 10 Jan 11 Apr 11 Jul 11 Oct 11
Aug 10
SQL Azure SU4 RTW
Database Copy
Web Admin
Feb 10
SQL Azure RTW
Feb 10
SQL Azure SU1 RTW
Alter Edition
Apr 10
SQL Azure SU2 RTW
MARS
Jun 10
SQL Azure SU3 RTW
50 GB Db
Spatial Type
HierarchyId Type
Dec 10
SQL Azure SU6 RTW
DataSync CTP2
Apr 11
SQL Azure SU V.Next
Multiple Servers
Server Mgmt API
JDBC
DAC Upgrade
Nov 10
DataMarket RTW
SQL Azure Reporting CTP1
Feb 11
SQL Azure Reporting CTP2
DataSync CTP2 Update
Jul 10
DataSync CTP1
Aug 11
New Portal Experience
Sparse Columns
SQL Azure Reporting CTP3
SQL Azure DataSync CTP3
DAC Import/Export Service
Denali TSQL
Secret: Many already
have Microsoft analytics
Business Intelligence and Business Analytics are
included with most SQL Server licenses
Data platform: SQL Server 2012
Database Services
SQL Server*
SQL Azure*
Replication
SQL Azure Data Sync*
Full Text & Semantic
Search*
Data Integration
Services
Integration Services*
Master Data Services*
Data Quality Services*
StreamInsight*
Project “Austin”*
Analytical Services
Analysis Services*
Data Mining
PowerPivot*
Reporting Services
Reporting Services*
SQL Azure Reporting*
Report Builder
Power View*
* New / improved in SQL Server 2012
SQL Server 2012 Editions
Retrieved from http://www.microsoft.com/en-us/sqlserver/editions.aspx -- February 2013
Secret: Microsoft offers
three enterprise tools
All three tools support scaled solutions
What Enterprise Tools support Microsoft
Data Mining?
Data
Mining
SSMS SSIS PowerShell
Variable 0 1 2 3 4 5 6 7
Discretized
Discretized
Continuous
Discrete
Variable 0 1 2 3 4 5 6 7
Discretized
Discretized
Continuous
Discrete
Variable 0 1 2 3 4 5 6 7
Discretized
Discretized
Continuous
Discrete
Variable 0 1 2 3 4 5 6 7
Discretized
Discretized
Continuous
Discrete
Data Mining Capacities
SQL Server 2008 R2 Analysis Services Object Maximum sizes/numbers
Maximum data mining models per structure 2^31-1 = 2,147,483,647
Maximum data mining structures per solution 2^31-1 = 2,147,483,647
Maximum data mining structures per Analysis
Services database
2^31-1 = 2,147,483,647
Maximum data mining attributes (variables) per
structure
2^31-1 = 2,147,483,647
Reference:
http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
Semantic Search
Text Mining
Future: Most data is Text
Two Research Types
• Quantitative research = data mining
• Qualitative research = text mining
The future is combining both
Full-Text Search Enhancements
Property search: search on tagged properties (such as author or title)
Customizable NEAR: find words or phrases close to one another
New Word Breakers and Stemmers (for many languages)
(iFilter Required)
Documents
Full-Text
Keyword
Index
“FTI”
iFilters
Semantic Document
Similarity Index “DSI”
Semantic
Database
Semantic
Key Phrase
Index –
Tag Index
“TI”
Languages Currently Supported
Traditional Chinese
German
English
French
Italian
Brazilian
Russian
Swedish
Simplified Chinese
British English
Portuguese
Chinese (Hong Kong SAR, PRC)
Spanish
Chinese (Singapore)
Chinese (Macau SAR)
Phases of Semantic Indexing
Full Text Keyword Index “FTI”
Semantic Key Phrase Index –
Tag Index “TI”
Semantic Document Similarity
Index “DSI”
http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
Secret: Semantic Search
scales linearly
Performance
Integrated Full Text Search (iFTS)
Improved Performance and Scale:
Scale-up to 350M documents for storage and search
iFTS query performance 7-10 times faster than in SQL Server 2008
Worst-case iFTS query response times less than 3 sec for corpus
Similar or better than main database search competitors
(2012, Michael Rys, Microsoft)
Linear Scale of FTI/TI/DSI
First known linearly scaling end-to-end Search and Semantic product in the industry
Time in Seconds vs. Number of Documents
(2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
Text Mining References
Video
http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic-
Search
http://www.microsoftpdc.com/2009/SVR32
Semantic Search (Books Online) – explains the demo
http://msdn.microsoft.com/en-us/library/gg492075.aspx
Paper
http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
Microsoft Resources
Links
Software
SQL Server 2012 Enterprise
(includes database engine, Analysis Services, SSMS and SSDT)
http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx
Microsoft Office 2012 Professional
http://office.microsoft.com/en-us/try
Organizations
Professional Association for SQL Server http://www.sqlpass.org
Atlanta MDF http://www.atlantamdf.com/
Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft-
Business-Intelligence-Users/
PASS Business Analytics Conference http://www.passbaconference.com
Microsoft TechEd North America http://northamerica.msteched.com/
Interactive
Takeaways
Conclusion: Seven Secrets
Excel data mining
More than just SQL Server
Success involves everyone
Microsoft is an analytics competitor
Many already have Microsoft analytics
Microsoft offers three enterprise tools
Semantic search scales linearly
Connect
Data Mining Resources and blog http://marktab.net
Data Mining Training and Consulting (especially Microsoft and SAS)
http://marktab.com
Abstract
If you have a SQL Server license (Standard or higher) then you already have the ability
to start data mining. In this new presentation, you will see how to scale up data
mining from the free Excel 2013 add-in to production use. Aimed at beginning to
intermediate data miners, this presentation will show how mining models move from
development to production. We will use SQL Server 2012 tools including SSMS, SSIS,
and SSDT.

More Related Content

What's hot (20)

PPTX
The CIOs Guide to NoSQL
DATAVERSITY
 
PPTX
Webinar: Live Data Visualisation with Tableau and MongoDB
MongoDB
 
PPT
Database programming in vb net
Zishan yousaf
 
PPTX
Azure: Lessons From The Field
Rob Gillen
 
PDF
MongoDB: Agile Combustion Engine
Norberto Leite
 
PDF
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
MongoDB
 
PPTX
L’architettura di Classe Enterprise di Nuova Generazione
MongoDB
 
PDF
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
ODP
Creating Flexible Data Services For Enterprise Soa With Wso2 Data Services
sumedha.r
 
PPTX
EclipseCon 2021 NoSQL Endgame
Thodoris Bais
 
PPTX
NoSQL Endgame DevoxxUA Conference 2020
Thodoris Bais
 
PPTX
Building nTier Applications with Entity Framework Services
David McCarter
 
PPTX
Microsoft SQL Server 2008
Hossein Zahed
 
PPT
1\9.SSIS 2008R2_Training - Introduction to SSIS
Pramod Singla
 
PDF
Jboss Teiid - The data you have on the place you need
Jackson dos Santos Olveira
 
PDF
JDV for Codemotion Rome 2017
Luigi Fugaro
 
PDF
Making Sense of Schema on Read
Kent Graziano
 
PDF
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
MongoDB
 
PPTX
MongoDB and Spark
Norberto Leite
 
PDF
Microsoft SQL Server 2012 Components and Tools (Quick Overview) - Rev 1.3
Naji El Kotob
 
The CIOs Guide to NoSQL
DATAVERSITY
 
Webinar: Live Data Visualisation with Tableau and MongoDB
MongoDB
 
Database programming in vb net
Zishan yousaf
 
Azure: Lessons From The Field
Rob Gillen
 
MongoDB: Agile Combustion Engine
Norberto Leite
 
Webinar: Introducing the MongoDB Connector for BI 2.0 with Tableau
MongoDB
 
L’architettura di Classe Enterprise di Nuova Generazione
MongoDB
 
MongoDB .local London 2019: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB
 
Creating Flexible Data Services For Enterprise Soa With Wso2 Data Services
sumedha.r
 
EclipseCon 2021 NoSQL Endgame
Thodoris Bais
 
NoSQL Endgame DevoxxUA Conference 2020
Thodoris Bais
 
Building nTier Applications with Entity Framework Services
David McCarter
 
Microsoft SQL Server 2008
Hossein Zahed
 
1\9.SSIS 2008R2_Training - Introduction to SSIS
Pramod Singla
 
Jboss Teiid - The data you have on the place you need
Jackson dos Santos Olveira
 
JDV for Codemotion Rome 2017
Luigi Fugaro
 
Making Sense of Schema on Read
Kent Graziano
 
MongoDB .local Munich 2019: Managing a Heterogeneous Stack with MongoDB & SQL
MongoDB
 
MongoDB and Spark
Norberto Leite
 
Microsoft SQL Server 2012 Components and Tools (Quick Overview) - Rev 1.3
Naji El Kotob
 

Viewers also liked (7)

PDF
SQL Saturday 89 Atlanta - Social marketing in 2011 for microsoft professionals
Mark Tabladillo
 
PDF
Social Marketing in 2011 for Microsoft Professionals
Mark Tabladillo
 
PDF
Applied Semantic Search 201306
Mark Tabladillo
 
PDF
Enterprise Data Mining for SQL Server Pros
Mark Tabladillo
 
PDF
Secrets of Enterprise Data Mining
Mark Tabladillo
 
PDF
An overview of Microsoft data mining technology
Mark Tabladillo
 
PDF
Application Refactoring With Design Patterns
Mark Tabladillo
 
SQL Saturday 89 Atlanta - Social marketing in 2011 for microsoft professionals
Mark Tabladillo
 
Social Marketing in 2011 for Microsoft Professionals
Mark Tabladillo
 
Applied Semantic Search 201306
Mark Tabladillo
 
Enterprise Data Mining for SQL Server Pros
Mark Tabladillo
 
Secrets of Enterprise Data Mining
Mark Tabladillo
 
An overview of Microsoft data mining technology
Mark Tabladillo
 
Application Refactoring With Design Patterns
Mark Tabladillo
 
Ad

Similar to Secrets of Enterprise Data Mining 201305 (20)

PDF
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Mark Tabladillo
 
PDF
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Mark Tabladillo
 
PDF
Developing with SQL Server Analysis Services 201310
Mark Tabladillo
 
PDF
SQL Saturday 109 -- Enterprise Data Mining with SQL Server
Mark Tabladillo
 
PDF
Applied Enterprise Semantic Search 201305
Mark Tabladillo
 
PPT
SQL Server 2008 Data Mining
llangit
 
PDF
Applied Enterprise Semantic Mining -- Charlotte 201410
Mark Tabladillo
 
PDF
SQL Saturday 108 -- Enterprise Data Mining with SQL Server
Mark Tabladillo
 
PPT
SQL Server 2008 Data Mining
llangit
 
PPT
SQL Server 2008 Data Mining
llangit
 
PDF
SQL Saturday 119 Chicago -- Enterprise Data Mining with SQL Server
Mark Tabladillo
 
PPT
Data Mining 2008
llangit
 
PPT
BI 2008 Simple
llangit
 
PPTX
Day 1 - Technical Bootcamp azure synapse analytics
Armand272
 
PPT
SQL Server 2008 for Developers
ukdpe
 
PPT
What's New for Data?
ukdpe
 
PDF
Data Mining with Excel 2010 and PowerPivot 201106
Mark Tabladillo
 
PDF
SQL Saturday 86 -- Enterprise Data Mining with SQL Server
Mark Tabladillo
 
PDF
Global Azure Bootcamp 2017 - Why I love S2D for MSSQL on Azure
Karim Vaes
 
PDF
An overview of microsoft data mining technology
Mark Tabladillo
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Mark Tabladillo
 
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Mark Tabladillo
 
Developing with SQL Server Analysis Services 201310
Mark Tabladillo
 
SQL Saturday 109 -- Enterprise Data Mining with SQL Server
Mark Tabladillo
 
Applied Enterprise Semantic Search 201305
Mark Tabladillo
 
SQL Server 2008 Data Mining
llangit
 
Applied Enterprise Semantic Mining -- Charlotte 201410
Mark Tabladillo
 
SQL Saturday 108 -- Enterprise Data Mining with SQL Server
Mark Tabladillo
 
SQL Server 2008 Data Mining
llangit
 
SQL Server 2008 Data Mining
llangit
 
SQL Saturday 119 Chicago -- Enterprise Data Mining with SQL Server
Mark Tabladillo
 
Data Mining 2008
llangit
 
BI 2008 Simple
llangit
 
Day 1 - Technical Bootcamp azure synapse analytics
Armand272
 
SQL Server 2008 for Developers
ukdpe
 
What's New for Data?
ukdpe
 
Data Mining with Excel 2010 and PowerPivot 201106
Mark Tabladillo
 
SQL Saturday 86 -- Enterprise Data Mining with SQL Server
Mark Tabladillo
 
Global Azure Bootcamp 2017 - Why I love S2D for MSSQL on Azure
Karim Vaes
 
An overview of microsoft data mining technology
Mark Tabladillo
 
Ad

More from Mark Tabladillo (20)

PDF
How to find low-cost or free data science resources 202006
Mark Tabladillo
 
PDF
Microsoft Build 2020: Data Science Recap
Mark Tabladillo
 
PDF
201909 Automated ML for Developers
Mark Tabladillo
 
PDF
201908 Overview of Automated ML
Mark Tabladillo
 
PDF
201906 01 Introduction to ML.NET 1.0
Mark Tabladillo
 
PDF
201906 04 Overview of Automated ML June 2019
Mark Tabladillo
 
PDF
201906 03 Introduction to NimbusML
Mark Tabladillo
 
PDF
201906 02 Introduction to AutoML with ML.NET 1.0
Mark Tabladillo
 
PDF
201905 Azure Databricks for Machine Learning
Mark Tabladillo
 
PDF
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
Mark Tabladillo
 
PDF
Big Data Advanced Analytics on Microsoft Azure 201904
Mark Tabladillo
 
PDF
Managing Enterprise Data Science 201904
Mark Tabladillo
 
PDF
Training of Python scikit-learn models on Azure
Mark Tabladillo
 
PDF
Big Data Adavnced Analytics on Microsoft Azure
Mark Tabladillo
 
PDF
Advanced Analytics with Power BI 201808
Mark Tabladillo
 
PDF
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Mark Tabladillo
 
PDF
Machine learning services with SQL Server 2017
Mark Tabladillo
 
PDF
Microsoft Technologies for Data Science 201612
Mark Tabladillo
 
PDF
How Big Companies plan to use Our Big Data 201610
Mark Tabladillo
 
PDF
Georgia Tech Data Science Hackathon September 2016
Mark Tabladillo
 
How to find low-cost or free data science resources 202006
Mark Tabladillo
 
Microsoft Build 2020: Data Science Recap
Mark Tabladillo
 
201909 Automated ML for Developers
Mark Tabladillo
 
201908 Overview of Automated ML
Mark Tabladillo
 
201906 01 Introduction to ML.NET 1.0
Mark Tabladillo
 
201906 04 Overview of Automated ML June 2019
Mark Tabladillo
 
201906 03 Introduction to NimbusML
Mark Tabladillo
 
201906 02 Introduction to AutoML with ML.NET 1.0
Mark Tabladillo
 
201905 Azure Databricks for Machine Learning
Mark Tabladillo
 
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
Mark Tabladillo
 
Big Data Advanced Analytics on Microsoft Azure 201904
Mark Tabladillo
 
Managing Enterprise Data Science 201904
Mark Tabladillo
 
Training of Python scikit-learn models on Azure
Mark Tabladillo
 
Big Data Adavnced Analytics on Microsoft Azure
Mark Tabladillo
 
Advanced Analytics with Power BI 201808
Mark Tabladillo
 
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Mark Tabladillo
 
Machine learning services with SQL Server 2017
Mark Tabladillo
 
Microsoft Technologies for Data Science 201612
Mark Tabladillo
 
How Big Companies plan to use Our Big Data 201610
Mark Tabladillo
 
Georgia Tech Data Science Hackathon September 2016
Mark Tabladillo
 

Recently uploaded (20)

PDF
Top 10 Corporates in India Investing in Sustainable Energy.pdf
Essar Group
 
PDF
Beyond HR: Human Experience, Business Psychology, and the Future of Work
Seta Wicaksana
 
PPTX
The Rise of Artificial Intelligence pptx
divyamarya13
 
PDF
Gregory Felber - An Accomplished Underwater Marine Biologist
Gregory Felber
 
PDF
Mentoring_Coaching_Work Readiness Gap_Conference_18 July 2025.pdf
Charles Cotter, PhD
 
PPTX
Appreciations - July 25.pptxsdsdsddddddsssss
anushavnayak
 
PDF
Alan Stalcup - Principal Of GVA Real Estate Investments
Alan Stalcup
 
PDF
ANÁLISIS DE COSTO- PAUCAR RIVERA NEISY.pdf
neisypaucarr
 
PDF
New Royals Distribution Plan Presentation
ksherwin
 
PPTX
FINAL _ DB x Forrester x Workday Webinar Buying Groups July 2025 (1).pptx
smarvin1
 
PPTX
Lecture on E Business course Topic 24-34.pptx
MuhammadUzair737846
 
PPTX
Integrative Negotiation: Expanding the Pie
badranomar1990
 
PDF
GenAI for Risk Management: Refresher for the Boards and Executives
Alexei Sidorenko, CRMP
 
PDF
The Rise of Penfolds Wine_ From Australian Vineyards to Global Fame.pdf
Enterprise world
 
PDF
NewBase 24 July 2025 Energy News issue - 1805 by Khaled Al Awadi._compressed...
Khaled Al Awadi
 
PDF
Infrastructure and geopolitics.AM.ENG.docx.pdf
Andrea Mennillo
 
PDF
A Study on Analysing the Financial Performance of AU Small Finance and Ujjiva...
AI Publications
 
PDF
Unlocking Productivity: Practical AI Skills for Professionals
LOKAL
 
PPTX
Struggling to Land a Social Media Marketing Job Here’s How to Navigate the In...
RahulSharma280537
 
DOCX
India's Emerging Global Leadership in Sustainable Energy Production The Rise ...
Insolation Energy
 
Top 10 Corporates in India Investing in Sustainable Energy.pdf
Essar Group
 
Beyond HR: Human Experience, Business Psychology, and the Future of Work
Seta Wicaksana
 
The Rise of Artificial Intelligence pptx
divyamarya13
 
Gregory Felber - An Accomplished Underwater Marine Biologist
Gregory Felber
 
Mentoring_Coaching_Work Readiness Gap_Conference_18 July 2025.pdf
Charles Cotter, PhD
 
Appreciations - July 25.pptxsdsdsddddddsssss
anushavnayak
 
Alan Stalcup - Principal Of GVA Real Estate Investments
Alan Stalcup
 
ANÁLISIS DE COSTO- PAUCAR RIVERA NEISY.pdf
neisypaucarr
 
New Royals Distribution Plan Presentation
ksherwin
 
FINAL _ DB x Forrester x Workday Webinar Buying Groups July 2025 (1).pptx
smarvin1
 
Lecture on E Business course Topic 24-34.pptx
MuhammadUzair737846
 
Integrative Negotiation: Expanding the Pie
badranomar1990
 
GenAI for Risk Management: Refresher for the Boards and Executives
Alexei Sidorenko, CRMP
 
The Rise of Penfolds Wine_ From Australian Vineyards to Global Fame.pdf
Enterprise world
 
NewBase 24 July 2025 Energy News issue - 1805 by Khaled Al Awadi._compressed...
Khaled Al Awadi
 
Infrastructure and geopolitics.AM.ENG.docx.pdf
Andrea Mennillo
 
A Study on Analysing the Financial Performance of AU Small Finance and Ujjiva...
AI Publications
 
Unlocking Productivity: Practical AI Skills for Professionals
LOKAL
 
Struggling to Land a Social Media Marketing Job Here’s How to Navigate the In...
RahulSharma280537
 
India's Emerging Global Leadership in Sustainable Energy Production The Rise ...
Insolation Energy
 

Secrets of Enterprise Data Mining 201305

  • 1. Secrets of Enterprise Data Mining Mark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT) PASS SQL Saturday #220 Atlanta GA May 18, 2013
  • 3. About MarkTab Training and Consulting with http://marktab.com Data Mining Resources and Blog at http://marktab.net Twitter @marktabnet
  • 4. Interactive Name (up to) three things you want from enterprise data mining
  • 5. Secret: Excel data mining Excel add-in for SQL Server data mining
  • 6. Secret: More than just SQL Server Microsoft continues to add machine learning technology
  • 7. Microsoft Offers Bing Maps Xbox Kinect Hacker Magnet SQL Server 2012 Analysis Services (Multidimensional and Data Mining) Integration Services Semantic Search Hadoop Partnership Excel Projects from Microsoft Research Microsoft Data Lab: http://passfiles.sqlpass.org/vc/ba/PASSBAVC042513/PASSBAVC042513.pdf
  • 9. Definition Data mining is the automated or semi-automated process of discovering patterns in data Machine learning is the development and optimization of algorithms for automated or semi-automated pattern discovery
  • 10. Purposes Phrase Goal “Data Mining” Inform actionable decisions “Machine Learning” Determine best performing algorithm
  • 11. Secret: Give artists art Data mining is part of a complete decision cycle
  • 17. Secret: Microsoft is an analytics competitor Industry Comparisons 2012-2013
  • 18. Gartner 2013 Magic Quadrant for Business Intelligence and Analytics Platforms Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb – February 5, 2013
  • 19. Gartner 2013 Magic Quadrant for Data Warehouse Database Management Systems Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb – January 31, 2013
  • 21. SQL Server 2012 Business Intelligence and Business Analytics
  • 22. New Platform options: managed services Applications Data Runtime Middleware Database O/S Virtualization Servers Storage Networking Platform (Self Managed) Applications Data Runtime Middleware Database O/S Virtualization Servers Storage Networking Infrastructure (as a Service) Applications Data Runtime Middleware Database O/S Virtualization Servers Storage Networking Platform (as a Service) Applications Data Runtime Middleware Database O/S Virtualization Servers Storage Networking Software (as a Service) ManagedServices ManagedServices ManagedServices
  • 23. SQL Release timelines 1996 SQL Server 6.5 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2005 SQL Server 2005 Unicode Support Native XML SQLCLR Service Broker Integration Services 1993 SQL Server 4.21 (NT) 1995 SQL Server 6.0 1989 SQL Server 1.0 (OS/2) 2000 SQL Server 2000 Reporting Services 2010 SQL Server 2008 R2 Data-tier Apps StreamInsight PowerPivot Master Data Services 2008 SQL Server 2008 Sparse Columns Spatial Types FILESTREAM 1998 SQL Server 7.0 Dynamic Locking Auto-Tuning Full-text search Replication Analysis Services 1991 SQL Server 1.1 (OS/2) 2012 SQL Server 2012 AlwaysOn Columnstore FileTable Semantic Search Power View Apr 10 Jul 10 Oct 10 Jan 11 Apr 11 Jul 11 Oct 11 Aug 10 SQL Azure SU4 RTW Database Copy Web Admin Feb 10 SQL Azure RTW Feb 10 SQL Azure SU1 RTW Alter Edition Apr 10 SQL Azure SU2 RTW MARS Jun 10 SQL Azure SU3 RTW 50 GB Db Spatial Type HierarchyId Type Dec 10 SQL Azure SU6 RTW DataSync CTP2 Apr 11 SQL Azure SU V.Next Multiple Servers Server Mgmt API JDBC DAC Upgrade Nov 10 DataMarket RTW SQL Azure Reporting CTP1 Feb 11 SQL Azure Reporting CTP2 DataSync CTP2 Update Jul 10 DataSync CTP1 Aug 11 New Portal Experience Sparse Columns SQL Azure Reporting CTP3 SQL Azure DataSync CTP3 DAC Import/Export Service Denali TSQL
  • 24. Secret: Many already have Microsoft analytics Business Intelligence and Business Analytics are included with most SQL Server licenses
  • 25. Data platform: SQL Server 2012 Database Services SQL Server* SQL Azure* Replication SQL Azure Data Sync* Full Text & Semantic Search* Data Integration Services Integration Services* Master Data Services* Data Quality Services* StreamInsight* Project “Austin”* Analytical Services Analysis Services* Data Mining PowerPivot* Reporting Services Reporting Services* SQL Azure Reporting* Report Builder Power View* * New / improved in SQL Server 2012
  • 26. SQL Server 2012 Editions Retrieved from http://www.microsoft.com/en-us/sqlserver/editions.aspx -- February 2013
  • 27. Secret: Microsoft offers three enterprise tools All three tools support scaled solutions
  • 28. What Enterprise Tools support Microsoft Data Mining? Data Mining SSMS SSIS PowerShell
  • 29. Variable 0 1 2 3 4 5 6 7 Discretized Discretized Continuous Discrete
  • 30. Variable 0 1 2 3 4 5 6 7 Discretized Discretized Continuous Discrete
  • 31. Variable 0 1 2 3 4 5 6 7 Discretized Discretized Continuous Discrete
  • 32. Variable 0 1 2 3 4 5 6 7 Discretized Discretized Continuous Discrete
  • 33. Data Mining Capacities SQL Server 2008 R2 Analysis Services Object Maximum sizes/numbers Maximum data mining models per structure 2^31-1 = 2,147,483,647 Maximum data mining structures per solution 2^31-1 = 2,147,483,647 Maximum data mining structures per Analysis Services database 2^31-1 = 2,147,483,647 Maximum data mining attributes (variables) per structure 2^31-1 = 2,147,483,647 Reference: http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
  • 35. Future: Most data is Text Two Research Types • Quantitative research = data mining • Qualitative research = text mining The future is combining both
  • 36. Full-Text Search Enhancements Property search: search on tagged properties (such as author or title) Customizable NEAR: find words or phrases close to one another New Word Breakers and Stemmers (for many languages)
  • 37. (iFilter Required) Documents Full-Text Keyword Index “FTI” iFilters Semantic Document Similarity Index “DSI” Semantic Database Semantic Key Phrase Index – Tag Index “TI”
  • 38. Languages Currently Supported Traditional Chinese German English French Italian Brazilian Russian Swedish Simplified Chinese British English Portuguese Chinese (Hong Kong SAR, PRC) Spanish Chinese (Singapore) Chinese (Macau SAR)
  • 39. Phases of Semantic Indexing Full Text Keyword Index “FTI” Semantic Key Phrase Index – Tag Index “TI” Semantic Document Similarity Index “DSI” http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
  • 40. Secret: Semantic Search scales linearly Performance
  • 41. Integrated Full Text Search (iFTS) Improved Performance and Scale: Scale-up to 350M documents for storage and search iFTS query performance 7-10 times faster than in SQL Server 2008 Worst-case iFTS query response times less than 3 sec for corpus Similar or better than main database search competitors (2012, Michael Rys, Microsoft)
  • 42. Linear Scale of FTI/TI/DSI First known linearly scaling end-to-end Search and Semantic product in the industry Time in Seconds vs. Number of Documents (2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
  • 43. Text Mining References Video http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search http://www.microsoftpdc.com/2009/SVR32 Semantic Search (Books Online) – explains the demo http://msdn.microsoft.com/en-us/library/gg492075.aspx Paper http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
  • 45. Software SQL Server 2012 Enterprise (includes database engine, Analysis Services, SSMS and SSDT) http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx Microsoft Office 2012 Professional http://office.microsoft.com/en-us/try
  • 46. Organizations Professional Association for SQL Server http://www.sqlpass.org Atlanta MDF http://www.atlantamdf.com/ Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft- Business-Intelligence-Users/ PASS Business Analytics Conference http://www.passbaconference.com Microsoft TechEd North America http://northamerica.msteched.com/
  • 48. Conclusion: Seven Secrets Excel data mining More than just SQL Server Success involves everyone Microsoft is an analytics competitor Many already have Microsoft analytics Microsoft offers three enterprise tools Semantic search scales linearly
  • 49. Connect Data Mining Resources and blog http://marktab.net Data Mining Training and Consulting (especially Microsoft and SAS) http://marktab.com
  • 50. Abstract If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2012 tools including SSMS, SSIS, and SSDT.