SlideShare a Scribd company logo
Data Quality Services in SQL Server 2012
(An Introduction)
Stéphane Fréchette
Friday April 26, 2013
Matching
Cleansing
DQS
Who am I?
My name is Stéphane Fréchette
I’m a Database & Business Intelligence Professional and CEO | Founder of
I have a passion for architecting, designing and building solutions that matter.
Self proclaimed Open Data Hacker/Advocate I founded Gatineau Ouverte a citizen led
initiative which aims to promote open access to civic data of the city of Gatineau.
Twitter: @sfrechette
Email: stephanefrechette@ukubu.com
Blog: stephanefrechette.com
Session Outline
• Microsoft Business Intelligence (The Stack)
• Dirty Data…
• SQL Server Data Quality Services (DQS)
• Data Steward
• Knowledge Base and Domains
• Data Quality Projects
• Data Cleansing Transform – SSIS
• DQS (Install & Architecture)
• Enterprise Information Management (EMI)
• Resources
Analysis
Services
Reporting
Services
Integration
Services
Master Data
Services
SharePoint
Collaboration
Excel
Workbooks
PowerPivot
Applications
SharePoint
Dashboards & Scorecards
Data Quality
Services
OData
Feeds
Line of Business
Applications
Hadoop Big Data
Microsoft Business Intelligence
Dirty Data…
Do you have dirty data?
(all projects have it! Its inevitable)
Dirty Data…
Causes?
Bad data entry
Poor Data Governance
Duplicate entities in different LOB systems
Sample Data Representation
• Prospect in CRM System:
Mark Smith | 613.111-1234 | Ottawa | ON | K1P 1K1
• Prospect buys goods now entered in POS System:
Markus Smith | 1234 Stilton Ave | Kanata |ON | K1P 1K1
• Record also entered into Accounting System:
Markus Smith | 1234 Stilton Avenue | Kanata | ON | K1P 1K1
ETL process imports these records into the Data Warehouse / Data Mart
FirstName LastName Phone Address City Province PostalCode
Mark Smith 613.111-1234 Ottawa ON K1P 1K1
Markus Smith 1234 Stilton Ave Kanata ON K1P 1K1
Markus Smith 1234 Stilton Avenue Kanata ON K1P 1K1
Sample Data Representation
• Duplicate records and inaccurate, incomplete data
• What we want is a golden record (one version of the truth)
FirstName LastName Phone Address City Province PostalCode
Mark Smith 613.111-1234 Ottawa ON K1P 1K1
Markus Smith 1234 Stilton Ave Kanata ON K1P 1K1
Markus Smith 1234 Stilton Avenue Kanata ON K1P 1K1
FirstName LastName Phone Address City Province PostalCode
Markus Smith 613-111-1234 1234 Stilton Ave Kanata ON K1P 1K1
SQL Server Data Quality Services (DQS)
• New in SQL Server 2012
• Enables cleansing, matching, standardizing and enriching data
• Delivers trusted information for business intelligence, data warehouse, transaction
processing workloads
• Knowledge-Driven Solution (create/edit)
• A knowledge management process that builds the knowledge base
• A data quality project that proposes changes to source data based on the knowledge in the knowledge
base (cleansing and matching)
• A key component to an Enterprise Information Management (EIM) solution
Answering the Need with DQS
• DQS enables to resolve issues involving incompleteness, lack of conformity, inconsistency,
inaccuracy, invalidity, and data duplication
• Provides the following features to resolve data quality issues:
 Data Cleansing
 Matching
 Reference Data Services
 Profiling
 Monitoring
 Knowledge Base
Data Steward
• Key role - Is usually a Business User and not from the Information Technology side
• Nutshell: Responsible for maintaining data elements in a metadata registry…
• Data Steward -> DQS Client
• Create and edit Knowledge Bases
• Run and process data though continually, iteratively, improving the Knowledge Bases
• Knowledge Bases can be consumed and used by other Data Stewards and IT (SSIS / ETL Developers)
DQS
Data Steward
MDS
Data Steward
SSIS
Developer
Matching Cleansing
Knowledge Bases and Domains
The knowledge base is a repo of knowledge about your data that enables you to understand
your data and maintain its integrity.
• Processes:
• Computer-assisted
• Interactive
• Components:
• Knowledge Discovery
• Domain Management
• Reference Data Services
• Matching Policy
Demo
Knowledge Base Management
(Creating a Knowledge Base)
Data Quality Projects
Improve quality of source data by performing data cleansing and data matching activities
using defined knowledge bases
• Cleansing Activity (2 step process)
• Computer-assisted : data is categorized (suggested, new, invalid, corrected, and correct)
• Interactive: data steward to approve, reject, or modify the proposed results from the computer-assisted
cleansing process
• Matching Activity
• Using existing knowledge base matching policy
• Prevent and remove data duplication
• Data Profiling and Notifications
• Profiling provides data quality stats and info: completeness and accuracy
• Notification on actions that can be taken to enhance operations
Demo
Data Quality Project
(Cleansing and Matching)
DQS Cleansing Transform in SSIS
• When you want to automate the cleansing and matching process
and not use the DQS Client
• Use SSIS for batch data cleansing
• Matching can be done with Master Data Services (MDS)
• SSIS can be leveraged to bring DQS and MDS together
*DQS does not expose matching functionality for SSIS, but you can use Fuzzy Grouping Transform to
identify duplicate data
*Cleansing Transform is single threaded – use multiple transform for parallelism
Demo
Data Cleansing Transform
(Automating the Cleansing and Matching using SSIS)
Installing DQS
• Requires Business Intelligence or Enterprise/Developer version of SQL Server 2012
• During SQL Server setup;
• Instance Features -> Data Quality Services
• Shared Features -> Data Quality Client
• Execute the Data Quality Server Installer;
• C:Program FilesMicrosoft SQL ServerMSSQL11.MSSQLSERVERMSSQLBinnDQSInstaller.exe
• Data Quality Service – Data Quality Server Installer
(Apps - Microsoft SQL Server 2012)
DQS Architecture
DQS Server
DQS Catalog (3 databases)
• DQS_MAIN (Knowledge Bases)
• DQS_PROJECTS (Projects)
• DQS_STAGING_DATA (Sandbox, scratch pad area)
Security – Database Roles
• dqs_administrator
• dqs_kb_editor
• dqs_kb_operator
Windows Azure Marketplace
Reference Data Services -> validating, cleansing and enriching your data
Performance considerations - FYI
• Major performance improvements from RTM to CU1 release of SQL Server 2012 (strongly
recommend patching and upgrading) http://bit.ly/11eEhHC
• Must read -> DQS Performance Best Practice Guide http://bit.ly/16Gwenl
• Understand data volumes and hardware requirements… plan wisely!
Enterprise Information Management (EIM)
The EIM Stack as a whole is the ‘Master Data Management’ solution from Microsoft and
consist of the following:
• SQL Server Data Quality Services (DQS) - Capture and record knowledge, rules, and actions
• SQL Server Master Data Services (MDS) - Master Data Management repository, Dimension data
• SQL Server Integration Services (SSIS) – Moves data, integration
Enterprise Information Management (EMI)
‘Master Data Management’
Resources
• Data Quality Services Team Blog (MSDN) http://bit.ly/WCI2nO
• SQL Server Data Quality Services (TechNet) http://bit.ly/ZaUO8k
• DQS Performance Best Practices Guide http://bit.ly/16Gwenl
• Enterprise Information Management (EIM) Bringing Together SSIS, DQS, and
MDS (Video – Channel 9) http://bit.ly/NJXvKr
• Matt Masson – Getting Started with DQS and MDS http://bit.ly/149Ga9n
• Paras Doshi’s – Blog (DQS) http://bit.ly/YoLthh
What Questions Do You Have?
Thank You
For attending this session
Ad

More Related Content

What's hot (20)

Data Governance
Data GovernanceData Governance
Data Governance
Boris Otto
 
DataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukDataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de Kreuk
Erwin de Kreuk
 
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
Ahmed Alorage
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management
DATAVERSITY
 
ETL Process
ETL ProcessETL Process
ETL Process
Rohin Rangnekar
 
Chapter 3: Data Governance
Chapter 3: Data Governance Chapter 3: Data Governance
Chapter 3: Data Governance
Ahmed Alorage
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
Data Quality Management: Cleaner Data, Better Reporting
Data Quality Management: Cleaner Data, Better ReportingData Quality Management: Cleaner Data, Better Reporting
Data Quality Management: Cleaner Data, Better Reporting
accenture
 
Data analytics
Data analyticsData analytics
Data analytics
Bhanu Pratap
 
Talend Data Quality
Talend Data QualityTalend Data Quality
Talend Data Quality
Talend
 
MDM Strategy & Roadmap
MDM Strategy & RoadmapMDM Strategy & Roadmap
MDM Strategy & Roadmap
victorlbrown
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance Strategy
Analytics8
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
 
MDM and Reference Data
MDM and Reference DataMDM and Reference Data
MDM and Reference Data
Database Answers Ltd.
 
Strategic Business Requirements for Master Data Management Systems
Strategic Business Requirements for Master Data Management SystemsStrategic Business Requirements for Master Data Management Systems
Strategic Business Requirements for Master Data Management Systems
Boris Otto
 
SAP Data Services
SAP Data ServicesSAP Data Services
SAP Data Services
Geetika
 
Industrial Data Science
Industrial Data ScienceIndustrial Data Science
Industrial Data Science
Niko Vuokko
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
DATAVERSITY
 
LDM Slides: Conceptual Data Models - How to Get the Attention of Business Use...
LDM Slides: Conceptual Data Models - How to Get the Attention of Business Use...LDM Slides: Conceptual Data Models - How to Get the Attention of Business Use...
LDM Slides: Conceptual Data Models - How to Get the Attention of Business Use...
DATAVERSITY
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Data Governance
Data GovernanceData Governance
Data Governance
Boris Otto
 
DataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de KreukDataMinds 2022 Azure Purview Erwin de Kreuk
DataMinds 2022 Azure Purview Erwin de Kreuk
Erwin de Kreuk
 
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
‏‏‏‏‏‏‏‏‏‏Chapter 12: Data Quality Management
Ahmed Alorage
 
Data Governance and Metadata Management
Data Governance and Metadata ManagementData Governance and Metadata Management
Data Governance and Metadata Management
DATAVERSITY
 
Chapter 3: Data Governance
Chapter 3: Data Governance Chapter 3: Data Governance
Chapter 3: Data Governance
Ahmed Alorage
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
DATAVERSITY
 
Data Quality Management: Cleaner Data, Better Reporting
Data Quality Management: Cleaner Data, Better ReportingData Quality Management: Cleaner Data, Better Reporting
Data Quality Management: Cleaner Data, Better Reporting
accenture
 
Talend Data Quality
Talend Data QualityTalend Data Quality
Talend Data Quality
Talend
 
MDM Strategy & Roadmap
MDM Strategy & RoadmapMDM Strategy & Roadmap
MDM Strategy & Roadmap
victorlbrown
 
Building a Data Governance Strategy
Building a Data Governance StrategyBuilding a Data Governance Strategy
Building a Data Governance Strategy
Analytics8
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
DATAVERSITY
 
Strategic Business Requirements for Master Data Management Systems
Strategic Business Requirements for Master Data Management SystemsStrategic Business Requirements for Master Data Management Systems
Strategic Business Requirements for Master Data Management Systems
Boris Otto
 
SAP Data Services
SAP Data ServicesSAP Data Services
SAP Data Services
Geetika
 
Industrial Data Science
Industrial Data ScienceIndustrial Data Science
Industrial Data Science
Niko Vuokko
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
DATAVERSITY
 
LDM Slides: Conceptual Data Models - How to Get the Attention of Business Use...
LDM Slides: Conceptual Data Models - How to Get the Attention of Business Use...LDM Slides: Conceptual Data Models - How to Get the Attention of Business Use...
LDM Slides: Conceptual Data Models - How to Get the Attention of Business Use...
DATAVERSITY
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 

Viewers also liked (17)

Data quality architecture
Data quality architectureData quality architecture
Data quality architecture
anicewick
 
Mobile Loyalty that works: a successful case study by Warply and Eurobank
Mobile Loyalty that works: a successful case study by Warply and Eurobank Mobile Loyalty that works: a successful case study by Warply and Eurobank
Mobile Loyalty that works: a successful case study by Warply and Eurobank
Warply
 
Data Quality
Data QualityData Quality
Data Quality
Michael Collins
 
SQL Server 2012 Certifications
SQL Server 2012 CertificationsSQL Server 2012 Certifications
SQL Server 2012 Certifications
Marcos Freccia
 
Sql server-dba
Sql server-dbaSql server-dba
Sql server-dba
NaviSoft
 
Sql Server Interview Question
Sql Server Interview QuestionSql Server Interview Question
Sql Server Interview Question
pukal rani
 
Webinar On-Demand: The Power of Analytics to Drive Loyalty
Webinar On-Demand: The Power of Analytics to Drive LoyaltyWebinar On-Demand: The Power of Analytics to Drive Loyalty
Webinar On-Demand: The Power of Analytics to Drive Loyalty
TIBCO Loyalty Lab
 
Sql server 2008 interview questions answers
Sql server 2008 interview questions answersSql server 2008 interview questions answers
Sql server 2008 interview questions answers
Jitendra Gangwar
 
The AMB Data Warehouse: A Case Study
The AMB Data Warehouse: A Case StudyThe AMB Data Warehouse: A Case Study
The AMB Data Warehouse: A Case Study
Mark Gschwind
 
Top 5 TSQL Improvements in SQL Server 2014
Top 5 TSQL Improvements in SQL Server 2014Top 5 TSQL Improvements in SQL Server 2014
Top 5 TSQL Improvements in SQL Server 2014
Boris Hristov
 
Customer Segmentation and Predictive Modeling
Customer Segmentation and Predictive ModelingCustomer Segmentation and Predictive Modeling
Customer Segmentation and Predictive Modeling
Angie Wang
 
Sql server 2012 dba online training
Sql server 2012 dba online trainingSql server 2012 dba online training
Sql server 2012 dba online training
sqlmasters
 
New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012
Richie Rump
 
70-461 Querying Microsoft SQL Server 2012
70-461 Querying Microsoft SQL Server 201270-461 Querying Microsoft SQL Server 2012
70-461 Querying Microsoft SQL Server 2012
siphocha
 
Good sql server interview_questions
Good sql server interview_questionsGood sql server interview_questions
Good sql server interview_questions
Mahesh Gupta (DBATAG) - SQL Server Consultant
 
Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012
Stéphane Fréchette
 
Best MCSA - SQL SERVER 2012 Training Institute in Delhi
Best MCSA - SQL SERVER 2012 Training Institute in DelhiBest MCSA - SQL SERVER 2012 Training Institute in Delhi
Best MCSA - SQL SERVER 2012 Training Institute in Delhi
Information Technology
 
Data quality architecture
Data quality architectureData quality architecture
Data quality architecture
anicewick
 
Mobile Loyalty that works: a successful case study by Warply and Eurobank
Mobile Loyalty that works: a successful case study by Warply and Eurobank Mobile Loyalty that works: a successful case study by Warply and Eurobank
Mobile Loyalty that works: a successful case study by Warply and Eurobank
Warply
 
SQL Server 2012 Certifications
SQL Server 2012 CertificationsSQL Server 2012 Certifications
SQL Server 2012 Certifications
Marcos Freccia
 
Sql server-dba
Sql server-dbaSql server-dba
Sql server-dba
NaviSoft
 
Sql Server Interview Question
Sql Server Interview QuestionSql Server Interview Question
Sql Server Interview Question
pukal rani
 
Webinar On-Demand: The Power of Analytics to Drive Loyalty
Webinar On-Demand: The Power of Analytics to Drive LoyaltyWebinar On-Demand: The Power of Analytics to Drive Loyalty
Webinar On-Demand: The Power of Analytics to Drive Loyalty
TIBCO Loyalty Lab
 
Sql server 2008 interview questions answers
Sql server 2008 interview questions answersSql server 2008 interview questions answers
Sql server 2008 interview questions answers
Jitendra Gangwar
 
The AMB Data Warehouse: A Case Study
The AMB Data Warehouse: A Case StudyThe AMB Data Warehouse: A Case Study
The AMB Data Warehouse: A Case Study
Mark Gschwind
 
Top 5 TSQL Improvements in SQL Server 2014
Top 5 TSQL Improvements in SQL Server 2014Top 5 TSQL Improvements in SQL Server 2014
Top 5 TSQL Improvements in SQL Server 2014
Boris Hristov
 
Customer Segmentation and Predictive Modeling
Customer Segmentation and Predictive ModelingCustomer Segmentation and Predictive Modeling
Customer Segmentation and Predictive Modeling
Angie Wang
 
Sql server 2012 dba online training
Sql server 2012 dba online trainingSql server 2012 dba online training
Sql server 2012 dba online training
sqlmasters
 
New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012 New T-SQL Features in SQL Server 2012
New T-SQL Features in SQL Server 2012
Richie Rump
 
70-461 Querying Microsoft SQL Server 2012
70-461 Querying Microsoft SQL Server 201270-461 Querying Microsoft SQL Server 2012
70-461 Querying Microsoft SQL Server 2012
siphocha
 
Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012Introduction to Master Data Services in SQL Server 2012
Introduction to Master Data Services in SQL Server 2012
Stéphane Fréchette
 
Best MCSA - SQL SERVER 2012 Training Institute in Delhi
Best MCSA - SQL SERVER 2012 Training Institute in DelhiBest MCSA - SQL SERVER 2012 Training Institute in Delhi
Best MCSA - SQL SERVER 2012 Training Institute in Delhi
Information Technology
 
Ad

Similar to Data Quality Services in SQL Server 2012 (20)

DQS & MDS in SQL Server 2016
DQS & MDS in SQL Server 2016DQS & MDS in SQL Server 2016
DQS & MDS in SQL Server 2016
Sébastien Notebaert
 
SQL Server Integration Services – Enterprise Manageability
SQL Server Integration Services – Enterprise ManageabilitySQL Server Integration Services – Enterprise Manageability
SQL Server Integration Services – Enterprise Manageability
Dan English
 
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
Perficient, Inc.
 
Data Quality from Precisely
Data Quality from PreciselyData Quality from Precisely
Data Quality from Precisely
Precisely
 
Sravya(1)
Sravya(1)Sravya(1)
Sravya(1)
goutham426
 
Ds04 data quality
Ds04   data qualityDs04   data quality
Ds04 data quality
DotNetCampus
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
DataWorks Summit
 
SQLSaturday #188 - Enterprise Information Management
SQLSaturday #188  - Enterprise Information ManagementSQLSaturday #188  - Enterprise Information Management
SQLSaturday #188 - Enterprise Information Management
Tillmann Eitelberg
 
Krishna_IBM_Infosphere_Certified_Datastage_Consultant
Krishna_IBM_Infosphere_Certified_Datastage_Consultant Krishna_IBM_Infosphere_Certified_Datastage_Consultant
Krishna_IBM_Infosphere_Certified_Datastage_Consultant
Krishna Kishore
 
Sarfaraz cv
Sarfaraz cvSarfaraz cv
Sarfaraz cv
Sarfaraz Makandar
 
DA_01_Intro.pptx
DA_01_Intro.pptxDA_01_Intro.pptx
DA_01_Intro.pptx
Alok Mohapatra
 
[DSC DACH 24] Bridging the Technical-Business Divide with Modern Cloud Archit...
[DSC DACH 24] Bridging the Technical-Business Divide with Modern Cloud Archit...[DSC DACH 24] Bridging the Technical-Business Divide with Modern Cloud Archit...
[DSC DACH 24] Bridging the Technical-Business Divide with Modern Cloud Archit...
DataScienceConferenc1
 
SQL_DBA USA_M&T Bank
SQL_DBA USA_M&T BankSQL_DBA USA_M&T Bank
SQL_DBA USA_M&T Bank
syamprasad agiripalli
 
Exploring sql server 2016
Exploring sql server 2016Exploring sql server 2016
Exploring sql server 2016
Antonios Chatzipavlis
 
Suvajitbasu
SuvajitbasuSuvajitbasu
Suvajitbasu
Suvajit Basu, MCSE, VCP , AWS , PRINCE2 ®
 
Marlabs Capabilities Overview: Microsoft SharePoint Services
Marlabs Capabilities Overview: Microsoft SharePoint Services Marlabs Capabilities Overview: Microsoft SharePoint Services
Marlabs Capabilities Overview: Microsoft SharePoint Services
Marlabs
 
Bi Resume Ejd
Bi Resume EjdBi Resume Ejd
Bi Resume Ejd
EJDonavan
 
Padmini Parmar
Padmini ParmarPadmini Parmar
Padmini Parmar
Padmini Avaradi
 
Padmini parmar
Padmini parmarPadmini parmar
Padmini parmar
Padmini Avaradi
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
Nathan Bijnens
 
SQL Server Integration Services – Enterprise Manageability
SQL Server Integration Services – Enterprise ManageabilitySQL Server Integration Services – Enterprise Manageability
SQL Server Integration Services – Enterprise Manageability
Dan English
 
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
Hybrid Analytics in Healthcare: Leveraging Power BI and Office 365 to Make Sm...
Perficient, Inc.
 
Data Quality from Precisely
Data Quality from PreciselyData Quality from Precisely
Data Quality from Precisely
Precisely
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
DataWorks Summit
 
SQLSaturday #188 - Enterprise Information Management
SQLSaturday #188  - Enterprise Information ManagementSQLSaturday #188  - Enterprise Information Management
SQLSaturday #188 - Enterprise Information Management
Tillmann Eitelberg
 
Krishna_IBM_Infosphere_Certified_Datastage_Consultant
Krishna_IBM_Infosphere_Certified_Datastage_Consultant Krishna_IBM_Infosphere_Certified_Datastage_Consultant
Krishna_IBM_Infosphere_Certified_Datastage_Consultant
Krishna Kishore
 
[DSC DACH 24] Bridging the Technical-Business Divide with Modern Cloud Archit...
[DSC DACH 24] Bridging the Technical-Business Divide with Modern Cloud Archit...[DSC DACH 24] Bridging the Technical-Business Divide with Modern Cloud Archit...
[DSC DACH 24] Bridging the Technical-Business Divide with Modern Cloud Archit...
DataScienceConferenc1
 
Marlabs Capabilities Overview: Microsoft SharePoint Services
Marlabs Capabilities Overview: Microsoft SharePoint Services Marlabs Capabilities Overview: Microsoft SharePoint Services
Marlabs Capabilities Overview: Microsoft SharePoint Services
Marlabs
 
Bi Resume Ejd
Bi Resume EjdBi Resume Ejd
Bi Resume Ejd
EJDonavan
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
Nathan Bijnens
 
Ad

More from Stéphane Fréchette (17)

Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016
Stéphane Fréchette
 
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston  Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Stéphane Fréchette
 
Power BI - Bring your data together
Power BI - Bring your data togetherPower BI - Bring your data together
Power BI - Bring your data together
Stéphane Fréchette
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
Stéphane Fréchette
 
Self-Service Data Integration with Power Query
Self-Service Data Integration with Power QuerySelf-Service Data Integration with Power Query
Self-Service Data Integration with Power Query
Stéphane Fréchette
 
Introduction to Azure HDInsight
Introduction to Azure HDInsightIntroduction to Azure HDInsight
Introduction to Azure HDInsight
Stéphane Fréchette
 
Le journalisme de données... par où commencer?
Le journalisme de données... par où commencer?Le journalisme de données... par où commencer?
Le journalisme de données... par où commencer?
Stéphane Fréchette
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
Stéphane Fréchette
 
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Graph Databases for SQL Server Professionals - SQLSaturday #350 WinnipegGraph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Stéphane Fréchette
 
Graph Databases for SQL Server Professionals
Graph Databases for SQL Server ProfessionalsGraph Databases for SQL Server Professionals
Graph Databases for SQL Server Professionals
Stéphane Fréchette
 
SQL Server 2014 Faster Insights from Any Data
SQL Server 2014 Faster Insights from Any DataSQL Server 2014 Faster Insights from Any Data
SQL Server 2014 Faster Insights from Any Data
Stéphane Fréchette
 
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
Stéphane Fréchette
 
TEDxGatineau
TEDxGatineau TEDxGatineau
TEDxGatineau
Stéphane Fréchette
 
Power BI
Power BIPower BI
Power BI
Stéphane Fréchette
 
Business Intelligence in Excel 2013
Business Intelligence in Excel 2013Business Intelligence in Excel 2013
Business Intelligence in Excel 2013
Stéphane Fréchette
 
Gatineau Ouverte troisième rencontre publique
Gatineau Ouverte troisième rencontre publiqueGatineau Ouverte troisième rencontre publique
Gatineau Ouverte troisième rencontre publique
Stéphane Fréchette
 
Gatineau Ouverte première rencontre publique
Gatineau Ouverte première rencontre publiqueGatineau Ouverte première rencontre publique
Gatineau Ouverte première rencontre publique
Stéphane Fréchette
 
Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016Back to the future - Temporal Table in SQL Server 2016
Back to the future - Temporal Table in SQL Server 2016
Stéphane Fréchette
 
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston  Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Stéphane Fréchette
 
Power BI - Bring your data together
Power BI - Bring your data togetherPower BI - Bring your data together
Power BI - Bring your data together
Stéphane Fréchette
 
Data Analytics with R and SQL Server
Data Analytics with R and SQL ServerData Analytics with R and SQL Server
Data Analytics with R and SQL Server
Stéphane Fréchette
 
Self-Service Data Integration with Power Query
Self-Service Data Integration with Power QuerySelf-Service Data Integration with Power Query
Self-Service Data Integration with Power Query
Stéphane Fréchette
 
Le journalisme de données... par où commencer?
Le journalisme de données... par où commencer?Le journalisme de données... par où commencer?
Le journalisme de données... par où commencer?
Stéphane Fréchette
 
Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
Stéphane Fréchette
 
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Graph Databases for SQL Server Professionals - SQLSaturday #350 WinnipegGraph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Stéphane Fréchette
 
Graph Databases for SQL Server Professionals
Graph Databases for SQL Server ProfessionalsGraph Databases for SQL Server Professionals
Graph Databases for SQL Server Professionals
Stéphane Fréchette
 
SQL Server 2014 Faster Insights from Any Data
SQL Server 2014 Faster Insights from Any DataSQL Server 2014 Faster Insights from Any Data
SQL Server 2014 Faster Insights from Any Data
Stéphane Fréchette
 
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
Stéphane Fréchette
 
Business Intelligence in Excel 2013
Business Intelligence in Excel 2013Business Intelligence in Excel 2013
Business Intelligence in Excel 2013
Stéphane Fréchette
 
Gatineau Ouverte troisième rencontre publique
Gatineau Ouverte troisième rencontre publiqueGatineau Ouverte troisième rencontre publique
Gatineau Ouverte troisième rencontre publique
Stéphane Fréchette
 
Gatineau Ouverte première rencontre publique
Gatineau Ouverte première rencontre publiqueGatineau Ouverte première rencontre publique
Gatineau Ouverte première rencontre publique
Stéphane Fréchette
 

Recently uploaded (20)

Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 

Data Quality Services in SQL Server 2012

  • 1. Data Quality Services in SQL Server 2012 (An Introduction) Stéphane Fréchette Friday April 26, 2013 Matching Cleansing DQS
  • 2. Who am I? My name is Stéphane Fréchette I’m a Database & Business Intelligence Professional and CEO | Founder of I have a passion for architecting, designing and building solutions that matter. Self proclaimed Open Data Hacker/Advocate I founded Gatineau Ouverte a citizen led initiative which aims to promote open access to civic data of the city of Gatineau. Twitter: @sfrechette Email: [email protected] Blog: stephanefrechette.com
  • 3. Session Outline • Microsoft Business Intelligence (The Stack) • Dirty Data… • SQL Server Data Quality Services (DQS) • Data Steward • Knowledge Base and Domains • Data Quality Projects • Data Cleansing Transform – SSIS • DQS (Install & Architecture) • Enterprise Information Management (EMI) • Resources
  • 4. Analysis Services Reporting Services Integration Services Master Data Services SharePoint Collaboration Excel Workbooks PowerPivot Applications SharePoint Dashboards & Scorecards Data Quality Services OData Feeds Line of Business Applications Hadoop Big Data Microsoft Business Intelligence
  • 5. Dirty Data… Do you have dirty data? (all projects have it! Its inevitable)
  • 6. Dirty Data… Causes? Bad data entry Poor Data Governance Duplicate entities in different LOB systems
  • 7. Sample Data Representation • Prospect in CRM System: Mark Smith | 613.111-1234 | Ottawa | ON | K1P 1K1 • Prospect buys goods now entered in POS System: Markus Smith | 1234 Stilton Ave | Kanata |ON | K1P 1K1 • Record also entered into Accounting System: Markus Smith | 1234 Stilton Avenue | Kanata | ON | K1P 1K1 ETL process imports these records into the Data Warehouse / Data Mart FirstName LastName Phone Address City Province PostalCode Mark Smith 613.111-1234 Ottawa ON K1P 1K1 Markus Smith 1234 Stilton Ave Kanata ON K1P 1K1 Markus Smith 1234 Stilton Avenue Kanata ON K1P 1K1
  • 8. Sample Data Representation • Duplicate records and inaccurate, incomplete data • What we want is a golden record (one version of the truth) FirstName LastName Phone Address City Province PostalCode Mark Smith 613.111-1234 Ottawa ON K1P 1K1 Markus Smith 1234 Stilton Ave Kanata ON K1P 1K1 Markus Smith 1234 Stilton Avenue Kanata ON K1P 1K1 FirstName LastName Phone Address City Province PostalCode Markus Smith 613-111-1234 1234 Stilton Ave Kanata ON K1P 1K1
  • 9. SQL Server Data Quality Services (DQS) • New in SQL Server 2012 • Enables cleansing, matching, standardizing and enriching data • Delivers trusted information for business intelligence, data warehouse, transaction processing workloads • Knowledge-Driven Solution (create/edit) • A knowledge management process that builds the knowledge base • A data quality project that proposes changes to source data based on the knowledge in the knowledge base (cleansing and matching) • A key component to an Enterprise Information Management (EIM) solution
  • 10. Answering the Need with DQS • DQS enables to resolve issues involving incompleteness, lack of conformity, inconsistency, inaccuracy, invalidity, and data duplication • Provides the following features to resolve data quality issues:  Data Cleansing  Matching  Reference Data Services  Profiling  Monitoring  Knowledge Base
  • 11. Data Steward • Key role - Is usually a Business User and not from the Information Technology side • Nutshell: Responsible for maintaining data elements in a metadata registry… • Data Steward -> DQS Client • Create and edit Knowledge Bases • Run and process data though continually, iteratively, improving the Knowledge Bases • Knowledge Bases can be consumed and used by other Data Stewards and IT (SSIS / ETL Developers) DQS Data Steward MDS Data Steward SSIS Developer Matching Cleansing
  • 12. Knowledge Bases and Domains The knowledge base is a repo of knowledge about your data that enables you to understand your data and maintain its integrity. • Processes: • Computer-assisted • Interactive • Components: • Knowledge Discovery • Domain Management • Reference Data Services • Matching Policy
  • 14. Data Quality Projects Improve quality of source data by performing data cleansing and data matching activities using defined knowledge bases • Cleansing Activity (2 step process) • Computer-assisted : data is categorized (suggested, new, invalid, corrected, and correct) • Interactive: data steward to approve, reject, or modify the proposed results from the computer-assisted cleansing process • Matching Activity • Using existing knowledge base matching policy • Prevent and remove data duplication • Data Profiling and Notifications • Profiling provides data quality stats and info: completeness and accuracy • Notification on actions that can be taken to enhance operations
  • 16. DQS Cleansing Transform in SSIS • When you want to automate the cleansing and matching process and not use the DQS Client • Use SSIS for batch data cleansing • Matching can be done with Master Data Services (MDS) • SSIS can be leveraged to bring DQS and MDS together *DQS does not expose matching functionality for SSIS, but you can use Fuzzy Grouping Transform to identify duplicate data *Cleansing Transform is single threaded – use multiple transform for parallelism
  • 17. Demo Data Cleansing Transform (Automating the Cleansing and Matching using SSIS)
  • 18. Installing DQS • Requires Business Intelligence or Enterprise/Developer version of SQL Server 2012 • During SQL Server setup; • Instance Features -> Data Quality Services • Shared Features -> Data Quality Client • Execute the Data Quality Server Installer; • C:Program FilesMicrosoft SQL ServerMSSQL11.MSSQLSERVERMSSQLBinnDQSInstaller.exe • Data Quality Service – Data Quality Server Installer (Apps - Microsoft SQL Server 2012)
  • 19. DQS Architecture DQS Server DQS Catalog (3 databases) • DQS_MAIN (Knowledge Bases) • DQS_PROJECTS (Projects) • DQS_STAGING_DATA (Sandbox, scratch pad area) Security – Database Roles • dqs_administrator • dqs_kb_editor • dqs_kb_operator
  • 20. Windows Azure Marketplace Reference Data Services -> validating, cleansing and enriching your data
  • 21. Performance considerations - FYI • Major performance improvements from RTM to CU1 release of SQL Server 2012 (strongly recommend patching and upgrading) http://bit.ly/11eEhHC • Must read -> DQS Performance Best Practice Guide http://bit.ly/16Gwenl • Understand data volumes and hardware requirements… plan wisely!
  • 22. Enterprise Information Management (EIM) The EIM Stack as a whole is the ‘Master Data Management’ solution from Microsoft and consist of the following: • SQL Server Data Quality Services (DQS) - Capture and record knowledge, rules, and actions • SQL Server Master Data Services (MDS) - Master Data Management repository, Dimension data • SQL Server Integration Services (SSIS) – Moves data, integration Enterprise Information Management (EMI) ‘Master Data Management’
  • 23. Resources • Data Quality Services Team Blog (MSDN) http://bit.ly/WCI2nO • SQL Server Data Quality Services (TechNet) http://bit.ly/ZaUO8k • DQS Performance Best Practices Guide http://bit.ly/16Gwenl • Enterprise Information Management (EIM) Bringing Together SSIS, DQS, and MDS (Video – Channel 9) http://bit.ly/NJXvKr • Matt Masson – Getting Started with DQS and MDS http://bit.ly/149Ga9n • Paras Doshi’s – Blog (DQS) http://bit.ly/YoLthh
  • 24. What Questions Do You Have?
  • 25. Thank You For attending this session