Data Profiling Screen

perfect defination n explaination

Uploaded by

zipzapdhoom

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views

Data Profiling Screen

perfect defination n explaination

Uploaded by

zipzapdhoom

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Trillium Software

Solution Guide

Data Proling Basics

What is Data Proling?
Data proling is a process for analyzing large data sets. Standard data proling automatically compiles statistics and other summary information about the data records. It includes analysis by eld for minimum and maximum values and other basic statistics, frequency counts for elds, data type and patterns/formats, and conformity to expected values. Other advanced proling techniques also perform analysis about the relationships between elds, such as dependencies between elds in a single set and between elds in separate data sets. Key Considerations for Selecting a Data Proling Tool 1. Who is proling: business users, IT, or both 2. Common enviroment to communicate, review, and interpret results 3. Complexity of analysis, number of sources 4. Security of data 5. Ongoing support and monitoring

Why Do People Prole?

People may want to prole for several reasons, including: Assessing risksCan data support the new initiative? Planning projectsWhat are realistic time lines and what data, systems, and resources will the project involve? Scoping projectsWhich data and systems will be included based on priority, quality, and level of effort required? Assessing data qualityHow accurate, consistent, and complete is the data within a single system? Designing new systemsWhat should the target structures look like? What mappings or transformations need to occur? Checking/monitoring dataDoes the data continue to meet business requirements after systems have gone live and changes and additions occur?

Who Should Be Proling the Data?

Data proling is primarily considered part of IT projects, but the most successful efforts involve a blend of IT resources and business users of the data. IT, business users, and data stewards each contribute valuable insights critical to the process: IT system owners, developers, and project managers analyze and understand issues of data structure: how complete is the data, how consistent are the formats, are key elds unique, is referential integrity enforced? Business users and subject matter experts understand the data content: what the data means, how it is applied in existing business processes, what data is required for new processes, what data is inaccurate or out of context? Data stewards understand corporate standards and enterprise data requirements as a whole. They can contribute to both the requirements for specic projects and the corporation.

of records, many elds, multiple sources, and questionable documentation and metadata. Sophisticated data proling technology was built to handle complex problems, especially for high-prole and mission-critical projects.

How Do Data Proling Tools Differ?

Data proling tools vary both in the architecture they use to analyze data and in the working environment they provide for the data proling team. Architecture option: Query-based proling Some proling technologies involve crafting SQL queries that are run against source systems or against a snapshot copy of the source data. While this generates some good information about the data, it has several limitations: Performance risks: Queries strain live systems, slowing down operations, sometimes signicantly. When additional information is required, or if users want to see the actual data, a second query executes, creating even more strain on the system. Organizations reduce this risk by making a copy of the data, but this requires replicating the entire environmentboth hardware and software systemswhich can be costly and time-consuming. Traceability risks: Data in production systems changes constantly. The statistics and metadata captured from querybased proling risk being out of date immediately. Completeness risks: It is difcult to gain comprehensive insights using query-based analysis. Queries are based on assumptions, and the purpose is to confirm and quantify expectations about what is wrong and right in the data. Given this, it is easy to overlook problems that you are not already aware of. Proling by query is valuable when you want to monitor production data for certain conditions. But it is not the best way to analyze large volumes of data in preparation for large-scale data integrations and migrations.

How Do People Prole Data?

The techniques for proling are either manual or automated via a proling tool: Manual techniques involve people sifting through the data to assess its condition, query by query. Manual proling is appropriate for small data sets from a single source, with fewer than 50 elds, where the data is relatively simple. Automated techniques use software tools to collect summary statistics and analyses. These tools are the most appropriate for projects with hundreds of thousands

Trillium Software Solution Guide: Data Proling Basics

Architecture option: Data proling repository Other proling technologies prole data as part of a scheduled process and store results in a proling repository. Stored results can include content such as summary statistics, metadata, patterns, keys, relationships, and data values. Results can then be further analyzed by users or stored for later trending analysis. Proling repositories that allow users to drill down on information and see original data values in the context of source records provide the most versatility and stability for non-technical audiences. Independence from operational source systems coupled with the vast amount of metadata and information derived from a point in time prole provide a cross-functional team of business and IT resources a common, comprehensive view of source system data from which traceable decisions can be based. Volume considerations: Should tables or les enter into the range of hundreds of millions of records, a proling repository strategy should be considered. With volumes this large, the best strategy may be a blend between weekend-scheduled proling processes and focused, non-contentious query-based proling, closely monitored by IT. Work environment: Multi-user workspace Some proling tools are designed as desktop solutions for resources to use as a team of one. How many resources will be involved in your data proling efforts? For large projects, there is generally a cross-functional team involved. Consider the environment a proling tool provides since multiple users with different skills, different expertise, and varying level of technical skills all need to be able to access and clearly see the condition of the data. Even if some prospective data prolers are skilled in SQL and database technologies, proling tools that foster collaboration between business users and IT offer greater value overall. With a common window on the data sets, people with diverse backgrounds can concretely and productively discuss the data, its current state, and what is required to move forward. Work environment: Graphical Interface Because users may not be familiar with database structures and technologies, it is important to nd a tool that provides an intuitive, easy-to-learn graphical user interface (GUI). Appropriate security features should also be a part of the work environment, to ensure that access to restricted elds or records can be allowed or denied, for sensitive information.

What Follows Data Proling?

Once the task of data proling is complete, there is more to do. Keep in mind both the short- and long-term goals driving the need to prole your data. Leverage your investments by understanding what follows and see if there are logical extensions to your proling efforts that can be executed within the same tool. ETL projects for data integration or migration use proling results to design target systems, dene how to accurately integrate multiple data sets, and efciently move data to a new system, taking all data conditions into consideration. Data quality processes that improve the accuracy, consistency, and completeness of data use results to identify problems or anomalies and then develop rules for automated cleansing and standardization. Data monitoring initiatives use proling results to establish automated processes for ongoing assessment of key data elements and acute data conditions in production systems. The proling repository captures results, sends alerts, and centrally manages data standards.

Trillium Software Solution Guide: Data Proling Basics TS Discovery for Data Proling
TS Discovery is unlike many other data proling tools on the market. It performs as a best-of-breed data proling tool, and also has several key differentiators that advance its overall value: User Interface the interface is designed specically for a business user. It is intuitive, easy to use, and allows for immediate drill down for further analysis, without hitting production systems. Collaborative Environment team members can log into a common repository, view the same data, and contribute to prioritizing and determining appropriate actions to take for addressing anomalies, improvements, integration rules, and monitoring thresholds. Proling Repositorythe repository stores metadata created by reverse-engineering the data. This metadata can be summarized, synthesized, drilled down into, and used to recreate original source record replicas. Business rules and data standards can be developed within the repository to run against the metadata or deployed to run systematically against production systems, complete with alert notications. Robust proling functionsin addition to basic proling, TS Discovery provides advanced proling options such as: pattern analysis, soundex routines, metaphones, natural key analysis, join analysis, dependency analysis, comparisons against dened data standards, and regulation against established business rules. Improve Data Immediatelydata can be cleansed and standardized directly using TS Discovery. Name and address cleansing, address validation, and recoding processes can be run using TS Discovery. Cleansed data is placed in new elds, never overwriting source data. The cleansed les can be used immediately in other systems and business processes. Helpful Modeling Functionsdata architects and modelers rely on results from key integrity, natural key, join, and dependency analyses. Physical data models can be produced through the effects of reverse engineering the data, to validate models and identify problem areas. Venn diagrams can be used to identify outlier records and orphans. Monitoring capabilitiesmonitor data sources for specic events and/or errors. Notify users when data values do not meet predened requirements, such as unacceptable data values or incomplete elds.These powerful features give users the environment necessary to understand the true nature of their current data landscape and how data relates across systems.

TS Discovery: Investing in the Future

A data proling solution cannot exist in a vacuum because it is also a part of a larger process. While data proling is a way to understand the condition of data, TS Discovery provides bridges to larger initiatives for data integrations, data quality, and data monitoring. Trillium Software is committed to expanding those bridges. It continues to innovate and provide new ways to track, control, and access data across the enterprise. It also continues to integrate TS Quality functionality into TS Discovery to establish and promote high quality data in complex and dynamic business environments.

Harte-Hanks Trillium Software

www.trilliumsoftware.com Corporate Headquarters + 1 (978) 436-8900 [email protected] European Headquarters +44 (0)118 940 7600 [email protected]

8018 Operation Manual Rev CB English
40% (5)
8018 Operation Manual Rev CB English
12 pages
Computer Basics
75% (4)
Computer Basics
206 pages
Capacity Training PDF
100% (2)
Capacity Training PDF
103 pages
Informatica Error Classification
No ratings yet
Informatica Error Classification
19 pages
Autosys Look Back
No ratings yet
Autosys Look Back
4 pages
2.SOP Vehicle Management
100% (2)
2.SOP Vehicle Management
16 pages
Data Lineage
No ratings yet
Data Lineage
14 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
6 pages
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
No ratings yet
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
4 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
124 pages
Ccs 334
No ratings yet
Ccs 334
16 pages
DATA ANALYTICS note
No ratings yet
DATA ANALYTICS note
52 pages
TP 4 2docuatrimestre
No ratings yet
TP 4 2docuatrimestre
10 pages
Module 1
No ratings yet
Module 1
35 pages
BDA Assignment 1: Big Data Features and Characteristics
No ratings yet
BDA Assignment 1: Big Data Features and Characteristics
14 pages
Unit 1
No ratings yet
Unit 1
19 pages
FK&A Hub and Spoke Architecture vs. Point To Point: Page - 1
No ratings yet
FK&A Hub and Spoke Architecture vs. Point To Point: Page - 1
4 pages
Dashboard Autonomic Messaging Systems
No ratings yet
Dashboard Autonomic Messaging Systems
8 pages
File 1
No ratings yet
File 1
3 pages
Module 3 Free Elective
No ratings yet
Module 3 Free Elective
19 pages
Bda CHP1
No ratings yet
Bda CHP1
83 pages
What is Big Data
No ratings yet
What is Big Data
4 pages
DA NOTES-1
No ratings yet
DA NOTES-1
21 pages
Big Data Ashish
No ratings yet
Big Data Ashish
7 pages
Data Infrastructure
No ratings yet
Data Infrastructure
7 pages
Data Analytics
No ratings yet
Data Analytics
5 pages
Big Data and Data Science
No ratings yet
Big Data and Data Science
6 pages
Medical
No ratings yet
Medical
3 pages
M
No ratings yet
M
13 pages
Thesis Data Warehouse
100% (3)
Thesis Data Warehouse
8 pages
Module_1_Session_3 Analytic Processes and Tools _ Analysis vs Reporting _ Modern Data Analytic Tools
No ratings yet
Module_1_Session_3 Analytic Processes and Tools _ Analysis vs Reporting _ Modern Data Analytic Tools
5 pages
BDA-24_Lect (3-4)-(Fundamentals of Data Analysis)
No ratings yet
BDA-24_Lect (3-4)-(Fundamentals of Data Analysis)
15 pages
Quality 4.0 Technical Overview - Things you should know when talking with IT
No ratings yet
Quality 4.0 Technical Overview - Things you should know when talking with IT
8 pages
DE Unit I
No ratings yet
DE Unit I
12 pages
Business Data Mining Week 2
No ratings yet
Business Data Mining Week 2
6 pages
Big Data Module 1
No ratings yet
Big Data Module 1
14 pages
Data Engineering Workbook
No ratings yet
Data Engineering Workbook
30 pages
BIG DATA ANALYTICS
No ratings yet
BIG DATA ANALYTICS
10 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
24 pages
DA Unit 1
No ratings yet
DA Unit 1
24 pages
BI Bro Notes Full
No ratings yet
BI Bro Notes Full
11 pages
Unit 1 Data Science and Big Data
No ratings yet
Unit 1 Data Science and Big Data
23 pages
Unit 3 Data Analytics
No ratings yet
Unit 3 Data Analytics
16 pages
2021 Data Platform Trends Report Monte Carlo
No ratings yet
2021 Data Platform Trends Report Monte Carlo
12 pages
Unit-1 DM
No ratings yet
Unit-1 DM
10 pages
Introduction to Big Data
No ratings yet
Introduction to Big Data
4 pages
Do y know what Data Engineers actually do
No ratings yet
Do y know what Data Engineers actually do
10 pages
TrilliumSoftwareDataQualityEssentials PDF
No ratings yet
TrilliumSoftwareDataQualityEssentials PDF
22 pages
Data Analysis _Unit1
No ratings yet
Data Analysis _Unit1
65 pages
2403RES29 - Hemant Choudhary - CS546 - Assignment - 1
No ratings yet
2403RES29 - Hemant Choudhary - CS546 - Assignment - 1
14 pages
MODULE 1 - ST
No ratings yet
MODULE 1 - ST
13 pages
Chapter-1-2, EMC DSA Notes
No ratings yet
Chapter-1-2, EMC DSA Notes
8 pages
The Ultimate Data Observability Checklist Guide
No ratings yet
The Ultimate Data Observability Checklist Guide
8 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
9 pages
Unit 1 1
No ratings yet
Unit 1 1
99 pages
Big Data Analytics
No ratings yet
Big Data Analytics
73 pages
Introduction-to-Data-Science
No ratings yet
Introduction-to-Data-Science
19 pages
Big Data Analytics
No ratings yet
Big Data Analytics
83 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
Data Science QB
No ratings yet
Data Science QB
42 pages
TIS Chapter 3
No ratings yet
TIS Chapter 3
36 pages
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
No ratings yet
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
35 pages
DWDV notes
No ratings yet
DWDV notes
111 pages
P5 - Chapter 5 THE IMPACT OF INFORMATION TECHNOLOGY
No ratings yet
P5 - Chapter 5 THE IMPACT OF INFORMATION TECHNOLOGY
3 pages
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
Best Data Profiling Rules Common
No ratings yet
Best Data Profiling Rules Common
4 pages
QAS Web Service WSLD Guide
No ratings yet
QAS Web Service WSLD Guide
144 pages
Class 5 English Book
No ratings yet
Class 5 English Book
8 pages
Informatica CDC
No ratings yet
Informatica CDC
4 pages
Address Doctor With IDQ
No ratings yet
Address Doctor With IDQ
7 pages
Admin Dump 21092010
No ratings yet
Admin Dump 21092010
59 pages
1031
No ratings yet
1031
179 pages
Electricity
No ratings yet
Electricity
24 pages
Data Profiling White Paper1003-Final
No ratings yet
Data Profiling White Paper1003-Final
17 pages
HR Mumbai Sample
No ratings yet
HR Mumbai Sample
15 pages
Pattern Action Reference: Ibm Infosphere Qualitystage
No ratings yet
Pattern Action Reference: Ibm Infosphere Qualitystage
74 pages
National List of Essential Medicine - Final
No ratings yet
National List of Essential Medicine - Final
126 pages
Know The Facts About Drugs
No ratings yet
Know The Facts About Drugs
20 pages
Ibm Infosphere Datastage V8.0.1 Training/Workshop: Course Description
No ratings yet
Ibm Infosphere Datastage V8.0.1 Training/Workshop: Course Description
2 pages
Framework Class Library or FCL), Enforces Security and Provides Many
No ratings yet
Framework Class Library or FCL), Enforces Security and Provides Many
5 pages
Datastage
No ratings yet
Datastage
52 pages
DataStage Functions
No ratings yet
DataStage Functions
10 pages
Example of SCD1 and Update Stratgey.
100% (1)
Example of SCD1 and Update Stratgey.
35 pages
MDM DB Migration Steps
No ratings yet
MDM DB Migration Steps
2 pages
Coding in Asp
100% (1)
Coding in Asp
25 pages
Loop or Cycle Workflow Informatica
No ratings yet
Loop or Cycle Workflow Informatica
8 pages
Informatica Senarios
No ratings yet
Informatica Senarios
6 pages
Centrifugal Compressor Control Xe 145f Series 2012 en
No ratings yet
Centrifugal Compressor Control Xe 145f Series 2012 en
8 pages
Lec 7
No ratings yet
Lec 7
20 pages
Trolling For The Lulz? Using Media Theory To Understand Transgressive Humour and Other Internet Trolling in Online Communities
No ratings yet
Trolling For The Lulz? Using Media Theory To Understand Transgressive Humour and Other Internet Trolling in Online Communities
18 pages
Project File Question 2022-23
No ratings yet
Project File Question 2022-23
3 pages
Ateliers Francois AF Flyer L Range 2015-06-30
No ratings yet
Ateliers Francois AF Flyer L Range 2015-06-30
2 pages
Tape EQ
No ratings yet
Tape EQ
4 pages
cs mcq
No ratings yet
cs mcq
54 pages
Az 103 PDF
No ratings yet
Az 103 PDF
440 pages
The Book of Nature
No ratings yet
The Book of Nature
2 pages
System Unit: Property of STI
No ratings yet
System Unit: Property of STI
2 pages
B Ata 34
No ratings yet
B Ata 34
782 pages
LP 713 TMP 28 Seater 4X2 Bus
No ratings yet
LP 713 TMP 28 Seater 4X2 Bus
1 page
Bsc6900 Umts Commissioning Guide (v900r019c10 01) (PDF) - en
No ratings yet
Bsc6900 Umts Commissioning Guide (v900r019c10 01) (PDF) - en
123 pages
20-Apr-2022 Parcel 1 Acs Submittal Rev02
No ratings yet
20-Apr-2022 Parcel 1 Acs Submittal Rev02
64 pages
Assignment 76490
No ratings yet
Assignment 76490
6 pages
SMK Negeri 2 Banjarmasin: Inventaris Barang Laboratorium Multimedia Tahun Ajaran 2020 / 2021
No ratings yet
SMK Negeri 2 Banjarmasin: Inventaris Barang Laboratorium Multimedia Tahun Ajaran 2020 / 2021
2 pages
AUTOSAR SRS DIODriver
No ratings yet
AUTOSAR SRS DIODriver
15 pages
Catalogo Tecnico Gb-S v07
No ratings yet
Catalogo Tecnico Gb-S v07
29 pages
System Title SW Edition Order Number Filename Deenfr It Escsctkoptja Plrusvczhudafi Nlrosksl TR TH Sinumerik 802D SL Languages
No ratings yet
System Title SW Edition Order Number Filename Deenfr It Escsctkoptja Plrusvczhudafi Nlrosksl TR TH Sinumerik 802D SL Languages
2 pages
Digital Control For Steam Turbines: Product Specification
No ratings yet
Digital Control For Steam Turbines: Product Specification
4 pages
Introduction To Excels Basic Features
No ratings yet
Introduction To Excels Basic Features
38 pages
Fe20a203 WSN Assigment
No ratings yet
Fe20a203 WSN Assigment
15 pages
X 704 Tech Specs
No ratings yet
X 704 Tech Specs
2 pages
Mobilink Pakistan SWOT Analysis
No ratings yet
Mobilink Pakistan SWOT Analysis
2 pages
J2ee-Developer2 - Template 16
No ratings yet
J2ee-Developer2 - Template 16
1 page
Micromax Presentation Final
No ratings yet
Micromax Presentation Final
33 pages
06 Push Pull Strategies
100% (1)
06 Push Pull Strategies
14 pages

Data Profiling Screen

Uploaded by

Data Profiling Screen

Uploaded by

Trillium Software

Data Proling Basics

Why Do People Prole?

Who Should Be Proling the Data?

How Do Data Proling Tools Differ?

How Do People Prole Data?

Trillium Software Solution Guide: Data Proling Basics

What Follows Data Proling?

TS Discovery: Investing in the Future

Harte-Hanks Trillium Software

You might also like