Ascertain Accuracy of Data (1)
Ascertain Accuracy of Data (1)
Data is a set of values of qualitative or quantitative variables which do not have complete
meaning that need to be farther analysis.
Data and information are often used interchangeably; however, the extent to which a set of data
is informative to someone depends on the extent to which it is unexpected by that person. The
amount of information content in a data stream may be characterized by its usage and
application.
• The result of an accuracy assessment for a data item in a test data set was 84% A DQ
Dimension is different to, and should not be confused with other dimension terminologies such
as those used in:
• other aspects of data management e.g. a data warehouse dimension or a data cube dimension
• physics, where a dimension refers to the structure of space or how material objects are located
in time * Characteristic, attribute or facet
CONTEXT
The best practice laid out in this document is designed to assist data quality practitioners when
looking to assess and describe the quality of the data in their organisations. This document
defines the six best practice definitions as generic data quality dimensions. This will help to
reduce uncertainty and confusion that may arise when considering data quality. It is suggested
that these dimensions and definitions should be adopted by data quality practitioners as the
standard method for assessing and describing the quality of data. However, in some situations
one or more dimension may not be relevant. The intention is for organisations to use these
dimensions to measure the impact of the poor data quality in terms of cost, reputation and
regulatory compliance, etc.
Before attempting to use data quality dimensions, an organisation needs to agree the quality rules
against which the data needs to be assessed against. These rules should be developed based upon
the six data quality dimensions, organisational requirements for data and the impact on an
organisation of data not complying with these rules.
• incorrect or missing email addresses would have a significant impact on any marketing
campaigns
• inaccurate personal details may lead to missed sales opportunities or a rise in customer
complaints
• incorrect product measurements can lead to significant transportation issues i.e. the product
will not fit into a lorry, alternatively too many lorries may have been ordered for the size of the
actual load
Data generally only has value when it supports a business process or organisational decision
making. The agreed data quality rules should take account of the value that data can provide to
an organization . If it is identified that data has a very high value in a certain context, then this
may indicate that more rigorous data quality rules are required in this context.
Organisations select the data quality dimensions and associated dimension thresholds based on
their business context, requirements, levels of risk etc. Note that each dimension is likely to have
a different weighting and in order to obtain an accurate measure of the quality of data, the
organisation will need to determine how much each dimension contributes to the data quality as
a whole.
1. Identify which data items need to be assessed for data quality, typically this will be data items
deemed as critical to business operations and associated management reporting
2. Assess which data quality dimensions to use and their associated weighting
3. For each data quality dimension, define values or ranges representing good and bad quality
data. Please note, that as a data set may support multiple requirements, a number of different data
quality assessments may need to be performed
6. Where appropriate take corrective actions e.g. clean the data and improve data handling
processes to prevent future recurrences
The outputs of different data quality checks may be required in order to determine how well the
data supports a particular business need. Data quality checks will not provide an effective
assessment of fitness for purpose if a particular business need is not adequately reflected in data
quality rules. Similarly, when undertaking repeat data quality assessments, you should check to
determine whether business data requirements have changed since the last assessment.
Whilst most data quality dimensions can be assessed by analysing the data itself, assessing
accuracy of data can only be achieved by either:
• Assessing the data against the actual thing it represents, for example, when an employee visits a
property; or
• Assessing the data against an authoritative reference data set, for example, checking customer
details against the official list of voters
1. Completeness
2. Uniqueness
3. Timeliness
4. Validity
5. Accuracy
` 6. Consistency