0% found this document useful (0 votes)
354 views

Process of Data Quality.

data

Uploaded by

Prashant Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
354 views

Process of Data Quality.

data

Uploaded by

Prashant Kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 238

The Process of Data Quality

Why You Don’t Get What You Thought You Asked For
By Michael P. Meier

©2013, Michael P. Meier


Copying or dissemination of the content of this publication, in whole or in
part is expressly forbidden without the written permission of the copyright
holder. Such permission will be granted for non-profit uses in which the
source is clearly identified.
Contents
Forward 8
Prologue 12
Your Mission 13
Two Approaches 14
Selecting the Best Approach 16
Recognizing Root Causes 18
The History of Data Management 18
An Approach 18
Genesis: Planting our Feet 18
The Role of Perspective 21
What is Data Management? 22
But What’s Involved? 23
Laying the Groundwork 26
The Start-up Business (or Household) 26
The Small Business 27
The Larger Corporate (Enterprise) Scale 27
The Big Picture 29
Process Management 31
Governance 33
Information Infrastructure 35
Metadata Management 36
Data Quality 37
Two Sides of a Coin 41
About Quality 41
What is Quality? 41
Quality vs. Error 43
Data and Quality 44
A Goal 46
The Core of the Problem 46
The Lesson(s) 47
You Are Here: Dimensions of Data Quality 48
Contextual Consistency (the Collection) 49
Fitness for Purpose 50
Definition and Purpose 50
People as a Dimension 51
Access and Interoperability 52
Comparison 53
Complexity and Utility 53
A Word About Methodologies 55
In the beginning… 58
The Leader 60
The Manager 60
The Governer 61
The Doer 62
The Opportunity 63
Data Quality in Context 63
Data Quality—More Than Meets the Eye 67
Foundation—Machines and Logic 70
Data Quality in the Logical World 71
Motivation 1: Business Need 73
Motivation 2: Innovation 75
Motivation 3: Creating Something 76
The Human Factor 76
Where Does Quality Come From? 82
Detection and Remediation 82
Strategic (Root Cause) Re-engineering 84
Initial (First) Definition of a Data Item 87
Use of Varchar as Default Type 92
Record vs Set Operations 94
Programmers Like Process 95
Record vs Set 96
But what about process? 98
Complexity (Taking a Step Back) 100
Bridge 102
Current and On-going Status of Data Management 104
Setting Course 104
14 Points (W. Edwards Deming) 105
1. Constancy of Purpose 105
2. Adopt the New Philosophy 106
3. Cease dependence on mass inspection 106
4. End the practice of awarding business on price tag alone 106
5. Constantly and forever improve the systems of production and services 107
6. Institute modern methods of training on the job 107
7. Institute modern methods of supervision and leadership 107
8. Drive out fear 107
9. Break down barriers between departments 107
10. Eliminate numerical goals for the work force 108
11. Eliminate work standards and numerical quotas 108
12. Remove barriers to pride of workmanship 108
13. Institute a vigorous program of education and training for everyone 109
14. Create a structure in top management that will push every day on these 13
points. 109
Abstraction 109
Understanding Data and an Introduction to Relationship 111
Relationship Concepts 111
Motivation 114
Some Examples 116
Modeling 117
The Relationship Pattern 120
Implications for Data Quality 124
Two Approaches to a Relationship 126
Obstacles to Relationship 128
The Spectrum 130
Type or Instance? 131
The Anatomy of a Relationship 132
Two Names are Better than One 135
Advanced Relationships 137
Architecting the Advanced Relationship 139
Unlocking the Mystery 139
A Data Example 142
The Tale of the New Blood Bank System 144
The Moral 147
More About the Architecture Process 147
Principles for Life in a Data World 150
Words are not meaning. Words are not truth. 150
Complexity is and simplicity is potential. 151
Data has more in common with words than with meaning 152
Meaning (and truth) is personal and subjective. 152
Relationships are how we find meaning 152
Current Hot Technology 153
Commitment to Action 154
Truths 155
Your Mission 156
Two Approaches 157
Selecting the Best Approach 158
Recognizing Root Causes 159
The Process 159
Epilogue 161
Appendices 162
Appendix A: Analysis of Error (1995) 162
Appendix B: Blogs, Essays and Articles 164
Introduction 165
More on Perspective 165
Tools and Products 165
Questions and Answers 166
Information or Systems? 166
Governance and Data Governance 167
Oxymorons in Abundance 168
Oxymorons in Abundance (continued) 169
Oxymorons in Abundance (part 3) 170
Measuring Governance 172
Pragmatism and Sacred Cows 173
Guerrillas and Governance 174
Practical, Pragmatic, Productive 175
Assistance vs. Solution 175
A Hammer in Search of a Nail 176
Fear, Accountability and Approval 177
Begin at the Beginning (but with the end in mind) 178
The Beginning (2) 179
The Beginning (3) 180
The Beginning (4) 181
The Beginning (5) 182
Why Management Isn't Enough 182
Programmers Need Leadership 183
Advice Re: Healthcare IT 185
Healthcare: More on I.T. Capabilities 186
Standards Dread 188
Standards Clarification 189
Arm Waving and Obfuscation 189
The Health Care (or Healthcare) Vision 190
The Aging Workforce and Your BI 191
Best Practice and Best 191
People First—Technology Somewhere in There 192
Keeping Your Own Counsel 192
Everything From The Center 193
The Problem With Telepathy 194
Governance in Context 195
Re-Branding 196
BI and Re-Branding 197
The Status of [Data] Governance 197
Changing 198
Note to Sec. Sebelius 199
Coaching 200
Guerrilla Governance 201
Governance and Control 202
Feng Shui and Data Governance 203
Winning the World Series 204
We The People 205
Can and Should 207
Haves and Have Nots 208
Healthcare and Health Care 209
Managing Technology and People 210
Enforcement and Accountability 211
The Control Myth 212
Complexity and [Over]Simplification 213
Answers and Questions 215
The Problem With Quality 216
Programmer as Data Quality Champion 217
Disturbances in the Force 219
What Is "Data" Anyway? 221
Business, Information and Technology 222
Survival, Error & Technology 224
Principles of Data Governance 225
Christmas Wish List 227
The Theory of Everything 228
Data Quality: Getting Started 229
What Do You Do When Things Aren't Working The Way You'd Hoped? 230
Principle Before Practice 231
Data Governance Is... 232
Leadership, Management, Governance 234
Whose Job Is It, Anyway? 235
Bibliography 237

Forward
This book is about relationships. There, I said it. I’m going to make the case that business
management principles such as alignment, governance, leadership, and yes, quality, are based
almost exclusively on relationships, both in the modeling sense and in human relations. Then I’ll
show you what goes into a high-functioning relationship. Just as neglected or unrecognized or
misunderstood relationships can cause problems in our personal lives, they can cause disastrous
problems in our business lives. I’m going to come at this through the back door of information
(data) quality.
Like all data people1, I would like to think that the whole world is vitally concerned with the
quality of the information delivered to them. I’ll begin here by acknowledging this to be a
fantasy. That doesn’t mean that the world shouldn’t be vitally concerned—only that in many
cases the much more pressing concerns may be avoiding starvation or imprisonment or violent
death or losing a promotion or a bonus.
Still, one would think that in the U.S. and Europe at least, there might be some general and
consistent effort made toward ensuring the reliability and credibility of information. To those of

1
Data People are those who understand (though there are degrees of understanding) that this thing called data is real and is not limited by
example. In other words, there is a real difference between data and this data.
us who are data people, it can look as though there is indeed concerted effort. Alas though,
delusion—no matter how persistent—is still delusion.
In order that you might know more of the perspectives that inspire and give form to this book,
the Myers-Briggs personality assessment puts me well into the Intuitive category in the profile
INTP2. The Gallup Strength Finder (Rath, 2007), taken in 2013, reveals [Maximizer,
Connectedness, Ideation, Strategic, Learner] as my top five strengths. These strengths can be
looked at in two ways
1. As filters through which we view the world
2. As tools that we use to attempt to manage our worldi
If you have a copy of The Gallup Strength Finder you will find guidance on how to “interpret”
and work with strengths other than your own. I attempt here to appeal to readers who are using
filters other than these. My hope is that at least one of the explanations, examples or illustrations
will succeed in passing understanding through your filters.
In 1982, upon “retiring” from the Air Force and a career as a German/Russian voice processing
specialist and intelligence analyst, I set myself a goal of understanding system development,
gaining command of all facets. Later, quality became part of the quest when the press began to
publish articles asking why system development results were frequently not well received. A
“study3” (The CHAOS Report) executed by the Standish Group in 1995 was widely quoted and
executives began asking why they were investing so much money into projects that were likely
to be “successful“ only 16% of the time.
I quickly determined that information systems are precisely about information—that is, data.
Today this is not a universally held view. In system development it is frequently the view of the
developers that an information system is a big puzzle in which all pieces have to fit together
without damage to one another. In this view, the data is merely a component of the puzzle but
one which can be manipulated. Data often becomes the chewing gum that makes it look like the
pieces fit together but, as anyone knows who has actually used chewing gum to hold something
together, it leaves a residue that can make more permanent fixes problematic or even impossible.
Chewing gum fixes also have a tendency to fail at the most inopportune of times.
While I studied software engineering, its methods and tools, and the economics of software
development and project management, I kept my eye firmly on the prize—data. I will use the
terms data and information interchangeably for the most part, always recognizing that there is no
information without data but that the converse is not true.
As it turns out, the approach was sound because information/data is the core of every information
system and it is wrapped in layers upon layers of technology, science, psychology, communica-

2
INTP
Seek to develop logical explanations for everything that interests them. Theoretical and abstract, interested more in ideas than in social
interaction. Quiet, contained, flexible, and adaptable. Have unusual ability to focus in depth to solve problems in their area of interest. Skeptical,
sometimes critical, always analytical.
3
You are left to decide for yourselves as to the value of this survey/study. The fact remains, though, that it did cause a stir in the trade press that
leaked over into mainstream management publications The fact is that the perception of quality in a system project is tied directly to delivery
delays. Perceived Quality is inversely proportional to the number and duration of delays.
tion, business, management and economics. The information systems of an enterprise touch
everyone in it at every level. The task of managing the systems and the information wrapped in
them is as complex a task as any that exists on the planet today. It may be that to future
archeologists, historians, archivists and anthropologists, the large scale systems in use today will
look much as the pyramids of ancient Egypt look to us. How did they do that with the tools they
had available to them?
I’ll go out on a limb here and assert that your data/information problems including inconsistency,
reliability, utility and simple lack of confidence and credibility are symptoms of widespread
problems in your business. A few years ago, it seemed that something called data governance
may have the potential to bring some of the information problems under control. What I learned
when I began to immerse myself in the world of data governance was that, like other business
ideals such as alignment, there is nothing other than an idea. Please note that I do not belittle
ideas as powerful forces. Certainly liberty, freedom, justice are ”only” ideas. At the same time
we need to recognize that those ideas are nothing if they do not become part of a vision that, in
turn, motivates action.
The appendices tell the story of my journey. The bottom line is that it is not possible to assert
governance in one community in the absence of governance in the surrounding communities.
Research into corporate governance will yield a treasure trove of organizational charts with
different labels on the boxes and extra dotted lines being the chief differentiators. The sad
conclusion was that governance in a business context is non-existent. Wait, though, there is one
form of governance that does exist almost universally in corporations and that is feudalism. You
may recall that in the feudal system, what is right and true is whatever the king or duke or baron
said was right or true. This insight caused me to adopt an altogether new approach to [data]
quality.
This is my magnum opus, the result of 30 years (as of today) in the information systems industry
and more than twice that in life. This work applies life experience to information systems issues
in general but specifically to issues of data and its quality.
We should all be able to accept that an information system of any utility at all in any industry
whatsoever is of value only to the extent that what we receive from it (information) is reliable
and credible. Anything less is the equivalent of a magic 8 ball or a Ouija board.

Simple and foolproof information systems


Notice that, while the Magic 8 Ball and the Ouija Board are simple and foolproof, they don’t
deliver what most people would consider to be credible information. Would you commit $100
million based on a “Yes” from the Ouija Board?
You will find that my approach to these issues is not driven by technology, nor by methodology
(in the sense of a formalized, tool supported approach to defining and building information
systems). Rather, all such are considered part of the problem.
If you have come into contact with data people or their subgroups data quality people and data
governance people, you have been told about many capabilities and processes that are vitally
important, critical or essential. It takes decades before a data person like me finally recognizes
that, while all of these capabilities, tools, methods, etc. are important, the only essential is
communication and agreement—first among people and then among systems. The solution
depends on your willingness and ability to step back from technology concerns and devote
yourself to understanding human communication.
This book has the goal of making the reader conversant with the important issues and preparing
you for the essential. You have to be able to communicate with data people to the extent that
you can recognize in broad terms what they are talking about and ask for better explanations
where needed. You’ll also need to involve them in agreements which are understood by them as
well as everyone else concerned.
Someone has to assume the responsibility and accountability for agreement. And, to make
certain that this is clear, agreement is
 not dictated
 not coerced
 not subject to signing memoranda and hidden agendas
 not subject to unilateral revocation.
If you want clear, credible, timely and useful information (which is what your data people want)
then you have to work with them.
It is also important to recognize that developers (also known as programmers or software
engineers) are not data people though they do share some common vocabulary and culture. You
must never assume that agreeing with one is the same as agreement with both. The two groups
or cultures are driven by different forces and those forces are often in conflict.
I don’t want to take this experience, knowledge, and insight with me to the grave. The best I can
do is to make it available with the hope that eventually these insights will catch on in a wave of
sufficient length and amplitude to change the way we think about technology-based information
systems.
We begin with this thought:
Data and information and their management have nothing whatsoever to do with technology.
True, technology has some application where large quantities are concerned, but the underlying
principles of data management, data quality and data governance are entirely independent of
technology. When you read, you will encounter technology-based scenarios and discussion
centering on data and its quality. You task is to see these as representative of your processes as a
whole. Remember, the problems are symptoms of a disease affecting the body as a whole.
Relationship with one subgroup will not deliver the result you desire. Not only must you relate
with all subgroups but you must insure that they have relationships with one another as well. I’ll
avoid diving deeply into anything wholly technical while trying to give you enough information
to help you appreciate the issues.

Prologue
[Apologies to James Michener, the master of the historical saga]
The business landscape lay unchanged for thousands of years. A Babylonian merchant’s bill of
lading captured in cuneiform on a clay tablet, once translated, would have been easily
recognizable to a merchant anywhere in the world of the early twentieth century. The status quo
seemed safe. The skies of commerce were clear, broken only by the occasional nimbus that
grudgingly leaked a bit of change to keep everyone on their toes.
Human to human communication, progressing from impressions in wet clay to ink on paper to
print to wire-limited electronic pulses, began to accelerate at a breathtaking rate. The shockwave
of improving communication capabilities pushed everything ahead and created a vacuum behind
that began to distort the landscape.
Timeliness of business communication had seen great improvement thanks to the telegraph,
telephone and radio. Goods still moved physically but more rapidly than ever before thanks to
the railroad and the independence from wind power on the seas.
The big money was in charge and barriers were in place to keep the financial flow in its designed
channels. As always, insuring an adequate flow of wealth into “my” purse was the primary
concern. Skills were valued (though not too highly) and those possessing them could rely on a
steady, if unspectacular income.
Record-keeping was based on long-established practice with data captured on the pages of ledger
books in pencil or ink. Those who created the records were often called upon to interpret them
as well. Even in medicine, nurses were taught how to keep records concerning their patients
though these were always subject to change by the physicians, each of whom had his (invariably
his) own likes and dislikes which had to be accommodated.
When people disagreed concerning the meaning of the records of their business, it didn’t matter
much since there was little actual sharing of data—after all, and assuming that both were ethical,
of what use would it be to Aloysius to have the ledger books of Ambrose’s business? Of course,
if Aloysius was negotiating with Ambrose to buy his business, the stakes were increased.
Gradually, beginning with the harnessing of electricity, the world of business information (data)
began to expand. The great wars of the 20th Century influenced the process as well. Mechanical
computation machines came into existence and then into common usage. The complexity of
these machines and the expense that resulted, limited these machines to special purpose use by
the very wealthy and governments.
With the advent of the transistor near the mid-point of the century, the expensive calculators
were rendered obsolete and the world of information exploded in a supernova, the extent of
which is still being discovered. The rate of improvement in communications—already
breathtaking—accelerated at an undreamed of rate. Analog voice transmissions, broadcasts of
real-time images of action and packetized digital limited only by the receiver’s ability to decode,
contributed to create new shockwaves and vacuum (push and pull) forces that tested the ability
of business to adapt.
Anything could now be treated as information, discrete units of which were called data. The
term came to be applied to the entire spectrum of information expressible as encoded changes in
voltage.
The information age had begun and for the first time in history data began to be felt as a financial
component of business.
Began is the operative word in the previous sentence. Within three decades a new field, software
engineering economics, had come into being and new sub-disciplines of software engineering
called validation and verification (V&V) were developed to begin to apply programming logic to
creating programs that would identify error conditions as they ran and, if forced to terminate,
would do so gracefully.
Things moved along at a hectic pace as we advanced from
 programming by flipping switches
 encoding machine (or assembly) language commands on punched cards
 “Higher-level” (more like natural) languages which were themselves treated as data and
given as input to into software programs called compilers that emitted machine language
as the output
It became possible to combine simple programs in a spreadsheet cell, link it other cells for input
and output and, without knowing anything about software engineering, create arbitrarily complex
logical constructs that could be saved or transmitted with or without their accompanying input
data. Finally it became possible to graphically manipulate icons representing programs on a
computer monitor, connect them using various kinds of connectors and essentially create a
pictorial spreadsheet.
But the spreadsheet worked fine when digits were entered from the keypad until a decimal point
(or a zero or a parenthesis or a quote or…) was inserted. Then the result became #Err. This
moment is when data first appeared on the cognitive radar of the average person.

Your Mission
You’ve decided you are the one to take on and solve all those data quality problems that plague
your organization. Well, somebody has to do it, right? The lack of data quality is costing us a
lot of money. Holding on to that money for the company would mean a more profitable
company and who doesn’t want a better profit margin?
There are a few things you should know before you commit yourself to this.
1. Data Quality is like dusting. Some of you will understand this and others won’t. What I
mean to say is that there will never be an end to it.
2. Data Quality is NOT about technology. Technology is the spotlight that makes [the lack
of] data quality so visible (and so expensive).
3. In times of rapid change, Data Quality issues are inevitable. Today, a company that isn’t
changing is dying so… The good news is that if you get good at this, your career is as
secure as any career can be.
4. Even though Data Quality is NOT about technology, you’re going to find that you will
need a very good foundation in technology processes and especially the processes
employed in your company in order to have any chance of identifying the right places to
apply pressure.
5. An approach to Data Quality that flits from one bunch of low-hanging fruit to another is
going to become a net additional cost (not what you want to be associated with).
6. In practice, there is no difference between a Data Quality program and a Data
Governance program. The goals are only slightly different and the methods may be
identical. The goal of Data Governance is the establishment of an auditable process
leading to consistently high quality and reliable data. The goal of Data Quality is
consistently high-quality and reliable data (which will require auditable processes).
Auditable means provably consistent.
7. You are going to learn that no one (and I mean absolutely NO one) wants to talk about
data. Learn to talk about other things and use them to illustrate the concepts you want to
teach.
8. You will not be able to do this alone. It’s going to take leadership on your part to
mobilize support and participation across the company.
9. It’s a really good idea to be able to communicate the data quality vision for consumption
by any audience. This will require you to be able to express it so that your audience gets
it (in their language, using their metaphors…)
10. Finally, your Vision is the only thing you will have to sustain you in this so make sure
that it is clear in your mind (and heart).

Two Approaches
There are two ways to approach data quality. Both involve a process that looks like this.

The difference between the two approaches is that


 One deals with some instance of data (customer, patient, procedure, visit, lab, order,
invoice, etc.). This one has short cycle times.
while
 The other deals with the processes that surround any data within the organization. Fixing
processes lengthens the cycle time.
The difference lies in the word “this” within the decision that “We have to do something about
this.”
The one that is used most of the time is to attack a specific issue that has become apparent in
terms of its cost to the organization. This is repeated as new issues arise. The other is used only
in organizations that are high-functioning. These are the organizations that have adopted a
capability/maturity based vision for themselves and are on the path to the Malcolm Baldridge
Award, or ISO quality certification or CMMS Level 5. The diagram shows, in pyramid form, the
relationship between the targets of the two approaches.

Costs (and Returns) increase as we address more foundational processes

Efforts focused on the top of the pyramid (bad customer addresses for example) are generally
less costly on a per project basis and can be turned around faster. While efforts involving the
lower segments of the pyramid will have broader scope and higher costs. In both cases, the long-
term rewards are proportional to the cost.
Resolving bad addresses to reduce the cost of mailings, for example, will be effective for only a
relatively short time before it must be done again (like dusting). To extend the time before a new
project team must be formed to repeat the fix, we could move downward to (for example) the
Technology Processes. We could institute checks to insure that validity of data (such as
customer address) must be guaranteed by the front-end (user interface) of the data collection
system.
Note that the sharp end of the pyramid can be attacked independently by one person or a very
small team while attacking lower (more foundational) levels will require coordination. For
instance, it would do little good in terms of reducing mailing cost to institute processes to
guarantee good addresses on input but leaving existing bad addresses uncorrected. All this
would accomplish would be to fix the cost at a certain level defined by the number of known
incorrect addresses. These considerations are quite apparent and lead to a high frequency of
what, from an historical perspective, can best be considered as misguided choices. The efforts
near the top are often seen as “low-hanging fruit.”
Harvesting low-hanging fruit is not necessarily bad although, as a standard practice, it
discourages us from developing the infrastructure we need to harvest and use the bulk of our
information resource.

Selecting the Best Approach


In general, the lower down we go on the pyramid, the higher we have to go on the org chart to
gain the needed level of support. Remember that “best” is a quality assertion. It means
something different to each person you meet so the object is to gain a meeting of the minds
(consensus) on an approach.
You don’t have to work too hard or be an expert in data management to convince a manager that
their department costs can be dramatically reduced with a two-week project that identifies
unusable addresses and eliminates them from the database(s). If they complain about losing
contact with customers (which, of course, they never had anyway) and say that the addresses
must be fixed instead of deleted, then the two-week project becomes an eight-week project with
additional expenditures for specialized address validation tools and possibly additional people
who will track down the customers and discover their correct address.
If the manager agrees and asks, “So the problem will be gone for good, right?” then we will have
to fix the systems that allow the bad addresses through to the database. Now we have a three-to
six month project involving a bigger team. We’re still OK though because the goals are well
defined and the end state is understood.
If the issue becomes one of not producing (or accepting) systems that allow bad data values into
our corporate data resource, we have moved to another level in the pyramid. We’re asking for
better requirements-to-design processes as well as better testing processes on all system
development and/or acquisition. It’s only fair to warn you at this point that here is where you
can easily get in over your head.
Your technology probably comes under the heading “Information Systems” on your org chart.
More recently “Information Technology” has dominated. This is called camouflage or mis-
direction. Current levels of budget could not possibly have been attained without the promise of
something that you, the holder of the purse strings, were really interested in—like information
for example. It’s something that has interest for everyone but has the benefit of being so well
understood (this is sarcasm) that it doesn’t need to be defined. In reality, Information Systems is
100% about the systems (or even better—technology) with any left-over effort being devoted to
information (more sarcasm). This just means that you must be prepared to negotiate with a
foreign entity, the tech people.
The development (programming) staff is going to claim that they can only design using the
specifications they receive from the business side or from the data architects. They’ll be right of
course, but you can’t let them off the hook now. They need to create and adopt processes that
leave an auditable trail of documentation. They will wail and tear their clothes and say, “We’re
not staffed for that and it will mean projects will take longer to turn around.” This isn’t
something they really care about but they believe you care about it and, because you do, you will
let them off the hook. The sad part is that it usually works.
If they were more honest (with themselves as well as with you) they would say that they have no
appetite for processes that can’t be executed by a computer. Your I.T. department would be very
happy never to hear the word “information” uttered in their cubicles. “Input” and “output”, OK,
but quality should consist in
 Doesn’t choke the computer
 Program continues to run
 Reports continue to run
 No alarming screen artifacts like “#DIV/0!” or “!Value”
 Preserve appearance of normality
I am NOT saying this is wrong. I am saying that this attitude, no matter how well it has served
in the past, will not produce what you are after. The better your grasp of this fact, the more
successful you are likely to be.
So, you will have to define what “auditable” means. You should also be aware that the new
processes are going to evolve rather quickly for the first 6-12 months (or the first 2-3 projects).
The processes will have to be designed to incorporate this evolution. It’s quite easy to build
something that will work once. It’s a few degrees of magnitude more difficult to build
something that will work from now on.
You may have been able to hold the line at the technology process level for a short time.
Eventually, though, the need for process change is going to involve the data management process
level. In truth, you may want to get them moving as you initiate the technology process changes.
Even with little or no background in data management, a review of data documentation
(meanings, usages, relationships) is going to show that the programmers have several valid
points. In many, if not most, cases the requirement calling for definitions for data objects
appears to be satisfied if there is any text at all, however meaningless or useless, in the
description property. It will be exceedingly rare to find any descriptive information about a
relationship beyond mandatory/optional and cardinality. We’ll discuss this in more detail in the
New Testament.
Once again, the data architects will attempt to lay off the responsibility for this state onto the
business architects. Don’t allow this. Data management must also design and implement
auditable processes that establish criteria for completeness and ensure that the criteria are met.
Failure to accomplish this and do so in a way that allows for evolution will lead to the collapse of
everything above. It’s going to take several attempts to pry open the communication portals
because, as much as the programmers don’t want to hear about “data” or “information,” the data
people will shy away from “process” like a calf at a new gate.
When you reach the business process layer, you’ll be dealing with questions from the COO
(Chief Operating Officer) or CEO (Chief Executive Officer) and they’ll want different kinds of
answers. These folks have also grown adept at detecting fuzziness of thought and verbal arm-
waving (in others). They won’t be easily distracted by canned responses and will want more
than “this is standard practice.” They may even ask whether a standard practice can be
implemented “here.” Of course, their weakness is an inability to admit that they don’t
understand. Decades of experience has taught the tech people and the data people that new
acronyms and techno-speak in sufficient quantity will bring down most barriers.
Which part of the pyramid you will attack is a line you can draw anywhere that is comfortable
for you. Something to keep in mind, however, is that you may not get to draw the line. Maybe
the line has already been drawn and you’re trying to move it. Executives and managers who
have been around for a while will have an idea where the sweet spot is for them and they will
resist fiercely any attempts to redraw that border.
Selecting the approach and scope that’s right for your situation depends on the entire
organizational context including key personalities who must be involved. The deeper you go
into the pyramid, the more accomplished you must be, the more knowledgeable, the more
credible and the more persuasive.

Recognizing Root Causes


You are going to be told several times every day that x or y or z is the cause of the data quality
issue you are working on. The “cause” that you will hear about is going to be something that
some new software or some new technology will fix and it just a matter of obtaining the
funding…
Nine times out of ten this will not be the cause. How will you recognize the real cause so that
you can save all the effort of attacking the other one(s) only to find that the quality issues persist?
That’s the purpose of this book. We want you to understand the historical causes of data quality
issues because those causes have never been eliminated. Here’s one heuristic to get you started:
the technology is never the cause.

The History of Data Management


An Approach
Whenever we need to find our way to something, it always helps to understand where we are
now. When we post a map in a shopping mall or a complex of some kind, we always make sure
to indicate the point where the map is posted. “You Are Here” is the first thing to look for, right
after we have established that our destination is actually present. Even when we are creating a
map, we need to have our frame of reference clearly defined.
We begin with an Old Testament treatment that will establish the frame of reference for the New
Testament which follows.

Genesis: Planting our Feet


In business as in life many, if not most of us, resemble Dr. Sheldon Cooper in the television
series, The Big Bang Theory. We are bright, knowledgeable in our chosen field and completely
focused on ourselves. It doesn’t help that we are utterly clueless in terms of navigating social
relationships. Just as in the TV program our friends and associates are so much like us that they
can offer little in the way of assistance as we move from one disappointment to another,
convinced that if they only knew what I know and weren’t so completely incompetent, the world
would be a much better place. If you have seen the program you have probably been
uncomfortable from time to time as you recognized Sheldon’s frustration or problem as similar
to one you were involved in recently. Or course Sheldon is such a caricature that we don’t easily
identify with him or his approach to life, but then why the discomfort?
In a sentence, the purpose of this book is to provide sufficient historical foundation together with
carefully designed knowledge reinforcement to allow those who obsess about data, its manage-
ment and quality, to realize that they are not singlehandedly holding the world together. We’re
seeking a moment of revelation. The epiphany that those boxes on the org chart actually
represent people and that those people each have lives, desires and motivations that are almost
certainly different than mine. Finally, we cultivate the insight that my own desires can be
advanced if I can create relationship with those people and show them how our relationship can
improve their lives as well.
Grasping this insight will give all of us a chance to relax long enough to engage in conversation
about actually solving some of the problems.
We won’t find the diamonds if we are constantly distracted by the rhinestones.
The proverbial boy who used his finger to plug the hole in the dyke might have been better off
had he had taken a moment to consider the larger picture. How often have you or I found
ourselves held captive in an “emergency” that we created so that we could be seen as heroes?
Just because a manager notices a problem (almost always a low-hanging fruit kind of problem)
that is no reason for others to put on blinders in order to ignore up- and down-stream issues as
well as collateral issues. All of which influence and may even have created the “leak in the dyke”
that we have been asked to plug.
We can distill everything we know about the management of data down to six critical principles
which, if followed, will allow us to shed a vast amount of non-productive (useless) effort and
stress and put us in position to benefit from our use of technology instead of cursing its cost and
impenetrability.
The six principles are (they are offered here in order that they may be sent to your subconscious
to ripen until we bring them out again for discussion in the New Testament).
1. Words are not meaning. Words are not truth.
2. Complexity is now and simplicity is potential.
3. Data has more in common with words than with meaning.
4. Meaning and truth are personal and subjective.
5. Relationships are how we find meaning
6. Technology is about volume4 only. Everything that is about meaning, truth and
value/quality comes from people.
Many years of working in the world of computer assisted data management have led to certain
observations concerning the way people and organizations function. Data and information are of
interest to everyone though each individual seems to have a different understanding of what data
is and why it’s important. One of the purposes, then, of this book is to communicate the idea of

4
More accurately, throughput or volume per time interval is the measure of technology. Think of your organization as a pipe. A pipe of a given
internal diameter is capable of carrying a certain volume of liquid content. Now think of technology as the means of increasing the pressure to
make more of that liquid flow out the end of the pipe in a day. Like pipe, organizations are constructed differently, have surfaces that are more or
less smooth, more or less resistant to corrosion, etc. It is most often the pipe that fails and not the pressurizing technology.
data. We won’t create experts but we will provide enough background so that you will grasp the
issues. Another purpose is to apply principles already known to the business world to data
quality. Data Quality (DQ) is a hot topic and it represents significant cost. It will be worth the
effort to get a better handle.
Alignment of all parts of an organization has emerged in the business literature as something of a
mythical objective. Executives are urged to optimize the performance of their organization(s) by
aligning the component functions so that they can operate in concert instead of in competition
with one another. In concert is a nice metaphor in fact. A popular vision that is called up is the
orchestra prior to a performance. If you’ve been there you’ll recall the noise that fills the venue
as each performer makes certain that his instrument is tuned and ready to play. Then the
conductor asks for attention and everything becomes quiet and the silence is filled by anticipa-
tion. On the downbeat the anticipation is replaced with music. This is the very picture of
alignment.
What the metaphor tends to obscure are the hours of relationship building that have preceded the
downbeat. Each individual musician has learned to relate to those around him and the conductor
has made certain that each now possesses a single vision of the work and the impressions that the
performance will produce for the customer. Individual mastery is simply the entrance
requirement. The performance demands mature relationships at several levels and those
relationships cannot come into being by accident.
Like so many business concepts, alignment has intuitive appeal and many consulting firms have
become wealthy by convincing executives that they have the key to obtaining alignment. Maybe
you have learned by now that they can’t deliver it. The best they can do is coach you through its
creation, no matter how long it takes.
The one thing you can expect from this book is that it will, on every subject, reflect the reality of
the information world. If it doesn’t reflect your reality, you are encouraged to take a closer look
at what you believe is real. So much of what we read today is best described as wishful thinking.
No wishes here. We’re going to start from the assumption that there may not be any such thing
as organizational alignment other than as an intuitively appealing concept. Take a deep,
cleansing breath now and roll up your sleeves because we’re going to create alignment
everywhere we go. We’re going to do it from the bottom up and the inside out.
 We’ll spotlight some of the critical relationships to be developed.
 We’ll learn how to build a high quality, effective relationship.
 We’ll reveal some motivations of key players that will strengthen your hand as you
negotiate the relationship.
 We’ll offer hints at what an aligned organization looks like in operation.
If your role is that of executive management, you only have to learn one sentence, supported by
one goal and implemented by committed action. The sentence is, “How can I help?” Of course
you should be satisfied that the person you’re talking with really intends to improve alignment.
If that person has a plan to get two or more segments (or even individuals) of your business
cooperating with each other—that’s alignment.
We’ll discuss this in more detail in the New Testament but the executive summary is this:
 Alignment will happen when people understand what you’re looking for.
 People will understand what you’re looking for when you can point to a handful of
examples.
The secret, then, is to
create or find a handful of examples.
The value in being able to say, “This is what I want,” can’t be overestimated.

The Role of Perspective


Perspective is another key idea. Over time a central tendency has dominated. It has emerged
that we often lack perspective. We get so focused on our own responsibilities and to-do list that
we forget that others may be accountable for different things. We have also come to believe that
the past has nothing useful to offer us. We no longer remember (if we ever knew) how we got
where we are today. This is nothing that should cause us to feel bad. We all develop blind spots
caused by forgetting that our perspective isn’t the only and certainly isn’t the “right” one. If you
want to see an example of such blind spots, again I recommend an episode or two of The Big
Bang Theory.
These single-perspective blind spots are especially prevalent wherever technology is involved.
To understand why, it is only necessary to think about the many layers between your eyes and
the microprocessor. We don’t really use technology like we use a hammer or a screw driver.
Rather we use ideas that are built on technology. Ideas are powerful as we have already
mentioned but they have one drawback in that they appear different to different people. When
the same thing appears different to me than it does to you, the opportunities for blind spots to
intrude are virtually boundless.
Let’s take a look at some different perspectives. Those who remember the early days of
computer-based data processing are shuffling off this mortal coil at an alarming rate. The early
days, are not pre-computer days. We need go back only as far as 1980 to get beyond the
memories of most who practice today. One of our problems is that in the technology world
which provides the background for DQ (data quality) discussions, even two years can have the
weight of an entire generation.
What follows is NOT a technical treatment. It is not a reference source on the history of
computing and data. It is not and cannot be an exhaustive discussion of any single aspect of data
management, data governance or data quality. You will find plenty of terms and acronyms that
can be used in any web search engine to produce all the details you may desire. Rather, it is an
overview of the problem from many perspectives. Among those perspectives are anthropo-
logical, cultural and natural in addition to economical, organizational and methodological.
The issues do not arise from a single direction but from all of these directions more or less
simultaneously. The answer, if one is to be found, must face all of these directions as well.
We may also want to entertain the possibility that THE ANSWER does not exist and that what we
have come to recognize as “data quality (DQ) problems” are part of the fabric of the world in
which we function.
The Bible (yes, that Bible) has provided inspiration in the creation of this treatment—not so
much the content—but the way it’s organized. There is an “old testament” that provides history
and perspective to aid in understanding the “new testament,” which offers a formula for the
future.
Although I have found over the course of my life that “Why?” is a spectacularly ineffective
question, it seems that the buyer/reader of this book is entitled to know what motivated it.
The simple answer is that, where Data Management is concerned, the world is characterized by
ignorance—a knowledge vacuum—people are not aware even of what can and can’t be done, not
to mention why (the reasons). In fact, the scarcity of knowledge and prevalence of misunder-
standing suggests that the level of expectation may be entirely too high. It also suggests that no
one, least of all you, should feel any defensiveness or even reluctance to admit your state of
ignorance or, if it feels more comfortable, your lack of mastery. The technology bookshelves at
Barnes & Noble are liberally sprinkled with books addressed to “idiots.” The Idiot’s Guide to…
is a very popular formula.
That being said, it also suggests that the market for a book such as this one may be enormous.
Certainly you are part of the market; after all, you have picked up the book. Everyone in
business should understand something of data management. Data and information, their
organization, storage and use, will affect every business, no matter whether its revenues are
$10,000 or $10 billion per year.

What is Data Management?


In simplest terms, Data Management starts with the recognition of data as something real—a
resource to be conserved just as you steward other business resources, from paper clips to
capital, from cubicles to the people who inhabit them. At some point it should occur to everyone
that forms, rolodexes, customer letters, complaints, production reports, purchase orders, invoices,
receipts are all kinds of data. Furthermore, contracts, customers, vendors, employees, etc., either
are data or have essential data components that must be recognized.
Once our thinking has progressed this far, it’s a short jump to dissatisfaction. There is always
something new or better we wish we could do if only we had some additional information. A
new business, for example, often has drawers full of receipts and some idea of costs as a whole
but soon wants to become more profitable by allocating costs to different kinds of products or
services. It is fascinating and illustrative of the steep learning curve that must be climbed that
this business owner will, without fail, focus on the processes and not the data.
“I need to do a better job of organizing my receipts so I have a better chance of understanding
my costs.” The non-data person (which is virtually everybody) will answer the siren call of
process tinkering without ever pausing to consider what “understanding my costs” actually
means. Because “understanding my costs” means something different to each business and each
business owner and—even more importantly—because it changes over time, there is no way to
supply a recipe that will work for every situation.
What we can do is learn to recognize that data and process are distinct from one another, though
so closely linked that it can become difficult to distinguish them. We can learn principles that
can be applied no matter what the situation. Finally, we can learn our own capabilities and
limitations. We have frequently been reminded by news organizations that “rescue” operations
continue as long as there is hope of victims remaining alive. “Recovery” operations begin when
either a corpse is discovered or when no hope of rescue remains. Knowing when to call for help
may be the difference between rescue and recovery.
To begin, our business owner will have to expand the idea of useful information (data) and create
some new kinds. It will be necessary to improve our handling of the data as well. We’ll need to
associate every purchase we make with a product or service that we offer, perhaps by creating a
purchase order tied to the product, and we’ll need better information about the product or job
itself so that we can do some useful analysis. If, for example, our business involves excavation
or is somehow weather-dependent, we may find it useful to collect data on the effects of weather
on a particular job. Over time, we can use the additional information to support better estimates,
better pricing, better revenue projections, etc. In short, we make our business better.
The figure below illustrates the relationship between process and data. They are discrete but
symbiotic. Some prefer to take a process-centric approach and then proceed to data. Others will
start with the data and move on to the processes. It helps immeasurably as you think about and
plan for data management, to keep this picture prominently in view. There is no point in
Start where you will

planning for data if there is no way to implement the process(es) that will generate it and
maintain its integrity. Likewise there is no reason to try to design a process for which the input
data will not be available.
As we think about process, keeping the data in mind will help us stay closer to reality. We have
a tendency, when thinking about process, to fantasize. We document the process we’re supposed
to have or the process we wish we had and make it the basis of our data management plans. If
your data management plans never go anywhere, the first place to look is the processes that are
the foundation. If they aren’t real, measureable and consistent (within specified parameters),
start over by injecting reality. This is the most prevalent cause of stagnation and perceived
failure.
You can see that much non-productive effort (cost) can be avoided. The same thought processes
will go into planning and scheduling as you make sure that the enabling data and processes are in
place and synchronized before beginning on dependent ones.

But What’s Involved?


Data Management (and quality in general) is at its most rudimentary a quest for consistency.
The rule of thumb is
Consistent processes produce consistent results.
Data flows through your organization

A quick example will make the case for the importance of consistency. The figure above shows
a watershed and indicates how water flows across and through a landscape. A landscape is
described by its topography which includes such information as elevations, physical character-
istics (sand, rock, etc.) and any man-made features (dams, roads, buildings…). Knowing
something about a watershed and its topography can help us avoid making some serious mistakes
if we are planning to build something. Heavy rains in the mountains near Boulder, CO, in 2013
resulted in massive inconsistencies in water run-off and are evidence that investing in
consistency guarantees can pay off in a big way.
Certainly the topography of the area had been mapped and major building projects over decades
had considered the flow of run-off, whether from snow melt or rain. Government (or
governance to use a term more familiar to the data landscape) had invested in check dams,
culverts, ditches, retention ponds, levees, etc. to make the flow more predictable in order to
minimize the risk for property owners.
Likewise, Data Management seeks to understand the organization’s topography and to constrain
the flow of data in order to make certain that everyone has the information they need and that
certain kinds of data can be diverted for special uses. We also want to insure that we don’t get
overwhelmed by sudden surges in known kinds of data or the sudden appearance of a new kind
of data.
Think now of each process in your organization as a pipe, culvert, ditch, creek or river. These
channel the flow to produce a planned result. Boulder may want to take steps to increase the
capacity of the surrounding watershed to carry what is now recognized as a new peak flow. We
might do that in our organization as well by adding FTEs to absorb increasing flow. The
downside is that the flow may have inherent inconsistency so that our new FTE’s may find
themselves next month with nothing to do. This is a far bigger problem for Boulder than it is for
the average business organization because we can use technology to increase the throughput
capacity of our processes. If we do it right, it will be scalable so that, for example, doubling
capacity next year will cost only a small fraction of the original effort.
Now you have an idea of the role of data management and a better idea of its value in the overall
organizational plan.
There are several practices that will make up your Data Management effort including:
 data design—insuring that you understand what data you (will) need
 data capture—making sure that you have or are able to get all the data you will need
 data storage—creating and operating the mechanisms to file your data and retrieve
it efficiently
 data reporting—creating the ability to retrieve your data in context to paint the
kinds of pictures you need
While none of these practices requires the use of technology, it is almost certain that you will
find yourself incorporating technology and becoming increasing dependent on technology if for
no other reason than because the sheer volume of data in any typical business soon becomes
overwhelming without the use of some sort of automation. This gives rise to some additional
practices as part of your Data Management effort. You may find yourself wanting to improve
your grasp of:
 database management systems and their use—this skill is often labeled as “database
administration” or DBA
 basic programming principles
 basic file management and mass storage principles
 basic networking concepts
You’ll find that creating consistency within these practices and then holding on to it will
consume most of your efforts. As a side note, it is far easier to create consistency (or any
change) than it is to hold on to it. If you don’t start out with the intention of making your process
changes permanent, entropy caused by the travails of life in business will soon make sure that
you’re right back where you started (or worse). When you first start, you won’t see this energy
expenditure as valuable, but it doesn’t take many iterations of rebuilding your data files from
scratch to begin to grasp the need.
Consistency is created through the use of “standardized” procedures. In a small company, this is
relatively easy. As the owner, you have only to create an effective procedure and then see that it
is used by you and a small number of employees. In a larger company this is much more
involved and may consume the efforts of multiple employees in entirely dedicated departments
such as Quality Assurance or Data Architecture or Software Methodologies or Technology
Infrastructure… In both the large and the small companies we invest in the standardization of
process because the payoff is so big.
As you begin to educate yourself, you’ll encounter terms such as Data Governance, Data
Integrity, Data Quality, Master Data Management, Metadata Management, Data and System
Architecture and so on. This book will provide a context for everything above and give you
some ideas for what to expect (or demand) from these various functions and practices.
This is the place for a word of caution. The more you learn about data and its management, the
greater the risk of vertigo. Just when you think you have everything nailed down, something
really important is certain to change. You may add a new product line that changes your
definition of customer. The government may issue a new regulation that forces you to handle
your data differently or to add one or more new types of data. There is an old story about the
cardiac surgeon who had taken his car in to have some repair made. The mechanic found out
that his customer was a heart surgeon and remarked that their jobs were much alike. They both
had to keep the engine going. The surgeon replied that there was a big difference—he had to do
his repairs while the engine was running.
My own feeling is that the data manager’s job is an order of magnitude more difficult because,
not only is the engine always running, it is also evolving at a blinding pace AND the patient is
wide awake and acutely aware of any pain that a data operation may cause. Add to that the fact
that the patient may be actively fighting against the operation and you have a task, the
satisfaction of which ought to be worth some serious money. Of course, it’s only data so you
may be willing to save the money and do it yourself. A mistake by the surgeon could mean
death for the patient. A mistake by the data manager wouldn’t result in anyone’s death and
probably not even the company’s death.
This is what makes this work necessary.

Laying the Groundwork


Let’s begin with the assumption that you’ll be doing what needs to be done by yourself. In a
small business this is very likely to be true. In a large corporation it will be functionally true
since you’ll find yourself in the midst of large numbers of specialists who will be absorbed in
their own responsibilities. You have either realized that there is a need to make sense of all of
these specialties and sub-specialties or you have stumbled upon the path through the maze.
Data Management demands coordination (alignment) among all the functions of a business
operation or corporation. The majority of the work may be the responsibility of I.T. while the
majority of the decisions must come from functions on the front lines (Sales, Purchasing, Human
Resource…). The larger organization is handicapped by the fact that many boundaries must be
crossed in planning and executing Data Management. As we all know, organizational
boundaries can be stubborn barriers to getting things done. More meetings are required at more
levels. More people must modify their plans and budgets.
We will address political, organizational and even cultural issues before we’re done, but we need
to build a solid foundation first—one that provides a context for everything that comes later.
Although Data Management will have certain characteristics no matter what the scale, broader or
more complex contexts bring their own unique problems and solutions. Let’s consider the most
focused and least complex scope first so that we can gain some mastery of foundation principles
before we try to “feed the world.”

The Start-up Business (or Household)


In our personal lives today we need to keep track of people and relationships, schedules
(including payment schedules), financial status, collections of “stuff”, various user ids and
passwords and to-do lists. A start-up business is not significantly different in terms of the kinds
of information to be managed but may be a bit more challenging in terms of the amount of
information to be managed as well as in the penalties for failure to manage the information
effectively.
Missing an appointment with the hairdresser—no matter how important we may consider the
appointment to be—doesn’t have the same downside potential as missing a meeting with a
prospective customer. An inability to produce an inventory of my music collection or my library
isn’t the recipe for disaster that mismanagement of product or materials inventories might be.
We do have to admit though, that on occasion we can feel our data management failures every
bit as acutely on an individual level as we might on a corporate level. A lost relationship caused
by a missed appointment can produce an “if only” memory that haunts us for the rest of our life.

The Small Business


As the scale of the business grows, complexity of the data management needs also grows. This
is an important fact to remember because it means that today’s solution will almost certainly not
be the one that will get us through tomorrow’s crises. When you realize that you need an
accountant, you should also be thinking about someone to take care of your data.
For a small business, scale changes when we add new revenue streams, new customers, new
products or services, new kinds of debt… As a small business owner, you recognize that this is
your way of life. Adding and changing is how a small business stays alive (profitable). As the
business grows in complexity, the relationships within the business tend to get more simple. We
can see this in our personal lives as well. The problem, of course, and the reason for so many
failed relationships, is that simplified and streamlined relationships aren’t very satisfying. While
finding the proper level is sometimes difficult, mutual involvement that gives us something to
care about other than ourselves is absolutely essential to healthy relationship. In the beginning
everyone knows everyone else on a personal basis and can see and feel the stress and frustration,
contentment and happiness in each other. People talk to each other and collaborate on problems.
At some point departments emerge and managers are employed. This is a watershed moment. If
the managers fear one another they will create boundaries around their departments and insist
that communication between departments flows through them. This doesn’t happen suddenly or
all at once, the situation evolves over weeks and months. Once it begins, however, it is very hard
to go back.
For data management, this means an increasing reliance on data flowing through the organiza-
tion. It requires additional effort to coax the necessary out of all the parties to a relationship. It
means that our relationship-based processes must be flexible and must have the ability to stretch
without breaking. Sometimes we can recover from a broken process or relationship (when times
are good) while at other times a broken process can snowball into a situation that is not recover-
able and leads to loss of customers, revenues, credit, or perhaps even business failure and
bankruptcy.

The Larger Corporate (Enterprise) Scale


The larger and more diversified the corporation, the more complex its problems—and the more
simplistic5 its relationships. For the data architect, teasing out relationship-based processes,
expectations and intentions can be impossible within the constraints of project timelines and
budgets. More and more today we see that “social media” is taking the place of relationship.

5
Please note: simplistic is a perception as opposed to simple which is obtained by comparison. Relationships in large organizations tend to be
seen through the filter of the org chart which hides the truth of their complexity and their importance.
The fact that this seems to be acceptable is evidence of the degree to which relationships have
been “simplified.” For example, the friend relationship, in real life a constantly changing and
infinitely complex human interaction, has become a simple link in the social media world. It
isn’t possible (or even useful) to blame social media for the problem though. Recall Ford’s
quality improvement program in the 80’s. Ford had seen the quality advances made by the
Japanese auto makers and noted a preference among potential customers for Ford cars with
Japanese components. A key component was the Quality Circle. (Ishikawa, 1981) The Quality
Circle was an attempt to recreate lost relationship in order to solve specific problems. A side
effect that was noted but never really measured was the reduction or elimination of suboptimi-
zation. When we can actually see someone else suffering we tend not to make our own life
easier at their expense.
Ford has its share of troubles in getting the Quality Circles to work because of the level of
mistrust that had grown up during the period when actual relationship was absent. Another
possible reason for the difficulty was that the goal (defined by management) was improved
quality. Instead of focusing on the relationships and allowing them influence quality they
attempted to use a quality focus motivate better relationship. At any rate, what had worked well
in Japan was more difficult to implement in the US.
We might productively ask why the quality circle concept worked in Japan but not in the US.
Since all of the research seems to be management-based rather than social- or relationship-based
we might find it difficult 30 years later to replicate the scenario. An anecdote from my own
experience may offer some insight.
In conversing with and application manager concerning the meaning and value of some pieces of
data that came out of that application, I was told that at any point in time the data was unreliable
but that over time it was reliable. I must have looked confused because an explanation was
offered in the form of a little story. At some time before, this manager had been doing data entry
with their staff and had received a batch of forms that were filled out incorrectly. They had
simply entered the incorrect data into the system, knowing that it would create work for someone
else. This action was rationalized with the remark that they were expected to clear (enter) the
forms by a certain time each day and could not take the time to investigate and correct the
discrepancies.
The department that had sent the forms had no process to correct them after the fact since the
patient was no longer present. The department that received the data after entry was accustomed
to getting incorrect data and had several processes and tools for clearing up discrepancies. “So
you see,” I was told, “we have processes for making the data correct but we don’t know exactly
when it becomes correct.” They were able to rely on its correctness eventually (for example
when the billing was sent out to the insurance company) but even then incorrect data was a fact
of life and everyone downstream had processes for correcting it.
The bottom line was that the idea of consistently reliable data was a non-starter. If this story
makes sense to you and you’re OK with the bottom line, then you can stop reading here. If you
begin to glimpse the cost of poor data quality here then you will definitely want to continue.
Because the data resource is an essential component to every process, its management becomes
visible to almost everyone. The only reason to apply the restriction “almost” is because in the
trenches data management problems often look like something else. Some will get blank looks
and scratch their heads when they encounter the term but if we can get them together and endure
the initial finger-pointing and accusations, we will demonstrate to their satisfaction that data
management—or its lack—is the real culprit.
Because of the incomprehensible complexity of the data management environment, effective
data management will become a question of priorities. In this world, the big picture is all the
more important because it will not be possible to advance at the same rate or even according to a
comprehensive plan across the entire enterprise. It will become absolutely critical to advance in
areas of opportunity without causing undue stress on blocked or currently stable areas.
Data management becomes a symphony in which the equivalent of strings, woodwinds, brass
and percussion receive planning and direction that is appropriate both individually and as a
whole.

The Big Picture


A big picture perspective is very helpful when discussing and planning data management.
Imagine for example that you live in Lodi, Ohio, and your goal is to visit the west coast branch
of the family in Lodi, California. You do your research and create a plan consistent with the
time and resources available. The strategic goal determines the parameters of the plan. Of
course, when you begin to execute the plan, nothing looks quite the way it did in Google or
Orbitz.
Fortunately, you have the big picture so that when an airport is closed, a flight is cancelled, or a
bridge is out you can still make progress in the general direction of the goal and improve the
chances of eventual success while you wait for things to sort themselves out. Without that big
picture—a basic knowledge of geography, airline operations, the highway system—you would
have no choice but to stop and generate a new plan or return home and wait for conditions to
improve.
Data Management context

Below is a diagram showing the major components of Data Management and their inter-
relationships. We can move from one of these component functions to another and be very
productive if we keep in mind that they are symbiotic. In case “symbiotic” causes any
confusion, I mean that when one is healthy, all can be healthy and when one is enfeebled, none
can achieve its full potential. Each produces something that the others need and each needs
things generated by the others. To be perfectly clear, effective data management is a goal that
will not be attained until we have attained some level of competency in all of the symbiotic
disciplines.
The widest scope addressed in our big picture is summarized in the box at the top left of the
diagram. Everything else is rooted in the recognition that data and information and their
associated processes are resources in the same way that financial capital or skills and experience
are resources.
We will walk through this diagram and examine each function individually. Note that the small
box in the upper left provides the context for the diagram. “Data as resource” is an underlying
assumption without which we would find it difficult to justify any investment at all..
Please note, also, that these are functions and not projects. They are analogous to sales or
accounting. By extension, Data Management is likewise not a project but a function. Functions
are continuous. You’ll never be able to have a Data Management Wrap Party.
We can improve our abilities to do these functions and thereby improve the quality of the
product. If we like parties, we can set goals for improvement and celebrate the achievement of
those goals.

Process Management
We’ll start with Process Management since process is something with which most are already
familiar. Note that Process Management is an enabler for all of the other functions. If we can’t
adequately manage our processes, then we will not be able to
 create and maintain a [data] warehouse
 make use of an information infrastructure
 insure data quality
 govern
 make use of meta data to do the above
The diagram below shows how Process Management may be decomposed into more manageable
components. These components, and the components of the other data management functions,
are processes.
Process management is a foundation for
governance activities. Eventually, data
governance can be subsumed into process
Process
management. Virtually all activities that we
Management currently think of as data governance are
really part of process management.
I.S./I.T. processes MUST BE INCLUDED.

We need to collect some meta data about our processes. SIPO


information is the minimum subset of meta data. Eventually
we’ll need to know about process stewardship. Initially we may
Inventory find that little or no information about processes exists. In a
worst case scenario, we find that “standard” processes have
been documented but there is no evidence that they are being
used. This is where we bring in the Compliance Officer.

Processes are just as much in need of stewardship as


data. Stewardship teams can be readily identified from
the SIPO information. A stewardship team for a process
must include representation from the upstream and
Stewardship
downstream processes. This is similar to the “Quality
Circles” formed at Ford in the 80's. Good leverage for SPC
efforts such as Six Sigma or Lean. Input/output quality
is a hook into Data Quality.

Each process must have associated KPI’s.


Improvement of large scale processes will demand
that we understand their component processes. No
Measurement
understanding without measurement. We’ll want to
institute a measurement and reporting program
including dashboards for each supervisor/lead.

Information gathered or created in Process Management will be invaluable to:


oSOA initiative
oIdentification of stakeholders for other efforts
Information
oIdentification of additional perspectives
Capture
oMeta Data efforts
oCompliance and Governance
oQuality Improvement initiatives

The definitions of the processes are not as important at this point as is the idea that a function is
composed of processes and that it is the processes themselves that make the function useful.
Imagine, for example, that the Accounting or Bookkeeping function in your business was
defined only by its name. You may have experienced this in fact. Many start-ups have nothing
more than a title to give direction to activities concerned with keeping track of income and
expenses. Often the owner tries to do this
A process is a sequence of activities that is repeated. For example, a recipe represents a process.
It’s worth noting here that a checklist IS NOT a process. A box on a checklist indicates
completion of some process. Completion of a checklist makes the pre-flight process an auditable
one. A pilot’s preflight checklist doesn’t specify how to establish that the aileron functions—
only that it’s operation has been verified. For this reason, a checklist is only as good as the
discipline of the person who uses it. It’s obvious that if the completed preflight checklist is
found at the crash site an investigator would have to begin by learning something about the
pilot’s nature, the circumstances during the preflight, and what kind of training he or she had
received—this before any investigation into the aircraft itself.
Processes are frequently represented as process flow charts and these charts (or diagrams) are
used both to document the process for management purposes and to educate or train those who
are to execute the process. The diagram is especially useful when a process is executed
infrequently (as a recipe).
The art of Data Management is the ability to juggle data and process while walking a tightrope of
resource constraints over a torrent of business needs and market fluctuations.
It will be necessary to create new process as well as to examine and repair existing ones. At first
glance this seems an easy task—after all, we say, processes are merely what we do. Nothing
could be further from the truth and if there is one key thing to take away from this book it is this:

Principle: Your processes are the heart of your business and you are
about to do open heart surgery.
There is a corollary to this admonition and it is so closely linked as to be inseparable. We can
replace the word process with the word data and nothing will be lost in terms of truth or
importance or urgency. A picky person might replace heart with blood in order to preserve the

Corollary: Your data are the lifeblood of your business and you are
about to do a transfusion.
metaphor, but the bottom line is that one has no meaning without the other.
Of course any business owner or executive will say that profits (or at least a reliable revenue
stream) are the lifeblood of a business and that is also true. This book is concerned in large part
with demonstrating the relationship between process/data and profit. Improvement in data
management will yield improvement in profits.
The heart/blood metaphor can be extended. Great care must be taken when doing a blood
transfusion in order to avoid killing the recipient by mixing blood of different types. Mixing
data of different types or different meaning (semantics) can be just as deadly to your business.
The death won’t be sudden, it will be gradual as sub-systems bog down and become increasingly
expensive. Often there is a human cost as managers and employees are held responsible for the
cost increases, over which they have absolutely no control. Frustration and turnover increase.

Governance
Governance Components

Governance is the set of processes by which


Governance we guarantee consistency in all thing related
to data.

Similar to definitions, the


relationships between data
sources, data items and even
DBA is the first and most
Database processes must be thoroughly
critical capability to be Relationships
Administration understood and documented in
developed.
such a way as to make the
information available for review
and use.

As governance matures, a set


of councils or standing Quality in terms of meta-
committees will emerge to information, processes and the
Council(s) Quality
assume responsibility and actual data values must be
authority for various governance addressed everywhere.
activities

We must accumulate
definitions but the heavy We will define an overall
lifting is in creating a architecture to allow
Definitions Architecture
mechanism for making governance mechanisms to be
the definitions available inserted where needed.
for review and use.

We must formalize our life cycle


Governance activities
from development to test to
must be extended into
Life Cycle production and maintenance.
Vendors the structures and Architecture Testing structures can be
processes of our vendor
standardized and made less
partners.
labor-intensive.

The process of agreeing on/


standardizing terminology definitions
and usage for the purpose of assuring
good communication between people
The governance initiative must
and systems.
Compliance fit into the overall compliance
Stewardship
Auditing architecture and will actively
A stewardship initiative is essential to
facilitate compliance activities.
the success of a governance program.
The steward would be someone with a
close relationship to the vendor or an
originator of data values.

Governance (frequently called Data Governance or DG) is the function that is all about
consistency. Think of it as the role of government. We want and need consistency and
predictability in our lives and businesses need those properties to an even greater extent than
individuals. In the midst of chaos (complete absence of consistency) planning is not even
possible. We are restricted to reacting to the most recent stimulus. Common functions can’t be
implemented because they rely on planning. The best we can do is to make resources available
and hope they are used. Waste is a given when consistency is absent.
It is quite common when business leaders speak of governance to have them point to an
organization chart as though such a chart embodies governance. It does not. We should always
bear in mind that governance is essential in any community if we don’t want the alternative. The
alternative is anarchy or chaos in which might makes right.
Governance comes in many forms, some of which are indistinguishable from might makes right
(or anarchy). The one most often practiced in corporations is a feudal model in which a
“nobility” holds absolute power and “rules” (governs) by fiat. This is what is meant when org
chart is equated with governance.
This isn’t meant to be a treatise on forms of governance so let’s look at some ways of handling a
subset of governance, that is data governance, within the bounds of the prevailing form of
overall governance. One thing we have to recognize is that consistency is the goal. Everyone
will agree that they can adapt to virtually anything that is predictable. No matter how much we
may want to be adaptable though, adaptation is impossible in an environment of uncertainty.
Attempting to conform to uncertainty leads to unbearable stress and all of its side effects.
People—even our best employees or managers—will adapt to unbearable stress by ignoring or
tuning out that which is perceived as causing the stress. The most conscientious employees will
not be able to do that and will simply shut down or leave. What this means for the person
responsible for data governance is that consistency efforts will be created or imposed by the will
of the “noble” closest to the process. This, in turn, requires that the data management function
hold the biggest of big picture perspectives since it will be necessary to ask of a given manager
or supervisor only things that make sense to them and do them no harm. At the same time, all
such requests must be orchestrated in such a way that overall consistency of data creation, use
and management processes is achieved and maintained.
To all of you Data Governance people who think that DG is about creating and enforcing
standards, I remind you that History is chock full of lessons about trying to apply the rule of law
(that is what you are trying to do) in a feudal system. Even monarchs have been unable to make
it happen (Tuchman, 1978). Rule of Law is only possible when there is faith that my life will be
improved under the law. Where this faith is lacking, there is no rule of law and there are no
standard processes or auditable results.

Information Infrastructure
Infrastructure is a familiar concept. In the technology world it most often refers to the servers
(processing capacity), storage (disk farms) and networking (with or without cables)—in other
words, the hardware, used in our enterprise. From a data management perspective, the hardware
is only part of the infrastructure and it all falls within Automation.
Infrastructure should be thought of as the enabling or foundational components of a function.
Such components must be accessible on an as-needed basis similar to the water supply (or
electricity or sewer) in your home or to the road system in your community. When infrastructure
fails it is often viewed as catastrophic.
Remember that all of this is functioning in the feudal system that is corporate governance today.
We must be constantly marketing (eliciting support) to each manager (and supervisor) with
targeted efforts recognizing the unique accountabilities and constraints that define their world.
Monday, June 10, 2013
Information Infrastructure Created Fri., Oct. 10, 2008

Michael Meier Data Management


This is an enabling activity
that defines, creates and
Information maintains the roads, utilities
Infrastructure and sewers needed to make
the other DM activities
viable.

Define the roadmap from


source to transport to Plan the implementation.
Architecture Plan
warehousing to repackaging Planning is continuous.
to delivery.

Without significant
Define the capabilities that
automation, we cannot hope
will be needed to implement
to keep the data shelves
Capabilities and maintain the Automation
stocked. Continuously
infrastructure. Monitor
monitor for new
continuously for adequacy.
opportunities.

We will need to bring the


entire organization on board
in order to motivate
governance activities. We
must also reach out to
Marketing
vendors in order to gain
alignment from top to
bottom and wall to wall.
WIIFM must be addressed
for all customer segments.

Information infrastructure

Planning must be constant as those accountabilities and constraints change. Even if one—or a
handful—remain constant, the overall picture is like the sea or the atmosphere. Change is
constant and is the only reliable or predictable property within our big picture.

Metadata Management
No surprise that Data Management carries its own data management burden. This burden is
known as metadata management. Metadata is most often defined as data about data. How
appropriate that we, ourselves must practice what we preach.
Page 5
Everything that we ask of any other function in the enterprise, we must already have asked of
Metadata must be acknowledged
and managed in accordance with Metadata management
Meta Data the data it describes. This may
mean separate facilities and
synchronization processes.

Similar to definitions, the


We must accumulate relationships between data
definitions but the heavy sources, data items and
lifting is in creating a even processes must be
Definitions Relationships
mechanism for making thoroughly understood and
the definitions available documented in such a way
for review and use. as to make the information
available for review and use.

This is a hook into


Warehouse capability must
Governance. Meta-data
be devoted to meta-data.
must be consistently defined
The meta-data must be
Meta Data Consistency and and reliably available.
Warehouse
accessible to the consumers Reliability Process Management (also
and will be used in
part of Governance (and
formulating warehouse
Compliance) is the critical
dimension and queries.
capability.

We must plan for, acquire


Vendors will be a significant and manage new tools and
Vendor source of various meta-data Tools and additional capacity (mass
Management regarding their (source) Capacity storage and processing) in
systems. order to manage this new
resource.

We must document a plan


We must create new policy
for bringing meta-data into
and standard procedures,
the mainstream of DM and
Policy and set up governance
Plan data warehousing. Planning
Procedure mechanisms and provide
is continuous as other
assistance for originators of
processes within Data
meta-data.
Management change.

ourselves. We will be the pilot for all processes, standards, and tools. We will validate all
training, measurement, and reporting.
Because Data Management is so outward-focused, this will be the most difficult of all DM
functions to implement.

Data Quality
Consistency is the essence of quality and is the outcome of good management practices (read
processes).
One of the most important overlooked components is remediation. When we’re reporting some
Quality data must be a Data quality
product of Data
Management, The
Data Quality
assurance of quality is a
central reason for doing
data management.

What do we have? How


We can’t rely on manual
good is it? How complete?
inspection to discover data
What resources must be
Metrics and quality issues. Once
Documentation tapped to bring it to the Monitoring Improvement begins, we
required level? Inventory
need to ensure that we are
and assessment is
holding the gains.
continuous.

Overlaps governance.
Overlaps with Meta Data Without designed and
and Governance. documented processes,
Process
Standards Consistency is the goal Management
there is no framework for
without which there can be applying standards.
no improvement. Processes are designed
around the standards.

Employees must become


knowledgeable at an
Subject areas will be the appropriate level concerning
basis for stewardship and Education, the impact of poor data
Subject Area
the context within which Training, quality. Function-specific
Stewardship
documentation (meta data) Competency training should be
will be created. established together with
competency tools and
processes.

What will we do with poor


Quality Develop staff skilled at quality data? Can it be
Assessment recognizing and analyzing Remediation improved? How will we
Training data quality issues. account for it in our
business processes?

What does bad data cost


us? What will elimination of
quality issues cost? What
will it cost to hold the gains?
Cost and ROI Look beyond the data itself
for process and product
costs, improved profitability,
intangibles such as
attitudinal changes...

metrics or statistics (let’s hope that those are synonyms), we often spend a fair amount of time
looking for suspect values so that we can exclude them from our report. We often fail to
recognize that those values should be reflected in the report—just under a different heading. For
example, suppose we need a report showing how much time (on average) was spent with patients
by each physician in a clinic and we want to report by month to even out fluctuations caused by
time off, etc. The first thing we realize is that we can’t answer the central question because the
actual time that the physician was in the exam room is not recorded.
We get the OK to use a standard time for a visit based on the coding for the visit (simple,
complex, new patient, physical…). Based on this we then find that we can’t generate the report
for the prior month until after the 15th of the month because that is the deadline for the coders.
We note that this is not a guarantee but merely a goal and that it is likely that a percentage of
visits will be uncoded when the report is run.
How much of this discovered information belongs on the report? How will missing information
be accounted for on the report? How many disclaimers can be included without damaging the
perception of reliable information that we work so hard to create?
Now imagine that, instead of time, we want to track blood pressure by billing code. If you aren’t
involved in health care, you may be asking, “what does a billing code have to do with health?”
Remember what we said in the governance section. Consistency by fiat depends on making it
worthwhile to comply with a particular process. As it turns out, one of the few common
denominators in health care is compensation (sometimes known as money). In order to receive
compensation for a visit, the visit must be coded because that’s how insurance companies
determine what the visit is worth. In order to get the visit coded at the appropriate level, the
physician must supply enough information for the coder to make a determination.
Because of this, coding information gets more attention than most other kinds of information
about the visit, it has been the focus of attention longer, and it is the most reliable. Please note
that we have not yet addressed the quality of the blood pressure data. Now that you understand
why billing codes are so important in the reporting of health data, we continue.
When we try to average blood pressures, we find that the report won’t run. It terminates
unexpectedly with a data type error. As an investigative tool, we run a profile on the systolic and
diastolic fields in our Visit table and learn that there are many values in those fields that could
not possibly be valid. We know that these values should be integers (counting numbers with no
fractional parts. Yet we see decimal fractions (measuring rather than counting numbers). How
can this be? Further investigation shows that the fields are typed (stored in the database) as
character strings. Typing is what tells the computer how to handle the values of a field. Certain
operations are possible on text fields but not possible on number fields.
Well, no problem, we can always do an explicit conversion of the text field to a number field
before attempting the calculation of the average. But wait, there is no conversion to numeric
(text to number). We have to convert to either an integer (counting) type or a floating point6
(measuring) type and we have both present. Now the question is how do floating point numbers
get into a box that should contain only integers? To answer this there are two paths to follow.
Programming is far less expensive the less it concerns itself with what people actually do. If the
concern is only what the system should do, the developer only has to talk to non-technical people
about specifications. If we have to worry also about how different people will use the system,
the cost doubles (at a minimum). As it happens, the nurse enters the vital signs, among which is
blood pressure, from notes they have taken and they typically don’t look at the screen when
doing so. A missing or extra <tab> puts the cursor in a box designated for systolic pressure
when they are entering temperature.
This should be fixable though—right? The short answer is, No. Recall again that consistency
only happens when the important person feels a benefit. The physician looking at the screen

6
In mathematics these are called real numbers.
frequently didn’t even notice the error. They were looking for a set of values and their mind
immediately put what they were seeing into an appropriate template for “Vitals.” When it is
suggested that this kind of error causes problems for someone else the response may well be that
they don’t want to mess with a process that gets them the vitals that they want.
Now what can be done in terms of remediation? We have to spend a lot more on our report
because we need to filter out everything that can’t possibly be a blood pressure value. We can’t
go looking somewhere else for that blood pressure since we aren’t allowed to make assumptions
where a patient’s health information is concerned—that’s a good rule by the way.
So now we have to include a disclaimer that the displayed average is not the true average since n
visits had no usable blood pressure values. When you visit with the nurses you should be
prepared for some pushback, too. They’ll insist that they took and recorded the BP—and they’ll
be correct. The fact that it didn’t get into the appropriate field in a database will be largely
irrelevant to them. And whose problem is it, anyway? The physician’s? The nurse’s? The
programmer’s? The business analyst’s? The data steward’s? Data Quality’s? Data
Governance’s? Data Management’s? Here is an orphan. An issue without a problem or an
owner. The usual course in these situations is, “You saw the problem—it’s yours.” So the
unfortunate report writer has some big decisions to make. What is the probability that the
resulting report will be viewed as a high quality product? Of course we can increase that
probability if we make the report look like there are no problems.
This is what a data quality issue looks like in practice. This example is from real life and is used
because it involves all of the normal suspects. Perhaps you’ll understand better now why your
report requests take so long to turn around. You’ll find that your DQ problems will involve the
same suspects
 human tendencies
 economics
 loose process
 mis-filed or mis-typed information
To put a bow on this particular case study, dramatic improvement was realized only after
 financial incentives became available from insurers for consistency in gathering vitals
during a patient visit
 tools were provided for nursing supervisors and managers to see mis-filed values in real
time
 nursing managers and supervisors followed up with the same tools to insure that the
problem was fixed
It is clear that technology could not be held responsible for the problem. It could be said that
when the patient’s chart was on paper there was no problem so, in that sense, technology is
precisely the problem. That path is a dead end given that the healthcare business is going to
depend on databases to meet the needs of patients and no one is going to be able to go back to
paper. Where does the problem lie? It lies in the tendency for all of us to simplify and
streamline wherever possible. And the beneficiary of that simplification and streamlining is
never “we” or “us.” The beneficiary is always “me.” This is the picture of suboptimization.
Two Sides of a Coin
You may have inferred at this point that data management is equal parts data and process. If so,
bravo! The implication is intentional and the inference is, therefore, intended. Data and process
are two sides of the same coin. Every process both consumes and produces data and every bit of
data comes from some process.
You may feel that this is pummeling an already deceased equine but be assured—there is no
possibility of over-emphasizing the importance of these concepts. Bitter experience has shown
over and over again that failure to grasp the essential relationship between data and process is the
cause of virtually all the problems that we see in initiatives such as process improvement, data
quality, data governance and so on.
You can’t depend on your I.T. department (if you have one) to help you out of this or even to
adequately explain it to you. I.T. has yet to realize after all these years that implementing a
development methodology is a data management problem. That is to say that it is a process
management problem.
The next time your CIO or IT Director is pressing for process management or a data quality
initiative or governance, you might casually ask how it has worked for her. If you really want to
turn the screws, ask for evidence of improvement in the form of data. This is not to question
motives but to test commitment. We have seen many attempts at governance or management
improvement fail because this foundation was compromised though trivialization, inattention or
simply self-delusion.
A typical approach might contain these steps or phases:
1. What isn’t working?
2. Why? What is causing sub-optimal performance?
3. What should we do? What changes are needed?
4. Implement the changes.
5. Measure the improvement.
Invariably—that is, 100% of the time—we will stall on phase 4, lose momentum and the
initiative will die.
What is the cause of the stalling, you ask. Well, I’m glad you asked; the answer will vary from
investigator to investigator but the root cause is always a combination of:
 economics
 commitment (or lack of commitment)
 suboptimization
Agile development (Alliance, 2013) is a perfect example. Hold that thought, though, and we’ll
visit it in more detail a bit later.

About Quality
What is Quality?
Quality is a topic that generates animated and occasionally heated discussion whenever it is
raised. Zen and the Art of Motorcycle Mechanics [Pirsig] has spawned an international forum on
the metaphysics of quality (MOQ) and the discussion there is wide-ranging and includes
representation from all the major trajectories of philosophy.
Cultural

Absolute
(universal)
Beauty

Quality
Elegance Originality

Essence

Personal Comparative

Better of Best of

an attempt to bound discussion of quality

In the late 1970’s and 1980’s there was a groundswell of interest in quality worldwide. This
came about because Japanese manufacturing, since 1945 the poster child for “cheap junk,”
suddenly (or so it seemed) began to woo customers away from US manufacturers. Automobiles
and then electronics taught us to view “made in Japan” with new respect. It seemed that
consumers were willing to pay extra for something they couldn’t find in US-made goods—the
perception of quality. Deming became a household name. In 1987 the Malcolm Baldrige
National Quality Award Program was established by an act of Congress. The purpose was to
promote the quality of US companies, in no small part because of the growing reputation for
quality of Japanese manufacturers.
Dr. W. Edwards Deming had been brought to Japan as a census consultant by Gen. Douglas
MacArthur and he introduced the concepts of Statistical Process Control (SPC), first developed
at Bell Labs by Dr. Shewhart, to Japan (Deming, 1982). Since the end of WWII, Japan had been
trying to get its manufacturing sector going again but by the late 1950’s had succeeded only in
flooding world markets with what was mostly recognized as junk. “Made in Japan” signified in
1957 something that was cheaply made of inferior materials and destined within hours for the
trash.
Titles such as Quality is Free (Crosby, 1979) and Quality Without Tears (Crosby, 1984) made
“zero defects,” “do it right the first time (DIRTFT)” and “price of nonconformance” part of the
business lexicon. In Search of Excellence (Peters, 1987) did the same for “business process
overhead” and “management by wandering/walking around.”
Ford adopted “quality circles” under the slogan “Quality is Job One.” After a rocky start
accompanied by critical press and stories of employee dissatisfaction, Ford pushed SPC
(statistical process control) out to its vendors and cut materials costs and rework dramatically. In
a decade they went from Fix Or Repair Daily to a highly respected product identity that was
competing on a quality basis with the Japanese giants, Toyota and Honda.
It is worthy of note that, in the 25 years since the Malcolm Baldrige National Quality Program
was launched, only 90 companies have qualified for the award. This despite the fact that, ”Up to
18 awards are given annually across six eligibility categories: manufacturing, service, small
business, education, health care, and nonprofit.” (NIST) It would be interesting to test the
award’s name recognition today. Quality seems to have lost substantial luster in the past 25
years. Today (2013) we like to talk about quality but the will to address the root causes of
defective product is largely lacking.
Try initiating a conversation about quality over lunch. Unless your lunch partners are all from
the Quality Department, you are going to be nonplussed at the response. First, you’ll get a
chorus of negative remarks about Quality being a drag on production or engineering. Then,
when they realize you’re serious about having the conversation, you’ll get a lot of NOTHING.
They haven’t even thought about quality.
If you force the issue, you’ll hear about appearance, costs, machine limitations, raw materials,
customers…but it becomes quite clear that quality is not seen as something over which they have
control. Now what?
The trade press (whatever the industry) is salted with stories of quality debacles although they
are not always identified as such. When gas tanks rupture in collisions or when baggage is lost
or the wrong limb is amputated we recognize a problem. It is unfortunate that the problem is
most often labeled as an error or mistake. Why is this unfortunate? Again, I’m glad you asked.

Quality vs. Error


Some very interesting and revealing books have been published in the past 20 years on the topic
of error. Human Error (Reason, 1990), How We Know What Isn’t So (Gilovich, 1993), and Set
Phasers On Stun and Other True Tales of Design, Technology, and Human Error (Casey, 1998)
were published in the 1990s in response to events like Bhopal, Dutch ferries capsizing, oil
tankers running aground off the coast of France and Alaska, Three Mile Island and Chernobyl,
etc.
More recently we have seen new treatments coming off the presses in reaction to police
shootings, hostage rescue attempts, loss of life in tsunamis, freeway pileups and even fatalities
on Mount Everest. A central theme emerges as we read these books. People have always made,
are making, and will continue to make mistakes—often with fatal consequences for themselves
or others.7
The inevitability of human error, whether of perception, judgment, understanding or preparation,
is what makes implementation of effective quality programs so challenging. This challenge
perhaps reaches its pinnacle in the area of data or information products.
A quality program has the goal of predictable (reliable) consistency in product generation.
Please note that this is a modern definition, having replaced the time honored process of
inspection (e.g., “It isn’t Hanes until I say it’s Hanes”). Inspection was intended to produce one
of three results:
1. a product to which we can attach the company name without reservation
2. scrap, which transforms the cost of production of the item from the asset column to the
liability column
3. rework, which sends the product back for additional investment in cost of production

7
See Appendix A for a discussion of Errors, how to recognize them, correct them, remediate them
A Quality Function aims to eliminate #2 and #3. This is accomplished by formalizing the
production process and inserting measurement at critical points. Each of these measurements is
known as a KPI (Key Process Indicator). When one of these KPIs begins to trend “out of
control” (Deming, 1982), immediate action is taken to correct the problem. When we do this
right, we avoid all scrap and rework and produce only the kind of quality product we want to put
our name on.
When we say, “Oops!,” the focus is on remembering enough about the situation to be able to
avoid the mistake in the future. We all do this dozens of times a day. The problem, of course, is
that this is primarily an individual process. It also goes by the name Experience. As we know,
some people are better at learning from experience than others. Sometimes we try to preserve
and pass on experience through constructs like checklists. A smart pilot walks around the
aircraft with a pre-flight checklist in hand EVERY time the aircraft is about to leave the ground.
5000 (or 50,000) feet in the air is a really bad place to utter, “Oops!”
When, for example, lost baggage is viewed as a quality issue, then the organization will devote
itself (to the limit of available resources) to identifying the processes involved, making changes
in those processes, and training employees in the execution of the new processes.
On the other hand, if we view lost baggage incidents as employee issues (errors), we put up
posters calling all employees to greater vigilance. We humiliate employees by documenting
performance issues. We post the number of bags handled without a loss or the number of days
without a baggage issue. All of these are intended to motivate employees to avoid mistakes.
Deming’s Red Bead Experiment (CITEC, 2012) demonstrates very clearly the futility of
focusing on employee performance or motivation.
Note the “limit of available resources” loophole. One of the contributions of Peters and Crosby
was to make a financial case for investing in this kind of quality effort. The cost of non-
conformance is a concept that opened eyes and checkbooks to get beyond the error paradigm and
make a real difference.

Data and Quality


Data Quality has come to be recognized as a pervasive issue in the world today. The Inter-
national Association for Information and Data Quality (IAIDQ) has grown since its formation in
2004 to include members from all corners of the globe.
Tom Redman (Redman, 1992) and Larry English (English, 1999) began to treat DQ (data
quality) as something that needed special attention due to its foundational nature. They
recognized that many of the problems that plague organizations were, in fact, attributable to DQ
issues.
Without detracting from this focus, which was a new perspective (not so long ago known as a
paradigm shift) it should be understood that the relationship between information and success or
at least efficiency in business endeavors has always been a given. What was new was the idea
that we could (and should) improve our control of data quality in order to positively affect the
bottom line of business. In other words DQ became an investment for which a return could be
expected. Prior to this point, DQ was an individual rather than a corporate concern.
As a programmer in the early 80’s, I took pride in the fact that the output of my program was
uniformly usable. I would ensure that my program could recognize various input anomalies
(quality problems) and deal with them in ways that not only produced the specified output but
did so in a consistent and predictable manner. Of course I was usually surprised by new
anomalies that I had not considered and eventually began to treat the unexpected input as a
given. My goal became avoidance of a state known as ABEND (abort-end or abnormal end)
with an accompanying SYSDUMP which was equivalent to the computer vomiting up
everything in its storage (in hexadecimal representations of data bytes).
In the early days of “data processing”, a business didn’t have high expectations of software. The
purpose of software was simply to bridge the gap between a business need and the power of the
computing machine to do simple, repetitive tasks very quickly, consistently, and tirelessly. We
bought computer processing because it replaced the staff needed to deal with ever increasing
quantities of people and things. The mega corporations of today are made possible because of
these basic capabilities of computing. In turn, these capabilities were made widely available
because of software.
Of course, no business was ever satisfied with its current level of profit. In order to increase
profit, two paths are available. Obviously we can sell more product (assuming that each sale
produces profit8). The second path is cost reduction. Cost reduction itself can be accomplished
along several paths but one that is frequently used is staff reduction. If we can accomplish our
production operations and sales with fewer people, we can potentially save a substantial amount
of money and, with everything else remaining the same, that savings turns directly into profit.
It didn’t take long for the simple, straightforward, repetitive tasks to be handed over permanently
to the computer. The search for new costs to cut led to consideration of slightly less simple but
no less repetitive tasks and the people who performed them. Several technology generations
later, we are targeting pretty complex (though still heavily repetitive) tasks for automation.
Here’s where DQ comes into play.
Complex tasks require decision-making capability and decision-making demands better
information. The model is: Simple tasks need simple input and support simple decisions.
Complex tasks mean complex input for supporting complex decisions.
A key point here is that if I, as a programmer, can’t guess what I will receive next as input to a
well-defined and constrained process how much less likely will it be that I, as CIO (or CEO,
COO…) can predict what I will receive as input. Top management receive big salaries for being
able to produce lemonade from whatever lemons they are handed. Unfortunately a corporation
requires many people in many functions on many levels to finally generate that lemonade.
Getting the lemonade requires that all of those people, functions and levels and their processes
be aligned. If your business requires that sometimes you will have to generate something other
than lemonade, alignment becomes even more critical.
Problems with data quality are not caused by technology and they won’t be cured by technology
although, as with every other problem domain that is big and complex, technology can play a

8
There is a joke about a business that set out to increase market share by means of price reductions. The rationale was that, “We’ll lose a little
on each sale but we’ll make it up in volume.” This is a long-term joke but a short term “strategy.”
role due to its capacity for executing millions of process steps in the blink of an eye (a non-
technical term that can be translated as milli- micro- or even nano-seconds).
Having consumed the ideas presented in this book, you will recognize that you are not in control,
have never been in control and will never be in control. You will realize that control is not the
goal and is not necessary. Further, you will understand that you have in your hands right now all
the quality you can use and that when you can use better quality it will be there for you.
You will have absorbed many insights that will aid you as you seek to make your way in the
overlapping worlds of business and technology. You will be a more relaxed, peaceful and
productive person.

A Goal
The original impetus for this book came from the field of data and information quality. There
was a notion that principles of science and mathematics could be applied to data in the following
way:
We desire a definition of data/information quality that will facilitate the creation of automated
algorithms for the purpose of assigning a scalar value to the quality of an arbitrary data item or
data set. Assignment of a scalar “quality value” would allow anyone to choose between
equivalent data sources based on quality.
In simpler terms, we would like to be able to measure the quality of data so that we can compare
one with another.
Such a definition would also allow those who control such data sources to better market and to
receive appropriate compensation for the use of those sources having a greater quality value.
The initial purpose of this investigation was to determine whether such a definition is possible.
Many of us felt the need to establish inflexible boundaries around data quality for several
reasons.
 To increase the level of productivity in the field which, like many technology-related
disciplines, was in process of splintering into ever smaller sub-fields
 To be able to apply technology to the process of creating and preserving quality
 To provide commonly understood paths for research and scholarly efforts to create
deeper foundations for business applications

The Core of the Problem


Mankind has always struggled to grasp the complexity of the world around them. No matter
how much we learn, with that learning comes the understanding that there is as much yet that
remains to be learned. We have developed rules of thumb, heuristics and algorithms designed to
provide a manageable window into the complexity in which we live and work. We are (or
should be) always alert to the possibility that a rule of thumb can’t be applied “in this case.” It is
usually an unhappy event when we use one of these rules only to realize that we missed
something. Some rules are more universal than others. Measure twice, cut once can save
cutting, backtracking and waste in nearly every circumstance—except when there is no time or
opportunity to measure at all, let alone twice.
All of these shortcut methods leave something to be desired. Oh, we have achieved astounding
levels of success in creating things that work and work well. They are useful, reliable, extensible
and expensive—very, very expensive.
When it comes to business systems, expense becomes a hard limit. Businesses do not have the
same ability to allocate and spend that governments do (although some business consortia rival
all but the largest national economies in their ability to envision and implement complex systems
on a large scale).
The way these consortia work provides object lessons into the conquest of complexity. The first
principle is one recognized by Archimedes a few thousand years ago. “Give me a place to stand
and with a lever I will move the whole world.” What man can envision, he can do (given the
necessary resources). Someone experiences a vision, then recruits the resources needed to make
it a reality. The resources may be expertise or money. Often an ownership interest is part of the
inducement.
Sometimes these efforts fail, though we don’t hear as much about those. In these cases one or
another key resource is neglected or missed entirely. Sometimes enormous effort and sums of
money may be expended before it becomes apparent that the resources or the will available are
insufficient to get to the finish line.

The Lesson(s)
The lesson to be learned is summed up in another aphorism that is found in every language:
If I had only known…
This will be the lament of one who is experiencing a result that was not foreseen. We all believe,
and for good reason, that the more information we have and the more reliable it is, the better our
decisions will be. Better decisions correlate strongly with hypothetical, or predicted, results.
When we get the result we hoped for, we claim to have made a good decision.
It is not as apparent but even more important to understand that results are but one way to judge
the quality of a decision. The legal system in the U.S. makes use of the standard of “the
reasonable person” to assist in evaluating the quality of a decision. A decision is “good” or
defensible if a hypothetical reasonable person in possession of the same information and in the
same circumstance would decide in the same way.
You may be thinking that it’s obvious that we don’t know your Board of Directors or your boss
(or your spouse). “Aye, there’s the rub” was written for Hamlet by Shakespeare in acknow-
ledgement of the effect of the unknown. If your job or career depends on the result of the
decisions you make, you will feel acute pressure as you contemplate a decision.
We agree that mistakes should be celebrated as a tool of learning while simultaneously being
ever eager to publicly flog (or worse) those whose decision has negatively impacted our life. In
fact, we don’t even need a demonstrable negative impact—the perception, or even the belief is
sufficient.
All of this—fear of the impact of the unknown, fear of being held accountable for a decision with
resulting loss of credibility or status or power, and fear of the trustworthiness of the information
we have available—is what leads to today’s concern for DQ (data quality).
You Are Here: Dimensions of Data Quality
Bear with me for just a bit as we get a bit more into mathematical concepts than will be
comfortable for most. The goal is simply to paint a picture to show the cause of our difficulties
in bringing data quality under control.
Things that can be measured are typically measured along accepted dimensions and these
dimensions are standardized so that, for example, if I say that a box measured 3x4x5 anyone can
imagine not only the basic shape (length, width, height) of the box but its relative size. Of
course we need to specify units for our measurements to form the complete picture.
An initial approach then would consist in identifying dimensions of data quality that might be
measured. Both the dimensions and the units of measure would have to be consistent across all
types of data or information so that comparisons could be made.
As part of our investigation of data quality and our attempt to define exactly what it is, we should
start by considering how to pin it down within the context of our universe. You may have heard
that some of the most recent theories into the nature of the universe have concluded that there are
not four dimensions but 19 or so. When we try to pin down data quality, it is easy to understand
the need for more than four dimensions.
What at first seems to be a trivial task (I lost money or opportunity because of this data) becomes
complex very quickly when we try to stop this kind of loss from happening in the future. As you
know from your studies of arithmetic in elementary school, we are able to quantify things by
assigning their quantity to a number line which is a simple scale that is a sequence of integer (or
scalar) values originating at zero or nil and continuing for an interval equivalent to the quantity
or count (imagine a ruler with marks only at inch intervals). Let’s imagine that we can precisely
quantify data quality—that is our aim after all.
How many scalars—but let’s make them vectors9 because we know that they are different and
don’t all run along sequences of integers—will it take? What are the ordered sets that define
them? In other words, can we define the dimensions of data quality?
Discrete pieces of data cannot be said to possess quality
A piece of data is like a speck of dust. All that can be said about it is, “There it is.” It is not until
these specks begin to aggregate that properties (including credibility or quality) begin to emerge.
I grant that those miracle workers of qualitative analysis on CSI (whether Las Vegas, Miami or
New York City) solve television crimes every week by deducing context from virtual specks of
dust. Realize that those analytical results are possible only by aggregating all of the discrete
specs of dust into a context that makes sense and points to a culprit. Nevertheless, if we come
across the speck of data, “red,” while we can bound it by discovering its definition and potential
applications, we can’t possibly assign to it anything remotely resembling a quality value.

9
A vector includes direction as well as quantity or measurement. Vectors are very useful because we can add them together to get a result that is
also a vector. A vector sum (or product—they can also be multiplied) can tell us a lot about what is actually happening. Imagine a vector
representing the force that moves your car. Now imagine another vector which represents the forces opposing that movement (friction, braking
force, wind resistance…). Since these two vectors progress in opposite directions, their sum is the difference in their magnitudes. Of course we
need to get the units the same so we would convert horsepower to joules (jōōlz—a popular unit for engineers and physicists). When the opposing
force is greater than force causing the motion, we come to a stop and it is possible to determine how long the stopping process will take.
Even when we can associate it without any ambiguity to another piece of data, “wagon,” it is still
extremely risky to assert that [red, wagon] has any particular meaning let alone quality of
meaning. Another way of looking at this is that a word has meaning(s) and those meanings can
be altered by associating other words. A word alone is completely dependent on language and
culture.

This ideogram may be readily understood by a person versed in the Chinese language and culture
but it carries virtually no meaning for anyone who doesn’t read Chinese.
As we continue to aggregate new pieces of data to what we already have, we build a framework
or context within which it becomes possible to make judgments with respect to quality. We may,
for example, be able to say whether this context (aggregation of data pieces) has physical
properties or is a virtual construct. We might suggest uses or applications of this context and we
could begin to develop expectations around those uses.

Contextual Consistency (the Collection)


While a discrete piece of data (or information) cannot be said to have (or lack) quality, a set of
data, like any collection, derives quality from its consistency. Consistency in this case means
that each member of the set/collection possesses all of the properties that define the set.
When we set out to collect something—to create a collection—we begin by stating what it is that
we are going to collect. As we come across new things, we apply our definition to determine
whether the new thing belongs in our collection or not. This definition of context becomes the
basis for all future discussion of quality with respect to the collection.
The more rigorously defined the context, the more valuable the collection. If I advertise a
collection of first-day covers from South America, I may be able to obtain a monetary value for
it. If the collection is advertised as a complete collection of 20th century South American first-
day covers, the monetary value should be greater. Now, if it is found that my collection is
inconsistent with its definition because, for example, it includes issues from Mexico, Guatemala
and Sierra Leone, then we may well expect its value to decrease because the very premise of the
collection is in doubt.
The logical inference to be drawn from this is that we should examine the defining properties for
the collection rather than the individual members of the collection to determine whether they
satisfy our need for quality. To put it another way, it makes sense to talk about the quality of
members only in the context of the defining principles.
One dimension that we must consider then is the nature of the collection. What are the
parameters for the collection? Quantity of members? No problems, a simple scalar. Of course it
may be claimed that if quantity is our only property, then we don’t have a collection at all but a
heap (think junk drawer). At any rate, once we get beyond counting things we do run into
problems.
Fitness for Purpose
There is a lot of interest in making data quality about fitness for purpose. A particularly good
expression of this is
Data may be said to have quality when it minimizes uncertainty in the decisions to which it is
applied.
If the decision is stated as, “How many books in the library were written by F. Scott Fitzgerald?”
and our data consists of a list of all the books in the library and their authors, then we can
confidently answer the question (make the decision). If, however, we have books for which the
author information is unknown or missing or if the count of records in our collection is not the
same as the count of books in our library then our decision is more complicated and contains
uncertainty. We might describe (a description or definition is a good thing, right?) our data
collection as “a list of volumes in [my library] including title, author, publisher, publication-
date”.
Also part of the quality assessment process is the action of change and entropy. In the same way
that the physical landscape may change as the result of either erosion or development, our
confidence in our collection may change due to outside influences. For example, if the question
handed to us is, “How many pieces in the library were written by F. Scott Fitzgerald?” and we
deliver the same answer as before, the very next question will be, “Does this include articles and
anthology pieces?” At this point we have two choices and both Scylla and Charybdis have sharp
teeth and a long reach.
Choice One is to stand on our collection as is, retaining the definition as already stated.
Choice Two is to change our collection to include anything in print that we happen to own.
Notice in the latter case that the record structure may remain unchanged but the nature of the
collection has changed. As we include articles, letters, short stories, etc., our new collection is
no longer consistent with the old description.
The implications of Choice One are:
 We risk becoming irrelevant. Others will no longer bring their questions to us.
 Our collection will certainly be perceived as having diminished quality. Note that the
only thing that changed was the question. This is the “fit for purpose” rubric in a
nutshell.
The Effects of Choice Two are
 We have to change the description of our collection.
 We have to re-construct all the processes around the collection
 We will be immediately forced into another Choice One or Choice Two.
If we subscribe to fitness for purpose, we invite everyone to judge the quality of our collection.

Definition and Purpose


We should pause here to note that the definition of our collection and the purpose to which it is
applied must be in harmony. If we have clearly defined the parameters of our collection and if
the purpose takes into account that definition, then we can expect success from our purpose. If
we attempt to use the collection for a purpose that contradicts or extends the definition, we may
well anticipate failure or loss and we assume the associated risk.
If this is so, then definition and purpose must be parts of Quality, possibly even dimensions. This
can get complex in a big hurry. Think of a system of user-defined data types (or don’t if you’re

Semantics of two discrete collections

getting a headache). Each type must be represented by an axis. A vector might be defined by
two ordered sets of values, where each value represents a point on one of the axes. It is not
possible to depict in two dimensions on paper something that exists in n dimensions but they are
certainly not representable as single vectors, though they might bend to the idea of a heap of
vectors related by context (illustrated by the simple drawing above).
We will not be able to define either Definition or Purpose such that they can be quantified. Can
you see a way to make the values for these critical “dimensions” a series? Do we have the same
problem with other dimensions? What we could do is break Description into constituent parts
(e.g., name, definition, cardinality…) and, in fact, many software tools exist that do this. None
of those tools insist on a value for these parts, however10. They leave it up to the user whether or
not a value will be supplied, trusting one supposes, in the desire for quality that lives within each
of us. What happens to our quality assessment when some of the information needed is missing?

People as a Dimension
As long as we’ve started down the path of dimensions, don’t we need to account for people in
some way? Any purpose that we might define will be dependent on one or more persons. If
we’re very lucky, we may find that many people share a single purpose. Realistically, though,
when is the last time, especially in a data quality context, that you encountered a situation in
which everyone involved agreed on the purpose of a data set?
There is an almost irresistible need to use data for more than its intended purpose. (Wand, 1996)
Very often, the architects and designers are called to a meeting and asked to add some additional
functionality to the [information] system. Frequently they are simply informed that this will be

10
That is, supplying values for these attributes is not mandatory in the tools’ data schema.
done. It’s a rare situation when the architect says “no.” Sometimes the architects might dig their
heels in a bit and insist on additional time to examine the implications. If this is the message you
are getting, you should listen. If it isn’t what you are hearing, you might want to begin planning
for delays anyway.
We have been disparaging the single-purpose data set (data silo) for more than two decades now
and it’s easy to become confused over the issues involved. There is a difference between
 data as sharable (enterprise) resource
 functional compatibility of a data asset.
The first is about access and availability. The second is about interoperability.

Access and Interoperability


Interoperability has become a hot topic, especially in healthcare. In usage it is difficult to
separate it from access. Vendors of systems often claim interoperability of components within
their product list and sometimes even claim interoperability with products from other vendors.
When we dig a little we find that they have a database that is shared or, more likely, sharable.
Getting a look at the database schema prior to purchase will take a lot of commitment on your
part. The vendor will claim that the schema represents their competitive advantage and, as such,
must remain secret.
Do not be deterred. You will experience attempts at misdirection. For example you may get an
early model of the schema that has no relationship to the actual product. You will almost
certainly be asked why you need it. You might even be told that they have no such thing and, if
so, that should be a signal to back away. The schema is the heart of the product offering to be
sure, and it is the only evidence you are likely to find concerning true interoperability and fitness
for intended use. Insist until you prevail and then find someone both knowledgeable and
credible who will study it and then sit down with you. You want a briefing on what the schema
will support and what it will not support in terms of interoperability.
Being able to get at the data does not guarantee interoperability but it does make it theoretically
possible. When the schema offers fitness for use with respect to all of the components that will
exchange information from the database, then and only then you will have interoperability.
The issue of interoperability versus availability is an issue also within the data management
function of your business.
We’ve all heard of silos of information and we know that they aren’t a good thing. In a sequence
of defensive moves, the information industry has created new, more sharable silos. Master Data
is one and Metadata is another. The content of these new silos is intended from the beginning to
be useful to any function within the enterprise so long as that function accepts the definition of
the collection. The purpose of Master Data is to isolate precisely that data that is used across the
enterprise. It is made available (read only) to any application with the understanding that any
updates or additions will only be done according to rigorously monitored processes within the
silo.
Metadata has not yet achieved the status accorded to Master Data and the reason is precisely the
same as the reason for this book—confusion. From the standpoint of Master Data, everything
else is Metadata. One kind of Metadata describes the content of a dataset and should have the
information needed to determine fitness. Another kind describes the data container and is useful
for defining automated quality inspection or implementing a Quality process that guarantees a
data product that meets specifications.
Consider that Master Data is rarely accompanied by Metadata. Neither the container type nor the
content type is available as data because those purposes have been built into the processes
insulating the Master Data silo. This is further evidence of the slippery nature of process and
data. Given all of this ambiguity, it’s difficult to conceive of any way to treat people in this
context as scalar values. In other words people represent the unknown in any DQ effort.

Comparison
Once we have identified the critical dimensions of data quality, we must develop at least one
way of assigning a quality value using those dimensions. We require a value that can be
compared in such a way that one data set may be credibly said to be of greater or lesser quality
than another.
Ideally, we would be able to rank several data sets. We might do this by comparison to a “gold
standard” of quality and assessing the subject data set’s variance from the standard.
We see an opportunity to use money to guide our thinking. After all, people have been able to
use money for millennia and its evolution has been studied thoroughly. Of course we have
recently seen the money equivalent of a data quality disaster when a key type of money
(mortgages and their derivatives) suffered a loss of credibility that caused its quality and
therefore its value to fall right through the basement floor.
With this in mind, it would be useful to be able to monitor the quality value of a data set over
time. We need to know whether our quality value is remaining constant, increasing or decreas-
ing. In order to do this, it would only be necessary to monitor those dimensions that change with
time.
Time may not be the only variable or dimension useful for tracking changes in quality. Any of
the dimensions might equally well be selected for monitoring. Our dimensions must be well
defined so as to enable comparison of individual dimensions.

Complexity and Utility


In order for any quality assessment method to be useful (to have utility) it must be applicable by
someone with a moderate amount of education and training or it must allow the creation of
lookup or reference tables. For example, risk and mortality are highly complex fields of study
requiring post graduate degrees or at least a post graduate level of knowledge. They are complex
and at the same time so important in assigning credibility to yet another form of money,
insurance, that they have been converted by actuaries to tabular form such that someone with
minimal knowledge or education can simply select the appropriate table(s) and gain an
understanding of the level of risk in a given situation. If the exact situation can’t be found in a
table, the table describing the most similar situation is used.
We would expect something similar to happen with data quality. Assessment of the quality of a
data set will be a highly complex undertaking attempted only by organizations with deep
pockets. Within that organization, the results may well be documented in tables allowing any
manager to make an informed choice concerning the quality tradeoffs that he or she must accept.
We may need the equivalent of actuaries to analyze the risk inherent in data and to create the
tables that enable informed decision making. It should be clearly understood that the tables
themselves would be data. Such tables should have quality/credibility values near the gold
standard.
It is clear that the understanding of data/information quality is in its infancy. We have become
accustomed to operating on the premise that “I know what I want and I want it now.” Virtually
everything “known” about data quality today is at the level of common knowledge, also called
garage logic, common sense, home remedy… Objectivity is absent from the discussion but is
lurking at the edges and looking for an entry.
We would like to employ an objective approach that will result in the establishment of well-
defined research paths that might be separately managed and yet coordinated to produce the kind
of formalized understanding of data quality that is so desperately needed.
We have seen that real gains can be obtained through an effort to codify Description and
Purpose so that we can objectively determine
1. The completeness and utility of a description
2. The completeness, clarity and utility of a purpose
3. The degree to which a stated purpose agrees with the description
4. The degree to which a decision fits a purpose and therefore might reasonably find support
in an information collection
To date all that we have accomplished on the description side is the assumption of a description
by providing a place to store one. We haven’t addressed the purpose side at all. Anyone who
has read or tried to use the descriptions provided with any data collection today will realize that
what has been supplied can most charitably be called a placeholder entry.
When the concept patientID is described as “a value that uniquely identifies a Patient” everyone
has wasted their time. How is patientID used in the collection? How is it created? Can it be
modified? What happens to the patientID when records are merged? Is any meaning encoded in
the patientID? These questions and others should be answered in the description of any ID field.
The true origins of data quality issues lie in the humans involved, without whom there would be
no need for quality or even for data. As we will see, the issues are easily recognized and
described but difficult to correct. Though it has been said that we live in the Age of Information,
we have not yet learned all the implications of that.
As humanity transitioned from the age of crafts to the industrial age, there was a great deal of
turmoil lasting for decades if not centuries as we adapted to new ways of thinking, living and
working. It may even be argued that we still have not entirely made our peace with
industrialization.
Little wonder then that we are experiencing discomfort as we try to find a new rhythm in the
Information Age. There are many people alive today who will die before they ever get
comfortable with the technology of the information age. Our problem is that too many of those
people are in position to affect or to judge the quality of the data within their environment. We
may well have spoken the data quality problem into existence and perhaps we can speak it out of
existence.
Data quality channels

We can divide the problem space into three domains or channels called detection, mitigation and
remediation. Detection involves everything we do or use to recognize that a quality issue exists.
This can take the form of profiling—generally producing a statistical analysis of an entire data
set. The analysis can be of a pass/fail nature, counts of all the values in a field/column, or some
more complex analysis based on combinations of fields/columns.
Mitigation includes the processes and tools used to correct or repair quality problems. Care must
be taken in mitigation activities to avoid making the situation worse. To insure that this doesn’t
happen, we need a solid understanding of the meaning(s) and use(s) of the data. In some cases,
we have no opportunity to change data, even though we might be improving its consistency
and/or overall quality. This is particularly true in medical, legal and scientific applications.
The final domain, remediation means what we do to prevent further quality problems.
Remediation might include improvement in processes, standard procedures, policies, methods,
education or a host of other activities that affect data and its quality.
None of these channels are effective in isolation. These activities must be executed in concert in
order to achieve the desired result—a consistently reliable data and information environment
spanning the enterprise.

A Word About Methodologies


We humans are nothing if not resourceful. Some subset of any community (no matter how large
or small) is on a permanent quest to make things better—to improve. They have already
improved themselves to the limit allowed within the community and now they need to improve
the community itself to in order to continue to grow.
These people are absolutely necessary though usually thought of in negative terms by the rest of
the community until they finally attract a critical mass11 of support.
Software or system development methodologies are created by such people. They first of all
recognize the need for consistency as a foundation for automated support. They recognized that

11
A critical mass is the amount of fissionable material needed to sustain a chain reaction (in a reactor or in a nuclear weapon). In common usage
it is that number of adherents to an idea that enables the idea to become self-sustaining.
system development without such support can become hopelessly mired in complexity and data
overload. Naturally, the involvement of multiple people implies multiple ideas about the “right”
way to do things.
First of all, they had goals in common. The goals included
 Consistency
 Project management
 Quality of result
 Documentation
They really only differed because of the kinds of suboptimization resulting from market
competition. Each chose to focus on some aspect that they could claim to be handling better than
anyone else. Years of competition resulted in too many choices for the development manager or
programmer and the almost inevitable choice to rely on the talent they had already invested in.
Sufficient critical mass was created around each of the main players to make them well-known
and financially successful (though they may quibble about this particular characterization).
The problem was that they had priced themselves out of the market. They were accepted in
larger organization but most of the development was happening in smaller organizations that
often couldn’t afford the products that made the methodologies feasible and productive.
Eventually, driven by that need to improve the community, they got together and began to
discuss how they could combine the critical masses to positively affect the community as a
whole. The product of this collaboration, the Agile Alliance, set out to promote a development
methodology that could be scaled to the organization using commonly available tools.
In one respect they have enjoyed success—it is doubtful that a developer or development
manager exists in the world today who has not heard of Agile and is familiar with its unique
dialect of scrum, story, and sprints. In some other critical aspects they have been less successful.
It’s an exciting concept for developer/programmers. Just look at the poster with all of the
implied motion. Remember humans (and all other animals) are attracted to movement. A close
look reveals nothing that we can relate to data management. Oh, sure, we find the term
architecture within the Strategic loop but it’s quite a leap from this context of activity, [iterative]
movement and instant gratification to the concepts we have been discussing here.
Agile is a compromise—a negotiated solution—in which those who command the most attention
have come away with all of the concessions. I’m not saying Agile isn’t a valid choice for
development. I am noticing that consistency appears nowhere on the poster, nor do standards or
governance. From a data management perspective, it’s difficult to see any points where I could
interface my processes with those of Agile.
In an ideal world, the CIO would be sitting on this and making sure that the interfacing did
happen, was working and producing consistency. In the many worlds we live in, we can’t even
rely on the existence of a CIO role and certaily not on the willingness or ability of that role to
make these things happen.
Agile seems to suffer from all the problems concerning economics, hidden agendas and
personalities as any other business endeavor. Understanding these problems is the reason for the
Old Testament. One individual is collecting reports of Agile issues with the intent of finding
mitigation approaches. You can see these reports at
http://www.mountaingoatsoftware.com/blog/please-help-me-list-the-problems-with-using-agile-or-scrum.
Agile poster

The following examples are included here in case this web page should disappear. Syntax,
grammar and spelling are unedited.
 No Documentation at all please this is agile and it mean no documentation at all
 No vision from Product owner "You have this hurdle let me get back" and we end up spilling
that task to next sprint and eventually the scapegoat
 Kick the testers and analysts off the teams. They don't know the technologies we're using and
if they aren't writing code they just slow us down. Testing can be done after we've finished
the real work
 Kick the developers out of the Sprint Reviews. These meetings are for program managers and
customers and only serve to distract developers from their real work
 Sprints, no one ever Sprinted a world record in a Marathon
 Do we have to estimate user stories when we have a fixed budget to meet the customer
requirements?
 Estimation paralyzes team member(s) with fear of being wrong (especially when there seems
to be no penalty for estimation “errors”)
 It is a metric, and if we get it wrong we will be punished. (e.g in bonus/ appraisal).
Paradoxically, we are told weekly that story points are only a measurement of complexity,
our productivity is defined by the number of story points. Agile may say this is not right, but
the business pays our pay cheque. And guess what: We do not have time for lunch any more -
Story points drop, then your salary may go to someone else

These are but a few suggestions. From the perspective of 30 years in the industry, my
impression is that these are the same things developers have always complained about except
that the dialect used to describe them has changed. A big problem in getting useful feedback is
that the managers seldom if ever respond and probably never even know that these forums exist.
We could theorize at length about the reasons, but the bottom line is that a key point of view is
missing.
Use Agile if you so desire, just make sure that you are integrating development and data
management processes. The purpose of this book is to provide you with the tools to be able to at
least direct this if not actually do it.
Agile is exactly like other development methodologies in this respect: the process is far more
exacting than its users are prepared to be. This is the least appreciated dimension of all things
having to do with technology
Active management and governance are required.
People and technology have their own needs and tendencies and, without constant vigilance, will
move in those directions. If you are responsible for management or governance and your goals
are not fully aligned with those of the methodology and the people using the methodology, then
your own goals are likely to eventually supersede those goals to the detriment of all concerned.
You now have the foundation for an appreciation of what “data quality” and “data management”
mean. One pass isn’t enough to make anyone an expert but it will provide sign posts to the in-
depth knowledge that will get you to expertise. You are ready to see the world with new eyes.

In the beginning…
The goal here is to provide an overview of the history of “data processing.” By doing this it is
hoped that some readers will accept the challenge and change their focus from technology to
people. Those who do this will equip themselves for the long haul. They will become the
community that provides stability, continuity and consistency to the quest for quality information.
A sailboat can be an enjoyable and effective mode of transportation. It does, however, have a
limitation in that it can only progress when the wind blows. The manner in which progress is
made also depends on the wind. Sometimes progress must be made against the prevailing winds
and we are forced to tack repeatedly, never challenging the wind directly but using some of its
energy to achieve our goal by an indirect route.
Tacking (progress against a prevailing wind) Easy sailing, a beam reach

Anyone who develops information technology knows that the trip isn’t all a beam reach. We
have to be alert to changes in wind direction as well as changes in destination.
Imagine how much more difficult it would be if our boat had no centerboard or ballast. We need
people who will be the centerboard to keep us on track and ballast to keep us from capsizing as
well as rudder and sail.
The purpose of this book (Book Three) is to plant the seeds that will produce such people and to
arm them with new tools and approaches more suitable for navigating in the Information Age.
We cannot assume clear sailing. We must be prepared to change direction all too frequently
while keeping the goal constant.
There is no human endeavor that is not subject to external forces. There is also no external force
that can affect a clearly identified goal. When people share commitment to a goal and don’t
insist on suboptimizing, all goals are eventually attainable. Along the way it’s going to take four
kinds of people:
1. The Leader
2. The Manager
3. The Governer (one who governs, not the elective office)
4. The Doer
The Leader
A leader appears when most needed. This person is able to function when others are petrified
with indecision. This is the person who can prioritize and execute the details while preserving
the big picture. He or she is able to assess resource availability “instantly” and direct the
available resources to preserve the viability of big picture goals.
The Leader is able to accomplish this by decomposing the big picture goal into multiple sub-
goals or objectives and selecting the one that:
 Has a good probability of success in the available timeframe
 Has the best risk-reward value
 Produces the result that is nearest the goal result
The Leader gathers as much information as the situation allows, assesses the credibility and
utility of the information, and makes one or more decisions in a time frame over which she often
has no control. Situations involving human life-or-death frequently must be handled within a
very limited timeframe. Situations in which the future of a business enterprise is at stake are
usually much more complex and have more extended time allotments.
Creating Leaders has been a goal for centuries, if not millennia. In a given life-or-death
situation, it is possible (though not necessarily likely) for almost anyone to emerge alive without
the exercise of any leadership abilities whatsoever. As the scenario becomes more complex,
greater need for leadership emerges. Often the acknowledgement of a leader becomes a survival
situation in itself. If there is no consensus concerning leadership, a ship full of leaders may still
sink.
From a data management standpoint, decision timeframes measured in minutes are irrelevant
unless the data is implicitly credible and of known utility—in other words, unless the situation
was anticipated. This is another thing that distinguishes leaders—they anticipate and prepare for
the moment when leadership will be required. They also recognize that the crisis is not likely to
take a form that is exactly like the one(s) they anticipate so they work hard to gain command of
all the resources that could be needed.
They prepare in advance for the information that will be needed when the crisis comes; making
sure that it is useful and credible.

The Manager
The Manager is handed an objective which is often part of a much bigger goal. The Manager’s
job is to achieve the objective at the lowest possible cost.
The critical information needed by the Manager includes
 Current cost of production
 Current production rate(s)
 Cost of materials
 Inventory
 Labor cost(s)
as well as others of more transitory interest. From a data management standpoint, the common
theme is currency or timeliness. The Manager can make decisions based on trends of historical
costs but to really achieve objectives at the lowest possible cost up-to-the-minute data is needed.
If I have built an organization that is flexible and adaptable, I can take advantage of declining
costs in one area to offset increasing cost in another. To do this I need to know what the costs
are NOW rather than as of last month’s close. It seems obvious that timeliness of information
availability is important to anyone in business but to a Manager, it has real value and the closer
the manager is to the production line, the more value it has.

The Governer
This person is sometimes known—usually without much affection—as a bureaucrat. When we
speak of Data Governance we are talking about a bureaucracy. Now wait! Before you skip this
section, vowing to have nothing to do with Data Governance, you should understand the function
of the bureaucracy and the bureaucrat.
Let’s examine governance in general. We are all familiar with the concept of governance and we
regularly hold elections to determine who the Leaders of the governance effort will be. Note that
we do not elect the bureaucrats themselves though we do expect the elected leaders to create or
at least preserve a streamlined bureaucracy.
The function of governance is to make certain that the community has what it needs to function,
that it is secure from attack from within and without, and that the community experiences as few
unpleasant surprises as possible. The role of bureaucracy is to make certain that those things
continue regardless of who is elected leader. In other words—consistency.
Bureaucrats do not make the rules and they do not enforce the rules, they simply make sure that
the rules are carried out. If a governer (or bureaucrat) is given a procedure to follow, they will
follow it. This can be frustrating for the Managers and Doers but they also benefit because the
alternative to bureaucrats is autocrats (my way or the highway) or anarchy (might makes right).
A vital bureaucracy insulates the community from changes in leadership. In business today,
changes in leadership are frequent and can seem quite arbitrary. Governance in the form of a
bureaucracy is essential to consistency. As we shall, see consistency is essential to data
management. One small quibble though with the notion of data governance—all governance
must take place at the process level. Governance demands process and can’t exist without
process. The term data governance is confusing at best and misleading in fact. We cannot
govern data though we must govern the processes that surround data.
Governers are focused on trends. They need to know that whatever they have in their hands
meets the process specification. Then they need to know whether the process is stable,
improving or deteriorating. Business today focuses on leadership to the detriment of
consistency. It isn’t a good thing when everyone is asked to make decisions. First, many
employees don’t want to make decisions. Second, most employees don’t have the experience or
knowledge to make good decisions. Finally, consistency is impossible when everyone is
deciding.
Most of the decisions in an organization (or a community) must be embedded in process. Doing
this empowers everyone who executes the process. An example will shed light here.
In a small hospital setting there are a handful of surgeons and 3-4 operating rooms. Outpatient
surgeries (cataracts, arthroscopies, and joint implants are the bulk of the procedures. There is a
process by which a patient is scheduled for a procedure to be performed by a surgeon in an OR.
The process involves a basic health assessment to determine whether the patient is a candidate
for anesthesia or has any infection that could pose a risk for the patient or the hospital.
All of the paperwork (data) associated with the various assessments, histories and questionnaires
goes to a ward secretary whose instructions are to collect the paperwork and make sure that all
the pieces are present the morning of the surgery.
The ideal (fantasy) process calls for the secretary to cancel the procedure if certain data is
missing or has values outside of acceptable ranges. Of course when the secretary does cancel,
she is immediately accosted for wasting the resources that have already been allocated. The
secretary hears, “We could have worked around that.” If sufficient time is available, the surgeon
or anesthesiologist may review the data and, with the benefit of greater knowledge and
experience, override the “decision” of the secretary. This only has to happen once or twice
before the secretary recognizes that the process she has been given is a fantasy. She begins to
gather the data as early as possible and pass it on to the medical staff.
Now she is berated for not canceling, thereby wasting resources. When it is pointed out to the
surgeons and anesthetists that they are asking a $15/hour, relatively uneducated (compared to the
surgeons) person to make a medical decision without any guidance, they realize their error.
They rebuild the process so that decisions are much more automatic and are made much earlier.
The process has decisions built into it and branches as appropriate to involve the proper medical
decision makers as soon as possible. The secretary no longer has to make decisions and is now
happy with her job which is now simply filing, copying and transmitting information.
Surgical teams are now assured of a procedure when they report for work and patients are no
longer sent home after appearing at 5:30 a.m. for pre-operative preparations. Good processes
make happy communities.

The Doer
Everyone is a Doer at times. In fact no one is any of these roles all the time and everyone is each
of these roles at some time. What we’re talking about here is tendency. In particular, a Doer is
someone who is given an objective and achieves it.
How does Data Management accommodate Doers? We can’t predict what kind of information
the Doer will need nor when or where it will be needed. In this respect the Doer and the Leader
have similar profiles. We find, however, that Doers need information at a finer level of detail
than do Leaders.
While a Leader or a Manager may be able to use inventory information at the level of dollar
value as a whole or by type, the Doer needs to know whether he can supply a customer with the
two units being requested. A lot of data management and IT time is spent satisfying the needs of
managers and leaders but the Doer is really the one who butters the bread.
Many roles in a business have a tendency to think in commodity terms about data and
information. We speak of records and terabytes and pages and other collective terms. None of
that is useful to the Doer. They need only one piece of information and it had better be credible
and delivered quickly and reliably. No other role is so dependent on the quality of the
information they are handed. No other role gets less attention from the data people.
When the Doer gets information that is suspect, they can’t just toss it back and demand better.
They have to roll up their sleeves and try something else or they have to verify the information or
they have to actually create the information they need. For example, the Doer may have to go to
the warehouse and actually count the number of pieces of a product that are currently available.
In more than one scenario involving more than one industry we have seen data that ranged from
guesses to estimates to survey results that says your Doers are spending from 20-60% of their
time on the job getting the quality they need in the information they work with. For most of
them this is not directly part of their objective and is considered a negative impact on their
productivity. Here is an opportunity for the Manager or Leader to make their numbers look
much better, essentially at zero cost!

The Opportunity
Let’s spell it out. The opportunity is to take that 20-60% of labor cost and make it into additional
productivity. It’s zero cost because we can simply redirect already lost productivity into
relatively short-term efforts that will improve processes with resulting improvement in
information quality. It requires surprisingly little effort to make significant improvement in the
credibility of our information.
The most important reason for the lack of quality we currently experience is the notion that if I
take the time to fix a problem it will mean that I may not meet my objective. “Somebody will fix
it.” Or “It’s not my job.” Or “We’ll fix it later.” Are all equivalent to burning money. There is a
name for the problem—it’s called suboptimization and we will discuss it in greater detail in
Book Eight.

Data Quality in Context


For the vast majority of people in the world the concept of data is meaningful only when it is
represented by examples. One example that was used in less politically correct times was this:
26 36 37 (sorted data)

37 26 36 (information)
Dress form, 37, 26, 3612
The fact is that almost everyone seems to need a context in order to discuss data. We are
comfortable discussing data about sports team performance or the financial performance of a

12
This, in fact, is the principle behind XML in which a set of data is tagged with a context so that it can be readily interpreted.
company or market sector. All of these data are known by a name such as RBI, ERA, P/E Ratio,
Days in A/R, FGPct, FTPct, Yds/Game, Yds/Att, Comp/Att, TD/Int… Most people bog down
quickly and lose interest when the discussion turns to describing the data itself.
The reason for this is that data, like money, numbers, law, process, and executable logic (to
name but a few of the really useful ones we see every day), is an abstraction.

What do we mean by Abstraction?

We can describe these things and provide examples (instances or manifestations) but we cannot
manipulate them directly. Abstractions are extremely useful and yet, because they are ideas, they
can create much confusion. When we confuse the idea with the example, we get instant
confusion.
We are accustomed to calling characters such as “4”, “8”, etc., numbers and for most
purposes, that causes no problems. Sometimes, though, we need to be more precise
and remember that those characters are numerals (Arabic numerals to be exact) and
that they represent numbers.
Arabic numerals by themselves can only represent one kind of number called an
integer (a counting number). This is why, in days gone by, there were so many names
for units of measure.
There was no way to determine, express or even conceive of a decimal fraction (a real
number). A field dimension was x feet and y inches or x chains and y rods (or yards,
furlongs, leagues...). At some point the notion of less than became fractions. The first
fractions were the ratio of integers such as 1 part in 2 or ½. Thus a new kind of
number was created—the rational number.
There’s no need to go deeper into numbers here but let’s apply the same thinking to money.
Most of us think of money and get a mental picture of coins or bills (currency). Some of us
might form an image of a bank statement or a Quicken screen or even a check or deposit slip.
Only those who understand money as an abstraction are equipped to imagine equities, credit
default swaps, futures, mortgages or debt in general as money.
Money instances

We have a similar situation with respect to data. Those who are able to use data once it is given
form, whether as a headline, a report, an article or a chart, are on one side (let’s call it the
concrete side) Those who are able to manage and create data or to recognize it in manifestations
that don’t look like data are on the other (the abstract side).
Obviously, many of us are in the in-between territory where we wander back and forth, some-
times catching a glimpse of the larger world but content for the most part to confine ourselves to
headlines, tweets and statuses. Those on the concrete side accept the data as it appears and are
often confused when (or if) they act on their data and find that things don’t turn out as expected.
On the abstract side there is continuous concern over the reliability of data. This group
recognizes that when we build with inferior products (unreliable data) the result will be
unreliable. They want to increase the reliability (confidence, trust, predictability) in their lives
by identifying and eliminating whatever it is that causes unreliability.
Over the millennia we humans have improved our understanding of number and developed
mathematics, which is a system of reliable rules—that is, rules that consistently produce results
that are objectively and pragmatically useful. One of the means by which this was accomplished
was the introduction and refinement of an algebra. An algebra is a system of symbolic notation
and operations that can be used to uncover and record the nature of the abstraction.
We see again and again that abstractions are “tamed” by the introduction of some symbolic
formalism that allows for precision in communication about the abstraction. A flow chart, for
example is much more precise than a narrative (natural language) description of a process. This
is particularly true when the flow chart is produced by someone with a grasp of the abstraction.
Executable logic yields to a combination of process and data flow diagrams.
Not so very long ago, there was considerable discussion and debate over the efficacy of one
symbolism over another. There were several different versions that each spotlighted some
specific aspect of process or data flow. Now we choose our tools, not because of the rigor of the
symbolic algebra at their root, but because of their cost or their ease of use. We are expected to
use these tools and so we do use them, without ever recognizing their place in the world of
abstraction.
Perhaps no warning is needed, but to avoid any liability situation and to increase your appetite
for what it to come, this is a good time to point out that thing people (those who live in the world
of the concrete) are often viewed by the idea people (those who can live in abstraction) as their
legitimate prey. This is easily seen when we consider swindles and confidence (con) games.
One person has all the power based on the ability to make unreliable data (usually about money)
look reliable. The concrete person accepts the data at face value and winds up transferring
money to a person he will never see again.
The case of Bernie Madoff illustrates the spectrum involved. His prey were not all
currency/concrete people. Some had a degree of sophistication about money and data and were
functioning in the abstract world. All were temporarily blinded (having recovered their ability to
see since Madoff’s bankruptcy and conviction) by the desire to increase their money holdings
(wealth).
Perhaps you don’t see the relevance of these warnings now, but be assured that this happens
every day in the world in which we live. It isn’t always about money. Sometimes it’s about
control or power. Sometimes it’s about influence or promotion or simply credit or appreciation.
People use data to gain an advantage and our job is to help them (on this side of legal).
We might quibble about ethics or morality, but we must be clear about what it is we’re doing.
We must be aware that even “bad” data (known to be unreliable) can be and is being put to use.
What do we call a person who makes a living by inducing people to part with something of value
in exchange for something of lesser value? This occupation is known as sales and the difference
in values is known as profit. Please, this is not a value judgment—the customer gets to satisfy a
need and both parties agree with their eyes (mostly) wide open. The purpose here is simply to
make sure that we’re all clear about our purpose.
If marginally reliable, the same thing as known unreliable, data can be put to use in legally
returning a profit and in fact may be the key to that profit, then what is the value of making all
data reliable? The real value of a data quality effort may be the ability to discern which data are
reliable and the level of reliability.
We’ll revisit this later, but keep this thought on the back burner of your mind as you proceed.
Lately we have seen a new kind of data user emerge, the hoarder. “Big Data” is a concept
created to appeal to the information hoarder. “What might you be missing?” is the question that
engages them.
Imagine that we live in a sea of data. We are like fish in that sea who need data to live. Or
maybe we are dolphins in that sea who come to the surface to live and only immerse themselves
to find food. Most of our data quality effort is like the person who attacks any impurity that he
sees as he swims along. He has absolutely no impact on the quality of the sea. By the way, if
it’s more comfortable for you, you can replace “sea” with “lake” or even “pond.” The big data
people are using specially designed strainers which they place in interesting currents within the
sea.
The person who desires to clean up the sea must become knowledgeable about many things that
are not, themselves, of the sea. Because the sea is the result of every process within our world,
governance of those processes must be instituted and consistently applied over an extended
period of time before she will see the results she is after. Even when the desired results become
visible, relaxing our beaurocratic vigilance will result in immediate loss of quality.
Data Quality—More Than Meets the Eye
As we move forward toward a view of data quality that allows us to create and use a language
specific to DQ issues, descriptions and solutions, let’s take a minute here to examine the
behavior of data.
Certainly, one of the attributes of quality data is that it is well-behaved. In other words it consist-
ently delivers value according to principles that are applicable because of its type, domain,
range, relationships, maturity, purpose(s)…
It is useful at this point to differentiate between static and dynamic properties of data. Any DQL
(data quality language) that we might define should work well where static properties are
concerned. When we begin to consider dynamic properties, the task becomes much more
complex. The greater the number of dynamic properties, the greater will be the complexity.
Our chances of designing a DQL will be significantly greater if we can restrict ourselves to static
properties only. Before we can do that, we have to understand the dynamic properties and assess
their relative importance. Can we carve them out of the discussion? Will excluding them
compromise our DQL’s capabilities?
Looking back at the list in paragraph 2 above, the first three properties (type, domain, range)
might be thought of as static. These are the focus of our modeling efforts or, if we only pretend
to do modeling, of our programming efforts. There is a tangent here that we’ll resist for now, but
at some point we have to come back to it. The question of how data is initially defined is huge
and the effect of initial definition on the lifetime of a datum and in particular on its quality is not
to be underestimated.
For now, though, we’ll put that on the back burner. We expect the individual pieces of data to
possess a definition (usually called a description), and our DBMS requires that we say what kind
of data it is (in terms of structure and permitted computational uses). Is it variable length text
strings, a specified number of characters, integer, floating point, money, date/time, etc.
It is vital that we remember that, even though data/ information and its management can be
defined and discussed without reference to technology, technology can have a lot to say about
the costs associated with failure to manage. The human mind can take a set of numerical values
(ex. 1, 1.5, twenty, pi, -12, 3/8) and make immediate sense of them. If asked, we could add them,
multiply them, average them, order them or perform a host of other operations on the set. The
computer CAN NOT. If we command a computer to add (1.5, twenty), we will get a failure. If
we command our computer to average the values found in a field or column in a table and one of
the values is 3..2, it will fail.
Technology demands consistency. Humans like consistency but can function in its absence.
Therein lie all of our problems. Humans must exert extra effort to achieve the consistency that
computers and technology demand. To the extent that we rely on technology, we must be
willing to exert the extra effort required to keep it happy.
It is surprising how many data are defined to the DBMS as varchar. It probably shouldn’t be so
surprising since all of our modeling tools allow us to set a default type and the default for the
default is always varchar(n). This is the default because it guarantees that any value supplied by
a user will be accepted by the DBMS. In other words, it makes life easier to use the varchar data
type. Other decisions that compromise eventual quality are made at this point, also because the
make life easier for someone in the near term and quality of result is a long-term concern. This
will also be an avenue for future exploration.
The final three items in the list (relationships, maturity, purpose) are dynamic in the sense that
their values can and will change, sometimes rapidly and usually unexpectedly. Let’s take the last
first. Purpose, as “fit for…,” will change whenever we’re not paying attention. We hope that
our stewards will be on top of this but pragmatically (everyone likes pragmatism), they may be
too close to the business itself so that changing business needs or drivers loom so large as to
overshadow defined purpose which then fades to insignificance.
Data Quality is similar to the general concept of quality

Maturity is also dynamic. We expect maturity to change over time. When we think of data
maturity (if we do), we include stability in all the other properties, quality metrics that have
flattened out, recognition within the enterprise and probably several other aspects.
Finally, we have to face relationships. We’re not very good at relationship management. This is
as context-free as any assertion can get. Some of us wouldn’t recognize a relationship if it sent
us a valentine. Others pile all sorts of unwarranted expectations on top of our relationships and
then wonder where the quality has gone.
It all starts in the modeling or definition phase. Chen (Chen, 1976), when he invented a
graphical notation for describing data, gave equal weight to entities and relationships. Both had
a two dimensional symbol and the opportunity to possess attributes. For many reasons, not least
perhaps that tool developers didn’t grasp the importance of relationship, “data modeling” tools
eventually turned a multi-dimensional, real thing into a single line segment that is only present at
all as a clue to the schema generation software to copy the identifier from one of the linked
entities into the attribute list of the other. It is labeled a foreign key so that the database engine
can build an index.
Although examples are often counter-productive in the discussion of data quality, one example
may illustrate the role of relationship in completing the semantic of a data set. PATIENT is such a
common entity in the health care marketplace that no one even bothers to define it. It is a set of
“demographics” by which we mean the attributes and it has relationship with PHYSICIAN or
PROVIDER. It probably also has relationship with VISIT or ADMISSION, ORDER, PROCEDURE,
PRESCRIPTION, SPECIMEN and other “entities” of specific interest to the enterprise such as
EDUCATION_SESSION, CLAIM…
It doesn’t take long to figure out that the relationship between patient and physician is more
complex than can be accommodated by a single foreign key. A physician can “see” a patient,
refer a patient, treat a patient, consult (with) a patient, admit a patient…the list goes on and on.
Each of these relationships has real meaning or semantic value and may even be regulated by an
outside body. Typically, these are implemented by a single foreign key attribute for each.
Sometimes they are called out explicitly as associative entities—frequently in events such as
VISIT, ADMISSION, ORDER, PROCEDURE, PRESCRIPTION. The only problem in this case is that
they are no longer recognized as relationships.
Now, imagine a situation in which an in-utero procedure is scheduled on a fetus. You may be
aware that transfusions, heart valve repair and a host of other medical procedures are actually
being performed on the fetus while it is still within the mother’s womb. So, who is a patient? If
the facility also terminates pregnancies for any reason you can see the conundrum. Medicine
doesn’t allow for terminating the life of a patient (Dr. Kevorkian is an interesting case study but
not for this discussion). At the same time, we would like to sometimes treat the fetus as a
patient, perhaps for reasons of safety. We also experience the lack of values for attributes that
we may have tagged as mandatory, e.g., DOB, SSN.
It is only when we explicitly talk about relationships that these issues emerge. Relationships cast
light on the entity from all angles.
Relationships also represent the business processes that inform the purpose of the data. Often,
undocumented meaning gets attached to data. Two analysts will get together and agree that for
the purpose of this analytic, this combination of attribute values will be included (or excluded).
For a given ETL (Extract, Transform, Load) job, we decide that an attribute value that isn’t on
the approved list will be replaced with “&”. The adjustments to business processes are constant
and usually undocumented and unnoticed. Until we can point to a documented process or
relationship, we have no way of capturing and dealing with changes.
What’s the difference between an association and a relationship? Somewhere in there we’ll find
clues about dynamic quality properties. One thing leaps out as a property of quality and a
property of relationship—expectation. When we claim that something has quality, we establish
an environment in which it is permitted to have certain kinds of expectations. The same is true
of relationship. When two parties or entities enter into relationship they agree as to the
expectations they will have of each other.
In our quest to define quality for data, we will be forced to document expectations and to monitor
accountability with respect to those expectations. We’ll give the relationship much more
attention later.
Foundation—Machines and Logic
We begin our investigation at the most basic level—that of the machine. Whatever Matter (as in
“Does it really matter?”) means to a machine it is quite different from what it might mean to a
higher consciousness or even to your boss (the Dilbert influence). At the machine level,
mattering takes the form of whether or not to activate an error condition (switch).
As you may be aware, the computing machine is actually a continuum that is anchored by
silicon-based integrated circuits and memory, hosting an instruction set (the ISA or instruction
set architecture) which in turn is the basis for firmware. Firmware is the platform on which the
operating system (OS) is built. The firmware layer is distinct to the processor type.
At the level of the machine, all data resolves to a linked sequence of bytes represented by an
address in memory. Obviously, we’re leaving out a lot of details. This chapter is a synopsis of
an entire semester of Microcomputer Architecture.
The byte is an 8-digit binary value. Binary means that there are only two values in our
mathematics, 0 (zero) and 1 (one), therefore each digit can be presented by an electronic
construct that is either on (1) or off (0). The machine neither knows nor cares what the value
represents and simply performs commanded operations, e.g., and, nand (not and), or, xor (one
but not both), +, -, /, * (higher order operations that can be expressed in terms of and, or, …), etc.
The only time any of the data matters is when the machine performs an operation resulting in a
memory (or register) overflow. In this case an operation produces a result that is too large to fit
into the assigned memory without losing some of the bits (binary digits). To the Logic that
requested the operation, this is a symptom of what is potentially a bigger problem. It means that
the data wasn’t understood well enough to reserve a memory extent adequate to hold the result.
Each stroke of a key on the keyboard generates a byte-value. Each value is a number from 0 to
255 that represents a character in the ASCII (or some other) character set. Somewhere in the
logic between the keyboard and the hardware, there must be recognition that, for example, a set
of keystrokes represents a number rather than a string of characters. If you use a spreadsheet
program you probably have experience in telling the program that a cell’s content is to be treated
as a number with zero or more decimal places.
You will have seen that cells behave differently if they hold numbers versus characters. For one
thing, text defaults to left-justified within the cell while numbers are right-justified by default.
An attempt to perform an arithmetic operation on a cell containing text will produce an error
(except in a few programming languages—most notably M or MUMPS—which makes
assumptions that allows the program to continue executing).
The goal of the Logic (higher-level program) is to identify and trap errors or potential errors
before they are delivered to the machine. This effort is worthwhile because the Logic should be
able to determine the cause of the error condition and take action to warn or correct. Allowing it
to pass through to the machine will result in an “overflow” error condition. This error is usually
accompanied by a memory or register address and possibly a screen full of byte values. The
machine will halt its handling of the input data at this point because that is the only safe thing to
do. This is rarely seen any more as these potential error conditions (bad data) are trapped and
dealt with at a higher level. In the case of the spreadsheet, “#REF” is the recognition of an error
condition before it is passed on to the machine.
Please accept sincere apologies for this chapter, but it is necessary to create an awareness that the
machine isn’t the problem. The chapter is short because it is dense (a semester in a few
paragraphs) but more detail is not needed for our purposes.
We’ve satisfied ourselves that the machine is not the object of our focus. Its requirements
(expectations) are simple and straightforward. In essence it throws all the responsibility back on
the participant who delivers the data. It faithfully and efficiently carries out operations on bytes
(pieces of data), confident that it won’t be given data that it can’t handle. It promises to avoid
potential harm by simply stopping and waiting for the problem to be cleared.
The characteristics of quality that matter to the machine are only that the instructions given must
fit the data. Fit is interpreted very literally. The instruction must set aside enough memory to
hold the result of an operation without loss of information. The machine’s only responsibility is
to minimize damage by halting operations as soon as an error condition is recognized.
Our investigation must proceed upward in the abstraction continuum to the Logic or software
layer(s). We have seen here that things that appear from a safe distance to be monolithic or
homogenous may, in fact, be layered and highly complex. As we proceed toward Logic, we will
come to recognize almost unbounded complexity.

Data Quality in the Logical World


We have said that data management is independent of technology. Having said that, we have to
backtrack just a bit to understand the basis of some of the problems we experience and which
arise because of the use of technology to help us in managing information.
When we are first introduced to computing machines, their logical nature is often the initial
hurdle to be negotiated. We really have to get used to the idea that something will execute the
commands we supply. Maybe that’s not the hard part though. It’s the idea that this machine will
never, under any circumstances, do anything that is not explicitly commanded. None of our
dealings with other humans or with nature will have prepared us to be so literal.
We expect that others will grasp the nuance and the implications of what we ask them to do and
will exercise something called judgment based on their own experience, their grasp of nuance
and their resulting ability to infer the best course of action in light of the instructions they were
given. The machine—whether fortunately or unfortunately—is incapable of nuance, inference,
judgment and the accumulation of experience. But wait, you say, I’ve heard of neural networks
and knowledge-based systems.
These things are actually layers of logic laid over the machine and although research continues
into making a machine that functions like the human brain, that machine does not exist today.
What we learn from our first attempts to direct a machine is that our own ability to think and act
logically is not very reliable. We se thngs that arnt thre. We unthinkingly forgive, though we
may notice, typographic errors, poor grammar and muddled syntax. We are masters at filling in
the blanks and ignoring what doesn’t matter, though we may be so good at it that it gets us into
trouble. Who hasn’t missed the word not and completely misunderstood an instruction?
So far we’ve spent our time in this chapter dedicated to the Logical talking about human
tendencies. This is solely for the purpose of setting the stage.
We humans soon provided ways to avoid dealing directly with the machine. Because we weren’t
comfortable or even very good at describing what we wanted in terms of binary numbers and
combinatory logic (and, or, nand, nor, xor…) we created interpreters and compilers as a way to
use higher level language and have it translated for the machine to execute. These allow us to
express ourselves in a language closer to the one we speak. For example, rather than having to
tell the machine
Step1: do A
Step 2: do B
Step 3: do C
Step 4: if register-n value equals register-m value then go to memory address x
do D
x: next
By using a higher level language we can simply tell the machine
Repeat
A; B; C;
Until n = m
D;
While the very early languages were only a half-step above machine language (or assembler),
they served to make the code that was the Logical more readable and therefore easier to “debug.”
Languages for programming computers have evolved steadily through four generations.
Currently, the availability of graphical (WYSIWYG—pronounced “wizzie wig”—what you see
is what you get) languages represents a late 4th or 5th generation.
All of this is important to our discussion because each advancement was motivated by a need for
more understandable (to humans) code. Errors in programs have always been expensive and
some have cost hundreds of millions of dollars. Each new development in language provided
new syntactic constructs to aid in making the programs written in it more understandable. More
understandable meant that the developer could “walk through” his program with a group of peers
and they could focus on looking for errors relative to the requirements rather than relative to the
machine. If the compiler accepted it, it would run on the machine.
At some point, mistakes in procedural instructions ceased to dominate, while mistakes in the
understanding and proper handling of data took center stage. It was at this time that relational
data management and entity-relationship-attribute modeling began to emerge.
Data Administration became mandatory in every organization as we began to search for ways to
make the appropriate handling of data more transparent. The focus was on definitions and rules
and it was assumed that if a data item were defined and we knew the rules that governed its use,
then most if not all of our problems would go away.
When this proved not to be the case, we dug ever deeper and created new concepts (similar to the
development of programming languages) to make our communication regarding data ever more
clear and less ambiguous. Data was too generic and we created new varieties of data to
understand: master, reference and meta data. We had to be able to draw more abstract pictures
of our data so that we could see how the pieces were expected to fit together. We created
architecture and architects for data as we continued to seek ways of explaining to non-data
people the kinds of things that needed to happen and when. Lest we forget, some of those non-
data people are writing the code that is our information management system.
We can struggle mightily to get the “business” people to understand what is needed in order to
have “good” data, only to have it all undone by logic-writers who believe they know what needs
to be done. Many of the response time problems that often plague database applications is
actually caused by logic-writers who insist on doing things one-record-at-a-time instead of the
way the relational DBMS (database management system) was designed to work—in sets. That,
however, is another tangent at this point in the story.
Programming languages (and DDLs-data definition languages) provided typing constructs so that
compilers and DBMSs (database management systems) could catch potential overflow errors
before they brought the machine to a halt. These constructs also encouraged developers to think
ahead about the data they were serving up, manipulating, modifying, storing, or destroying. Life
was good and the end of the dark ages seemed just over the horizon.
A lemma is a subsidiary proposition assumed to be valid and used to demonstrate a principal
proposition.13
Lemma: No tool delivers value until it is used.
Lemma: Tools without obvious value will not be used.
Lemma: “Doesn’t matter” is the same as “no obvious value.”
Lemma: the value delivered must be seen to exceed the value of the effort expended or the tool
will not be used.
Theorem: Tools for improving the quality of data exist but are not being used; therefore the level
of data quality is at best static.
Theorem: If the value to be obtained from using a tool is not intrinsic to the process the value can
still be elevated to threshold level through extrinsic motivation.
It helps in understanding the issues of data quality handling in Logicals if we first understand the
various motivations and processes that produce the Logical. We now understand that many tools
and constructs are commonly available to developers and that their appropriate use could be
expected to have a positive impact on the quality of our data assets.

Motivation 1: Business Need


“We need a system to [do something].” This is far and away the most common reason to build a
Logical. Of course, if part of the something that is to be done were to involve quality data, we
would have little to talk about. The specification for a new system very rarely establishes service

13
http://www.answers.com/topic/lemma#ixzz1E9lGwhxN
levels for data quality. Let’s assume though that we do wish our system to be associated only
with quality data.

Logic design with DQ in mind


We would begin with a conceptual-level model that establishes the complete context for the
implementation and use of the system. We would make sure to identify subtype situations and
all relationships. We would spend a lot of time on the relationships because we recognize that
they represent key business processes that incorporate “business rules.” We would establish key
process indicators associated with each relationship to inform us when relationship (process)
expectations are being violated.
Developers (logic-writers) would include relationship maintenance in their specification and
design. The model would also provide information for the developers concerning domains for
each attribute—the choice of data type should never be allowed to default. Developers would
include appropriate data types in their design. Logical models would preserve the meaning of
the context including relationships and domains. They would be normalized.
Physical models would be created by adding application-specific entities to the logical model.
Application- or program-specific data must be kept separate from the “real-world” data. We
may need to know who did what to a PATIENT record and when they did it but that information
has nothing to do with the PATIENT. Normalize, normalize, normalize!
Normalization rules were created with data quality in mind.
Developers would validate each data item created in the system and would not allow storage of
un-validated or invalid data. Data analysts would participate in design and code reviews and be
part of the QA and QC processes. No data item intended for use in analysis would be captured
or stored as free text. Error statuses received from the DBMS as the result of type conflicts
would be handled within the Logical, either by forcing re-entry of the data or by canceling the
transaction. Mandatory data items would be enforced and validated.
If attributes get their values from drop-down lists (this is known as an enumerated type), the
Logic should not allow the end-user to free-text new values for the drop down. If a new value is
needed and the user is authorized to create one, then distinct functionality (logic) should be
supplied that will apply adequate controls. By the way, all drop-down lists should be sorted in a
way that recognizes their meaning and use (alphabetical, numeric…).
In summary, allowing free-text entry of data that has (or should have) analytical value may be
the single largest source of DQ issues. This shortcut can cost an organization millions of dollars.
Imagine, for example, an application (Logic) that allows a laboratory result to be stored as text.
At some point, we want to trend those results for a PATIENT or for a group/population of
PATIENTs. We have a text field to work with but we expect numeric values. In other words, we
expect to encounter only the characters 0-9, “+”, “-“, and “.”. We are experts in the use of SQL
so we do a type conversion on the fly to compute an average and a standard deviation. Our
analytical query collects perhaps millions of values and begins the computation.
After some period of time—usually this happens after nearly all the records have been processed
(this must be an unknown corollary to Murphy’s Law)—a value is encountered that contains a
character that can’t be interpreted by the query processor Logic as a number. When this
happens, an error condition is generated and all the work done to that point is abandoned. This
avoids an even worse scenario that would have developed had the operation been shipped to the
machine for execution. So we roll up our sleeves and go to work to root out the offending value
which eventually turns out to be “see note”.
Undaunted, we draw upon our vast knowledge of SQL and do a bit of a quality check before we
include a value in our computation. We add a bit of logic (isnumeric) that tests whether the
value can be interpreted as a number first before we compute with it. Presto! Our analytic is
delivered. Our pride is short-lived, however, because anyone interested in average and standard
deviation will always want to know the size of the population. The count included in the analytic
attracts attention because it seems too low. “We did more tests than that.”
Now we do some profiling to discover just how big a problem this is. We learn that only 78% of
our records are actually numeric. The other 22% are a mixture of text values and nulls. What to
do?
Investigation determines that the values are machine-generated so we can’t even make it a
training issue. Can we simply ignore the non-numeric values? It turns out that we can’t because
they represent testing actually done. How can we adjust the analytic to account for what at first
looked like bad data but now turns out to be useful though non-compliant. This decision will
involve several meetings with different stakeholder groups and a lot of work for us as we
develop recommendations. We can’t even add this to the data quality costs we are accumulating
because how can we call it bad data?
It turns out that the analyzer (the laboratory equipment that actually extracts results from
specimens by chemistry or some other means) has some flexibility in the format of the result.
An interface Logic is also involved and has the flexibility of a maintenance fix to reformat what
it hands to the DBMS for storage. The load Logic could be changed to separate the numeric
results from the non-numeric ones. We could agree to assign a numeric value to the various non-
numeric results. There are many things that can be done as fixes, should have been done during
design and implementation, and now will go on the books as IT maintenance costs.

Motivation 2: Innovation
We’ve got this great idea for a software “solution” that will take the marketplace by storm.
Everything that we said about Motivation 1 applies to this one as well. Now, though, Marketing
gets impatient and, in order to be first to market, begins to sell something that really only exists
as an idea—or maybe a prototype. This has the effect of punishing timelines and induced
shortcuts.
We’ll consider the human side of this in more detail later, but for now it’s enough to understand
that all of that validation that we discussed above and much of the modeling will start to look
like dispensable fluff. We don’t need it if we use text data types and control the data used in our
demonstrations. It’s only important to our aim of being first to market with our solution that we
demonstrate a product that appears to be doing something that our customers need. Need we
mention that our customers have never told us that the quality of their data is important to them?
In fact, in many cases (only fear of being labeled a cynic prevents me from saying most) the
customer has never thought about the quality of their data and how their data might provide a
more realistic test. They look at the sample data and see that it looks just like the ideal data that
they produce. It’s an easy sell to get them to believe, “This is the result you will see from this
product.”
Please don’t misunderstand. We aren’t blaming the seller of the solution. If anyone is to blame
in this it’s the buyer who doesn’t understand his own situation. Let the buyer beware is a sound
warning that has been ignored for millennia. Remember the Bernie Madoff story in the chapter
on abstraction?

Motivation 3: Creating Something


The remaining reason to create Logic is because it is a creative act and a source of pride in the
same way that authors experience pride and satisfaction from completing a story or book. In the
case of Logic, the developer is validated by the machine that executes the Logic. Often no other
validation is required. “It works!”
Again, more will be said later about the human side of this, but it suffices to remark here that
data and its quality are not often at issue. This is easy to see in the development of a prototype.
A prototype is a kind of Logic that is assembled to prove a point. The idea is to show that
something can be done. Everyone should understand that this result can be far removed from the
real-world doing.14
Unless the purpose of this Logic has to do specifically with data quality, we will never see any
effort spent on validating, preserving, enhancing data quality.
We have seen that the Logical world has immense but largely unrealized potential to have a
positive impact on data quality. Some of the inhibitors are time and cost, but there are many
others that are more ephemeral. For the most part, though, the impact of Logic on DQ is a
negative one.
Strong typing (data typing to be enforced by the compiler) was introduced in the 70’s as a
powerful tool in the programming language to avoid run-time failures caused by poorly
understood data. Verification and Validation (V&V) processes were identified decades ago.
Edsger Dijkstra (Dijkstra, 1982) was one of the first and perhaps the most influential proponents
of provably correct programs as early as the 1950’s.
It’s not that we don’t know (as a discipline) what to do or how to do it. Why then, do we still
have data quality problems? Let’s move on to the next chapter and see if we can’t discover what
the problem really is.

The Human Factor


The human factor in the function is the one that determines whether it (the DQ function, as in
QD = f (h) can be evaluated or not. How can the human factor change? What changes would
make it impossible to solve DQ?
Let’s begin by making a list.

14
As an aside, this seems to cast a different light on pragmatism. Is a prototype pragmatic? How pragmatic is it?
 Humans come and go
 Humans are not interchangeable
 Each is unique
 Each has a unique way of perceiving anything
 Maturity level (and therefore, perspective) changes over time
 Skill level changes over time
 Knowledge level changes over time
Human means changing. Humans must change and adapt just to live with one another
successfully. For a large portion of humanity, though, change equates to uncertainty and
uncertainty evokes fear. For those who advocate change then, it is critical to do so in a way that
reduces uncertainty and therefore fear.
Fear can be overcome or avoided by some tried-and-true methods, among them:
 Education
 Training
 Practice
 Motivation
 Leadership
Fear is one of the forces that act to inhibit change and although it is a powerful force, it is still
but one. Another that is even more difficult to address is inertia. If you were a physics student,
you may be saying, “Inertia isn’t a force.” In physics terms it is not a force. Inertia is the
tendency of a mass at rest to remain at rest or for a mass in motion to stay in motion. Inertia
must be overcome by some force. In human terms, inertia is often known by other names such
as lazy, unmotivated, satisfied, content, comfortable, inattentive, ignorant, powerless, and on and
on or, conversely, driven, committed, goal-oriented, focused, etc.
Whenever a person or an organization is described with one of these adjectives, we can be sure
that inertia is in control.
There is another aspect of inertia that we should be aware of because, while some of the
adjectives imply lack of motion, others allow for undeviating motion. Let’s remember that the
other side of inertia is that a body in motion tends to remain in motion (along the same
undeviating path).
In any case, it takes a force to overcome or change (re-direct) inertia.
We have a tendency to organize things, to classify, label, count and file. Moreover, we like to
specialize and be the best. Specialization and being the best go hand-in-hand. Think of the
Guinness Book of World Records. If I want to be in the book I look at all of the existing records
and then pick one that I already like and am proficient at. Then I specialize just a bit. I juggle 14
cleavers but I do it while standing on one foot. In the same way, if I am marketing myself I can
claim to be the best at Agile (Alliance, 2013) development in a manufacturing market (or a
healthcare market…)
The first few programmers when computing was in its infancy could call themselves the best and
who could argue? As the potential of the computing machine came to be understood, more and
more people who were intelligent and talented began to call themselves programmers and they
were in demand. Gradually, though, competition for the available work grew and people began
to look for ways to claim that they were the best.
At the same time, computer processing (as it was then called) was maturing as a discipline, if not
a profession. The methods for producing good software were being researched, analyzed and
improved. New process steps were created which created the need for new experts.
Marketing was also maturing as a function and this had a direct effect on specialization. As new
products were introduced it was necessary to differentiate them from already-existing products.
What better way to do that than by claiming that the new product is “the best.” Frequently a new
capability or functionality had to be created in order to claim “new and improved.” The new
functionalities provided an avenue for a technician or developer to claim it as his own and once
again be the best or the only.
Decades of this has led us to where we find ourselves today—in an impenetrable morass of
largely meaningless acronyms and technical jargon. Now, a programmer can claim expertise
merely by listing the hottest new technologies on his resume or by liberally seeding his
conversation with acronyms and jargon. Of course the impression of expertise may not be
sustainable but that’s not the point.
The flip side is that the consumers of information technology are easily cowed by references to
things that aren’t understandable to them—even to the point of claiming, “I’m non-technical!”
and making the claim a badge of honor. The ugly result of unnecessary specialization is that a
huge gulf has been created between the technology people and everyone else. In a conversation
with someone from the “business side” a mention of I.T. will very often elicit some eye-rolling
and statements like,
“I can’t even talk to them about something like this.”
“I.T. wants to do it their own way.”
“What comes out of I.T. won’t be what I asked for or what I need.”
“I hate dealing with I.T.”
The fundamental problem, and the reason why [data] quality is intransigent as an issue today, is
precisely the gulf that was exposed in the previous section. A little story might help to illustrate.
A certain church had “adopted” (in 2008) a tribe of hunter-gatherers in east Africa. The hunter-
gatherer culture is nomadic in order to follow the food supply. Nomadic means that the tribe has
no permanent structures and no need for any such.
Your inclination may be to dismiss this tribe as an anachronism with a comment like, “They
should wake up and get into the 21st Century.” A longer term viewpoint might be that since these
are some of the last nomadic people on the planet, any problem will take care of itself if we
simply allow time to pass.
A church (as this particular church) wants to demonstrate to these people that they are linked to
other people whom they don’t know and that these other people care about them. In an effort to
do this, the church created a “mission trip” and offered its members the opportunity to be
emissaries (and a chance to go to east Africa).
Meetings were organized to discuss the trip and what would happen. In these meetings, a
recurring topic dealt with what the emissaries would “do for” the people of the tribe. Some
wanted to build (a church or school or clinic) and proceeded rapidly to discussing materials and
fund-raising until they were reminded that this tribe had no need or desire for such structures and
wouldn’t know what to do with them if they suddenly appeared.
The tribe live within an inertial bubble of tens of thousands of years of nomadic culture in which
the only useful knowledge pertained to finding sufficient food and water to stay alive. Medical
and other services were delivered within that bubble by going to where the people were and
providing what they needed.
The Amish and other groups here in the U.S. are similar to this tribe in terms of the very visible
gulf between them and the technology–dominated world in which most of us live.
What we are saying is that this same kind of gulf exists between the typical “end-user” and those
who are comfortable with the manipulation of technology. Now, obviously we are over-
simplifying and there is a spectrum that runs from technophile to technophobe. The
programmers are at one extreme and chances are that your V.P. is at the other. Even
“manipulation of technology” involves a spectrum. There is manipulation as in productive use
and manipulation as in creating from new ideas.
All of this is simply background for the observation that it’s no wonder that data quality or
technology quality in general doesn’t seem to matter to most people.
We, to whom it does matter, seem unable to grasp the existence of a culture that doesn’t see
things in the same way we do. Even though we can’t see the quality of the observation skills of a
hunter-gather or the quality of a particular draft horse or a quilt or the use of light and the brush
strokes of a piece of art, we don’t feel at all inadequate. Now we want to infer that others are
somehow less because they can’t see the quality, not just in a piece of data, but in a data
architecture, a process step or a data profile.
In order for that kind of quality to matter, a person must see the whole. A holistic orientation is
an absolute requirement in order for quality (of any variety) to matter. How much it matters
depends on the scope of the portion of the whole that my holistic perspective is able to
accommodate.
How does one become able to expand his holistic view? Experience enhanced by learning is the
only way to bring more of everything into your personal universe. Another secret: once you
think you have nothing left to learn—you won’t. Beware, though, life has a way of moving you
off of that pedestal and it is frequently unpleasant. It’s only when that happens and you accept
that everyone you meet has some new knowledge or new perspective that will expand your
universe—only then will your understanding of holistic begin to change and expand. Holistic
must be defined in terms of all the stakeholders collectively. To restrict it to the perspective of
only one—even if that one is you—is to guarantee a sub-optimal result.
We have all seen and lived these sub-optimal results. Whenever we trade some aspect of data
quality for hitting a time-box or a budget or for making someone’s life just a little easier, we
have suboptimized. We do this again and again because we focus on the local benefit and get
tunnel vision.
Why do we insist on getting tripped up by this? We have only to look at the benefits to
understand the why. What is the benefit of making someone’s life a little bit easier? If that
someone is you, then the benefit is in the sigh of relief or the “one less thing to worry about” or
the few extra minutes each day.
If it’s someone else who benefits, we get the credit, the accolades, the commendations,
recognition, enhanced reputation, credibility, or simply the chip that we collect and hold to
exchange for some future favor. And what did we have to trade for this? Only something that
no one may ever notice and if they do, it will be well into the future. In fact, we hope to have
moved on to greener pastures by then. It’s an easy trade—a no-brainer.
It takes immense self-discipline not to suboptimize. The choice for suboptimization often isn’t
even apparent and is buried deep within the candy and baked goods and covered in chocolate.
We have seen that suboptimization is the natural outcome of human nature. How, then, can
suboptimization be driven out in the face of the natural, unthinking and self-validating processes
and tendencies that produce it in the first place?
There can be but one answer but that answer, discipline, is itself burdened with connotations of
effort and suffering that seem to make it a non-starter. We all know that we need discipline and
especially self-discipline. If it were easy though, we wouldn’t have an epidemic of tobacco use
or obesity on our hands; the US Army would not have had to water down the physical
conditioning in its basic training programs; and addiction would be a much smaller problem.
We have very few models of discipline that we can hold up as standards or goals. The
organization itself will be accused of hypocrisy since it is clear to all observers that the discipline
it attempts to force on itself through policies and standard practices is only intended to apply to
those who must be controlled. Those policies and procedures are most often used as a club to
force submission under threat of termination of employment as specified in the employee
handbook.
Budgeting processes provide one example. Managers devote considerable time to assembling
and justifying a budget request each year. Even before they get a chance to present their request
they are notified that all requests must be cut x%. Frequently these demands to “trim the fat” are
repeated multiple times. A manager must find ways to get through several rounds of trimming
and still arrive at the endpoint of this process with sufficient funding to deliver on
accountabilities or else risk being seen as ineffective or redundant.
All of this comes in the guise of bottom up budgeting. Let’s see what we need to make the
business successful—that is the motivating tease. You, the manager are important and it’s
important for us to understand what is needed to make your organization productive.
Of course, the CFO has long since created a forecast and all of the repetitive fat trimming of the
“budget process” has been designed to get to the forecast number. In this scenario,
suboptimization is guaranteed. Even if the various department managers had been working
toward some common goal, they will have forsaken those expenditures as extraneous “fat” long
before the end of the process.
Attentive observers will also have noticed that the biggest expenditures tend to bypass the
budgeting process completely.
The corporate world is chock full of examples such as this. Individuals are consistently pitted
against one another—not by design—because of inattention and lack of discipline. We insist on
allowing ourselves to be victimized by arbitrary time lines and budgets.
The most effective defense against the tendency to suboptimize is unrestricted information flow.
When people lack information or believe that they do, they assume a defensive posture. In
defense, they begin to hold back information themselves and then to adopt self-centered
approaches to guarantee their own survival. “How do I know you’re not playing me?” leads to
“Your problems are your problems. I have to make sure I have what I need.”
It begins to look like we will always need to work in a suboptimized environment. This has been
a known problem for decades. The Rand Corporation published a study of suboptimization in
1952 (Rand Corporation, 2012). Even with wide-spread recognition of the problem it continues
to flourish. Given its hardiness, it would be wise for us to learn how to co-exist, minimize
damages, and succeed anyway.
The central, though unspoken, theme of suboptimization is self-interest. Self-interest is the
primordial goo we often hear about in discussion of the origins of life (on Earth). In fact, self-
interest is the basis of evolution and suboptimization is the mechanism.
We shouldn’t wonder then when we encounter it in discussion of data quality. Because
suboptimization will always be with us, our best chance of coping is simply to identify it and
plan around it. Enlightened self-interest is still self-interest but it means softer boundaries and
expanded opportunities.
So much of our lives is expended (wastefully) by attempting to stand in the path of self-interest.
The lesson to be learned is that self-interest is a train and that our only hope is to get out ahead of
it and lay some new track with a switch so that we can divert the train.
A train without track is a train wreck. This is what you will have if you attempt a course change
without laying the groundwork.
When we stand in front of a train believing that we can (and should) divert it from its path the
potential outcomes are limited in number. The inertia of the train is such that re-directing it
while keeping it intact requires a different approach to the application of force. Unless we have
built new track, there are only two outcomes and both are unpleasant.
The most likely outcome is that we become history. Of course it won’t be the history that we
envisioned. It will be a kind of history that will produce the exact opposite of the outcome we
really wanted. For years afterward we will be that someone who tried it and disappeared. We
will have simply become an impediment to change, guaranteeing that the new day we were
seeking is pushed even farther into the future.
The other outcome is a train wreck. We’ve all seen train wrecks in the movies. It’s fascinating
to watch and we marvel that the locomotive is in the river while those in the last car continue to
dine, or converse or nap, oblivious to what is about to happen to them.
A [good] manager never forgets those in the last car even while applying power and then brakes
at the right moments to make optimal progress without going off the track. Those in the last car
hurtle along with no sensation of hurtling. In order to maintain that illusion, the manager must
avoid sudden changes in direction, sudden braking, sudden acceleration—sudden anything.
How do you do that when your most reliable tool is the org chart? If all of the track, both old
and new, is in your box on the org chart you can proceed with some hope of success. But is your
box composed of boxes? Are other managers, other groups, other communities, other interests
involved? If so, the job of surveying and track-laying just got much more complicated.
We begin to see that the organization may not be up to the task. Organization simply lacks any
insight into interests and motives. Those who draw borders on maps often look for guidance
from terrain features such as mountains and rivers, assuming that these natural boundaries will
have divided interests as well. How surprising then that drawing artificial boundaries, even
when they follow natural ones, become motives for strife when cultures/communities with
disparate attitudes and customs are divided and thrown together in new mixes that “aren’t
natural.”
Those divergent cultures may be made to cooperate in order to get the train on the new path but
they will always be resentful of one another, will follow different leaders and will seek ways to
make themselves prosperous even (and maybe especially) if it means taking something from
their neighbors.
An additional opportunity for problems comes from the fact that in the information age, the train
and its tracks may be entirely virtual. “Neighbors” may not have physical proximity.
Communities/Cultures along the virtual track may differ not only in attitudes but also in
language. To a certain extent we acknowledge and accept the inevitable suboptimization as
inevitable. Then we fall victim to dissociative identity disorder, passing from a pragmatic
acceptance of and appreciation for the difficulties to a self-centered mode in which we fume at
our inability to get what we need.
Later we’ll examine the concept of “what I need” in some detail but for now we leave the subject
of [human] people to resume our exploration of quality itself.

Where Does Quality Come From?


Detection and Remediation
What is the source of quality? There are at least two ways to approach this question. One is to
assume quality and then examine instances of not-quality. This perspective can appear very
productive. This approach is the absolute best one available to a list-person—someone who
derives satisfaction from making lists and then checking things off of the list.
 Detection (Assume quality: find NOT quality)
o Notice (and collect): remediate instances
o Search (and collect): remediate common causes
Detection may take the form of noticing (as mentioned above) or it may be accomplished by
purposeful searching. The end result, a list of not-quality examples may be the same in both
cases but there are major differences in the how and these differences can be leveraged in the
interests of quality improvement.
The noticing method seemingly takes little preparation, although some training, education or
experience is needed to sensitize people to what not-quality looks like and why it’s important to
notice. Without this minimal preparation, most people would see only the most egregious
examples of not-quality.
Searching requires much more in the way of preparation. Not only does the searcher require all
the knowledge that the noticer has, but they also will need to have mastered the searching tools.
It should be apparent that they must also have developed a template to use in the search. They
will look for data examples that don’t fit the template. It follows also that they will need
templates for each kind of data they will be searching. A CUSTOMER template won’t help to
search out not-quality in LABORATORY RESULTING.
In the noticing approach, each time we encounter an instance of not-quality, we add it to the list.
Searchers also keep lists and associate each one with the search template that produced it. We
then scan the lists for instances that must be fixed. The determination as to whether a given
instance of not-quality must be mitigated or remediated is based upon many factors, each of
which may be considered alone or in combination with others. Some factors are
 Do we know how to fix it?
 Who is affected?
 Who else has noticed?
 How much will the fix cost?
 How much cost saving will we derive?
 Can I do it myself?
 How complicated will it be? How much coordination is involved?
 If it is repaired, who will notice? Could this be used as political capital?
 Will I be recognized? Can I get more recognition by mitigating or remediating?
These are only some of the more important factors. The number is limited only by the
imaginations of those involved. Note, too, that these factors may be used at any level in the
organization. It seems obvious that those closest to the problem are in the best position to keep
costs down. BUT this is true only in the case of relatively trivial repairs.
For example, we may have a table column (field/attribute) that is expected to contain integer
(whole/counting) numbers but which has been set up as a character or text field. Our
overworked staff must enter data into many fields of a form and often confuses this one with an
adjacent one which should contain floating point (real/decimal) numbers. Now both contain
obviously incorrect values. Viewers of the form see the data values as a larger construct (e.g.,
vital signs in a medical application) and automatically transpose the values so that they have the
proper meaning. Now, however, we are interested in generating statistical analysis of the values.
Our product is called into question because the results seem to be skewed.
We run some tests and find that a significant portion of our values are greater (or less) than any
naturally occurring value. When we look at the outliers, we see that some are obviously
typographic in nature but we see evidence of a larger problem as well
It is a source of amazement to consider that the vast majority of the discussion referencing “data
quality” is not about quality data and how to get it but about poor quality and what to do about
it. This is very like trying to decide what kind of lotion to apply to a skin condition. When all
we see is symptom, the best we can hope for in applying topical treatment is to relieve irritation
(itching, for example).
Strategic (Root Cause) Re-engineering
If we want to cure the problem—not just relieve a symptom—we need to know whether the itchy
bumps are the result of mosquitoes, bed bugs, poison ivy, measles, shingles or smallpox.
Treating the wrong cause (or just the symptom) can have disastrous consequences.
We also have to ask whether it is quality we seek to promote or non-quality that we seek to
eliminate. The results, though similar may produce decidedly different cultural perspectives as
we move forward. One path leads to a culture that will tend toward suboptimization, culprit-
seeking, uneasiness, and past-focus. The other path is more likely to lead to future-focus,
community, communication and pride.
 Create quality
o Define quality
o Create processes to produce quality
Our first task, then, is to decide whether our quality glass is half full or half empty. It is difficult
for individuals to change the way they think about quality although organizational thinking may
be affected by the attitudes of opinion leaders. Over time, an organization’s approach to quality
can be changed via attrition and hiring practices, probably as a result of the vision of one or more
leaders.
What do you do if you’re half-full person in a half-empty organization? You are likely to be
unhappy because you get frustrated with those around you who want to solve the problem for
tomorrow instead of for today. You may find a role in which you can be happy if you can focus
on your current tasking and stick to defined processes and standards.
If you happen to be a half-empty person in a half-full organization you will be frustrated by the
tendency to apply Band-Aid solutions to problems. The focus is on getting through today and
you can’t find anyone who wants to talk about tomorrow. You may be engaged by a particular
task but will often end up frustrated because you want to take it just a little farther. You may
never be able to find satisfaction or happiness in this organization.
If you think this discussion is odd and out of place in a book purporting to be about quality in
general and data quality specifically, then you will probably self-select for the half-full
population. It is true that we don’t ordinarily see these ideas brought up in data quality
discussions, but the whole purpose of this book is to take those discussions in a new and
potentially much more productive direction. We must be willing to grapple with the really big
ideas as well as the really small ones (like invalid customer address).
There is a basic methodology for elimination of non-quality products. These products are often
called poor quality or bad or scrap and the associated cost is often referred to as re-work. Until
Shewhart, Deming and Juran came along and Japanese industry became desperate enough
abandon these principles, the way to get a quality product was be means of inspection and
testing.
Either products were individually examined or they were sampled using statistical algorithms to
select subjects. The remedy for products not meeting the quality standard was scrap. For
individually inspected product, individuals would be scrapped while for sampled products, an
entire production run would have to be scrapped or each individual in the run would be tested
and selectively scrapped.
Note that inspection costs are much higher for individual inspection while scrap cost is higher for
statistical inspection. Sometimes, where the level of quality was known for a production run, the
product could be diverted to a different market where the standard was lower.
The so-called Information Age meant changes to this paradigm. Very often it was not possible
to scrap product and we had no sliding scale in our quality standard that would allow us to divert
the product to another market, so we had no choice but re-work. Someone had to go back to the
sub-standard information and bring it up to the level of acceptance.
In several informal polls conducted within various kinds of organizations, it has been found that
this rework consumes as much as 60% of the workday for many employees, a huge competitive
opportunity for most companies. This kind of poll is easy to conduct in your own environment.
Just ask workers how much time they spend each day in
 verifying information
 fixing obviously incorrect information
 tracking down missing information
 matching or un-matching records
You will be surprised. In Quality terms this cost is known as re-work.
If you include your I.T. organization in this survey, you may find the cost much higher. Every
large organization todays makes use of ETL (the overnight programs that extract data from one
database, match it with data from one or more other databases and load it into yet another
database). When ETL jobs fail because a value couldn’t be converted to a number, the costs can
quickly escalate to big numbers. Sometimes you only get one chance per night/week/month to
run these time-consuming jobs. When they fail to run, your data warehouse is no longer current
and various reports can’t be run or they have holes in them.
A data warehouse also forces choices on us. When we encounter a quality problem, we can
often do some instantaneous re-work to make it acceptable to the warehouse but the intended
value is unknown and lost. In the case of healthcare data, there is usually no tolerance for
amending values as originally recorded. A missing or obviously incorrect value casts doubt on
the whole screen, form, or report.
We are faced with dilemma after dilemma. We can keep the warehouse working and
information flowing if we exclude known bad data. If we do this however, the credibility of the
warehouse as a whole can be irreversibly damaged. “I know we did more than that!” is the
precursor to a series of investigations followed by a schedule of involved and potentially heated
discussions in which you try to explain what happened and why in such a way that the customer
can still see a future for his investment. This particular issue will sometimes recede into the
background as more and more data is accumulated but be assured that someone will eventually
notice.
The alternative is to “massage” the offending values in such a way that they are accepted into the
warehouse. This path is also fraught with peril. “What the heck does this mean?” or “This isn’t
believable!” will also mean investigations followed by meetings in which you try to repair your
own credibility and the credibility of the warehouse itself.
Given these painful scenarios, it is little wonder that the “today”, “half-full” people get excited
about poor quality in the data that they have to manage and look for ways to “fix it” now.
They seldom go upstream in the data process beyond the point where they first encounter the
data. They aren’t stupid or lazy or even ignorant. They simply need to get something fixed
before the next ETL run. Give them credit, too, for considering modification of the ETL logic to
make it more tolerant of errors. The problem here is that it is impossible in most cases to make a
decision that will preserve the credibility of the data. Credibility, like quality is most easily
defined by its absence. Sufficient intelligence regarding standards of credibility simply does not
exist.
DQ is born in what we call I.T. It is born in an atmosphere of blown expectations, drawing its
first breath when a database programmer says (or mutters), “That shouldn’t be there.” As a
database programmer, there are few things more frustrating than learning that I can’t make any
assumptions about the consistency of the data I am working with. This is a good time to remark
that “We can’t afford to make error detection part of the system” is just another way of saying
“We’ll take care of it later.” We don’t have much data on the cost of postponing error
detection/correction but there is a large body of evidence that tells us that the earlier in a
system’s life that we find and fix errors, the better off we are.
An entire industry (Computer Aided Software Engineering, 2013) emerged to produce tools that,
because of their graphical nature and the code generation functionality beneath them, allowed
developers to correct discovered issues in the graphical representation (or model) and then
regenerate the system code. This is by far preferable to “patching” the portion of the code in
which the error manifests itself. There are many reasons why this is so but they are beyond the
scope of this book. Research into Software Engineering Economics and Software Engineering
Methodology may be worthwhile but be warned that most, if not all, sources assume a level of
knowledge about system development that is well beyond that of most people practicing in the
field today.
We like to tell ourselves that our customers recognize problems in their reports and analytics as
data quality issues. Here’s another revelation: the stories we tell ourselves are designed for one
purpose only. These stories are created to make ourselves feel better.
We need to believe that others are hearing our message and are empathetic with respect to the
issues we have identified. To believe that makes us feel that we are making progress. If you are
a parent, you may have purchased a few moments of peace from your child by allowing them to
believe that they will get what they want. You may even have fallen for the promise that, “If you
do this one thing for me, I’ll never ask for anything else ever again.” The underlying premise is
the same as the one used in, “It’s better to ask forgiveness than to ask permission.”
We should understand that we may sound just like whining children in our attempts to convince
our customers that the problems they are experiencing are their own fault. We’ll do “our part” if
they’ll just do theirs isn’t convincing and makes no friends—especially when our part is quite
specifically “information management” or “information technology.” In other words it sounds
like we’re asking them to help us do our job.
We are being pacified with agreement and the occasional project that allows us to feel as though
we’re doing something. And why should this be? Why should “the organization” wish to pacify
us in the same way that we pacify our children? They do it for exactly the same reason—to buy
a period of peace. If they’re lucky, we’ll fail to produce dramatic results, thereby damaging our
credibility to the extent that they never have to listen to us again. Of course the problems don’t
go away and soon another champion of data quality shows up to start all over again.
Please understand that we’re talking about a typical scenario here. There are certainly
organizations in which information and its quality is taken seriously from the top on down.
These are the motivation that keeps the rest of us going.
Unfortunately, there isn’t enough history to demonstrate that gains can be held. We also have to
contend with cultural personalization—the fact that solutions which might be effective in one
corporate culture may be, and usually are, completely ineffective in any other corporate culture.
This makes it unlikely that any sort of best practice investigation will produce THE solution.
We can extract useful ideas but the application of those ideas will frequently demand cultural
change. Cultural modification—whether subtle or wholesale—is not something that can be
taken for granted.

Initial (First) Definition of a Data Item


The initial attempt at defining a data item in a given context deserves focused attention for the
following reasons:

 This is the laying of the foundation


 No structure can last without a sound foundation
 This moment is very much like the first nudge of the snowball down the mountainside.
 “do overs” can be extremely costly
If it’s true that “You never get a second chance to make a first impression” then it may also be
true that you never get a second chance to produce the initial definition. The first definition that
is captured in a database or a software system is of critical importance.
This is so because forever afterward this definition will be the stake in the ground to which
future definitions must be tethered. For example, when someone asks “who is a customer?” or
“what is an order?” or “what is an episode of care?” we should recognize that a moment of truth
is upon us. It’s unlikely that anyone from the business side (as opposed to the technical side)
will ever ask this kind of question but they will ask for an information system that has customer,
order, or episode of care at its core.
The technical side has insisted that the definition of key entities is essential and the business side
has refused to take it seriously. This impasse has resulted in historically poor and always
inadequate definitions. Those who ask for the definitions usually don’t have the business
experience and those with the business experience are unable to grasp the criticality of nuance in
the key entities. Invariably the first fix, enhancement or upgrade is underway when the system is
rolled out for the first time. This happens when business side customers see the system in
development—perhaps as a prototype or maybe as a pilot project—and they recognize that
something important is missing or not right.
They never see and can’t be expected to see that quitting too soon in the definition process is the
reason15. Full disclosure: defining terms is difficult and generally unexciting work and all parties
are ready at the first suggestion to put a bow on the process and move on. The process of initial
definition will always benefit from including the most experienced data person available. This
person will be far less likely to prematurely end the definition process because they will have
experienced the downside. It isn’t easy to know how much is enough but the chances of
finishing the effort with something useful is far greater when experienced data people are
involved.
It is always a good idea to draw attention to definition issues when dissatisfaction is expressed or
changes suggested. There is no best way to do this because it depends on the personalities and
culture of the organization. Use the desired result—future support for extended definition
processes—to guide your approach and methods. Regardless of how you approach this it will be
a tough sell. Everyone has more interesting or exciting things to occupy their time.
It has taken decades for this to become clear and even now there is no guarantee that it is
recognized in many contexts today. In days gone by, the programmer was most often the one
who was responsible for the initial definition. They often didn’t realize that this was what they
were doing. Quite often definition consisted of a brief phrase or sometimes a sentence or two
attached to or co-located with a data declaration. A declaration statement is part of almost all
programming languages. It is the means by which the compiler is instructed concerning how to
reserve memory space for values of the data item and what operations the item will be allowed to
participate in. Data declarations might take (for example) any of the following forms depending
on the programming language:
 Data Division
Working Storage Section
<data name> PIC X(20)
 DECL CHAR(20) <data name>
 Type person_name <data name>
 Dim <data name> AS String
This is but a very small sample of the ways in which data is declared in a computer program.
Every language provides for shortcuts and some don’t even require a declaration or will infer the
type (storage format) from the name given to the data item.
Compilers do not require meaningful names for data items. They are quite content to accept any
identifying sequence as long as they know how to allocate memory. In the earliest days of
programming, programmers used meaningless sequences of letters and/or digits to identify a data

15
The 1980s and 1990s were the golden age of software development methodologies. We had learned a lot about how to create high quality, low
maintenance systems and there were many CASE (computer assisted software engineering) products available that could be configured to enforce
your methodology of choice. There was a problem though—putting all that quality in was slow. The process was so slow that sometimes the
need for the system had evaporated before anything was ever delivered. At this point a group of well-known methodologists got together and
created the Agile Alliance. The Agile Alliance distilled what were considered to be the most important aspects of all of the methodologies in use
and repackaged them in an iterative (as opposed to a “waterfall”) model that was designed to deliver results quickly and use the end
user/customer for quality control prior to launching the next iteration. The result of this is that a new kind of “quality” was born. Unfortunately
Agile didn’t resolve any of the issues we are dealing with here.
item. In fact, in those days programmers actually recognized that they were tagging or labeling a
location in memory. The entire concept of a data item was an abstraction still hidden in the mists
of the future.
What was found at the address represented by the tag? A binary value (a sequence of zeros and
ones representing a number in base-2) could represent anything. In the beginning it was numbers
only. IBM invented BCD (Binary Coded Decimal) using six bits rather than four and by doing
so were able to encode uppercase alpha (for alphabetic) characters as well. By expanding to
eight bits (Expanded BCD Interchange Code or EBCDIC) lower case alpha characters as well as
some special characters could be encoded. The Univac computers from Sperry Corporation used
a 9-bit byte in order to make use of an expanded instruction set.
ASCII (American Standard Code for Information Interchange) uses an 8-bit byte to represent
256 different characters. This code is used worldwide today. All of this is interesting but not
critical knowledge. The important concept here is that the Latin letters and the Arabic numerals
that we think of as data are recorded in electronic memory, transported as electronic impulses
and decoded by programs (Logical constructs). These are the grains of sand on which our
quality foundation must be built.
At this point we had the ability to store virtually any kind of data in computer memory. This
wealth of capability naturally came with a price. How could a program differentiate between
different kinds of tagged memory? What if one program stored integer numbers in memory and
then another program retrieved that memory and tried to interpret it as alpha[numeric] values?
The need for data typing was born.
Because memory was severely rationed (due to its cost), programmers developed many tricks to
overload a memory address in such a way that one program could see a computational value
while another could see a character or non-computational value. Some of these tricks are still in
use today although the difference in interpretation and use may be much more subtle.
A programmer may define a data item, usually called a variable, which really represents a tag for
an area of memory. They may insert a comment that this variable will be used to hold values
representing, for example, a person’s body temperature. The assumption is that this program
will interpret the sequence of bits assigned to this tag as a real or floating-point number which is
simply a number with a decimal point in it. The programmer sees no need to go into greater
detail in the definition because all who use this system know that a body temperature will have
one digit to the right of the decimal and that valid body temperatures are those read from a fever
thermometer and for a living human will range between about 75F (23.9C) and 112F (44.4C).
Maybe you can see a problem looming already. How do we know whether the value we find
was expressed in Celsius or Fahrenheit? A human looking at the value would assume based on
known valid ranges. A computer program can’t assume and would need to apply rules present in
the code. What will the basis for the rules? The definition would be the place to begin the
search.
This problem is manageable, though applying rules would certainly increase the number of
processor cycles needed and therefore the response time of the program. To complicate things a
bit more, we may find values like 60 or 120. These are not believable as body temperatures.
Could our rules differentiate? What would our program do with such values? How would they
be interpreted? Over time an examination of the data might suggest that the input program was
allowing intended respiration values to be stored in our body temp variable. Maybe we would
even have enough associated information to track these entries to a specific nurse who always
took a patient’s vitals in a certain order and always recorded them in the same order.
What is our defense? Think of the definition as a package, like an egg carton, that helps the
customer to decide what should go into it and also to help in deciding whether what we find in
the container actually belongs there.
The best possible defense is to include in the definition enough specifications to determine
without ambiguity whether a given value should be stored in this variable and, with the same
lack of ambiguity, determine whether a value returned in this variable is valid. But wait, it isn’t
enough to create this specification list, we also have to make it available to future programmers
who might want to modify either the input or the output logic. This is all possible within the
schema definition for a table in any relational database management system. We can specify that
a value is required, that it be a real number, that it be between 75.0 and 112.0 (if Fahrenheit is the
standard) or between 23.9 and 44.4 (if Centigrade). To simplify we can store values in
Fahrenheit units and allow the program to convert to Centigrade at the user’s option.
Oh, yes, and one more thing—we have to build such definition/specification into our quality
control procedures so that we make sure they are applied wherever this data item is used.
When an inadequate definition is associated with a data item, the next person who tries to use it
will have questions. It must be assumed that the person who created the initial definition is
unknown or, if known, no longer available. The questions won’t be answered. The best case
scenario has you tracking down the person responsible for the definition only to find out that you
have already seen everything they accumulated.
The diligent person will analyze the data values in context to deduce or simply infer the answers
needed to fill in the blanks. In many cases the person, believing that they possess sufficient
knowledge and experience and in the interests of convenience or time-savings, will assume the
answers, claiming insight that will later turn out to be unreliable. When we discover that the
values currently in our database are not consistent with the definition published for them, what
more can be said? What do we really know now about the ones that appear to be consistent with
the definition?
The original definition—if it exists at all—will be expanded to cover the new situation. Business
will proceed as usual until someone tasked with creating a report or someone searching for
actionable information in a report suddenly notices that subtotals are incorrect. Minimal
investigation reveals that the headings for subtotals include several different kinds of things, not
all of which are expected. We have uncovered a DQ issue.
We can recognize that the definition we have is inadequate in some way, but what definition was
being used when the unexpected values were enshrined? We have at least two choices in
addition to pretending we never noticed (which is a far more common choice than we might
wish).
We can root out the offending values (this is known as data cleansing), but consequences come
immediately to mind. What do we do with them? Can they simply be deleted? What about
related information which, by the way, also does not fit the original definition as we have
understood it.
The second choice, segregating the offending data values and their associated data to a new data
item, also has consequences. Will they need to be associated with the “real” data? Can we
extend the definition to cover the unexpected values? What effect will that have on existing
reports, forms, etc.?
This kind of problem is routine. Nothing remains the same for long and particularly in the
domain of business. Mergers and acquisitions are a never ending source of DQ issues.
Entrepreneurial initiatives within a company comprise another fertile garden.
It will not be productive to try to fix blame or even to do a root-cause analysis. The cause is
change, whether the change be due to growth (planned or unplanned), market pressures,
technology, or simply opportunity. The real questions require answers that will not come from
I.T. Unfortunately, it is going to be very difficult to get anyone to pay attention long enough to
understand the questions, let alone develop answers that will demand analysis, planning and
decision-making. People just have a blind spot where data and specifically DQ issues are
concerned.
Recasting this as a process issue will result in a much increased level of comfort. People very
naturally understand that there may be a better way to do something. If we can show them what
a quality result looks like and then show them a process by which they can produce such a
quality result, the enemy is on the run. Turning the issue into something tangible will create
longer attention spans and increased tolerance for engagement. Everyone grasps the need to re-
examine processes when a raw material or component is changed. The need for new processes
for new products is apparent. The problem, then, is to make a DQ issue look like a process
issue.
This is the point at which current data quality efforts fail. We have shown ourselves unable to
view quality problems through a process filter. Instead, we bravely step forward and volunteer
to fix the bad data. It will take a strong leader to resist this tendency and get the herd of cats
back into the cage.
When we begin to view data as a raw material, component or product, we will immediately begin
to see opportunities for the quality control (QC) of data. Like any other QC effort, though, we
will find that we can’t anticipate everything. We will be able to provide guidelines that will help
us sort out the unanticipated situation and get us back on track.
The original definition of a datum, set of data, or information object in an organizations is
absolutely critical in laying a foundation for quality. A good foundation is not a guarantee of
successful data management but a poor or non-existent foundation is a predictor of problems.
The problems may range from occasional unreliability to macro level failure affecting an entire
organization, market segment, or even a nation.
It may and probably will be impossible to put the genie back in the bottle. Repairing “bad” data,
while it may get us over one more hurdle, is fraught with danger at best and at worst, the process
of repair erases clues about the cause of the problem. Too many times, the repair actually
succeeds in creating even bigger problems that may not be noticed until long after it is too late to
undo the operation.
Knowing that it will be costly and very likely impossible to walk back a problem to its source
when the source has never been documented, we should be highly motivated to spend whatever
time is needed to ensure that we have documented the source of the data and the uses and
relationships in which it is involved.
Why don’t we do this then?

Use of Varchar as Default Type


Once again, let’s remind ourselves that technology is not the cause of data quality issues, but
merely the catalyst that gets the quality issues to manifest themselves. The root cause of all data
quality problems (and quality problems in general) is in the way humans get things done.
Data modeling tools and indeed the database management products themselves began almost
immediately to offer the ability to define default settings for the generation of new data items.
These products began their lives with a more rigorous notion since the designers were tuned in to
some of the needs we have been discussing.
Either just before or just after the initial release of the first products of these types, someone in
sales or an initial customer said, “Wait, this is too limiting. We won’t be able to crank things out
fast enough if we have to think and analyze before generating our database.” OK, no one is
going to admit to having said this or even to having heard this. Nevertheless, it has been said in
nearly this exact form in meetings when it was suggested that a policy of avoiding any use of
default typing be adopted.
It is easy to see how the use of defaults could make a programmer or modeler appear more
productive. Let us look just a little deeper, however, because this choice, like all choices, has
consequences.
The first thing to understand is that the modeler or programmer is never the one who must suffer
the consequences. In this respect, this choice is a good example of suboptimization. The cure is
to inhibit the ability to choose suboptimal strategies by using standard procedures and tracking
their use (quality assurance).
Recalling our discussion of data declarations, you may have noticed that it is possible to create a
data item that will accept any value that is supplied. Knowing that a Long Integer is eight bytes,
Char(8) will hold either a very large number or a sequence of letters and/or digits. The
CHARACTER type accepts any value from 0 to 254 (00000000 to 11111111). CHARACTER is
useful only in limited instance in which a data item will always have values that can be
represented by a single character.
A string is the most common type of value that we encounter. “Columbus” is a string of length
eight (8) for example. “OH” is a string of length two (2). This type has one distinct limitation.
You must specify its length. How many characters will be needed to store the value of the data
item? Because early programmers were inconvenienced again and again by having to go back
and change the declaration, recompiling the entire program, when new information came to light
that required more characters, they learned to declare the length to be a value that they thought
would be larger than would ever be needed16. The drawback was that the computer always

16
They had to walk a fine line because they could always be criticized for wasting memory which was quite expensive in those days—but that’s
another story. This is also the source of the infamous Year 2000 or Y2K problem. Programmers allocated two characters for storing the year
portion of a date. This was entirely reasonable in 1959 since who could have guessed that this code would still be in use 41 years hence.
returned n characters where n is the number of bytes reserved. When the forces of change made
even their most extravagant estimates inadequate, tool developers looked for another and better
way.
Thus was born the VARCHAR or variable length character string. This data type solved a host of
problems in that a programmer could name a maximum length for the data item and leave it up to
the compiler to ensure that memory would not be wasted. The computer only used memory
enough to contain the contents of the string and kept track of how long it was. It retrieved only
the actual string value and not all of the memory reserved for it. Those who weren’t there cannot
possibly imagine the sigh of relief that ensued.
Almost immediately VARCHAR became the type of choice for a default setting. We had the best
of all possible worlds (or so it was thought in the glow of newfound freedom). It actually took a
bit of time to realize that after-glow and myopic fog have much in common. In truth, were we to
examine the default settings used by our modelers and programmers right this minute, we would
find that upwards of 99.99% are using VARCHAR (n), where n is often 255, as their default. You
may actually want to make a note of any that you find who do not use default settings because
they may be able to explain to you some of the problems you are experiencing (or they haven’t
discovered yet that defaults are a feature).
Recall our discussion of body temperature and its definition. The use of a VARCHAR data type to
store what is actually a real (floating point) number is the root of that problem. If a data item, for
example BODY_TEMP, were declared as REAL (or FLOAT or SINGLE or DOUBLE…), with
appropriate range values established, it would be impossible to store values of “NR” (not
recorded) or 120 or 50. If we made it mandatory as well, we could insure that our nurse or
clinicians always provided a BODY_TEMP value.
Why don’t we do these things? The only reason is suboptimization. We shift the cost of error
detection and handling to someone else far removed. We keep the cost of development down
and create a much larger cost that includes distrust, lack of credibility, inconsistency and real
money for some faceless person in some other part of the company.
The fix is to adopt quality assurance processes that assure that the data product won’t just be
thrown into any available container but will be placed into a container designed specifically to
protect it. Be assured that attempts to do this will get a lot of pushback accompanied by wailing
and gnashing of teeth. This is the standard response to any attempt to eliminate suboptimization
and the best way to deal with it is constancy of purpose together with trust. “You’re not going to
be held to old metrics. We’re going to monitor this for a while before we establish new metrics.”
There is an economic or financial tradeoff to be discussed. If we know that we must pay now or
pay later, we very often choose the later option simply because we hope that we’ll be better able
to afford it later or that we’ll be the lucky ones who never experience the breakdown.
The default use of VARCHAR(n) is an excellent illustration of this tradeoff. We trade relatively
instant gratification in the form of a usable system for the certainty of high maintenance costs
and DQ issues later. The tradeoff is especially easy because no one talks about future costs.
Everyone congratulates himself for being part of a successful project, everyone updates their
resume, and a few are promoted. When the costs hit, there is an entirely new cast of characters.
The new cast goes to work without adequate specifications or definitions, making “educated”
guesses about the intention of the original project team. They often change the meaning of
things just enough to make the next maintenance effort even more costly. The result of this
iterative model is an entirely new system after 1-2 maintenance iterations. A side effect is a
substantial amount of data that is useless or nearly so. How do we even begin to calculate the
cost of this?

Record vs Set Operations


You are going to hear the term “de-normalization” or “de-normalize.” Usually it is heard in
discussion of “performance” by which is actually meant response time. This is the time from the
instant you click the button to get your report and the instant when the report appears on your
monitor. Great care must be taken when these terms are being thrown around. They are the
equivalent of arm-waving. A magician will wave his arms in grand gestures to distract the
audience from the small movements that are his bread and butter.
When relational database management was first unveiled to the world at large in the 70s, it
promised a more understandable way of envisioning your data and its organization. It included
three principle operations for retrieving data, project, select and join. Without going into
unneeded detail, these operations in combination came to be known as a query and were
guaranteed to produce a set of data given that some very basic rules were followed when storing
the data.
The rules were collectively called normalization. Their purpose was to guarantee that
 the project, select and join operations could be interpreted by the database engine without
ambiguity
 all the data could be accessed via a query (project/ select/join)
 a set of data could always be returned in response to a query
Normalization is a set of characteristics to be applied to relational data structures. If
normalization is ignored, the resulting structure is not truly relational no matter that the engine
being used for access is a relational DBMS (database management system). If the data structures
are not normalized there is no guarantee that select/project or even SQL will be able to return the
desired data set from the structures (the database). Five normal forms (1st through 5th) have been
defined. 1NF through 3NF ensure that queries (retrievals) will return predicted (and predictable)
results. 4NF and 5NF help keep your data consistent because they eliminate the possibility of
update anomaly by eliminating unneeded redundancy of stored values.17
It isn’t possible to make a database relational just by using a relational language (like SQL) to
query it. Imagine that a fastener does not become a nail merely because we use a hammer to
embed it and a nail does not become a screw simply by virtue of embedding it with an electric
screwdriver. Nails and screws have different properties and those properties are what make each
a good choice for specific applications.
“De-normalization” has its roots in the typically conservative human response to any change.
We resist and when the resistance begins to no longer make sense, we rationalize to create the

17
A readable description of the normal forms can be found on Wikipedia (www.wikipedia.com) with a search for “database normalization.”
sense we need. When relational data management emerged it was thrust into an already well-
developed and mature data management environment. Many compromises were then introduced
in order to create a better fit into that environment. The more we understand of the principles
that led to relational data management, the better equipped we are to recognize the compromises
and decide when to use them and when to ignore them or even to exert the effort needed to
overcome them.
As data processing advanced from its infancy, the need to store and access ever increasing
volumes of data virtually demanded an increasing reliance on technology. Remember,
technology is about speed. We were doing everything that we now do before the computer was
invented but it took longer and used more people.
The history of programming is dominated by the idea of processing. In fact, data processing was
an early name for what is now known as IT (information technology) or IS (information
systems). Programmers carried the burden of technology application. They eventually became
known by new names like developers, software developers or software engineers. They were
differentiated as systems vs. application programmers. Sometimes they were assembly language
programmers or firmware programmers or PLC (programmable logic controller) programmers.
Programmers by whatever name continue to dominate the technology world though they may
now do it from Mumbai or Singapore.

Programmers Like Process


All of this is a lead-in to the rest of the story which is that programmers are not, as a rule, driven
by data requirements. Oh, sure, they like data requirements, but only because they provide a
convenient escape when fingers are being pointed. It’s not that they ignore data—far from it.
They are sensitive to data because the algorithm must assume input that conforms to the data
specification and the output of the algorithm must be satisfactory to a customer.
Programmers like things that they can control (who doesn’t) and they are focused very narrowly
on the algorithm itself—the process. The psyche and actions of the programmer are similar in
every respect to those of the chef. Both use imagination and creativity to combine specified
ingredients into something that people will appreciate. Do this and this and then do that. It isn’t
a recipe—neither a chef nor a programmer would deign to follow a recipe. An algorithm is
different, though.
A chef (or a painter or sculptor) studies the building blocks of his art which are in the form of
algorithms like flaky crusts, use of fresh herbs, luminous backgrounds, use of various tools, etc.
Then he or she combines the basic algorithms in new ways (creation) to create aesthetically
pleasing results. Success depends on mastery of the basic algorithms and the proper selection of
materials or ingredients. Every now and again, the artist will make use of substandard materials
but will do so in a new way that shows off a hitherto unknown quality. The product may be
acclaimed for that quality despite the sub-standard material.
The programmer’s materials include
 Tools—the various software products including the programming language, libraries of
reusable code, change management products, configuration management products to
keep track of which pieces go with what systems, version and releases, and others
 Customer Processes—these are generally known as specifications. They often start very
vague (screen resumés) and gradually accumulate greater detail as the programmer
begins to ask questions.
 Development Processes—Usually known as methods or methodologies but sometimes as
standard processes. There is actually a difference but for purposes of this audience we
can assume that methods and standards are equivalent.
 Data—input to and output from the system.
In our example (“screen resumés”), note that we have a verb and a transitive object. The object,
resumes, receives the action of the verb (screen). Humans (and other animals) take note of
action before anything else and may use the action to help define the object. (Helbig,
Steinwender, Graf, & Kiefer, 2010). Programmers, despite occasional counter-indications, are
human and they focus first and foremost on the action or process. This is worth remembering:
humans see the action (process) first.
A programmer (and I have personal experience here) develops algorithms that fit the process and
THEN looks for the data to feed the algorithms. By that point, he is feeling like his job is just
about done. Imagine the surprise when the data feed turns out to be inconsistent! Now the
completion horizon suddenly looks much more distant.
He is faced with two choices:
A. He can solicit assistance to trace the inconsistency back to its source and insure that the
inconsistency will no longer be allowed to occur.
B. He can add new algorithms to his program designed to recognize inconsistency and
accommodate it somehow.
Try to imagine yourself in this situation. Knowing what “everybody knows” about
communication, cooperation and business structures, which option looks like the simpler (and
therefore easier and less costly) path? That’s the choice that programmers, left to their own
devices, almost always make.
Theoretically, choosing B won’t prohibit us from later (after the deadlines have been met) going
back and implementing A. The only side effect of doing things in the order B then A is that the
system now contains a certain amount of “dead” code that makes maintenance more costly. The
code to accommodate inconsistency will never execute if the inconsistencies no longer exist and
no one will ever notice the few thousandths of a second that it takes to recognize that the
inconsistency is not present.
The real cost comes from the fact that this scenario repeats itself in mind-boggling quantity and
each time it does the result is a significant increase (up to 100% or more) in the size of the
system as measured in LOC (lines of code). Every time we increase LOC we increase errors,
especially because these weren’t designed LOC. Rather they are “finger in the dike” solutions
invented on the spot and often outside the boundaries of any methodology or SOP (standard
operating procedure).

Record vs Set
In the dark ages of system development the system accepted data records on paper cards (or
tape). The data could contain all the information that could be recorded on an 80-column card
carrying one character per column. The invention of magnetic tape input allowed the records to
be arbitrarily long as long as they were separated by a special character. The last record was also
followed by another special character, the EOF (end of file).
The next development was the disk drive which allowed for reading what appeared to be random
records (read record number 11223344 for example). Further advances included new structures
for files so that the records could be stored in smaller segments scattered about the disk (or even
several disks) and still be accessed sequentially or randomly as needed.
Developer/Programmers, creative as they are, found new algorithms to better manipulate the data
in all of these varied formats (because, due to the frugal nature of business folks, they were all
kept around until a “business case” could be made for replacing each one). These algorithms
soon became part of the languages used to write the programs. The languages, therefore, were
almost universally oriented to record-at-time processing.
Now comes a new development that trumped all the media developments (cards, magnetic tape,
disks), structure developments (sequential, ISAM, linked list, hashed…), the algorithm/language
developments (assembler, Fortran, COBOL, Modula, Ada, C (and all its variants), SmallTalk,
Java…). We no longer needed to process one record at a time. With the advent of the relational
dbms (database management system), not only could we leave the how of accessing data records
to the dbms, we could actually specify via a relational query language (the most widely known of
which is Structured Query Language or SQL) what kind of records we wanted and even which
part of the record.
We could ask for
 The set of order records created since 5:00 PM on July 15 of this year.
 The set of customers for order records created since 5:00 PM on July 15 of this year.
 The set of products for order records created since 5:00 PM on July 15 of this year.
And the really beautiful part—we could manipulate the resulting set in the same way.
We could ask for
 The (zip codes from the Customer_Address table) where customer ID is the same as
customer ID of the (Orders table where Order_Date is greater than 5:00 PM on July 15 of
this year).
Further, we could change the entire set with a single query! No longer were we forced to plod
through each record of the Orders file checking whether the creation date met our specification
and, if so, writing the record to a new temporary file. Then plod through each record of the
Customer file, comparing the customer ID to the customer IDs in our temporary file and writing
the Zip Code of the matching records to yet another temporary file for eventual reporting. Now
we could tag each of those customers with some marketing indicator (such as “Current” or
“Recent”, etc.) with a single command and let the dbms decide how best to accomplish it. And,
by the way, the dbms has optimization algorithms built into it that far exceed the abilities of the
average programmer.
This is a good time to give database administrators a plug. The DBA comes with an
understanding of the internal workings of the dbms and can guide both programmers and
architects to deliver the quickest possible response times. If you are dependent on relational
databases, then you need a DBA. This role will pay for itself very quickly. Remember this, too,
the DBA’s raw material consists of memory and processor cycles. With sufficient raw materials
a good DBA can deliver unimagined consistency.
The bottom line is that now we only have to define the set of data we want to work with and it
will be delivered to us.

But what about process?


No sooner was this exciting capability rolled out to the world than requests (no, demands) began
to come in from the programming world. They didn’t trust a black box to do what they already
knew how to do. They were willing to use an rdbms (relational database management system)
but only if they could use it in the same way they were accustomed to.
The concept of a “cursor” was added to enable the programmer to step through a table, just as he
had always stepped through files, in order to examine each row/record to see if it qualified for
further processing. This marked a low point in the history of data management.
To be scrupulously fair, programmers were accustomed to being the “ghost busters” (who’re you
gonna call?) of the technology world which, in the eyes of almost everyone else, was the world
behind the monitor. Maybe the world just didn’t realize yet that, while the processing side of
technology would still be important, the information side was taking over. New paradigms were
involved as well as new knowledge and new skills. Yet another point must be made in the
interest of fairness to programmers. Remember those normalization rules we discussed earlier?
Their sole purpose was to insure that the data would be accessible using the select/project/join
operations of the relational algebra. Without normalized data, record-at-a-time is the only
option.
We have all heard the maxim that, to someone whose only tool is a hammer, the world begins to
look like a nail. In our story the tool is record-at-a-time processes and all the various algorithms
that went along with it. These unfortunate people were asked to build business structures with a
new tool, the relational database. As we have also heard and doubtless experienced, even the
most advanced tool can be used as a hammer.
Existing data was often simply dumped into relational tables in order to get the conversion
initiative off the ground as quickly as possible. The relational vendors didn’t help matters when
they made it seem that no special knowledge was necessary and they designed and delivered new
types of hammers. It is understandable that executives and sales managers reach for market
share at all times and with any means available to them. Entirely new fields—technology
transition, change management to name two of the best known—arose in part because of the
difficulties experienced here.
Recall that the inventors of relational database delivered a language (or languages, for there were
at one time several competitors18) that were designed specifically for defining sets of data.
Extensions such as cursor and trigger and stored procedure were added at the insistence of

18
In fact, SQL emerged as the “winner” not least because it was one of the first languages to incorporate the cursor. It was also receiving heavy
backing from IBM at the same time.
programmers and their use meant that a query expressed in the given language no longer
produced a predictable (that is consistent) result. In addition, the entire concept of a relational
dbms was predicated on the assumption that the data was to organized following relational
principles. These principles became known as normalization rules. (Database Normalization,
2013)
If the tables of a database do not conform at least to 1st normal form (1NF), it may not even be
possible to specify a query that will produce the set we want. The first rule of relational design is
that a relation (or table) may only contain information about one kind of thing. Often a file in
pre-relational days might consist of records that represented a history of some activity. This
allowed the programmer (or the file clerk) to pull a single record (think file folder) about a client
(for example) and thereby have all information available about that client. Medical records were
(and are) a good example of this.
If files like this are loaded into relations (tables) most queries will require some programming to
get the answer we want. Obviously the “right” thing to do would have been to reorganize the
data, loading it into multiple, related tables with each containing one kind of thing. If we did the
obvious thing, we could enjoy all the benefits of relational database management BUT if I’m a
programmer I’m not getting paid to punt problems off to someone else. In fact, in any given sub-
optimized organization we will find it an extremely rare occurrence. My schedule is practically a
sure bet to finish first in any competition.
If I’m a programmer, I may not know enough about normalized relational design to actually be
able to improve things. In many areas of technology a poor effort can be worse than no effort. I
do, however, have at my disposal cursors and stored procedures and triggers that, together, will
let me overcome the data design deficiencies.
It can’t possibly be emphasized enough that
 One of the most important motivations for the creation of the relational model for data
manipulation is to avoid the need to call on a programmer every time you want some
data.
 Use of cursors, stored procedures and triggers guarantees job security for the
programmer.
It saves no time at all to perform a single operation repetitively when it can be done once on an
entire set. For example, in a crowd of people we need to identify those with a valid ticket to
move them to another venue. We could poll each member of the crowd, ask for their ticket,
examine the ticket for validity and then move them, one by one, to the alternate venue.
Or we could ask the crowd to move to the alternate venue if they are holding a ticket that meets a
set of specifications. Which would be your choice? A programmer will see the value of set-at-a-
time in the “real world” but will often fall back on what he knows best in the software
development world.
This is creating problems for you. The problems include
1. Unnecessary complexity which resolves to cost
a. In development
b. In maintenance
c. In communication
2. Inconsistency in the database (we know this as poor data quality)
3. Longer response times (frequently MUCH longer)
One last time now, fairness dictates a reminder at this point. We have done much to encourage
this behavior on the part of our developers. We could have made the effort to reorganize our
data resource (normalization) to deliver set results in all cases but, because of ignorance and
perceived cost, we didn’t (and don’t). We could become a little more knowledgeable about the
entire subject of data management (as you are doing by reading this book) so that we are better
equipped to ask relevant questions and judge whether the answer is smoke or substance. We can
at least recognize that the problems do not have technology causes so that we can insist on
answers that are not couched in techno-speak.
Apply update x to all <your state here> customers
not
If this is a <your state here> customer apply update x

Complexity (Taking a Step Back)


Very often we find ourselves up to our eyebrows in some messy activity and it occurs to us to
ask how we got there and, even more to the point, what we are doing there. It may be appropriate
to ask these questions right now.
What is it that we’re after in our quest for Data Quality? What are the characteristics that will
tell us that we’ve arrived?
A partial list might include:
 ETL (extract-transform-load) jobs do not fail because of an unexpected data value
 Mass mailings result in fewer than x% returned pieces
 Source and validity (including completeness and correctness of specific data are auditable
and audits identify no issues
 The purpose of a report is clearly identified and data included is credible for that purpose
 Duplicate customers are always identified and merged within x hours of initial capture
 The age of any data is readily available and most recent values are always displayed
unless a history is specifically requested
 Master data values are consistent (conformed) wherever used.
 All data values are consistently represented
What role does complexity play in each of these problems? Complexity is unavoidable in any
cooperative venture and the more participants taking part in the venture or the more ventures we
take on, the faster complexity increases. If I want to go to a movie, I probably have a particular
one in mind and I simply have to find a theater where it is playing.
As soon as a second person is involved, each decision requires discussion and resolution and
each answer becomes—not the best answer but the best compromise. Many factors now become
part of the compromise until I may find myself buying a ticket for something I no longer care
about. In the world of information systems there is often a gulf between
 what I want and what we need
 what we need and what we can afford
 what we need and what is possible
These can be difficult negotiations at best but if we allow unnecessary complexity into the party,
everything becomes drudgery and good decision may be impossible to achieve.
The best leaders are the best decision makers. They don’t get distracted by someone else’s
complexity. Please be aware as well that each party to a new system has their own concerns and
that sometimes the concerns involve complexity. Each party must be allowed to say, “I can’t
deliver that.” It may be that we need to find another party who is able to deliver it or it may be
that we have asked for something that is not possible within the constraints that have been
defined. Caveat emptor! The buyer must beware because a vendor will often put themselves at
risk in order to get the sale. Previously satisfied customers can cast no light on your purchase
unless you are exactly like those customers and are buying exactly the same product.
Information systems are not the same as nails, nuts and bolts, or laundry detergent. The buyer
can never afford to relax.
In the world in which we operate there are many circumstances in which one party has enough
leverage to compel another to agree to something that is impossible. It may happen that a party
discovers after agreeing that what they have agreed to is impossible.
All parties owe it to each other to be as clear as humanly possible about their goals and, because
ambiguity is a given, to answer questions about contexts, constraints and goals until the other
parties are satisfied.
Through all of this, the best answer to complexity is going to be consistency. Inconsistency
immediately causes doubt and multiple inconsistencies will cause a complete breakdown of trust
with eventual failure. Every authority on Quality tells us that Consistency is the only starting
point for Improvement. If we can’t make a process consistent, then we need a new process.
Consistency. Consistency. Consistency. Consistency.
Now you may say that what has been outlined here is an ideal and is not attainable in “my”
world. If that thought did cross your mind then here are some things to think about as we
conclude our survey of how we got to this point.
1. No single thing that has been suggested so far is unrealistic or unattainable.
2. Nothing suggested is wrong.
3. Implementing any of the suggestions is liable to be difficult.
4. Implementing all of the suggestion may not be feasible.
5. Absolute quality requires full implementation of all suggestions.
6. You must choose and prioritize, recognizing the quality compromises you are making.
7. The quality you end up with is what you chose.
8. #7 is true no matter what but at least now they can be conscious choices.
Bridge
If we want to truly progress, the ability to let the past be past is a very useful thing to work on.
We can’t make improvements without being present in the now. Dr. Deming compared
management based on historical data to driving a car by looking only in the rearview mirror. If
you don’t know where you’re going, it really doesn’t matter where you’ve been. The
management problem is the definition of the goal.
That said, and taking quality improvement methods as a model, effective improvement requires
that we know what has happened in the past. That data is what we used to establish control
limits for our process. Control limits, like the lines on a highway, are what enable us to advance
with assurance. Once we have established those limits we can forget about the past and focus
entirely on the present via our Key Process Indicators. We continue to record data because the
control limits are constantly being revised. Eventually the limits will take us to our goal.
Nothing in life or in business is quite as simple as driving down a highway. We have even
reached the point where we can entrust travel on public roadways to a computer. We need the
metrics in order to know that the changes we implement are, in fact, improvements. The
computer driving an automobile is constantly engaged in testing trends. “Am I getting closer to
the boundary?” “Am I getting farther from the boundary?” “Are obstacles present?” “Is the
obstacle moving?” “How much time do I have to make a decision?” In any quality process (one
that is operating within established limits) there must be a great deal of information available
about the process itself. The good news is that this information is simple data and can nearly
always be monitored by a machine (a logical machine). Remember “simple data, simple
decisions?” We can even ask our controller logic to initiate simple corrective action to help keep
the process on track.
Do you have metrics describing your past performance? Can the metrics be associated with a
specific process? When someone shows you numbers indicating dramatic improvement (or
degradation) in productivity or in cost control or profitability, what is your first question? The
probability is high that you will ask one or more of the following:
 What changed?
 How did that happen?
Or simply
 Why?
The astute manager might ask why they weren’t aware that these numbers weren’t on the radar.
If they were on the radar as an expectation, the conversation would be entirely different. Or
maybe your organization doesn’t have that kind of radar. Maybe no one would commit to an
expected improvement for fear of being wrong, or to an expected degradation for fear of being
right.
The new and improved, six-sigma19 organization will have radar. Suboptimization will be
recognized. Information will be freely available and no agenda will be hidden. When someone
hypothesizes about a process change, anticipated results will become key process indicators
(KPI) and we will be alert to unexpected trending from the moment of change. In the SPC
world, three successive measurements moving in the same direction are evidence of a trend. Our
processes will be designed such that KPIs are defined for real-time action, for human action and
for financial action.
An example might be sandpaper production with the customer being a large automobile
manufacturer. We will have processes (and KPIs) that will be monitored by the production
control system so that it can take immediate action to correct a trend toward diminished
reflectance, a predictor of cut or how well the product will perform in its intended application. If
the process control system has to make too many such adjustments or if it can’t bring the
reflectance KPI back into control, the human processes will make adjustments to direct the
product away from the large auto manufacturer and toward a more tolerant customer, the home
handyman. If too much product is diverted from the premium market, then fiscal adjustments,
possibly requiring new material sources, or retooling of the production line might be necessary.
It should be clear that the sooner we can identify a problem or even a potential problem, the less
it will cost. It should also be clear that data is what makes all of this possible. We can’t
emphasize too much that credible, reliable, consistent—what we call quality—data is going to be
absolutely essential in making these improvements work.
Because of this recognition, we will want to make improving and guaranteeing the quality of our
data the first priority. Be warned though, that the executive management (or owners) of your
business are not thinking about data or even information. They are going to be focused on costs
and profits. When talking with these folks, you will get farther faster by avoiding any reference
to data altogether. Be sure to frame everything in terms of costs and bottom line. You may
benefit from incorporating organizational alignment, consistency and reliability but above all,
remember that this is a sales presentation. There is no need to show how the product is made.
Make them ask and only answer the questions that are asked. You can make the sale without
referring to data at all but be prepared to deliver your product.
It should have also become clear that throughout our discussion of quality the exploration of
each path led to people, their attitudes, intentions and expectations. It is not you and the
organization it is you and the people in the organization. If you want to have lunch and kick
around some ideas, the org chart is a serviceable tool. If you want a roadmap that can give you
some assurance that everything is covered, the org chart is a serviceable tool. If you want to feel
important, the org chart is a serviceable tool. If, however, you want lay track in new territory
and meet time and budget constraints, you had better focus on relationship with the inhabitants of
the communities that will be affected.
There is good news! The next section will tell you everything you need to know.

19
Note that this term is used generically to indicate a high-functioning, quality organization and not in reference to the
trademarked training and certification programs. Six standard deviations, represented in statistics as ᵟ (sigma), from the mean of
a process whose performance can be represented as a bell-shaped curve will include virtually 100% of results. This process is the
ideal of consistency.
Current and On-going Status of Data Management
When you become lost or disoriented, the best first step is to sit down where you are and take an
inventory. This is a MacGyver approach. MacGyver was a TV phenomenon in the late 1980’s.
He was known for using materials at hand to escape from seemingly inescapable scenarios. The
scenarios often involved bombs with timers counting down the minutes and seconds to doom.
Our situation may not be as deadly but the simple methodology of understanding what you have
and how you can use those things to construct what you don’t have is a useful one in real life.
Survival experts advise
 Don’t panic
 Stay where you are unless you are sure you can make your situation safer
 Use what you have at hand to improve your situation
 Prepare for rescue
 Prepare to stay alive until help arrives
In our history, we learned to recognize some of the perils that have forced us into the current
situation. We reviewed some approaches to staying alive by harvesting low-hanging fruit if
necessary and planting crops if possible. We learned the names of some of the “beasts” that we
have to look out for—inconsistency and suboptimization.
Now it’s time to learn some techniques, not only for staying alive but for finding our way back to
civilization. Recall the example of the orchestra, where rehearsal is the key to beautiful
alignment and individual mastery is the ticket for admission. This book does not represent
individual mastery in any area. It is full of signposts and trail markers pointing the way to the
areas that must be mastered. It’s too much to hope that you will find or become the one person
who has mastered all of the instruments. You will probably need several different virtuosos but
you must remember that rehearsal together is essential. Such rehearsal must be part of the job
description for each.

Setting Course
Here are some roadblocks that have already been encountered by others in your position. The
more we know about roadblocks, the better we can overcome them.
1. Corporations are spending a rather large portion of their budget on data quality
mitigation—it just isn't recognized as such.
2. A substantial portion of the FTE's within your company (which is just like any other in
this respect) are owing their job to the need for quality information—once they realize
that you mean to move them to the excess pool, they may not be supportive of your
efforts.
3. We fail to take action that will reduce "problem" quality because it costs too much even
though the actions are well understood and within the grasp of even junior employees.
4. We can't present a value statement that can be implemented unless it relates to a very
specific subset of the data resource.
5. We create new data quality issues every day by not doing what we are capable ourselves
of doing and then attempting to make others accountable.
When your vision for your organization explicitly includes quality data, you already have 80% of
what you need. W. Edwards Deming is famous for many reasons but one of his contributions is
useful no matter you stand on Statistical Process Control. He gave us the famous 14 Points
which, like Covey’s 7 Habits, are a worthwhile addition to any office wall.

14 Points (W. Edwards Deming)


1. Create constancy of purpose for improvement of product
and services
2. Adopt the new philosophy
3. Cease dependence on mass inspection
4. End the practice of awarding business on price tag alone
5. Constantly and forever improve the systems of production and services
6. Institute modern methods of training on the job
7. Institute modern methods of supervision and leadership
8. Drive out fear
9. Break down barriers between departments
10. Eliminate numerical goals for the work force
11. Eliminate work standards and numerical quotas
12. Remove barriers to pride of workmanship
13. Institute a vigorous program of education and training for everyone
14. Create a structure in top management that will push every day on the above 13 points.
Although Deming himself elaborates on the 14 Points in (Deming, 1982), let’s consider each of
the Points from the standpoint of data and its management.

1. Constancy of Purpose
A read of Total Quality Control for Management, (Nemoto, 1987), shows very clearly the result
of adherence to these principles. The question is whether you, within the business culture of the
United States (or Great Britain, Germany, France, China…) in 201x, can generate the same
constancy of purpose.
Let’s be clear that we are discussing a transformation here. Nothing less will get the results we
require. It takes 21-28 days to form a new habit according to PsyBlog (How Long To Form A
Habit?, 2009) and the attention span of our culture seems to be growing shorter rather than
longer, so this is no small thing. Constancy of Purpose is what drives all of the other principles.
Constancy of Purpose isn’t a completely unfamiliar concept. It is very similar to Vision and
Mission. A “man on a mission” has constancy of purpose that lasts until the mission is
accomplished. A person with a vision may sustain constancy of purpose driven by the vision for
a lifetime. Constancy of purpose isn’t the same as single-mindedness, which won’t allow us to
address anything else until that focus has been resolved. Constancy of Purpose can be goal-
driven or it can be method-/process-driven. It’s best to allow the method to emerge but
sometimes we know in advance the method that must be used.
If our purpose is to stand at the summit of Everest, many intermediate goals suggest themselves
and we will have a choice of methods up to a certain point. From that point onward, however,
we know that putting one foot in front of the other over and over again is the only method.
If our purpose is to walk across the Golden Gate, our methods will depend on
 Where we are now
 Whether there is a target date
 Financial resources available
 Physical resources available
Our purpose is to have demonstrably credible, reliable, timely information where it is needed
within our organization. Again, we must be clear that this isn’t simply a nice thing that will give
us an advantage over certain of our competitors. This is a matter of survival—or at least
profitability which is closely related to survival.

2. Adopt the New Philosophy


This one sounds pretty simple—trivial even—until we realize that Principle 1, constancy of
purpose, must also be satisfied. In that light, adopt takes on a new meaning. Adopting the new
philosophy will require that we internalize all 14 Principles and use them to guide all decisions
within the context of our enterprise.
It’s all in or fold. There is no in between. (Nemoto, 1987) is a good resource as a case study in
what adopt the new philosophy means.

3. Cease dependence on mass inspection


Most businesses have gone to a sampling methodology for any inspections. They may believe
that they have already implemented this principle. They would be mistaken. As Shewhart,
Deming and Juran have claimed and Toyota and others have demonstrated, inspection in any
form or quantity is practically (meaning pragmatically) useless in terms of quality assurance.
Control of quality is accomplished by monitoring the key process steps. When they go out of
control (meaning beyond computed limits), quality of output is de facto out of control.

4. End the practice of awarding business on price tag alone


The only reason that this is still done is that people lose sight of their product and its quality and
focus exclusively on short-term dollar-cost metrics. It should be clear that sub-standard raw
material (input data) will almost invariably result in sub-standard product.
Whenever we shop for a new information system, those who understand ALL the eventual costs
are not part of the decision path. Speaking from multiple experiences, one of two things will
happen:
1. Decision makers will become emotionally attached to a particular solution much as new
car buyers get attached to the red convertible.
2. The team gets tired of the activity of evaluating options and takes the easy path with a
known vendor or simply chooses the option with the lowest initial cost.
The way out of this situation (Point 4) is to have data people on the point. Once data-
knowledgeable people have endorsed a solution (and only then), it can be added to the market
basket for consideration by those who will actually use the new system.
When outside influences (politics, hidden agendas, suboptimization…) dictate that the low bid
must be used, quality issues, whether of internal or external product, become irrelevant.

5. Constantly and forever improve the systems of production and services


This principle recognizes that product is the outcome of production and services and that
becoming single-minded concerning the product while ignoring the system of production and the
system of service cannot result in the outcome we want.
Further, we must acknowledge that quality is a moving target, requiring constant effort aimed at
improving the level of quality that we have now. Last, but often overlooked, we must ensure that
that improvements we implement are permanent.
In order to accomplish this principle we have to make certain that the improvements do not
depend on a specific leader (who may be here today and gone tomorrow), but rather become part
of the culture so that new employees understand that nothing less is acceptable.

6. Institute modern methods of training on the job


The way to get cultural change is to have new employees observe and participate in the quality
culture until they have internalized it. This is best accomplished by having them mentored by
those who have already bought in. This concept is well known but frequently ignored in the rush
to full productivity. Ignoring this always carries a cost.

7. Institute modern methods of supervision and leadership


The implication of this principle is that we should examine our existing methods of supervision
and leadership to make certain they are in tune with these principles. If our current methods
promote or condone suboptimization, for example, we need new “modern” methods.

8. Drive out fear


“The beatings [or layoffs] will continue until morale improves!” Fear has no place in the
workplace. People (and processes) perform best and most consistently when there is intrinsic
motivation in the form of (for example) feelings of competence, pride of workmanship, positive
feedback, or immediate feedback of any kind, etc.
In particular, we do not get quality as a byproduct of fear. Fear is often the result of inadequate
specification or inadequate training. Uncertainty generates fear and anxiety while confidence
born of mastery and fostered by good training and good processes produces consistent results.
Remember that consistent results are the only basis for improvement.

9. Break down barriers between departments


This is perhaps the most effective way to root out suboptimization. When people see themselves
as receiving inputs tossed at them over a wall or as throwing their outputs over a wall to them
they feel free to set their process up to make their own life easier. When fear is present, they feel
compelled to do so. Feelings of isolation encourage all of the negative behaviors we need to
eradicate.
Amazing cost reductions and quality improvements can arise simply by having representatives of
supplier departments sit down with representatives of customer departments. In general,
whenever we are experiencing process issues without clear root causes, the first place to look is
at the boundaries. Just as in world politics, the boundaries between states (or processes), also
known as the frontier, are the areas of least confidence and most anxiety.
These barriers are obstacles to relationship.

10. Eliminate numerical goals for the work force


What’s the difference between “128 days without a lost time accident” and “No lost time
accidents!”? One is a metric and the other is a goal. Both are useful. How about “40 widgets
per hour”? How about this: “Your hourly pay rate is based on production of 40 widgets. When
(if) your production deviates from 40/hr, your pay rate will be adjusted in proportion to your
hourly production.”?
In general, people like keeping score so that they can improve on their best score. As Deming’s
Red Bead Experiment (CITEC, 2012) shows clearly, exhortations about quotas will produce only
anxiety, stress and fear. It is management’s job to make certain that the processes in use can
produce the desired quota and then let the innate desire to improve take care of the rest.

11. Eliminate work standards and numerical quotas


De facto work standards (expected output) and quotas (specified output) are in place virtually
everywhere. If, for example, we have an annual performance review for all employees and one
of the review items is “productivity”, then we do have a work standard. In most cases the
standard is undefined and left to the discretion of the reviewer. This is the absolute worst kind of
standard. It completely ignores work processes as a factor and leaves the employee vulnerable to
the current whim of the boss.
Some bosses are fair and just and some are not. The real problem is that the process is no longer
under scrutiny. The principle attempts to put the focus back where it belongs.
If we have one employee who consistently outperforms all others by a substantial margin, what
should we do? If we like the employee, we may see to it that they get a better (different) job—
usually called a promotion. Maybe we give them an increase in compensation or some non-
financial recognition. Whatever we do is probably embedded in the company culture. What
these principles try to do is to make certain that whatever else we may do, we first understand
how this performance is accomplished.
Whatever part of the outstanding performance can be incorporated into the work process, should
be. That which can’t be passed on should be noted for improvement to the hiring processes.

12. Remove barriers to pride of workmanship


Many people find satisfaction in surveying their accomplishments at the end of the day or at
some interval during the day. As part of our measurement, we should find a way to enable and
encourage this. If we can also find a way to allow the employee to compare himself to others
doing the same job, we will be better off. An employee must be able to identify with her work
and take pride in it.
Company, department and work group measurements are necessary but no more necessary than
individual ones. It is critical though, that the individual measurements are only for use by that
individual. When bosses compare measurement across a work group or department, the values
must never be identified. The workers will know if they are near the top, near the bottom or in
the safe middle.
As processes improve, you will see a rising tide effect that raises all ships.

13. Institute a vigorous program of education and training for everyone


Every employee must become immersed in the new culture. You will need several different
curricula so that each person sees and hears that part of the culture in which he works. For
example, workers and supervisors do not benefit from knowing about cost coefficients. They
(and you) do benefit when they understand a process that uniformly produces the result that is
desired.

14. Create a structure in top management that will push every day on these 13 points.
When all of management has bought into the new culture (including these Points), the new
culture has been born. A new language will emerge in the boardroom, in internal communi-
cations, in lunch table conversation…
Managers will notice that it is easier to communicate ideas with subordinates. People will begin
to exchange data on purpose and they will speak in terms of measurements. New ideas will be
tested by measurement and only the best will survive, regardless of their source. New things will
assume critical importance while the “facts” that we formerly considered important will vanish
altogether.
Please recognize that it is not possible to bring into control processes that are not defined. Most
managers and business owners would stand before a group of peers and maintain that they know
what the processes are for which they are accountable. They would almost certainly be wrong.
Unless you have instituted audit procedures that consistently produce process metrics that
accurately reflect production experience, you simply cannot say that you know what your
processes are.

Abstraction
The reason things get difficult for many people is because of something that should make things
easier. If you have studied philosophy at all you will have heard of Idealism. In layman’s terms,
our world—at least the part we have built—is constructed of ideas. The natural world may be
less malleable. Even Kant would have run from a hungry tiger.
The notion of abstraction is in such widespread use that most will not even be aware of it. Part
of being human is the ability to use abstraction to reduce complexity.
We all should become a little more comfortable with various kinds of abstraction. The list below
names some common abstractions. For each of them we could name (or picture) several
instances of the abstraction.
Abstraction is both an idea and a tool. When we ask what things have in common we are
creating a collection of things. The commonality is the unifying idea or theme. A collection is an
abstraction. When we actually build the collection each item that we add is an instance of the
unifying idea for the collection. When we ask, “What do these things have in common?” we are
engaging in the process of abstracting. Abstraction is an extremely useful tool in problem
analysis and problem solving.
It is often difficult to discern the difference between the abstraction or idea and an instance of
that abstraction. For example, someone holding a $5 bill (or note) asks “What is this?” and we
answer, “It’s money.” We would be exactly correct if we answered that it is an instance of
money. Money is instantiated in so many forms that it represents a real problem for most people.
If we put on a table a $5 bill, a postal money order, a stock certificate, a bank statement, and an
active insurance policy and asked, “What do these have in common?” we might wait a long time
before anyone recognized that they are all instances of money.
Similarly, if we laid on the table a grocery list, a spread sheet, a customer list, a catalog, a USB
drive, and a graph/chart and asked the same question we might have to ask a hundred people
before any recognized that they are all data. We might never hear that they are all instances of
data.
The concept of instance goes hand-in-hand with abstraction. We can all empathize with former
Justice Potter Stewart of the Supreme Court who, in attempting to define “obscene” in 1964,
wrote, "I shall not today attempt further to define the kinds of material I understand to be
embraced…[b]ut I know it when I see it…" What he was saying is that we can recognize
instances without fully understanding the abstraction. Note that he apparently was engaged in
listing various instances in an attempt to get to the abstraction.
When it comes to data, this is what we all do. We can all recognize what we have before us as
data without recognizing that it is an instance of data. By now you are probably thinking that
this particular discussion has gone far enough. While appreciating the confusion and
bewilderment that discussions like this one create in the casual or neophyte observer (which is a
group that includes almost everyone), it is still important to drive home the idea of data as
abstraction.
Just as there are rules or algorithms about proper handling of money that do not depend on the
instance before us, there are rules about managing data that transcend any instance. These rules
need not be understood by everyone but, if data is important to us in the same way that money is
important, we need to find someone who does understand them and include that person in
decision-making.
Do you know how the big change you are contemplating will affect your money? You will make
sure you do understand the potential impact before you launch the change. Do you also know
how your data will be affected? You might want to discover those potential impacts as well.

Understanding Data and an Introduction to Relationship


If you’re reading this you almost certainly have at least heard about something called data
modeling.
The sound of this—modeling—is attractive. We understand modeling and its uses. We model to
create something that looks like reality but is manageable because a model need only include the
essential. We model in the mathematical sense to give us a way to experiment—to see what
would happen if… We model to gain an understanding of how something is constructed. Each
of these motivations as well as others more attuned to the personal needs of the modeler can be
useful approaches where data is concerned.
This section on understanding data will make use of models and modeling notation as means of
clarifying some concepts and improve communication. Fear not! If you aren’t an accomplished
modeler, you’ll learn everything you need to know as we proceed.
Along the way we’ll address some widespread misconceptions, fill in some missing pieces and
bring all readers to a common understanding of the whys and the hows of modeling data.

Relationship Concepts
Len Silverston and Paul Agnew have written a book entitled The Data Model Resource Book
Volume 3: Universal Patterns for Data Modeling (Silverston, 2008), which is self-explanatory if
you happen to be a data modeler.
If you aren’t a data modeler, please bear with me because I have something for you in return for
your patience.
This is an excellent book, complete and thorough. I bought the book (only an author can really
appreciate the commitment this represents) and eagerly looked through the chapter titles. What I
was looking for was a chapter on relationships.
I have been of the opinion for about 20 years and have been talking to anyone who will listen for
about 15 years about the role of relationships in ER (which stands for entity-relationship)
modeling, more commonly known as data modeling. I started this book in 2006 (it is now 2013)
but it simply wasn’t ready to be written yet.
In 2012, Danette McGilvray (Granite Falls Consulting) presented a “Meet the Expert” session for
the IAIDQ (International Association for Data and Information Quality). She did a great
presentation on 12 dimensions of data quality based on her recently published book, Executing
Data Quality Projects (McGilvray, 2008). I had submitted a question asking whether the 12
dimensions applied to relationships as well as entities and their attributes. Danette gave the
answer that I expected—she had not considered relationships.
The last and best motivation was the short shrift delivered by the Universal Patterns book. Now,
I know that relationships are different than entities and I’m willing to let Len and Paul and
Danette off the incompleteness hook (I’m sure they’re relieved) but not until I publicly
acknowledge their role in motivating the work you are engaged in reading (actually Book 3
through Book 6).
Now, for you “normal” people (non-data people), this chapter will offer insights into
relationships in general. You might deduce from the name “relational database” that relations
play a significant part. We know that relationships play a critical part in our ‘real-world” lives as
well, so this is one of those serendipitous times when we can improve our understanding of our
organizational (work) life as well as our personal life.
I find that it is not possible to talk about relationships in the data design sense without diving
fairly deeply into the mechanics of relationships in general. So, my offering to you is that if you
are “working on” a relationship with someone or if someone is insisting that you “work on” a
relationship with them, you will find concepts here that will be both useful and productive.
So, if your interest is in gaining some additional insight concerning your interactions with
another person or you are struggling with the organization of information (data), this is for you.
For the more technology-oriented, a short (13 slides) slide deck is available for download.
The word, interactions, was used above as a poor substitute for relationships. Because I didn’t
want to overuse relationship, I chose a word that means much less. Relationship as a concept
has a power that can’t be easily replaced. Microsoft, in the thesaurus built into Office, suggests
the following as possible replacements for relationship.

None of these expresses what we mean by the word relationship. None even come close. At the
risk of alienating half of my potential readers: if there is one thing that is the essence of the
difference between female and male thinking, it is the recognition of relationship as something
real, something that can be “worked on.” Sisters, I’m going to do my best here to make
relationships real for everyone. Brothers, trust me, this is going to help you.
We all want improved relationships with our loved ones, friends, and coworkers. We all need to
organize the flood of information in our world—some of us do it for a living. In either case,
relationships are the key and understanding what they are and how they work is critical for us to
feel, and be, successful.
If you are looking for a deeper understanding of entity-relationship modeling because that’s how
you earn your living, you won’t mind the inter-personal parts because you also have inter-
personal relationships in your life and they always need deeper understanding and attention.
If you are interested only in the inter-personal aspect, but are deeply and genuinely interested in
understanding and improving, then you will benefit from the more “technical” treatment because
it will give you additional tools that you may use as you “work on” your relationship. I also
promise to warn you if the discussion might go beyond relationships and into purely technical
issues.
Understanding the relationships around us is vital, whether you are in workplace politics or a
romance.
If we work in the world of business and information technology and we want processes and
systems that continue to function even when a new department pops up or an old one disappears,
then we need to go deeper into the underlying relationships and avoid being distracted by
traditions or personalities.
The stability in our lives comes from the core—our relationships. Your author and expert asks
that you grant him credibility in the area of relationships based upon the following:
 44+ years of marriage to one woman
 more than 30 years of experience with relational database
 training as a group leader with Marriage Enrichment, Inc.
 more than 30 years of experience with entity-relationship modeling and analysis
 training with Befriender Ministry and Stephen Ministry
 author of “A Philosophy of Data Modeling”, Database Programming & Design, 1988
 Many presentations on modeling for meaning
 work with 3M, Mayo Clinic, Unisys Defense Systems, BNSF, Northwest Airline and several
smaller corporations in multiple industries involving every aspect of entity-relationship
modeling
If you are a woman, you’re probably thinking that there is no way a man could ever have a clue
about relationships. If you are a man, you may be thinking that you should have picked up a
different book.
Lesson one, then, is that both men and women can be wrong about relationships.
For those interested in modeling data, you have been creating “relational” models with minimal,
misleading and for the most part useless information about relationships. Stick with me and
learn how to create models that are intuitive, expressive and, above all, useful. Why? Once
again, I’m glad you asked.
Data Models (or ERDs or ER Diagrams or whatever you want to call them) are extremely useful
in these ways:
 As a discussion and clarification tool
 As a means of establishing boundaries
 As context for
o Defining terms
o Defining relationships
o Testing assumptions
 If the modeling is done with the business users it makes an excellent blueprint for system
user interface design
Modeling solely for the purpose of generating a database schema (definition of the tables in a
database) is potentially a useful byproduct. My recommendation would be to create your model
for the above reasons (you can call it a conceptual model if you like) then create another model
from that in order to have something to feed the schema generator. The problem is that the
“physical” model that we want in order to generate the schema has to contain a large quantity of
information that isn’t of use to the business relationships. Additionally, there is never any
differentiation in the physical model between entities and relationships that have business origins
and those that have programming origins. This lack of clarity makes the physical model nearly
useless for the purposes of quality in information.

Motivation
Does a human being exist who has not, from time to time, been baffled by a particular
relationship? It must be true. Females the world over roll their eyes when the topic is males and
relationships. The males do the same thing, but for different reasons.
We can imagine many reasons why this may be true. Females live in relationship and the
nuances are critical. Males seem to live life in between relationships—that is, in the “space”
between relationships. Males, perhaps a majority of them, seem to prefer that competition define
all relationship. The “Norwegian bachelor farmer” made famous by Garrison Keillor is the
prototype. Someone so befuddled by the intricacies of relationship that he chooses to forego
them entirely.20
We know, for example, that males and females have different ways of communicating. It is also
true that two people of the same gender may be equally unable to communicate clearly because
they both speak and listen through the filter of their past. When their pasts are sufficiently
different, they find themselves stuck in a frustrating battle to be understood.
Consider that humans populate every continent of the world (save Antarctica) and every type of
terrain and climate. They build homes of mud, sticks, skins, leaves, lumber, steel and cardboard.
They dine on virtually anything that grows or moves. They suffer from common and uncommon
maladies. They come from small families, large families, extended families, single-parent
families, loving families and dysfunctional families.
Given all that variety and diversity, the probability of any two people being able to communicate
effectively without long practice is vanishingly small. But to make matters even more
complicated, mix in gender and insist that the topics to be communicated include emotion and
feelings.
Nearly every human interaction (nearly is only to avoid the quibbling that would result if I said
all) has an emotional overtone that can cause one of the parties to get hooked into an emotion
from their past. These hooks are huge barriers to communication.
We see the same problems in work relationships. Communication difficulties lead Sally to
expect something that Mike is unprepared to deliver. The expectation may be based on experi-
ence with a previous employer, a previous boss, a previous job description, or even family
experience.

20
By the way, I have actually met two such men and visited with them. They were brothers who learned to live with each other from the cradle
and never saw a good reason to include anyone else in their lives. Yes, they were of Norwegian descent and yes, they were farmers. There was
not a single frill in their home which was heated by the same wood stove they cooked on. The walls were covered with newspaper purely as a
way of reducing drafts. The lesson is that any relationship is possible and potentially long-lasting when the parameters have been agreed to by
the parties.
We don’t often think of process-process boundaries or organization-organization boundaries as
comprising relationship, but those regions are, in fact, rich in relationship. We ignore these
relationships in our modeling efforts at great peril to the fidelity of resulting systems. Failure to
treat these relationships in our models leads to frustration, dissatisfaction and abandonment.
Relationship is vitally important in the management of data and information within a business. If
we are unable to relate pieces of information to one another, we won’t be able to create the
comprehensive picture that the leaders need in order to make good decisions.
These three different types might be called
 Chosen: These are the relationships we dream of, seek out.
 Mandated: These are dictated to us. They include both family and work relationships.
 Constructed: The ones we get to define from the ground up
There is great potential for debate on the question whether Chosen relationships are not also
Constructed. I want to separate them based on the perception that, while Chosen relationships
may be customized, they already have some form, including a name, roles and basic expectations
defined by social convention before any party or parties elect to enter. These are broad
characterizations and, depending on you, there may be wide overlap between the types or
classes.
The premise of this chapter is that all of these types of relationship share a common pattern. If
we can gain mastery of one type, we could use that mastery to generate success in the other
types.
Because every kind of relationship involves at least two parties, it stands to reason that every
relationship includes and is dependent on communication. This chapter focuses on the
communication that must happen. We’ll take apart relationship to examine its components. The
answer is there somewhere. What is the nature of the dependence? Are certain kinds of
communication better than others?
The bottom line is that we must find the key areas where communication must happen—for
instance in business process—and adopt a formula that will insure that complete, necessary
communication takes place.
This rationale may find a sympathetic ear in business analysts and data modelers, but, if your
interest is confined to getting your husband, boyfriend or partner to fill an emotional need in you,
it may be a stretch at this point. Fear not! The same approach will be productive no matter what
the relationship. I promise a payoff for you.
Let’s say that your need is for commitment from your partner. That’s a fairly common
complaint. Many relationships dissolve when one person is unable to feel commitment from the
other(s). This is no less true in business relationships.
We’re going to treat each other like adults and we’re going to be honest with ourselves and with
each other as we move forward. Relationship, whether business, personal or data, is NOT depen-
dent on emotion or feelings. Rather, relationship is about choices. I’m not going to pretend that
I understand how a woman thinks or feels, nor will I pretend that I understand how any given
man thinks or feels. That simply isn’t necessary.
We’ll assume here that we all have the same basic kinds of needs as defined by Maslow
(Maslow, 1943) and the physical survival needs are being met.
The program that we’ll follow is to tease apart relationship at various phases of its lifecycle.
We’ll examine
1. Relationship formation
2. Relationship growth and maturity
3. Relationship dissolution
We’ll also look at what it takes to get a usable status report at any point in time during the
lifecycle.
Finally, we’ll use interpersonal relationships as the specimen to be dissected. We do this
because these are the most familiar. If we can gain a better understanding of personal
relationship and can see that business relationships are no different, we’ll finally be in position to
create some real improvement. We’ll extend the lessons out to other types as needed to insure
that the model stays consistent.
Everyone has experience with types of personal relationships, which makes them ideal examples.
Business relationships (Mandated and Constructed), especially the kinds that are documented in
databases, have more formal overtones that require only slightly different handling.

Some Examples
From here on, we’ll use a simple notation to represent relationships. In simplest form, a
relationship is a link between two people (or things including events, but we’ll stick with people
for now). Below is a representation of a relationship.

The horizontal bar in between the two boxes represents the relationship. Of course, this diagram
says nothing at all about the nature of the relationship and doesn’t acknowledge the possibility of
more than one relationship between me and you.
Here is another diagram (a simple model) that possibly conveys a bit more information.

An unmarried woman who is a romantic will immediately recognize some possible names for
this relationship including meet, find, and marry.
It’s worth brief consideration here to wonder about “someone” and how he or she might name
the relationship. “Why?” you may ask. Remember that question.
Let’s assume that no one wants to “trap” anyone else into any relationship. Such a relationship,
involving an unwilling or unknowing party, could not produce positive results. Are you still
with me?
So, if we accept this and want to build a relationship involving willing and even enthusiastic
partners so that we can improve our lives and theirs, then we must understand what the building
blocks are. If we do not, then we run a very real risk of creating what may be perceived as a trap
or snare and resented as such.21
What are the properties of a relationship that we can sense? How are these sensations perceived
by those involved?

A relationship in which all parties are not willing participants is not


capable of delivering value for any of the participants.

Modeling
The modeling of relationships using a graphical notation took off in 1976 with the publication of
Peter Chen’s article (Chen, 1976) on Entity-Relationship Modeling. It came out at the perfect
time to help fill a perceptual gap between adherents of the new (at that time) “relational”
database structure and the established network or CODASYL22 structure.
One of the attractions of the relational structure was its mathematical completeness. Operations
on “relations” are predictable mathematically. Of course this had immense appeal to academics
of the computing world, not least because it allowed for the construction of a “query language”
with operations similar to addition, subtraction, multiplication and division.
The ER notation suggested by Chen drew attention because a diagram could be turned into
(“mapped onto”) a relational schema (database file structure) in a very straightforward way.
ER notation used the mathematics of set theory in such a way that a relational schema derived
from an ER model diagram retained all of its mathematical predictability.
One of the features of Chen’s graphical notation was that relationships had their own two-dimen-
sional symbol (Chen, 1976). This gave relationships weight equivalent to entities. It made them

real. It gave them substance.


In the years that followed, the ER notation evolved for several reasons. Among them:

21
The format of these two statements is known as a value proposition and I acknowledge the contribution of Gwen Thomas who first showed me
the power of this construct.
22
“Network” and “CODASYL” are distinguished from relational data management in that their implementation is not portable. All access to data
is accomplished by means of programs. There are primitive commands that go with the model, chief of which is “next” or “get next”.
 It was streamlined so that more information could be conveyed with fewer symbols (less
ink)23.
 Emphasis shifted to the automated generation of program code from diagrams. This was
called CASE (Computer Aided Software Engineering) and included several modeling
notations including Process Flow, Data Flow and ERD, each of which was often provided in
multiple formats to fit your favorite development methodology.
 Emphasis shifted from processes to data and an entirely new approach to software
development emerged called Information Engineering. This was really only made possible
because of ER modeling.
 The sands continued to shift resulting in the dominance of data organization and management
concerns and the relegation of process to the design of the user interface.
As we attempted to model more and more complexity, diagram real estate became increasingly
valuable. It turned out that the relational completeness of the diagram apparently didn’t need
anything substantial from a relationship. In fact, a single line connecting them could express all
relationships between two entities. This perception, though untrue, has had a profound effect on
both data design and data management. It is not enough to simply link two tables of data. The
amount of meaning (and therefore quality) that can be extracted from such a simplistic
“relationship” is absolutely trivial. Many of the issues that have seemingly popped up since then
are a direct result of trivializing relationship.
With that change, the meaning of relationship was gone for good. Relationship had become
merely the means by which a record or set of records in one database table could be linked to a
record or set of records in another table. This was all very well for database administrators, but it
has seriously hampered the efforts of those who persist in their attempts to model the world that
an information system serves rather than the system itself.24
All of this change was driven by the folk wisdom that “processes change, data doesn’t.” While I
have uttered these words myself on many occasions and I still believe them, if pressed hard we
have to admit that it is the kinds of data that don’t change. In other words, we have found that
this bit of folk wisdom, like most, has a grain of truth at its core and that failure to comprehend
the limits of that truth have led to widespread misapplication.
The individual pieces of data are highly subjective in their qualities and therefore quite
changeable. When the new CEO assumes control, it isn’t just the organizations and processes
that change, but the very definition of key pieces of data.
I won’t bore you to death with additional history. The net result of these tides has been that the
relationship itself has virtually disappeared from the practice of ER modeling. I suspect that if
females had been involved in greater numbers while all this was going on, the result may have
been a bit different. They would not have been so quick to devalue relationship.

23
Edward Tufte (Tufte, 1997) advocates making all visual distinctions as subtle as possible while still being clear and effective. “Less ink” is a
rule of thumb in the quality of a representation.
24
This begs the question, “Why Do We Model?” This question was the subject of at least one doctoral thesis (Simsion G. , 2006). It seems that
design rather than description is the purpose for modeling data. Beyond this, however, is the question of what to model. It is argued that the
greatest design benefits result from an approach which begins by modeling the portion of the world into which the desired system will fit
(descriptive modeling). We then extract the portion of the world directly affecting or affected by the system and, within that [descriptive] model,
we design the data structures for the system itself. For a summary of modeling whys see (Simsion G. w., 2005).
The only reason that relationships still exist at all in the typical model is referential integrity, the
utility of automatically generating database procedures or triggers to ensure the ability to
successfully link two files in the database. For those of a pragmatic nature, that is enough of a
reason.
Unfortunately, for reasons that will soon be demonstrated, successfully linking two or more files
may not provide the basis of a sound business system. Referential integrity is not enough.
Let’s look at a simple example.
Consider the family tree (or pedigree). Many are interested in this and would like to manage the
data for various reasons. We identify three kinds of things that will be important: person,
parentage, children. You may notice that parent and child are also persons. This is true, but as
you will see, in our database they are relationships.
I made it sound as though family tree and pedigree were equivalent. They are not, although they
share many characteristics. A Pedigree is only concerned with genetic relationships while a
family tree is concerned with family relationships (which need not be genetic, as in the case of
adoptive children). Because we’re preserving our options, we want to make our database serve
both purposes.

Person

ID Gender DOB DOD Full name Given Family MotherID


name name

111111 M 1/2/1902 2/3/1979 Johan Jacob Johan Schmidt


Schmidt

111112 F 3/4/1907 4/5/1986 Hannah Maria Hannah Richter


Richter

123321 F 5/6/1919 6/7/1990 Pauline Pauline Schmidt 111112


Carlotta
Schmidt

543456 M 7/8/1938 Carl Arthur Carl Smithson 123321


Smithson

This is a table of Persons and contains attributes of a person. These attributes are pieces of
information (data) that we might want to associate with a Person. Note that we relate Person to
itself by including a field, MotherID, which is a pointer to another row in the table.
There are many problems with this “relationship” which can be expressed in ER form in the
notation at left or, with slightly more meaning, in the notation at right.
What about the Person instances for which MotherID is unknown (null)? For a family tree we
can fill in the PersonID of virtually anyone. “The Person thought of as Mother” would be an OK
definition or name for the attribute. For Pedigree purposes we’ll need more rigor. We know that
every Person is linked to a female Person who gave birth. Knowing this rule doesn’t necessarily
help us to populate our table. For one thing, how do we interpret null values for MotherID? For
another, how do we know whether the Person referred to by MotherID is the “birth mother” or
“Person thought of as Mother”?
How can we redraw our model so that we can answer these questions? We first recognize that
we need to define some attributes of the relationship itself. We are naturally led to the notation
at right above, which actually gives us something to which we may attach attributes. We could
treat the Mother relationship as just another entity and this is the path our tools force us to take.
In Erwin, a very popular “data modeling” tool, we can tell the schema generator to generate all
relationships as entities. Another option is to generate only many-to-many relationships as
entities.
Do you see how the nature of the relationship itself is becoming obscured? The DBA only wants
to make sure that relational integrity is defined. Every time a Mother is inserted into our
database, we want the dbms to insure that a corresponding Person record exists. What about the
reverse? If we want to be assured that (for our Pedigree) every time a Person record is inserted
into the database, a Mother record gets created, then we make the relationship mandatory. If we
must create a Mother record, what attributes will be required? If we make the MotherID a
required field then every time we insert a Person, we must be prepared with the PersonID of the
Mother. Probably we can’t guarantee that but that ability depends on the “business processes”
surrounding the system. Finally, what should happen if the Mother record is deleted or updated?
Do we want the dbms to automatically apply updates made to Mother to corresponding Persons?
What if we delete a Person record or a Mother record? If we find we have made some mistake
and we delete a Person, do we also want any connected Mothers to be deleted? If we delete a
Mother, do we want all Persons for whom she functions as Mother to also be deleted?
In many cases the business processes are not consistent enough to allow the dbms to
automatically do anything. Well then, if we can’t do it automatically, what are the options? We
either have to pretend that these needs will never trouble us or we have to build logic that will
prompt us when/if they do emerge and then carry out our desires, whatever they may be, at that
time. Knowing what we now know about motivations and economics, which—if any—of the
choices will be made? Do you think that knowledge of those business processes should be part
of what we know about the relationship? How can we get that to happen?

The Relationship Pattern


The purpose of a pattern in general is to provide a means of readily reproducing the essential
aspects of something. The purpose of the universal patterns for data modeling, as Silverston and
Agnew (Silverston, 2008) remind us, is as a tool to extend and develop just about any type of
data model. One of the drawbacks of a data modeling product is that the product (model) is too
often viewed as a static artifact, much like a roadmap. A roadmap may be indispensable in
finding our way from A to B but it doesn’t guarantee to make the trip uneventful.
Travel guides help by incorporating along with the maps some information that helps us to
formulate expectations for the trip. Does it make a difference whether the travel will be accom-
plished by air, rail, automobile, bicycle or on foot?
As someone who has tried to use a highway map to plan a bicycle trip and as the father of
someone who walked from Iowa to Washington, D.C. in the month of January, I can say with
great authority that it does indeed matter how you propose to make the trip. It’s tempting here to
wander off into a discussion of how and the many variation in meaning one could assign to the
word, but I leave the multi-dimensional nature of how to another article or book.
What does the data model lack that renders it lifeless? What prompts us to treat it as an artifact?
It lacks any genuine reflection of relationship. No human endeavor can be described without
referring to the relationships that bring it to life. Those one-dimensional, annotated symbols on
the data model bear no resemblance to the thing they are intended to represent. We require a
much fuller grasp in order to imagine the model alive.
Modeling offered some background on the minimalist notation currently being used to represent
relationship in an ERD (entity-relationship diagram) or, as it is more commonly known today,
the data model. Let’s take a stab at a universal pattern for relationship. I hope you find this
exciting. Certainly, if we can shed some light on relationship and bring the concept to life, it’s
an ERD meta-model

just possible that a substantial amount of stress can be lifted from our lives. It’s also possible
that we can relieve similar stress on our data systems.
Beginning with the properties of the relationship class itself, we note that it has both a
description and a purpose. The description will lay out the basic ground rules as in: A baseball
team will consist of a sufficient number of members to compete under the rules of the games of
baseball (league rules being the final authority) with a maximum of nine active players and a
sufficient number of reserve players to guarantee that the required minimum number is always
available. You might wish to add additional description regarding the various positions and
skills that must be included or suggest a set of roles without which the baseball team
(relationship) will be incomplete. That’s up to you as a user of the pattern.
When a new member joins the team, he or she will accept a role and subscribe to the purpose.
The purpose is an essential part of the relationship since, without agreement as to purpose there
will never be agreement as to role expectations. For example, a team whose purpose is to return
a profit to the owner(s) will have far different expectations than the team whose purpose is to
win a championship.
Even in a relationship such as marriage
def: Marriage is a lifetime commitment to join the lives of two persons.
it is critical that the parties agree on a purpose at the time the relationship is instantiated. If one
person is purposed to raise a family and the other to live a life of romance, some negotiation is
going to have to take place in order to make the relationship last. Clearly, we have pared the
definition down to the bare bone and not all would agree with it as it stands here.
One further example will help to show the importance of purpose and definition to a relationship.
Nearly all of us have experienced an attraction to another person. This is almost always
accompanied by anxiety, uncertainty, indecision, and sometimes outright fear as we realize that
this one is different. What are the questions we need answers to?
 Name
o What name shall I give to this?
o What name is he/she giving to this?
 Role
o What shall I call myself in relation to him/her?
o What shall I call him/her in relation to me (friend, boyfriend…)
 Purpose
o What do I want?
o What does he/she want?
The answers to these questions suggest expectations and obviously, if the parties are giving
incompatible answers, their expectations can’t be aligned. Without serious negotiation, the
relationship is doomed. Every expectation must be answered by a corresponding intention, also
known to both parties.
Our pattern denotes the many-to-many situation that exists with respect to relationships and roles
by means of an entity type called ROLE-EXPECTATION. Every relationship involves the
expectations of the participants. When expectations are clear everything can flow along
smoothly. When they are not clearly articulated or when they are not put into words at all, the
parties have virtually no chance to create a positive and productive relationship.
A specific entity (an instance) may be filling one or more roles in various relationships. A role is
filled by one or more entity instances at any point in time and over time. We all understand that
the role of husband can have a history over the lifetime of a marriage relationship. Similarly, the
role of buyer in a contractual relationship may be held by multiple individuals at any time and is
almost certain to be filled by more than one person over the lifetime of the contract.
The complete picture of any relationship consists of potentially many roles. Much as we
appreciate simplicity, it also seems likely that a given role might be found in more than one
relationship.
For example, BUYER might be a role in a retail relationship, a real estate transaction or in a
contractual relationship. To bring this home, MAN or MALE are roles found in many
relationships, as are WOMAN and FEMALE. An important aside: does your model include one of
these as an entity? These roles are often confused with entity types and may often be found as
categories. Do you see how the MALE and FEMALE roles are different than the values for gender
that we might assign to a PERSON entity type?
You will note that the “traditional” line segment(s) are used here only to convey information
about cardinality and optionality. A line segment, no matter how richly adorned with avian feet,
bars or circles—even labels, cannot convey the meaning of a relationship. At best it can only
convey the idea of a relationship, which must be fleshed out if we are to make our model really
useful.
We all know that expectations are part of any relationship. Each role expects certain behaviors
from the other role(s). Speaking only for myself, this is the source of frequent irritation and
occasional unhappiness. It isn’t the expectations we know about that cause problems, it’s the
ones we don’t know about. In our personal relationships we can’t hope to get all of the
expectations out in the open up front. In fact, one of the expectations is that you will learn to
anticipate my expectations. It sounds pretty hopeless doesn’t it? I wonder how many would
choose the relationship knowing this one up front. Think of the typical data quality breakdown.
Isn’t the issue that we (or someone) failed to anticipate the expectations of someone else?
Our pattern says to expect expectations and that’s the benefit of the pattern. It’s only when we
don’t have any idea that there will be expectations that we get into serious trouble. Knowing
allows us to, at a minimum, be aware of situations that could be expected to involve
expectations.
It would be unreasonable to think that all of this could be negotiated at the inauguration of a
relationship. Relationships (and their expectations) evolve over time. Having the relationship
pattern in mind helps us to be alert to hints of changes and should prompt a renegotiation
(discussion) when changes are noticed. In the personal realm, these discussions should satisfy
the “working on” expectation.
We might also notice the idea that someone who is not a party to the relationship may be affected
by it in some way. In the case of a marriage relationship, we will have to introduce in-law roles
and manage those expectations, but what about bystander roles?
A pure bystander will not be in any formal relationship with our parties and for that reason
cannot be allowed to have any expectation regarding the relationship. They should only be
considered to the extent that the “bystander” entity is involved with one of the parties in yet
another relationship of some kind. For example, my boss and I share that relationship and our
roles might include expectations regarding marriage relationships of which one or the other is a
party.
It is critical to note at this point one other essential difference between a pattern and an instance.
We have to remember—and this cannot be over-emphasized—that the pattern describes a class
of relationship instances. This means that what is being documented is any instance within the
class. What we are doing is generalizing.
We have been warned about generalization since childhood and yet we find it so useful that we
risk the problems that generalization may present in order to reap the reward of being able to
simplify the way we deal with the world. Be warned: although all marriages share some critical
roles and the expectations for those roles are formalized to the extent that audiences of thousands
will laugh uproariously at a joke that references one of the expectations, it is still dangerous to
rely on that expectation in your particular instance. It certainly aids in negotiating expectations,
though, to have a starting point already defined.
This is one of the ways in which the first class, Chosen relationships, differs from the other two
classes. As we reflect further on the three classes of relationship, we will see that as we move
from Chosen through Mandated to Constructed conformance to expectation becomes more and
more the norm. We will also see that if we handle our Constructed relationships the same way
we have been accustomed to handling our Chosen ones, we will experience the same kinds of
turmoil in our work life that we have in our personal life.

Implications for Data Quality


Have you ever scanned a page or a screen full of data without the benefit of any column headings
or other documentation to provide a frame of reference? Amazingly, you soon begin to develop
an idea for a plausible interpretation of the various kinds of data and their relationships. If
everything is good, you begin to recognize that this column is probably customer numbers and
that one is the year, quarter, month. This columns looks like a dollar amount and that number is
probably a quantity purchased. It all starts to make sense.
Then, in the real world, you see a row that doesn’t fit the pattern. Now what? There are a host
of possible next steps, but no matter which one you choose, the bloom is off the rose—some
credibility has been lost. Your level of confidence in the data has been eroded.
What if you suddenly discover the column headings at this point? What if that one row still
sticks out? Remember, no single row-column value looks wrong. It’s only when they’re put
together that a line or row looks wrong. You have just discovered a mis-documented
relationship. Either that record doesn’t belong in the collection or else the defining principle of
the collection needs to be modified.
In the days of paper filing, a document or a folder filed in the wrong drawer could be lost
literally forever. The same thing can happen in the virtual world. The only advantage we have
today is that, if we can describe the anomaly we’re looking for, the search can proceed very
quickly. Often, though, we can’t describe the anomaly. The values are all correct and current—
they just don’t belong together. We know that the business rules we are aware of don’t permit
the configuration we’re seeing.
We start to trace the offending record back to its origin and that’s when we hear, “Oh, I thought
that meant…”
How long have you been doing this? How many times have you entered the data like this? We
finally realize that it isn’t this person’s fault and that they aren’t the only one with a different
understanding.
Poorly understood and therefore poorly implemented data relationships lead to the most serious
kind of problems. The problem goes by the name of “that’s not what I meant” or “I thought it
meant” or, most often, “oh.” The symptom is records indicating relationships that “can’t be” or
“shouldn’t be possible.”
This kind of data presents a reporting nightmare. There is simply no way to be sure how
widespread the problem is and, once discovered, it throws an entire data source into doubt.
Contrast this with the more common data quality problems in which a fat finger error causes a
phone number, a birth date, an address, social security number or a driver’s license number to be
invalid. Those mistakes are generally repairable. Many, if not most, data quality errors can be
trapped and prevented with better attention given to the user interface and edit checks or even
(perish the thought) properly typed database fields and well-defined constraint definitions.
Character types are not always the best solution—only the easiest or the solution with the lowest
initial cost..
Misunderstood relationship is different in that the data frequently can’t be repaired. Even though
some instances may stand out as wrong, many others will escape detection until we rely on them
for something critical.
For example, an ambulatory care facility (an insider name for a clinic or a doctor’s office) tracks
each patient’s primary care physician. A patient is allowed by the data model and resulting
system/database implementation to have exactly one physician who is their “primary care
physician” or PCP. A field, labeled PCP, is on the registration screen used by the receptionists in
each department to check in patients. There is no guidance on the screen and the training for
new receptionists doesn’t cover this field specifically. For whatever reason one specialty
department or one or two receptionists, who may move from department to department, make it
their practice to select the name of the doctor who is seeing the patient for the current visit as the
PCP.
After a relatively short time, a substantial portion of the patient population has a PCP
relationship with a physician who is not in a primary care specialty. The bottom line is we
cannot say with certainty who the PCP is for any given patient. What will happen is that a legal
issue will arise and the attorneys will need to get a statement from the patient’s primary care
provider. It will become a matter of judicial note and public record that the clinic’s record
keeping is unreliable. This is not considered good publicity for a medical facility. But maybe
we’ll be more fortunate. Maybe it will only be the CMO (Chief Medical Officer) who wants a
report showing the relative panel sizes for all of the primary care clinicians on staff. There will
be a lot of interest in the number of patients in the allergist’s or the surgeon’s panel.
How can our pattern help? If we have implemented role-expectations in our systems, we will
have the ability to validate that a given entity instance actually has a role in the relationship. We
will have documented the expectation for that role and be able to enforce the current
expectations. By the way, this is one point where we have to acknowledge that some
expectations in the real world can probably never be adequately specified and, indeed, may
change with different participants according to the day of week or time of day. For that reason,
we should focus on expectations that are critical to the value of the data itself and that can be
enforced.
Let’s call these expectations business rules. How to capture, store and use business rules has
always been a point of debate. A few of the more common ways include
 table-driven rules engines
 triggers
 constraints
 software (program logic) controls
Virtually all of the business rules you might want to define are really role-expectations that are
part of a relationship. Using our pattern will greatly improve our ability to find, document and
enforce them.

Two Approaches to a Relationship


If we live long enough, we eventually come to understand that there are at least two ways to
come at any problem. Those who begin to achieve wisdom also realize that more than one of the
approaches may prove productive. In fact, we learn that it is only at great risk that we ignore or
disrespect any perception of a relationship.
The two broad avenues along which we will be moving in our attempt to understand relationship
better are inhabitant and architect. Another way of thinking about these two perspectives is that
one is from the inside, while the other is from the outside.
An inhabitant of a relationship, someone who is playing an active role within the relationship, is
affected by it in the same way that someone who inhabits a house is affected by everything about
the house. The floor plan of a house has much to do with how the inhabitants interact with one
another. The more restricted the floor plan, the more constraints are placed on the inhabitants.
The house in which I grew up was about 1200 square feet with a single bathroom and the seven
inhabitants were required to operate with many expectations of one another in order for everyone
to get through each day.
When one of the inhabitants failed to fulfill an expectation, turmoil resulted that lasted as long as
it took for everyone else to adapt. For example, when someone got sick, the expectations
regarding the bathroom had to be adapted.
This was an important lesson. The inhabitants of a relationship will live according to the
expectations of the others in the relationship with two conditions:
 They must understand the expectation.
 They must be capable of achieving the expectation.
Woe unto the inhabitant who arbitrarily decided to ignore everyone else’s expectation and act
unilaterally and (as others saw it) selfishly. Submarine crews also understand this. People in
understaffed and over-tasked departments in a company understand this. But, take people out of
those situations and put them into one-on-one relationships based on “emotion” and they
immediately experience stress.
A simple solution to relationship difficulties suggests itself. What all inhabitants of a
relationship need is a mother to make all of the expectations clear. Sad as it is to say, most adult
relationships do not seem to benefit from the presence of a mother.
What can take the place of a mother—someone to facilitate communication and act as traffic
cop? We need someone to make sure that we each get a turn and that everything is shared
equally.
For some, the rules about communicating and sharing were ingrained by our mother’s constant
and loving reinforcement. Others may have had a mother who was unwilling or unable to
provide that reinforcement. Still others were stifled by a mother taking over the entire process.
They never had to learn to communicate because Mother did it for them.
The point here is that, while relationships demand constant, honest and bi-directional
communication, many of us simply do not have a model for doing that. We have no experience,
no practice. We are all like someone who could be a world class figure skater, except that we
live in sub-Saharan Africa and have never even seen a sheet of ice, ice skates or the motivation
of the winter Olympics.
When it comes to interpersonal relationships, only the inhabitants can actually change the way
the relationship works.
How will the relationship’s inhabitants change the way the relationship works? There are two
ways.
1. I can change the way I function in the relationship.
2. I can change my expectations with respect to the other inhabitant(s).
In the world of human life, demanding that someone else change is guaranteed to cause a
relationship to falter. This is true whether the relationship is a chosen, mandated or constructed
one. When one inhabitant says, “Things have got to be different and I’m not going to change,”
(someone actually told me that once) things can go in no direction but down. No relationship can
be based on win-lose.
A win-lose interaction between people is called a contest. A contest differs from a relationship
in one major way. In a contest, the parties have no need to communicate with one another and it
is frequently in their best interest either to avoid communication or to be deceptive in the
communication to avoid revealing their plans to the other side.
It is essential that we understand the difference between a relationship and a contest.
Relationship Contest
win-win win-lose
honesty deception
communication mandatory communication optional
Motivations (goals) are complex, motivations hidden, discovered
negotiated
Now ask yourself, how many of the interactions where you work are contest and how many are
relationship. Too many work situations are characterized by
 hidden agendas/motivations
 communication that is intentionally misleading or incomplete
 expectations that shift without warning
 unclear purpose
 win-lose thinking
Is it any wonder that work can be stressful? We all want to believe that we share the objectives
of the company but we all know that it’s the objectives of the boss that we’d better pay attention
to. Even that might work out OK if the boss would actually reveal his or her objectives. When
managers and supervisors conceal their true objectives, they are forced to provide tasking that is
fragmented and incomplete—the recipe for stress and wasted effort. Additionally, the lack of a
consistent vision and common objectives forces all employees to rely on the manager for
guidance. This, in turn, means that the work of 20 or 200 or more proceeds only as fast as the
manager can absorb what is happening and devise new instructions. What a waste of talent and
motivation!
If the manager had a team, they would have a purpose as a team and each member would know
their role.

Obstacles to Relationship
Many women believe themselves to be experts in the practice of relationship, but have no ability
to be honest with themselves. How could they? They never had a model for open, vulnerable
and honest communications.
Most men, on the other hand, prefer to be oblivious to the needs of relationship. We may not
even understand that there is a relationship until we begin to experience some of the effects.
Perhaps we become the object of anger when we meet a friend at the Home Depot and spend the
rest of the day on the golf course. Maybe we come face to face with what can only be
recognized as disappointment directed at us and we have no idea concerning the cause of that
disappointment.
We simply allow our “significant other” to dictate the rules. Our SO tells us that it’s our fault
and if we would only be different, everyone would be happy.
There is a saying that is old and trite, and it holds a lot of truth. “A woman picks a man hoping
he’ll change and a man picks a woman hoping she won’t [change].” What if this is literally true?
What problems could it cause? What is the best way to deal with this before it becomes
problems for my relationship?
Come with me for a minute while I take your situation out to the extremes in order to illustrate
this point.
Imagine for a moment that you are from a mountain village in the Himalayas, have never even
seen anyone whom you haven’t known from birth, never seen a light switch or electrical outlet,
nor anything printed and that you wake up one day in the middle of a desert. You are in the
company of a Someone who has lived a life similar to yours in its isolation but in the midst of the
Amazon basin.
That’s a picture of most relationships. You find yourselves “together” for some reason that is
not immediately apparent and you know that your life could be better or at least easier if you
could rely on the other. Can you imagine that your odds of surviving are vanishingly small if
you try to do it on your own in a world full of things that you don’t understand?
You try to talk to the Someone but it results only in confusion and frustration for both of you.
You wonder if it’s worth it. Then you look around you and see an unforgiving and bewildering
emptiness in every direction.
The idea of turning your back on the Someone and simply walking away loses its appeal. You
can clearly see yourself withering to insignificance along that path. What are your choices then?
Some of us find that the frustration and discontent are too much to bear and we do turn our backs
and walk away. We hope that somewhere in that emptiness is a Someone who will be able to
communicate with us. Maybe the Someone agrees with this decision—or maybe not. Maybe
s/he follows at a distance, staying close, but not too close, hoping not to be left alone.
Others of us can’t stand the thought of being alone in the world. We resolve to make the best of
a bad situation. We contribute what we can and grudgingly accept the contributions of the
Someone, even though we don’t like what s/he has to offer. We never learn to communicate
although we do learn to get along. We develop habits and rituals to take the place of
communication. We coexist.
Then there are those who throw themselves into the creation of a relationship. They eagerly
listen to what the Someone has to say and eventually learn to communicate fluently in the
Someone’s language. They find valuable traits, abilities and knowledge in the other. They offer
themselves freely and learn how their own traits, abilities and skills complement those of the
other. Together they find that they can do things that neither could do alone. They survive and
then they thrive.
Of course the variety of approaches to relationship includes many more than these three, but we
can think of these three cases as being near the extremes and somewhere in the middle.
How do we bring this example back to the workplace? The work situation isn’t really that much
different from the personal/social example. We accept a job and show up full of hope and
expectations. After a few days or weeks (or maybe a few hours) we suddenly realize that we
aren’t where we assumed. We realize with a sense of panic that we might be in completely
unfamiliar territory.
We look around at our co-workers and see for the first time that they each have their own hopes
and expectations and that some of those are pretty radically different from ours. Some of these
co-workers are part of our team and it becomes apparent that the team may have a different
understanding of where we’re headed than we do.
The three options are still available and the only real difference is the level of emotion involved
(although some of us can create a lot of emotion (it’s often called drama) in any given situation.
What we often find it most difficult to do is to simply commit to the relationship and request the
help of the other inhabitants to understand where we, where we’re headed, and what resources
we have available to get there.
What about an inter-department relationship? How might the expectations of the Sales
department change? Will the intentions of the Fulfillment department ever change? Do any of
these changes affect Shipping and Receiving or the Warehouse? How do these change impact on
Management expectations? Can you see why these deserve attention? How much attention are
they currently getting? Management expectations change and new reports (or new dashboards)
are requested. Does Fulfillment get new dashboards to help their intention to meet management
expectations (goals)?
We are generally content if we identify a foreign key (order_number) that is included in an
invoice record or in a shipment record so that the order can be linked to the invoice and/or to the
shipment. That simple tactic conveys nothing about the promises made by Sales or the
expectations of the Customer.

The Spectrum
In case you aren’t familiar with the idea of a spectrum, let’s take just a few seconds to create a
mental image. Even if you already know what a spectrum is, it’s still useful to make certain that
we’re all on the same mental page.
In order to see a spectrum, we need a prism. In the most familiar example, water droplets in the
air act as a prism, bending sunlight so that all of its color components are visible. This is a
spectrum and we know it as a rainbow. In general, we use the term spectrum to give the idea that
something is not necessarily as homogeneous (hum á genus)—which means having a single
structure or composition—as we might be tempted to think. Don’t worry; we’re not going to
going into a detailed explanation of the science. In fact, while it’s good to remember that most
things are not as simple as they seem, it’s also very useful to remember that it isn’t necessary to
go into a lot of detail about how and why when we can get quite far on knowing what.
Imagine then that relationship, like sunlight, has components that require something like a prism
to see. What would make a good relationship prism? I nominate time.
If we look at a relationship (or even a type of relationship) over a period of time, we will see its

Time as a prism

characteristics change. Picture a first date relationship and compare it to a newlywed


relationship and a golden anniversary relationship. On the business side, compare a new
customer, an established customer and a loyal customer. These are all relationships.
Let’s just stick to interpersonal relationship for now. I think that if I can provide some help for
you personally, you’ll be more willing to accept my help in the world where you earn your
living.
Here is a picture of the relationship. Notice that no name is assigned to it. What name would
you give it? Some potential names may seem appropriate based on the event or point-in-time

names that we choose while others may be inappropriate for the same reasons. While anyone,
regardless of gender identification or orientation or age or experience can probably describe this
relationship, they may use different words to identify the events or states of the relationship.
You don’t need to have experienced this kind of state or even this relationship to discuss it or to
talk about your expectations at a given point in the spectrum.
The important thing is to see that this is a single relationship extending over an interval of time.
A core set of expectations is involved throughout the lifetime of the relationship. Other
expectations change or evolve. Is this kind of relationship unique, or even unusual? It takes
only the briefest of reflection to realize that any personal relationship must change as its
inhabitants change.

Type or Instance?
So far, we have been discussing relationship types rather than instances. Types of interpersonal
relationships include friend, spouse, sibling, parent-child, family, neighbor, babysitter, teacher-
student, teacher-parent and so on. We can understand without explanation (intuitively) that I
may have a sibling (type) relationship with five others (in my case), but my relationship
instances with each of those five are distinct and unique.
Type is another kind of abstraction that is extremely important. You may have heard of
something called “object oriented” in the context of system development. In the earliest days of
computing, there was but one type, the binary integer. It took almost no time to realize that one
type just wouldn’t work. There are many kinds or types of numbers in addition to integers. To
refresh yourself regarding the history of types in computing, you may want to take another look
at the Old Testament, Book 6, Machines and Logic. As we continue to work at getting
comfortable with the notion of types of relationships versus instances of those types, it’s time to
set aside instances as a topic of discussion here. Our goal here is not to dissect any specific
relationship instance. What we want to do here is to learn enough about the basic building
blocks of any relationship to be confident that we can take one apart and put it back together—
better than it was before.
Maybe you have considered this relationship. What is my relationship to the world? How do I

Another perspective
fit in? It is useful for each of us to spend some time considering this relationship. This really
isn’t a good example for exposing any of the building blocks—except one. No relationship can
be disassembled without acknowledging the perspective from which we are working. We can
recognize this relationship while recognizing, too, that “the world” has no perspective that is
useful to the discussion. Also notice that switching the position of the parties makes no
difference. What we take away is the realization that my relationship to the world is entirely
dependent on my perception. My perception of the world (which is really a class) will affect my
relationship with any instance from the class the world. In the same way my perception of
manager or sales or customer will affect my relationship with any individual from one of those
classes. We easily recognize this law in our lives and we have a name for it, prejudice. Clearly
prejudice has its own spectrum and is inescapable. It’s one thing to act out of prejudice as a
reflex and still another thing to stay aware of the prejudice as we decide what our actions will be.
Before we start the dissection, a quick review of terminology used so far will be useful.
Remember that communication is one of the keys to a successful relationship, so it’s vitally
important that we continue to speak the same language as we move forward.
We are still taking the inhabitant perspective. The inhabitant is one of the parties involved in a
relationship and perspective is the point of view of one of the inhabitants.
A type or class is like the form or mold used to define all instances. A type contains the features
or properties shared by all instances. An instance is a specific case of a type. The type’s
features or properties all have values—in other words, the names are all filled in.
Our goal is to get you into a position such that, if you select the type of relationship you want to
inhabit, you can feel capable of managing the parts that are your responsibility and offering
support in the parts that aren’t your responsibility. We all recognize—don’t we?—that all
inhabitants of a relationship succeed or fail together.
Let’s just repeat that before we proceed:
All inhabitants of a relationship succeed or fail together.

The Anatomy of a Relationship


A relationship is a connection between at least two people (or places or events or things). That’s
how we are accustomed to thinking about relationships.

The figure below is the simplest possible depiction of a relationship. It shows two parties
connected by something. We call the something a relationship. This diagram provides no

Entity A Entity B
information about the nature of the relationship. It could represent “farmer milking cow” or
“man committed to woman” or “customer complaint about product.” We need to get more
meaning from our diagram.
Women like to think of “working on our relationship.” By this is meant something very
complex. They seem never to tire of discussing relationships. Men, on the other hand, seem
entirely willing to ignore the complexities of a relationship, preferring to simply adapt to or cope
with each nuance as it presents itself.
To males, a relationship is frequently something that someone else is talking about. It is often a
mystery. To females, a relationship is something that is intuitive and real, though it may still be
largely mysterious and elusive.
You have already received an introduction to the idea that a relationship can be represented in a
diagram. A simple pictorial representation has been developed for documenting and analyzing
the nature of the relationships between entities. From now on, I am going to use the words entity
and party interchangeably to represent a person, place, event or thing that may be involved in a
relationship type.
A picture of a relationship might look like those below. This picture simply says that the Entity
on the left is related to the Entity on the right. So far, not very useful. We need to know—even
if we are male—more about this relationship in order bring it into our life and use it.

woman man

Sally Mike

The first way to give meaning to a relationship is to name the entities involved. If we add names
to our example we might get something like the Sally-Mike diagram.
I’m still not satisfied that I know what the relationship is. Many different relationships suggest
themselves. This is the point in interpersonal relationships where uncertainty and fear enter the
equation.

married to
woman man

engaged to
woman man

reports to
woman man

supervises
woman man

sister of
woman man
Relationship Examples
We find it very difficult to name a relationship most of the time. Is the relationship one of
friendship or is it love? Is there a difference? If we don’t name it, we’re done. We can go no
farther. Think of the name as a handle that we can use to begin “working on” the relationship.
This is the premise that lies at the foundation of this book. First, that it is frequently difficult to
put a name on a relationship and second, that naming it is essential for understanding and the key
to getting the most from the relationship. Naming all of the parts is essential if we are to derive
any benefit from the relationship. Are there additional parts that we haven’t met? Follow me.
Those who model relationships between things for a living often don’t. That is, they claim and
may even believe that they are modeling entities and their relationships, but their models often
go no deeper than the examples above.
Nearly all of their efforts go into understanding the entities themselves. Don’t start to feel
superior now. That is apparently the most natural approach. In fact, in our own personal
relationships isn’t it true that it may be impossible to actually name a relationship without first
getting to know ourselves? If what I really need is someone to cook meals for me, can I enter
into a relationship named “love” with any expectation of mutual benefit? Possible, maybe, but
it’s a distinctly improbable outcome.
In general, relationships are the most complex things that we deal with on a daily basis.
Successful relationships are the foundation of everything important that humans and their virtual
counterparts, corporations25, do.
We work hard to gather up all of the attributes or properties of an entity. We can learn her name,
her phone number, her address, her birthday, her astrological sign, her dress size, her social
security number, her driver’s license number, her height and weight—we can record every
possible fact about her but, without naming the relationship, we will never know what to do with
all of that information.
From these examples you can see the huge increase in meaning that comes from naming the
relationship. In the figures on the preceding page we see ambiguous relationships. This is the
kind of relationship in which many people spend their lives. If we happen to know Sally and
Mike, the unnamed relationship is not quite so ambiguous. We can imagine some potential
names for it simply because of the additional contextual clues provided by the entity names.
It is generally frowned upon to belabor a point, beat a dead horse, preach to the choir, or carry
coal to Newcastle, but the idea that relationships must be named is so critical that it’s worth the
risk.
We have discovered the first essential property of the relationship type. Relationship instances
have a name.

25
A recent candidate for the US Presidency said that “corporations are people.” This is true in the courts but it is clearly dangerous for any
individual human to think that they can behave as a corporation does.
Two Names are Better than One

married to
woman man

engaged to
woman man

reports to
woman man
manager of

supervises
woman man
supervised by

sister of
woman man
brother of

Isn’t it frustrating when you have named your relationship but he has a different name? If you
have felt that frustration, then you understand perfectly the rule that a relationship instance has
two names.
In fact, a relationship is composed of two equal and opposite parts that need to be in harmony.
Each participant sees the relationship differently. Each may have a different name for the con-
nection that exists between them.
When we acknowledge, understand and accept both names, then it becomes possible to exploit
the relationship. The word exploit is used here after much thought. The word tends to have
negative connotations today, but it means simply “to utilize productively.” That’s what we’re
after. We want the relationship to work for us—to produce what we need from it.
The relationship below looks much like the middle one in the previous diagram except that one

of the relationship names has been changed. What is the difference between the two? Mentors
gives a much different impression than manager of. Which one is true? Which one gives more
information about the possible interactions? Does it make any difference whether the man is
doing the mentoring or the woman is?
In our relationships with others, it is the need for both parties to provide a name for the
relationship that gives rise to so much difficulty. When one person says “I love you” and the
other responds with “I like you a lot, too” there is very clearly a mismatch. The labels we assign
are much more important than Shakespeare would have us believe.
When Romeo says, “What’s in a name?” his commentary is about the importance or lack of
importance of names to entities or parties. His friends and family have made a big deal out of
family names. In their world, the relationships available to a party named Montague could
include a party named Capulet only if the nature of the relationship was negative—one of
enmity.
Capulet can have nothing to do with Montague. Romeo has already made his mind up to ignore
all the history and present tension around those names. He is determined that, despite all of that,
Romeo may love Juliet and expect love to be returned. He comes up with a good example that
carries the day.
Shakespeare goes on to show us, however, that Romeo is wrong. It is about recognizing the
essence of a thing. We ignore names at our own risk. The names Capulet and Montague, with
all their history, caused problems that made appropriate communication impossible. Lack of
communication led to misunderstanding and the deaths of both Romeo and Juliet.
To communicate the essence of something, we can apply a name that both describes and evokes
the essential properties of the thing. When we say “rose,” we create the expectation of a form
and an aroma. We are left to wonder only about color or developmental stage (bud or full
bloom).
What is the essence of a relationship? Friend, boyfriend, fiancé, husband, friend, enemy,
victim—all are possible names for a given relationship. Each different name leads us to expect
something different. The existence of a name provides a handle that can be used to begin the
work but it is vital to understand the essence of a relationship before we can give it the proper
name. If we settle on the name it too quickly, we run the very real risk that the set of
expectations that the name evokes will not be acceptable to both parties. Often we choose a non-
threatening name first and then negotiate the real name. Because females are so much more
adept than males, they may be willing to take advantage and maneuver the male party to get
agreement way before the understanding arrives. Remember what was said earlier about
entrapment?
Romeo is right when he tells us that the characteristics or properties of something don’t change
when we use a different name. The relationship(s) between Capulet and Montague families do
not necessarily govern the relationship(s) between Romeo and Juliet. Please note, however, that
while Romeo and Juliet are free to establish their own relationship, they are never entirely free
from the relationships between their families.
Romeo misleads us terribly when emotions and expectations enter the picture. Of course, we
would all be much better off if we could rid ourselves of any expectations in our relationships,
but in our world when someone says, “I love you,” for example, she—especially the first time—
hears something like, “I’m going to adore you, put you on a pedestal and satisfy your every need
for romance…”
He may hear, “I’m ready to take care of you and be there when you come home.” Unfortunately,
when I say “I love you” I may actually believe that I’m prepared to meet that expectation. This
is a good time for a reminder. Each party in a relationship has to work hard at honesty and not
just with the other party(ies). It’s equally hard to maintain honesty with yourself. But, if we’re
to be successful at relationship, honesty is crucial.
All of this obviously applies to personal relationships, but what does it have to do with data and
information?
Have you ever told someone what you thought they wanted (or needed) to hear? How did that
work for you? For them? Do you think that the problems may have arisen because your
relationship wasn’t what you thought it was?
Anyone who believes that the quality of data and information is independent of the humans
involved is certain to experience a life of frustration and disappointment. On the other hand,
those who work hard at the relationships, cultivating honesty and understanding expectations,
will experience far less of that frustration and disappointment.

Advanced Relationships
Very few relationships ever proceed to the advanced stage. If you have been married for 30, 40,
50 or more years, it doesn’t mean that you have developed an advanced relationship. You may
have become co-dependent over the years. Or you may have remained in that relationship only
because any other choice involved too much fear or pain.
In the case of an organizational or architected relationship, it may persist long after it has ceased
to add value, simply because no one accepts ownership or the architect’s role. In many cases, the
inhabitants each perform according to the expectations they have come to understand through
habituation. Each person measures success by the degree to which they are satisfied with what
they receive from their supplier and the apparent satisfaction of the person to whom they deliver.
The problem in these relationships is that there is not sufficient personal pain or disruption to the
organization to trigger a desire for change. A relationship can be said to develop inertia or
momentum.
Recall from your time spent in high school science class, that inertia is the property of a mass in
motion that must be overcome in order to stop the motion or change its direction. Inertia is what
causes the pain when a falling body hits the ground. Momentum is another name for inertia.
Inertia is the product of mass and velocity. Can we think of a relationship as having mass and
velocity? Think of the mass of a relationship as definition and expectations. We can think of a
relationship as having velocity in the sense that it is moving in a certain direction because of the
intentions of the parties. A relationship’s inertia is created from its birth. It is combined of the
expectations and the intentions of the inhabitants (parties). Like a snowball rolling downhill,
inertia increases the longer the expectations and intentions remain unchanged.
A relationship can develop a great deal of inertia early if the inhabitants work to define it and
develop healthy expectations. Movement can be quite rapid. Early inertia can last a long time.
Inertia has the property of direction which it inherits from its velocity. Many relationships start
in a positive direction and the ones that don’t usually have a short life. How much energy do you
think it will take to change the direction (and thus the inertia) of a relationship that has existed
unchanged for a long time?
Think about all the relationships in your organization and how they have contributed to the
current state of your data. We exposed those relationships in the Old Testament. It may well be
the case that many, most or even all of these relationships may need to change in order to effect a
noticeable change in the quality of the data. One thing is certain: ignoring the relationships and
inserting a big, complex, new piece of software or a new methodology is going to produce
turmoil in the relationships without materially affecting the quality of the output.
Like the snowball halfway down the mountain, we can’t hope to simply stop it without the
application of considerable force and the result will be destructive. What can do is to apply a
series of “nudges”, relatively small changes that, together over time, will change the direction.
Consider the marriage relationship which goes on for some period of time before one party (or
both for potentially different reasons) comes to realize the momentum is not in a positive
direction. At this point we have a choice to either jump out or begin to do what is necessary to
change the momentum. The most frequent mistake is to demand that “we” (or more honestly,
“you”) need to put more effort into the relationship. Even if “you” is more honest, it is also the
least effective. We all want to believe that we are doing everything we can and are therefore
blameless regarding the momentum of the relationship. Do you see, though, that a
relationship—any relationship—is a balancing act. The expectations and intentions of one party
must always be coordinated with those of the other party (or parties). It’s a tug-o-war with intent
balancing expectation such that neither party gets pulled into the mud hole.
Sometimes the coordination (adaptation) just happens and sometimes it involves negotiation and
compromise. It always takes time. Often the time frame begins to test the patience of one or
more of the parties. This is where experience becomes important. Newer marriages dissolve
when a party becomes impatient and has no experience with gradual accommodation. One party
will try to stop the snowball or make a large change in its direction with disaster as the result. A
series of patiently applied nudges coupled with negotiation and compromise will often produce
the desired momentum change with a minimum of turmoil.
We also must understand that change is a given and therefor adaptation and the willingness to
adapt must also be givens in order to preserve the relationship long term. What kind of people
does this take? In a nutshell, it takes mature people.
“Mature” is one of those words whose meaning seems clear on the surface, but which can cause
a lot of confusion without a carefully established context. Emotional Maturity is a relatively
recent concept created because of this need for better context. Simply, emotional maturity is the
ability to recognize and manage your emotions before they get control. It requires the ability to
be present in interactions through openness, honesty and demonstration of concern for the well-
being of the other.
Consider the following two ways of dealing with a problem.
1. “When I see a backlog piling up, I am concerned about my ability to deal with it. Can we
find a way to reduce or eliminate those backlogs?”
2. “You need to stop letting things pile up. It makes extra work for me.”
It doesn’t matter whether the backlog is composed of dirty laundry or warehouse orders, on the
one hand we will feel like a collaborator while on the other we will feel like the scapegoat. One
encourages negotiation and the other throws up a wall.
Do the parties in your relationships exhibit this kind of maturity? Do you see it in the
relationships of others around you? Most people have never seen maturity in action and so have
no way to model it in their own lives. People who exhibit maturity should be placed where their
example can affect as many others as possible. What’s the alternative?
If years together don’t make an advanced relationship, what does? By what standard do we
measure the quality or strength of a relationship? This question needs to be answered whether
we are the inhabitant of a relationship or the architect of it.
Let’s go back to the template model for a relationship. Recall that there are at least two roles
involved and potentially many expectations for each role. The advanced relationship is one in
which the inhabitants have collaborated on naming the relationship and the various roles they
will play. Beyond that, they have clearly stated their expectations and each has agreed to do his
or her best to meet the other’s expectations and has formed the intent to do so.
In business, we may not find much tolerance when expectations are not met. There may even be
penalties. This may also be true in personal relationships, but the advanced relationship often
exhibits much more tolerance of failure. Of course if there is never success or if one inhabitant
simply stops trying to meet expectations, there may be severe penalties, up to and including
dissolution of the relationship. In fact, after enough failed expectations, the inhabitants will
simply cease to acknowledge the relationship altogether. Each will begin to behave as if the
others don’t exist. This is the beginning of active subspecialization.

Architecting the Advanced Relationship


Unlocking the Mystery
Stop for a minute and reflect on the perniciously recurring problems that make life less than
happy and sometimes even intolerable. You can lay the blame (responsibility) off on him (her,
them) and most people do, but all of that blaming has never made those problems go away.
What would you give to have them go away or fade into the background? Come to think of it,
what wouldn’t you give? The problems are all based in unrecognized and/or undefined
relationships. Up to now we have run away in terror from this idea. Even now you may be
thinking, “Oh, please! Just give me a best practice, or ten things to do/avoid, or any kind of
roadmap that I can follow.” If we buy in to the idea that relationships are the source of our
problems, then we have bought into the inescapable realization that we own at least half of the
problem.
At some point in your life you have heard (or maybe even said) that the bigger (read more
mature) person must take the first step in mending a relationship. This is absolutely true and you
are now that person, having come this far with me. That means that the next step is ENTIRELY
YOUR RESPONSIBILITY. The people and organizations (be they neighbors, coworkers, teams,
departments, divisions or competitors) may not even be aware that they are in a relationship with
you. Were you aware when you started reading this book?
The carrot dangling in front of you consists in
 reduced frustration
 better communication
 fewer (ugly) surprises
 improved efficiency and effectiveness
 reduced costs (both in money and energy)
If these are worth it to you then you’ll want to start with these steps:
 Make a list of those relationships. (Maybe you want to focus on just one at the
beginning.)
 Name them as they are now and as you want them to be.

 What are your expectations for each? Again, it may help to list current and desired
expectations.
 What are your intentions for each? Current and desired is still a good idea.
 What information do you need to keep about each? For example you may want to have
accessible the inception date of the relationship (or anniversary), schedules, net payment
requirement, etc.

 Who is the person you want to partner with?


 Are your expectations and intentions different for this person than for the relationship? If
the problem relationship involves a department, for example, you may pick out a person
in the department who is an opinion maker and build a relationship with that person as a
first step in getting to the inter-departmental relationship that is the goal. Remember that
group that you wanted to be part of when you were a kid? You could find a way in by
making friends with one of the members and letting them “include” you in. That’s what
we’re talking about here.
 How formalized does the relationship need to be? We often go too far in formalizing,
thereby setting the stage for the eventual breakup. Intentions are key here. Do you
intend to enforce all expectations all the time? If so, you need a contract which is a
special kind of relationship that can be enforced in court. If, however, the expectations
are simply that and not requirements, you may intend to communicate in those instances
when expectations are not met, and to negotiate improvement. In that case you need
much less in terms of formality. Beware of formality. It is much easier and less costly to
get what you need by talking about it over coffee than by using attorneys and courts.
 What do you know or think you know about the potential partner?
 What do you think their expectations and intentions are?
 What do you want them to be? Look at your expectations and intentions again. Are they
still reasonable?
 When you are clear about the relationship you want and why, approach the other party
and lay out your cards. Show them the steps you’ve gone through and the carrots that are
motivating you. Do not try to “close the deal” at this point. It took you time to get here
and you may have to walk them back to where you started and give them time to get back
here on their own. Remember the spider web.
 If you have come here by considering “we” and “us” instead of simply “me” you will at
least have made an impression and softened resistance. You must expect, though, that
when they come back, their expectations and intentions may be somewhat different than
those you attributed to them. That is the first step in a successful relationship.
 The negotiation is never completed in a strong and well-architected relationship. Think
of it as preventive maintenance. The biggest (and ugliest) surprises happen when we take
something for granted. The name on the relationship doesn’t matter in that case.
These steps will work for any relationship at all. Do you see that this kind of architecting will
improve your business systems as well? Do you think you will have more or fewer
disappointments once you have gone through this process? Recall, too, that we have discussed
some tools that can be put to use as we work our way through the process. The most useful is the
entity-relationship diagram (ERD) notation which can be put to work with a pencil and paper or
a blackboard or a whiteboard…
The best way to avoid and eventually stamp out the waste of subspecialization or me-oriented
thinking is to seek to have only advanced relationships in your organization and in your life.
Advanced relationships must be designed (architected). This is the only way to call out and
document all important expectations and intentions from the parties. Properly done, this is also
going to be a model for emotional maturity for everyone involved.
An architected relationship frequently has considerable mass. This comes from the time and
money spent in the architecting. If the architects have done a good job, the relationship will start
with some velocity from its environment. These relationships are ones that already existed,
unrecognized and undeveloped, in the organization.
Other architected relationships are artificial constructs with little or no connection to the
organizational environment. Occasionally, though rarely, these involve innovation that is
recognized as useful and productive and they immediately begin to pick up velocity. More
usually, they are a mistake, initiated by one party and then “assumed” into existence by that
party. It is easy to find part of a relationship and then make assumptions about the rest. This can
happen when inhabitants do not make themselves available for the interview process,
If you have ever dealt with an architect, you know that they want to gather a lot of information
before they begin the creative part of their job. It’s important—critically so—for the product to
fit the needs of the client. If we sit down with an architect who is focused on his own needs and
only minimally interested in ours, we will be well-advised to continue our search for an architect.
This is no less true for relationships involving data. “What is it that you need?” “How do you
accomplish that?” “What are your biggest problem areas?” “Walk me through that process.”
You client is not concerned with what you need to get your job done any more than they care
about what Accounting needs. The accomplished architect makes everything be about the
client’s needs.
If we use language that is oriented to our processes and activities rather than those of the client,
nobody gets what they need. It should never be forgotten in the architectural process that the
architect must communicate equally well with the client and the builder.
There is plenty of room for creativity when we get to implementation, but the names, roles, and
expectations must belong to the clients or they will have no value for anyone. Let’s look at an
example that is neither simple nor overly complex.

A Data Example
As you know, there is considerable interest today in genes and genetics. An important
component of this interest is the pedigree. A pedigree is the record of the genetic precursors of
an individual. We know of its application to animal breeding. Someone who is breeding dogs or
thoroughbreds is vitally concerned about the pedigrees of the mating pair.
Many people today are also interested in ancestry and actively build and maintain family trees.
A family tree is similar to but not as rigorous as a pedigree. A family tree will often include
family relationships that are legal rather than genetic. For example, a family tree will generally
include marriages and adopted and/or step children, neither of which are of interest to the
geneticist.
Let’s build a database that will let us store and retrieve pedigree data. In order to increase the
market for our product, let’s also stipulate that it may be used to document family trees and that
the customer must be able to extract one or the other without ambiguity. It’s important to
remember that we will never be able to store information that isn’t visible in the database and, if
it isn’t in the model, it isn’t in the database.

Our architects (data modelers) interview everyone involved and come up with the model shown
above. It is understood that Person will have additional attributes such as date of birth, family
and given names, and current contact information. We may also want to keep a date of death as
well as other dates.
If we convert this model to relational tables we will wind up with three tables.
Person (DOB, Family_Name, Given_Name, Death_date, Current_Address)
Maternity (Mother, Child, Genetic)
Paternity (Father, Child, Genetic)
At this point almost everything we know about the relationships (Maternity, Paternity) is implied
by the configuration of the diagram. The modeler (architect) is satisfied that all of the
requirement have been met. If we give this schema to the programmers, we have at best a 50-50
chance of being satisfied (let alone happy) with the result. Unfortunately, the programmers are
more likely to receive a model such as this

or this, which conveys a bit more meaning but would not produce the same database schema.

In the case of the first model the schema would contain a single table
Person (mother, father, DOB, Family_Name, Given_Name, Current_Address).
We will have lost our genetic information. It is possible to tell the generator to implement
relationships as tables. In that case we will have three tables.
Person (DOB, Family_Name, Given_Name, Current_Address)
Mother (MotherPerson, ChildPerson)
Father (FatherPerson, ChildPerson)
The second model will generate either one or five tables (Person1 and Person2 are not real
entities but copies of Person created to make the diagram read better.
The tables would be either
Person (Genetic_Mother, Genetic_Father, Other_Mother, Other_Father, DOB,
Family_Name, Given_Name, Current_Address)
Or
Person (DOB, Family_Name, Given_Name, Current_Address)
Genetic_Mother (MotherPerson, ChildPerson)
Genetic_Father (FatherPerson, ChildPerson)
Other_Mother (MotherPerson, ChildPerson)
Other_Father (FatherPerson, ChildPerson)
About this time someone notices that “Other Mother” and “Other Father” don’t exactly equate to
“Adopted” or “Step” and we have completely misplaced the notion of marriage. Even though
our database could answer the request to generate a family tree for Person A, the information that
makes it look like a family tree, the information that warms it up and adds family to it is
completely missing.
Why is this happening? How did the process fail?
“I have a meeting to go to.” This statement is usually accompanied with a slight emphasis on the
word meeting and a bit of an eye roll that lets us know that the speaker really has better things to
do. If you were thinking about a new dream house for your family and the architect asked for a
meeting, would you reschedule other things and make the meeting a priority? Of course the
architect must respect your time and make the meeting about you and your dream.
People don’t like to be summoned to a meeting where they will be asked to think about
something they had never even considered and then be chastised for not having ready or
appropriate answers to questions about someone else’s dream. If you really want to sour people
on meetings, make them attend a series of meetings that really produce something and then give
the final decision over to someone who was never a participant, has no knowledge of the
meetings and has a completely different focus.
Let me tell you a story about an actual experience.

The Tale of the New Blood Bank System


A large medical facility operated a blood bank. The Blood Bank was accountable for delivering
blood and blood products for the facility’s surgical procedures. There were additional
accountabilities in support of the primary one.
It was determined that the computer-based information system in use was a problem that they
would need to solve. It required substantial training. It was cumbersome to use and thereby cost
time. Most important, it was not able to keep up with the rapid pace of change in the medical
world.
The process to be used for the acquisition of the replacement information system was
 Create an RFP (Request for Proposal)
 Send the RFP to known vendors of such systems
 Attend demonstrations of the vendors’ products
 Select a product/vendor
 Negotiate a price for initial installation and annual support
 Implement the new system
Raise your hand if this process looks familiar. Keep your hand up if you think it is likely to
succeed. For those who still have their hand up, do you see any potential for problems to pop
up? We know that problems always pop up, but it’s the ones we didn’t foresee that cause the
most difficulties.
One problem that fairly jumps out is that this seems like someone in a marriage who has decided
that divorce is the answer. Whether it is or it isn’t, it is important to find out why the marriage
relationship is no longer sustainable so that we can have a better chance of sustaining the next
one.
Everyone could see that technology had advanced beyond the “green screen” and there was an
unspoken faith in new user interface paradigms as a cure for the issues of time cost and training.
There also seemed to be considerable faith that those same technology advances would solve the
problem of system adaptability.
I was part of the team to help insure that nothing foolish was done relative to technology. There
was no recognition in the early stages of any requirement other than “ease of use.” The
assumption was that we could verify that the new system would let us deliver on our
accountability to the hospital. After all, we could talk to other customers and even visit them to
see the system in action.
We even had a jump on others who had gone through this process. For a year I had been
meeting with a group of supervisors within the Blood Bank to create a model of their operation.
That effort was very productive and had even led to some major changes in organization and
process. We had agreed on what the critical information needs were both in terms of the data
chunks and the relationships amongst them. We had posed questions that would need answers
and then found the answers in the model. One such exercise had produced a question that
couldn’t be answered using existing relationships. The supervisors recognized that new
relationships would be needed and that the only way to get them was to change the processes
used. A new organizational structure was created as the best way to make the new processes
feasible.
So we entered the acquisition process feeling confident that we knew our operation and its needs
and capabilities very well and could get the system we needed. We had still not yet asked why
we needed to divorce the old system.
When the question was asked, it became apparent that if the system could change, they would be
OK with keeping it. So could it change? They liked the vendor and had a long history with both
vendor and system. The answer was no and the reason was rooted in technology. The system
had been born in a time before database management systems and was composed of files (many
of them) that were linked (related) via programming logic (code).
Relationships were visible as code but intentions and expectations were undocumented and
therefore not understood. The vendor’s difficulties were exhibited in extremely long delivery
times and high costs for the needed enhancements. In fact the vendor was one impetus for
divorce and remarriage. Of course the vendor had a nephew in mind in the form of a newer
product in their family. We allowed for the possibility but wanted to make the nephew jump
through all the same hoops as everyone else.
We agreed that in order to get the flexibility and adaptability we needed and to get access to the
information without going back to the vendor, we needed a system that incorporated a database
management system. Further, we agreed that it should be an RDBMS (relational database
management system) that used SQL for data definition and access. The rationale was basic—
there were (and are) many products and tools that make finding and reporting data easier and
cheaper as well as internally manageable once we assumed an SQL database.
Weeks and months passed while we interviewed candidates and made site visits. The result was
an engagement to a suitor from a new family. The suitor had all the right stuff and a resume that
was very attractive. We invited him in to meet the family.
Raise your hand if you believe that a long and happy marriage resulted.
If your hand isn’t up, maybe you have some ideas about what could happen to this apparently
well-thought-out relationship. We had our expectations and intentions in order so what’s
missing from our architected relationship?
There are actually two pieces missing. First, we have not addressed the expectations and
intentions of the other party and second, though we didn’t know it yet, maturity was also lacking.
Our suitor was a relatively young, privately held company that had worked hard to build a
product that was robust and flexible enough to earn customers both in Canada and the US. I
probably could have, with a translator to rewrite the labels on the screens, been able to find
customers worldwide. The announcement of their selection and the upcoming marriage made
them very desirable to other healthcare vendors and they soon announced that they had been
bought by a larger family.
At this point, regardless of their expectations and intentions might have been, both became
defined in terms of money. Revenue and profits clearly began to dominate their thinking.
It was also at this point that our own lack of maturity began to make an appearance. Our team of
matchmakers, which had performed well up to this point and had made a good match, was
pushed to the background because, while we clearly knew what we were about with respect to
processes and blood bank accountabilities, we just as clearly could not be trusted when it came to
money.
On our side, the expectations and intentions changed and became about getting the best “deal”
possible. On their side, preserving the engagement to preserve the expectations of investors
became the intention and expectations were confined to confidence in the product’s ability to
satisfy the customer’s blood banking needs. They failed to get to know their betrothed well
enough.
You see, at the money level we were convinced that we couldn’t remarry unless we could
preserve our own unique personality. Since our first marriage had been conducted entirely to our
demands and approval, we behaved as though the new marriage would be conducted the same
way. We believed our brand of blood banking was superior to all others and that, while we
might share certain basic processes with other blood banks, to simply accept those processes as
standard would be to give up our identity as the standard of practice to which all others must
only aspire.
Our partner could see no need to modify something that was already working well for many
others unless the others could be induced to accept the changes as well. Despite this they
continued to promise that they would change if we would only go through with the marriage. It
came to pass that the people at the table were no longer the ones with the power to commit.
Eventually there were promises of changes that should have been seen (were we mature enough
to care) as suicidal.
In the end, they tried as our money people knew they would and failed as the more mature could
see they would. This was seen as a predictable result stemming from not selecting a suitor from
a known family. Too bad, but that’s what happens when you choose a mate from outside the
extended family. I guess Shakespeare would have predicted it.

The Moral
The moral for the advanced relationship and its architecture is this. If what you’re reading here
has been making sense and if you want to bring advanced relationships to life in your
organization and in your life, then it is essential that you give the architecting process a chance to
work. Tragedies such as the one in the story can be avoided and they must be avoided if we wish
to preserve credibility in any relationship.
We can’t be partly or occasionally or mostly mature. Here is truth.
If I withhold truth because I believe you are withholding truth, then there is no truth. As soon as
I withhold truth, you are forced to make decisions without adequate information. Even if you
intend truth, it isn’t truth. When either party withholds truth from the relationship there is no
truth. In the absence of truth there are no good decisions.
In the absence of truth, no relationship will survive.
The winners become the ones with the most money when the dust settles. This may seem
appealing but remember that the winners will have sacrificed their credibility. Having “won” in
this way, they may find it very difficult, if not impossible, to architect a productive relationship
in the future. Everything is going to cost more in the absence of trust.

More About the Architecture Process


This section is intended to flesh out some ideas for data modelers, data architects and others
closer to the technology side of the problem. Others may skip this section without fear of
missing anything.
This example is intended to demonstrate some of the hows of relationship modeling. Here is

another attempt at a model for our Pedigree/Family Tree database. This one exhibits additional
effort and contains some symbols that will be useful to a database administrator or a
programmer. The bottom line is that it enables us to document virtually any kind of relationship
that a person may be party to. We can see who the partner was in that relationship as well as its
duration.
Because we have made each of the original relationships an entity/table in its own right, we are
now able to assert much better control in terms of expectations and intentions. For example, you
will notice symbols at either end of the connecting line which no longer represents a relationship
to us. Now it is simply a vehicle to convey some of the Expectation information.

We indicate the expectation that a Person will be the mother of zero or more other persons.
Another way of expressing this expectation is that a Person may be the Mother of another
Person. In the Maternity relationship we indicate the mother by recording the MotherID which is
really a PersonID as shown by the FK1 notation. ChildID is also a PersonID (FK2). Note that if
the Person is not a mother, there will be no instance of Maternity in which the Person’s PersonID
will match a MotherID. On the other hand we do have an expectation that each Person’s mother
is known. For Pedigree purposes this is essential hence we expect that a Person has exactly one
Mother. This is sometimes expressed as “one and only one”.

You’ll see that there is a Child relationship that does not assume a genetic link. It allows for
arbitrarily defined instance types (ChildType) such as “adopted” or “foster” and provides the
opportunity to record a duration for the relationship so that a person could be a child (ChildID) in
more than one instance. A Person need not have any children, therefore that participation is
optional (zero or more).
The Relationship relationship is included for Family Tree purposes in order to record marriages
or other such relationships of interest. We could record the end of a marriage by supplying an
end_date for the instance. These arbitrarily defined relationship instance may well be of use to
the person researching the pedigree and might be used to record background information learned.
The Notes section here states the expectations with respect to the relationship.
It was my goal here to choose a situation that any reader could understand and provide examples
to illustrate what is involved in “working on” a relationship. There is no single correct solution
and there are nearly unlimited ways to arrive at a solution. What each path will have in common
is that it will have documented as many expectations as possible (or at least as necessary) and it
will have recorded the intentions of the parties in terms of fulfilling the expectations.
The notations used in these examples are useful but you can certainly create your own (although
these have the advantage of having been tested through much use by many people over a
relatively long time.).

Principles for Life in a Data World


We promised in the beginning (Planting Our Feet) to expose six principles for living a
productive life in a data world. To review, the five are:
1. Words are not meaning. Words are not truth.
2. Complexity is now and simplicity is potential.
3. Data has more in common with words than with meaning.
4. Meaning and truth are personal and subjective.
5. Relationships are how we find meaning
6. Technology is about volume only. Everything that is about meaning, truth and
value/quality comes from people.
Let’s take these one by one and relate each to what we have already presented.

Words are not meaning. Words are not truth.


This may come as a shock so brace yourself. Words and language in a human context are at best
imprecise and at worst confusing to the point of inducing harm.
A given word in a language has two kinds of meaning. The word denotes something—that is, it
has one or more definitions. We use a dictionary as a tool when we are unsure about what a
particular word denotes.
A word also connotes something. The connotation is distinct from and may be at odds with the
denotation. Connotation may be implicit within a dictionary definition and is often addressed by
the use of an example sentence. Those familiar with sarcasm will be aware that the meaning
conveyed by a word may be changed completely by emphasis, vocal tone or inflection.
Imagine that you have a pocket full of coins from the nation of Bhutan. What difficulties will
you have when you attempt to use the coins to make purchase in Finland? We understand that
coins have meaning and we—no matter what culture we come from—share the meaning of
“coin.” For the moment, let’s ignore that possibility that a culture may not contain the idea of
“money” or “purchase.”
We can see that it is a long step from the idea of “coin” to one of “value” and that we will need
some training or education or at least some explanation that we can use to convert our Bhutan
currency into something that a Finnish shopkeeper might find to be adequate compensation for
the souvenir we want to buy.
Now remember that a coin has a physical presence and something called “banking” behind it so
that we could go to an intermediary bank and exchange coins of one kind for coins of another.
This process is cumbersome and subject to inaccuracies, both intentional and accidental on the
part of all three parties. We expect some difficulty and we would be willing to put up with that
difficulty to accomplish our purpose.
Now recall that words have no physical manifestation and nothing like a bank that could in some
sense guarantee a fair exchange between transmitter and receiver. Now ask yourself why people
experience such frustration when communication is less than completely successful.
Think of a word as a marker that indicates a kind of territory to be explored. When we put words
together (producing syntax) we are creating a fence or enclosure that further constrains the
territory we want to define. The more the cultural backgrounds of sender and receiver are
similar, the smaller the semantic territory enclosed by our language.
One of the most effective communication tools is the metaphor. Metaphors represent an
impressionist rather than a realist approach to language and communication. The intent of a
metaphor is to better fence the area we intend to work. A good metaphor aids communication by
constraining the number of denotational and connotational choices the audience needs to
consider to arrive at an understanding of the idea we want to communicate.
Why is this principle important? We have all been frustrated by communication that appears to
assume some kind of knowledge or experience that we don’t possess. In spite of this common
experience, we continue to assume that our audience is “just like me.” Fully integrating this one
principle would lead to new communication strategies which would involve several kinds of
feedback loops so that the sender gets clues that the receivers are not locked on to the signal.

Complexity is and simplicity is potential.


We tend to assume complexity. Often we miss the meaning of something when it is much
simpler than we anticipated. We do this because in the vast majority of instances, things are
complex. An excellent rule of thumb is this: Anyone can create complexity but it takes a master
to create simplicity.
You’ve heard it said that a giraffe is a horse designed by a committee. The way we do things
frequently guarantees complexity. In a world in which politics plays such a big part in every
human dynamic we negotiate, we compromise, we concede, we settle… and the result is nearly
always more complex. We begin with, “We should…,” and we end with, “We could…”
There is a world of difference between what we should do and what we, organizationally and
individually, are actually able to do.
We frequently run into situations in which we ask why. Sometimes we are more diplomatic and
change why? to how?. If you have ever engaged in this exercise, you will recognize that,
although these questions are attractive—we want so badly to believe that we can affect the future
by discovering that answer which lies in the past—if we are completely honest with ourselves we
must admit that all the digging we do and all the people we alienate have not helped in the least
to guarantee a brighter future.
In fact, “never ask why” is a corollary to this principle. When we ask about decisions made in the
past, we force people into a defensive posture. Even when we genuinely desire understanding so
that we don’t unintentionally design in the same mechanisms that got us to where we are today,
there is something about asking “Why?” that stimulates the limbic brain in a way that prompts a
fight or flight response. (Bernstein & Rosen, 1989) We can ask whether anyone still present was
involved and then we can go to that person and encourage stories about the early days of x. We
can gain just as good an idea of how x came into being in this way and we never need to put
anyone on the spot by asking why.

Data has more in common with words than with meaning


Meaning associates with data only as we assemble discrete pieces of data into something that
resembles language. Just as with words, each new one adds to the meaning. We ask too much of
our data and we give it too little attention. Handing off data with the expectation that the
recipient will interpret it as we intended will often end in confusion, hurt feelings and anger.
This is part of the why of metadata. The idea has been to attach the metadata context or at least a
link to that context to any information that we pass on.
This is no doubt the reason we have so many problems communicating via email. In email as in
business intelligence, short is good. Application of this principle provides a warning that short
and communicate may be at odds.

Meaning (and truth) is personal and subjective.


All meaning is created in the mind and is a combination of words, gestures, emphasis,
expression, history, emotion, desire, and a host of other dimensions. As a communicator you
have control over some of these depending on the medium.
There will always be more dimensions that you don’t control than those you do control.
Effective communication is difficult even when you know the recipient and nearly impossible
when you don’t. The odds are best for simple concepts (more or less, bigger or smaller, etc.).
The more we embellish or explain, the greater the risk of introducing unwanted responses.
Proper use of the recipient’s native language or dialect will aid in communication. Interpreters
are helpful.
Truth is meaning that is found to be useful. Useful is subjective. If you tell me that the
information I just delivered to you is not useful, you should be prepared to say why or describe
what useful information would look like. I have been asked for a chart showing completely
unreliable (and known by the requester to be so) data in a way that told a story to a particular
audience. To the extent that the audience bought the story, the information was true. To the
extent that the requester got the desired result, the information was true.
How much of the information we see on TV news programs is true? We are familiar with the
idea of slanted or biased coverage of an event by “news” outlets. If we agree with the bias it is
seen as true, if we do not agree we can be driven to apoplexy.

Relationships are how we find meaning


When we only have ourselves to be concerned with, meaning doesn’t even matter. Recall the
Tom Hanks movie, Cast Away, in which Hanks portrayed a man living completely alone (unless
we count a volleyball with a face on it) for years on an uncharted island. He simply did what
needed to be done and had no need to explain or define.
Now imagine that story with a second survivor from the plane. Immediately the story changes as
two people expect and are disappointed, negotiate and are disappointed, communicate and find
meaning for the negotiations and expectations. Across the months and years they have no choice
but to become mature and learn to behave in mature ways toward each other. If they hope to
escape and survive together they can’t allow mistrust within their relationship. Everything must
be out in the open all the time.
Maybe they get away quicker and are rescued sooner. Maybe they can’t rise above mistrust and
one leaves the other behind or simply smashes his head with a rock. What meaning will you get
from your relationships?
Technology is about volume (throughput) only. Everything that is about meaning, truth
and value/quality comes from people.
Many of things we have to do are boringly simple and mind-numbingly repetitive. That’s why
technology was invented. We have come to believe that technology makes us smarter and more
capable. In reality, technology makes us more efficient and possibly more effective by making
needed information more accessible.
Technology can either help us to do a lot more of a good thing or a lot more of a bad thing. It
doesn’t decide which. If we can’t describe what we want to do to another human, we have no
chance of describing it to a computer.

Current Hot Technology


We must recognize that the world of technology and its uses is in constant flux. A principle
characteristic of this turmoil is the daily introduction of “new” concepts and technology. For the
most part these “innovations” (like big data and nosql and even Agile) should be viewed as
marketing constructs designed to capture the attention of people like us in order to sell the idea
that the magic carpet really exists to carry us away from and over all the problems, pitfalls and
traps that make up our daily lives.
Big Data is the idea that very large data sets require different handling than more “normal” size
data sets. This may or may not be true. The human needs and desires involving the data may
well require innovation. Imagine for example, the desire of the manager of a political campaign
to statistically sample trends from a very large population and do it in “real time.”26 How do we
identify important data in the vast stream of all available data, capture it, store it, and make it
available to satisfy the needs of our campaign manager?
NoSql, like Agile, is a response to the need for rapid turnaround in specific situations. NoSql is
really a throwback to pre-relational days when data systems were designed and built for a
specific use. It took programmers to build them and programmers to change them (if they could
be changed). Today, as always, the handling of very specific kinds of data can be facilitated by

26
When the term “real time” first came into use it meant electron-real and not human-real time.
There was a need when designing and building electronics for the space program or for defense
systems, for example, to ensure that one component could communicate with another
instantaneously to avoid creating loops or deadlocks that would render and entire system
inoperable. Today most uses of the term refer to a timeframe measured in seconds (or even
minutes) rather than micro-seconds—literally thousands or millions of times slower.
custom building a data handler. The rules of relational data design need not apply. There can be
no doubt that unusual circumstances may call for unusual solution, however, serious and
protracted thought should be given before abandoning the gains represented by relational data
management. If you are already in need of a programmer whenever you have a new question to
answer from your data or if you hear the term denormalized from your staff, you probably have
already abandoned many of not most of the advantages of relational data management. You may
be in a NoSql situation while still relying on SQL.
Marketing will always result in new names for things if not new things per se. Be assured that
very little that comes onto the market is actually new. Technophiles will always advocate for the
new tool because they know that the next time they interview for a position, all of those new
names will have become essential as a way of demonstrating their commitment to learning and
adaptation. Interviewers typically have no way of ascertaining a candidates level of mastery and
must rely on gauging their level of interest. Your needs are not the needs of the previous
employer and only mastery will give value. Learning to handle every new tool that appears on
the horizon takes a lot of time during which the master would have solved your issue by bringing
the appropriate people together.

Commitment to Action
Every aspect of human life, from the most elemental to the esoteric and spiritual, is plagued by a
tendency to wait for some favorable combination of events. “As soon as…” “If only…”
“When…”
We’ll launch our effort at the right time and until then we’ll keep doing what we have been
doing. There’s a story about a frog sitting on a log who decided, after a long time between
passing flies, to jump off and go to another part of the pond where the insects were more
abundant. “As soon as the sun goes down,” he said to himself.
When the sun settled below the cattails, he prepared to jump into the pond. “Maybe I should
wait until the middle of the night when that big pike is resting.” This thought held him on the
log. At midnight as he once again began to gather himself to begin his swim, he thought of the
raccoons that came to the pond to hunt each night. “It really would be better to wait until dawn.”
As the sun began to shine through the birch trees, thoughts of daytime perils kept him firmly
attached to his log. Day after day, the circumstances never seemed just right until one day a big
dragonfly came by and the frog watched it circle around until it was within reach. At the proper
moment he opened his mouth to snag his meal—and learned that he had become too weak. Now
he saw clearly the value of all of his decisions. They were worth nothing because none had been
accompanied by action. He was committed to eating and staying alive but not committed to the
actions that made those goals attainable.
You are the only one who can do this. If you don’t do it, who will?
Each person and group in the chain from “wouldn’t it be nice if” or “we really need” to “this
isn’t a complete picture” or “this isn’t what I needed” has reasons or rationales for doing what
they did. In most cases the reasons were based on guesses about what was really wanted and
why. Often the instigator of the whole thing spent less than two minutes thinking about what
was needed, why it was needed, and how it would be used. Does that person really expect
everyone down the line to know what’s in his mind? Of course not. They assume without
thinking that those who will produce the result will come back with additional questions and that
the discussion will bring things into focus. It’s true but very sad that those at the ends are often
much more aligned than those in the middle. So called “middle management” are frequently
caught in a no-man’s land where they believe themselves without reliable allies. Truth may
appear non-existent and relationships go unrecognized.
The beginning of the chain expects discussion and negotiation, the end of the chain needs
discussion and negotiation. All of the relationships as we move down the line are undefined with
the possible exception of time box and budget expectations. No one knows the upstream person
or organization well enough to guess what they need and no one knows the downstream person
or organization well enough to guess what they need or what they are capable of. It’s time we
worked on aligning the processes through architected relationships.
If the data architects or modelers worked on architecting those relationships, much more useful
questions and discussion would be the result. If the architects recognized those relationships and
sought information about them as though they existed, the parties would be forced to
acknowledge them as well. This could only improve alignment.
If governance processes included architecting, documenting and monitoring relationships,
organization structure might look and function completely differently.
What if…
An historian compared history to a parade. We, in the present are at the tail of the parade but
clearly part of it. The parade route follows a winding route up a mountain. As we march along,
we can see bits of the past as it passes by. Sometimes we can even see long segments of it. The
revelation is that each time we catch a glimpse of the past, it is from a new perspective. History
seen from the perspective of today’s march may look entirely different than it did yesterday and
so may have a different meaning.
The information technology industry very rarely looks back. The route seems to be changing in
front of us and it seems to take all of our concentration just to stay on the path. It is a shame
because, to see the forests and swamps that were traversed in the past and the way in which they
were surmounted, could help us to see better ways of getting by today’s detours and roadblocks.

Truths
In the information/data marketplace we are accustomed to hearing the expression “a single
version of the truth.” This expression seems like a mantra but there are those among us who
appear to believe that this is a useful goal.
When we’re young and ambitious and all-knowing it can seem like the truth is ours to safeguard.
We hold it out in front of us as we smash through obstacles and crush opposition. At some point,
though, we exchange that initial truth for another—and then another and another. We never go
back to mend the damage that we did with our immature, less developed truths.
Here are some Truths that we can ride to the finish line.
 It’s not about me.
 “I know enough to be dangerous” is a prophecy
 We arrived where we are now by some process. If we ignore that process we will be back
here again.
 “Cynical” is a label rarely applied by the mature person but often applied to the mature
person.
 The quality of data is a useful concept if and only if
1. There exists a standard that can be applied
2. The processes that produce the data are understood and under control
3. The uses of the data are understood and agreed upon
4. Issues concerning quality are based on exception(s) to one or more of the above.
 Governance of data or anything else is for the purpose of predictability and consistency.
 Governance is by the consent of the governed. That is, those within the scope of governance
efforts must perceive that they are benefitting from the governance. Without such consent
we have a domination system. In a domination system quality is whatever the person at the
top says it is.
 Management of data or anything else is for the purpose of effectiveness and efficiency.
These may be reflected in lower costs but this may not be the case in suboptimized scenarios.
 Leaders, like teachers, emerge when they are needed and it is incumbent on the rest of us to
acknowledge them.

Your Mission
You’ve decided you are the one to take on and solve all those data quality problems that plague
your organization. Well, somebody has to do it, right? I mean the lack of data quality is costing
us a lot of money. Holding on to that money for the company would mean a more profitable
company and who doesn’t want a better profit margin?
There are a few things you should know before you commit yourself to this.
11. Data Quality is like dusting. Some of you will understand this and others won’t. What I
mean to say is that there will never be an end to it.
5. Data Quality is NOT about technology. Technology is the spotlight that makes [the lack
of] data quality so visible (and so expensive).
6. In times of rapid change, Data Quality issues are inevitable. Today, a company that isn’t
changing is dying so… The good news is that if you get good at this, you career is as
secure as any career can be.
7. Even though Data Quality is NOT about technology, you’re going to find that you will
need a very good foundation in technology processes and especially the processes
employed in your company in order to have any chance of identifying the right places to
apply pressure.
8. An approach to Data Quality that goes from one bunch of low-hanging fruit to another is
going to be a net additional cost (not what you want to be associated with).
9. In practice, there is no difference between a Data Quality program and a Data
Governance program. The goals are only slightly different and the methods may be
identical. The goal of Data Governance is the establishment of an auditable process
leading to consistently high quality and reliable data. The goal of Data Quality is
consistently high-quality and reliable data which will require auditable processes.
Auditable means provably consistent.
10. You are going to learn that no one (and I mean absolutely NO one) wants to talk about
data. Learn to talk about other things and use them to illustrate the concepts you want to
teach.
11. You will not be able to do this alone. It’s going to take leadership on your part to
mobilize support and participation across the company.
12. It’s a really good idea to be able to communicate the data quality vision for consumption
by any audience. This will require you to be able to express it so that your audience gets
it (in their language, using their metaphors…)
13. Finally, your Vision is the only thing you will have to sustain you in this so make sure
that it is clear in your mind (and heart).

Two Approaches
There are two ways to approach data quality. Both involve a process that looks like this.

The difference between the two is that one deals with some instance of data (customer, patient,
procedure, visit, lab, order, invoice, etc.) while the other deals with the processes that surround
any data within the organization. The difference lies in the word “this” within the decision that
“We have to do something about this.”
The one that is used most of the time is to attack a specific issue that has become apparent in
terms of its cost to the organization. This is repeated as new issues arise. The other is used only
in organizations that are high-functioning. These are the organizations that have adopted a
capability/maturity based vision for themselves and are on the path to the Malcolm Baldridge
Award, or ISO quality certification or CMMS Level 5. The diagram shows, in pyramid form, the

Fig. 1 Costs (and Returns) increase as we address more foundational processes


relationship between the targets of the two approaches.
Efforts focused on the top of the pyramid (bad customer addresses for example) are generally
less costly on a per instance basis and can be turned around faster. While efforts involving the
lower segments of the pyramid will have broader scope and higher costs. In both cases, the long-
term rewards are proportional to the cost.
Resolving bad addresses to reduce the cost of mailings, for example, will be effective for only a
relatively short time before it must be done again. To extend the time before a new project team
must be formed to repeat the fix, we could move downward to (for example) the Technology
Processes. We could institute checks to insure that validity of data (such as customer address)
must be guaranteed by the front-end (user interface) of the data collection system.
Note that the sharp end of the pyramid can be attacked independently while attacking lower
(more foundational) levels will require coordination. For instance, it would do little good in
terms of reducing mailing cost to institute processes to guarantee good addresses on input but
leaving existing bad addresses uncorrected. All this would accomplish would be to fix the cost
at a certain level.

Selecting the Best Approach


In general, the lower down we go on the pyramid, the higher we have to go on the org chart to
gain the needed level of support. Remember that “best” is a quality assertion. It means
something different to each person you meet so the object is to gain a meeting of the minds
(consensus) on an approach.
You don’t have to work too hard or be an expert in data management to convince a manager that
their department costs can be dramatically reduced with a two-week project that identifies
unusable addresses and eliminates them from the database(s). If they complain about losing
contact with customers and say that the addresses must be fixed instead of deleted, then the two-
week project becomes an eight-week project with additional expenditures for specialized address
validation tools.
If the manager agrees and asks, “So the problem will be gone for good, right?” then we will have
to fix the systems that allow the bad addresses through to the database. Now we have a three-to
six month project involving a bigger team. We’re still OK though because the goals are well
defined and the end state is understood.
If the issue becomes one of not producing (or accepting) systems that allow bad data values into
our corporate data resource, we have moved to another level in the pyramid. We’re asking for
better requirements-to-design processes as well as better testing processes on all system
development and/or acquisition. It’s only fair to warn you at this point that here is where you
can easily get in over your head.
The development (programming) staff is going to claim that they can only design using the
specifications they receive from the business side or from the data architects. They’ll be right of
course, but you can’t let them off the hook now. They need to create and adopt processes that
leave and auditable trail of documentation. You will have to define what “auditable” means.
You should also be aware that the new processes are going to evolve rather quickly for the first
6-12 months (or the first 2-3 projects). The processes will have to be designed to incorporate this
evolution.
You may have been able to hold the line at the technology process level for a while. Eventually,
though, the need for process change is going to involve the data management process level. In
truth, you may want to get them moving as you initiate the technology process changes. Even
with little or no background in data management, a review of data documentation (meanings,
usages, relationships) is going to show that the programmers have several valid points. In many,
if not most cases, the requirement for definitions for data objects appears to be satisfied if there is
any text at all, however meaningless or useless, in the description property. It will be
exceedingly rare to find any descriptive information about a relationship beyond
mandatory/optional and cardinality.
Once again, the data architects will attempt to lay the responsibility for this state off on the
business architects. Don’t allow this. Data management must also design and implement
auditable processes that establish criteria for completeness and ensure that the criteria are met.
Failure to accomplish this and do so in a way that allows for evolution will lead to the collapse of
everything above.
When you reach the business process layer, you’ll be dealing with questions from the COO or
CEO and they’ll want different kinds of answers. These folks have also grown adept at detecting
fuzziness of thought and verbal arm-waving. They won’t be easily distracted by canned
responses and will want more than “this is standard practice.” They may even ask whether a
standard practice can be implemented “here.”
You can draw the line anywhere that is comfortable for you. The thing to keep in mind,
however, is that you may not get to draw the line. Maybe the line has already been drawn and
you’re trying to move it.
Selecting the approach and scope that’s right for your situation depends on the entire
organizational context including key personalities who must be involved. The deeper you go
into the pyramid, the more accomplished you must be, the more knowledgeable, the more
credible and the more persuasive.

Recognizing Root Causes


You are going to be told several times every day that x or y or z is the cause of the data quality
issue you are working on. Nine times out of ten this will not be the cause. How will you
recognize the real cause so that you can save all the effort of attacking the other none only to find
that the quality issues persist?
That was the purpose of this book. We want you to understand the historical causes of data
quality issues because those causes have never been eliminated. Here’s one heuristic to get you
started: the technology is never the cause.

The Process
Here is a process for reducing costs, eliminating complexity, asserting control and living happily
(if not joyously) with your data. At some point in your life you have heard (or maybe even said)
that the bigger (read more mature) person must take the first step in mending a relationship. This
is absolutely true and you are now that person, having come this far with me. That means that
the next step is ENTIRELY YOUR RESPONSIBILITY. The people and organizations (be they
neighbors, coworkers, teams, departments, divisions or competitors) may not even be aware that
they are in a relationship with you. Were you aware when you started reading this book?
The carrot dangling in front of you consists in
 reduced frustration
 better communication
 fewer (ugly) surprises
 improved efficiency and effectiveness
 reduced costs (both in money and energy)
If these are worth it to you then you’ll want to start with these steps:
 Make a list of those relationships. (Maybe you want to focus on just one at the
beginning.)

 Name them as they are now and as you want them to be.

 What are your expectations for each? Again, it may help to list current and desired
expectations.
 What are your intentions for each? Current and desired is still a good idea.
 What information do you need to keep about each? For example you may want to have
accessible the inception date of the relationship (or anniversary), schedules, net payment
requirement, etc.

 Who is the person you want to partner with? Recognize that organization-to-organization
relationships are always based on one or more person-to-person relationships. Even a
historically bad inter-organization relationship can be re-architected and restored by
choosing different people to be the core. These people need not be visible on the org
chart. It is only necessary that they be opinion leaders.
 Are your expectations and intentions different for this person than for the relationship? If
the problem relationship involves a department, for example, you may pick out a person
in the department who is an opinion maker and build a relationship with that person as a
first step in getting to the inter-departmental relationship that is the goal. Remember that
group that you wanted to be part of when you were a kid? You could find a way in by
making friends with one of the members and letting them bring you in. That’s what
we’re talking about here.
 How formalized does the relationship need to be? We often go too far in formalizing,
thereby setting the stage for the eventual breakup. Intentions are key here. Do you
intend to enforce all expectations all the time? If so, you need a contract which is a
special kind of relationship that can be enforced in court. If, however, you intend to
communicate in those instances when expectations are not met, and to negotiate
improvement, then you need much less in terms of formality. Beware of formality. It is
much easier and less costly to get what you need by talking about it over coffee than by
using attorneys and courts.
 What do you know about the potential partner?
 What do you think their expectations and intentions are?

 What do you want them to be? Look at your expectations and intentions again. Are they
still reasonable?
 When you are clear about the relationship you want and why, approach the other party
and lay out your cards. Show them the steps you’ve gone through and the carrots that are
motivating you. Do not try to “close the deal” at this point. It took you time to get here
and you may have to walk them back to where you started and give them time to get back
here on their own. Remember the spider web.
 If you have come here by considering “we” and “us” instead of simply “me” you will at
least have made an impression and softened resistance. You must expect, though, that
when they come back, their expectations and intentions may be somewhat different than
those you attributed to them. That is the first step in a successful relationship.
 The negotiation is never completed in a strong and well-architected relationship. Think
of it as preventive maintenance. The biggest (and ugliest) surprises happen when we take
something for granted and the name on the relationship doesn’t matter in that case.

Epilogue
Those who become lost in the wilderness always find themselves going in a circle. It can be
extremely debilitating mentally and spiritually to invest yourself completely in something for an
extended period, risking your well-being and your career, only to find yourself back where you
started.
The first law of survival when you are lost is to find a place that is safe and offers the possibility
of water and food AND THEN STAY PUT! Wandering in circles is good exercise but it also
represents risk. What if we get stuck in place that doesn’t offer the basic survival needs?
You are going to become lost. This book should be considered a basic survival guide. Equipped
with it you should be able to recognize your surroundings and situation. Using it, you should be
able to plot a path that will get you closer to your destination.
At the very least, it is evidence that someone has been here before and got out, scarred but alive.
Consider this, then, the blaze mark on a tree or the message in a bottle. Someone has been here
once and gotten through relatively intact and able to describe the journey. Those entering this
wilderness would be wise to listen. There are plenty of hazards not discussed here but heeding
this advice will help you to get farther faster and you may be the one who conquers the
problems.

Appendices
The following appendices are offered as bread crumbs to be followed for those interested in
seeing how this book came into being. They are a collection of writings that represent
developmental steps. As such they may be useful to provide a stable foundation for some of the
intuitive leaps. If you need them, here they are.

Appendix A: Analysis of Error (1995)


We often seem to believe that errors are caused by some lack or fault on the part of the
“responsible” person. In reality, responsibility for error is a wild goose chase. It’s a chase best
not begun unless we intend to run the quarry to ground by getting to the root cause. No one sets
out to make mistakes and eliminating mistakes is not possible. The best use of time and
resources, therefore, is to devise methods and processes that catch mistakes as early as possible
(even just before they happen) and contain the damage.
An error is described across several dimensions including (type, loss status, loss value, ability to
mitigate, and possibly others). This applies to known errors. It is only when an error occurs
repeatedly that we can justify the resources to eliminate it. This is why Special Causes of
Variability are ignored in the Statistical Process Control model. Could we have predicted it?
Could we have avoided it?
Occasionally the potential damage or loss involved is of such severity that we are forced to take
its avoidance into account in our process design.
I. Types of errors
A. Errors of omission
1. I didn’t do something I should have done
B. Errors of commission
1. I did something I shouldn’t have done
C. Errors in interpretation
1. I took that to mean ...
D. Errors in communication
1. I thought you said ...
E. Errors of generalization
1. Failure to recognize the special case
2. Prejudice — All clouds are white
F. Errors of specialization
1. Failure to recognize commonality
2. My job is unique. We are different.
G. Errors of ignorance/inexperience
1. But, I thought ...
2. Next time, ...
H. Errors of rejection
1. Opportunity knocked but I wouldn’t answer the door
2. I should have…
I. Errors of acceptance
1. I never checked. He seemed OK.
J. Errors of oversimplification
1. Integration is easy, all we have to do is ...
K. Errors of over complication
1. Failure to recognize progress
2. We’ll never be able to do all that.
L. Errors of haste
1. The light was about to change.
2. I didn’t read the directions.
M. Errors of delay
1. If only I had sold yesterday.
2. What do you mean, it’s no longer available?
N. Errors of sequence
1. First light the match, then turn on the gas.
O. Errors of intent
1. Deliberate (Watch this!)
2. Induced (Watch this. Oh, Joan, ...)
II. How to recognize when an error has occurred
A. Predictive
1. Uh-oh, I’d better change that
2. This won’t work
3. This can’t work
B. Retrospective
1. This hasn’t worked
2. Look at this garbage
C. Real-time (the worst/most expensive kind)
1. Oops (the best of the worst)
2. This isn’t working
3. I was expecting... (Something may be wrong.)
4. I thought we’d be done by now. (We’d better check to see if everything is
OK.)
III. The proper corrective measure
A. Does corrective measure depend on error type?
B. Does corrective measure depend on recognition mode?
C. Which is more important/useful
1. Error recognition?
2. Error classification?
D. Recognition to minimize damage
E. Classification to
1. maximize predictive recognition
2. avoid need for future recognition
3. avoid risk of future damage
F. Some errors can’t be corrected
1. Accumulated damage = 0
2. Some errors can’t be remediated
G. Accumulated damage reduced
IV. Accumulated damage: What we do next depends on what is happening now.
A. 0 (What does the future look like?)
1. Low probability for > 0
a. Assess worst case
b. Establish upper bound on damage
2. High probability for > 0
a. Imminent
b. Plenty of time
3. Probability changes with time
a. increasing complexity
b. Increasing dependence
B. > 0 (What is the trend?)
1. Static
2. Getting better
3. Total damage
4. Rate of accumulation
5. Getting worse

Appendix B: Blogs, Essays and Articles


The following are included as an aid in constructing that fence around the truth—the meaning—
that is the objective of this book. No single article, chapter, or paragraph can communicate
clearly the truth about data, information and quality but perhaps a sufficient number organized
around the theme can do what one cannot.
Introduction
I never had any intention of becoming a blogger—not that there's anything wrong with blogging.
It may be too little, too late and I certainly feel like the boy with his finger in the dike, but there
needs to be a voice of reason in cyberspace when it comes to the latest HOT topic, Business
Intelligence.
I am leveraging more than 20 years of data management experience here, so I'm going to try to
avoid the bling and get to the meat. Since so much of the BI (short for business intelligence)
stream is bling, frills, bells and whistles, I should be able to keep these relatively short.
I'll inaugurate this spot with the assertion that if there is no meat to what you're seeing—if the
information isn't "actionable", then it doesn't matter how it's presented. That means that if you've
blown your budget on presentation (ex., dashboard) tools but you've never spent a cent on a data
quality assessment and you don't have process consistency or even documented processes, then
you won't get intelligence.

More on Perspective
I worked for a company that had the hardest time getting new things started on schedule. Some
new product or line of business would be announced with a grand opening date. Often the date
was only a few weeks away. The problem was—over and over again—that the people who had
to actually make the new thing happen found out about it at the time of the announcement.
So guess what happened? Either the schedule slipped and slipped again or the doors opened on
something that was incomplete and holding all the pieces together was extremely hard on
everyone involved. Is that what you guessed?
The root cause of this, as it turned out, was lack of the needed perspectives. Highly placed
persons believed that they could make all the commitments for many of the enabling functions.
In principle, this was true and needs were recognized and responded to AT A HIGH LEVEL. As
any general or coach will tell you, the best strategy is only as good as the troops or players who
have to execute it.
When we finally got a group together to analyze the situation, it became clear that no one person
could know all that must be known to plan and implement the project. What was required was a
meeting—as soon as possible—of representatives of all the business functions involved as well
as those who are involved in everything (facilities, telecom, network...). All the perspectives only
emerged in a group setting where people with specialized knowledge could bounce ideas around.
Personal perspective expands in a group setting.
Again I ask, what perspectives are required for a successful BI

Tools and Products


Very briefly, I have been seeing many instances of confusion about tools and products. If you're
going to make chisels or scissors, you really have to keep the eventual product firmly in mind.
A tool maker who loses track of the application for his tool runs the risk of producing something
that has no utility. Something that has no utility or whose utility can't be easily recognized is not
a tool. It may be an art object or it may just be a poorly designed piece of junk. It may also
simply become so costly that the product buyer can no longer afford it.
How good does it have to be? Good enough is the right answer. As with so many things in life,
the right answer isn't nearly as satisfying as we might have hoped.

Questions and Answers


No learning is taking place if there are no questions.
How good does it have to be? Good enough.
Good enough begs for more questions. Who will decide? What will they base the decision on?
What will be used for comparison? What is the standard? Each of these questions will lead to
more questions and—here is the key—someone will tire and yield the field.
The answerer, let's call him A, and the questioner, Q, will continue the dialogue until either A or
Q tires or loses interest or runs out of time. When the process doesn't run to completion the
result—let's say BI, but it could be almost anything—will be based on whatever perspective Q
has developed from the answers already received.
But here we are back at "how good does it have to be?" How many questions and answers does it
take? Of course it varies with the quality of both the questions and the answers. Some people are
good at 20 questions and some never get it. In order to zero in, we should look at the track
record.
If you're A and you're never satisfied—never get what you need—no matter who is in the Q role,
then I suggest that we focus in on A as the probable source of whatever problems there are. If
you're Q and what you're delivering is never good enough, then we should focus on Q.
What's your track record? Can you offer any insights? Do you need to stick with it longer? What
can you do to keep the other guy engaged longer?

Information or Systems?
For those of you who work in I.S. or I.T. or any of the variants, a question. Is your organization
about information or is it about systems or maybe it's about technology?
My sense is that many more people are in it for the systems (programming) or the technology
(networking, servers, wires, boxes) than for the information.
Just to get all the cards on the table, I'm asking this from the perspective (there's that word again)
of someone who has been having his nose bloodied for years because of a stubborn streak that
keeps on insisting that it is about the information and that everything else is supporting cast,
walk-ons and extras (the Academy Awards are a recent memory).
The term, ontology, has become trendy in the relatively recent past. It simply means a
specification for a concept. A concept is an idea and many times it never progresses beyond that.
Rarely, an idea like freedom or liberty needs little or no specification to make it useful, Many
ideas, like stewardship, on the other hand, need quite a lot of specification before they become
useful.
Information Systems/Technology appears to be in need of some ontological work. I.S./I.T. are
ideas that require a context. They are found in the context of a business. The business, in turn,
has a context but we don' need to go that far for the purposes of this discussion.
Businesses need to produce, dispense, store, manage many kinds of things and all of them are
physical save one—information. Because information is a concept in its own right, it quite often
gets pushed out of the way in favor of the physical things that compete for our attention by virtue
of form, color or sound. These things require physical space and unless we do something, they
will soon pile up and make it impossible to get anything done.
Quiet information or data, on the other hand makes no demands and is consequently ignored.
Remember, though, that the business has an I.S. or I.T. organization because every now and then
someone needs a specific piece of data or a chunk of information and needs it now. Sometimes
the data has just come into existence and other times it has been languishing in a "data file" for
days, months or years.
How do we find that set of ones and zeros and turn it back into the concrete abstraction that the
business needs? Friends, that takes data as well. Device names, drives, folders, files, instances,
records, fields, indices, values—all of that is data. In I.S., we understand the need to keep that
kind of data reliable. We create systems and they are data as well. We understand the need to
maintain our system data: product, version, build, component, QA status... and the implications
of not doing so.
Frequently the Data people (data administration, data architecture, data stewardship, data
governance, database administration...) are part of I.S. or I.T. and we're content with that as long
as they are directing their attention outward, toward the business. As soon as they begin to
exhibit interest in us and our handling of our own data, we start to feel resentment, frustration
and even anger. "Who are they to tell us how to do our job?"
Friends, and I am sincere in my use of the term, programming is programming and data is data.
The Data people can help you and they want to help you and, most of all, they need to help you
in order to close the loop. They are being held responsible for the quality of the data resource and
the processes that create and manage the resource. You represent a huge exposure as far as they
are concerned. When you re-learn to associate your system with the information that flows
through it, I hope you will also learn to value what the Data people are offering.
Information Systems, Information Services, Information Technology: let's refocus on the reason
and purpose of those efforts. You can benefit from the consistency that results from standard
processes. You can benefit from better data management capabilities. We can all benefit from
understanding our shared purpose—the best information for the business we're part of.

Governance and Data Governance


It occurred to me today that one of the reasons for so much confusion in the data governance
ranks today is that businesses have a hard time with governance in general.
The literature I've seen (and I'll admit I'm not ready for a thesis defense) is focused on what I
would call the mechanical aspects.
 What kind(s) of committee(s) and at what level
 Who should be members
 What are the roles of the committee(s)
 Who holds the "decision rights"
 What are the decision domains
When discussing committee membership, the choices are framed in terms of role and level
within the business.
Nowhere is the question of competency introduced. To me, this is cause to wonder about the
purpose of governance. If competency is not a requisite quality of governance, then why do it? It
seems clear that, even in areas that most would gladly cede to executive management such as
strategy formulation and prosecution, there is an aspect of competence that, if missing, will cause
decisions to be ineffective and/or impossible to implement.
We are accustomed to seeing leadership, management and governance used interchangeably
when, in fact they are three different activities with three separate purposes.
1. Leadership has the mission of (productive or positive) change.
2. Management has the mission of effectiveness and efficiency.
3. Governance has the mission of stability, consistency and predictability.
I submit that questions about whether a certain initiative should be funded or not is NOT a
question for governance but for management. The governance question is whether doing this will
upset the applecart. Can we continue to produce expected results if we do this? If we need to do
this because of a leadership imperative, how can it be accomplished such that predictability is
preserved?
Without intending any disrespect to executives, it is doubtful whether they could productively be
involved in answering those questions.

Oxymorons in Abundance
No Governance in Data Governance.
No Intelligence in Business Intelligence.
No Leadership in Corporate Leadership.
Let me hasten to say that these are not intended to be value judgments. To be fair and truthful
(which are tough sells) I should modify these statement a bit.
1. There is little governance in data governance, little intelligence in business intelligence
and little leadership in corporate leadership.
2. The question to be asked is not "why" but "how". Whenever we are faced with something
unexpected, we are used to responding, "Why?" I've learned that why? almost always
puts people on the defensive and that communication effectively shuts down when people
become defensive.
If we change our question to how?, we can focus on processes and look for cause rather than
fault. So, how does it come about that data governance so often lacks any vestiges of
governance?
The first task is to differentiate between leadership, management and governance. This isn't
about people—an individual may be capable of doing all three—but it is about tendencies.
Here's a breakdown that might help this to make sense.
 Leadership is about change.
 Management is about effectiveness and efficiency.
 Governance is about consistency and stability.
My take, after watching these processes work for many years, is that we find it nearly impossible
to keep these three functions separated. When we are doing management while talking about
leadership or doing leadership when we are talking governance, we not only confuse ourselves
but also the community we are working within.
For example, we make a leadership decision to create data governance. Then we turn the task
over to managers. In reality, leadership is required all the way out to the line organizations.
Managers cannot make effective or efficient something that they do not understand and have not
bought into. A too-early transition from leadership to management will give birth to confusion,
frustration, burn out and the failure of the initiative before it even gets to the governance phase.
Continued tomorrow.

Oxymorons in Abundance (continued)


Leadership is about Change.
Management is about Effectiveness and Efficiency.
Governance is about Consistency and Stability.
The discipline of Change Management is well worth our time. It consists of theory and practice
involved in getting a change implemented.
The diagram below makes an excellent reference whether you are a Manager, Governor or
Leader. In everyday life, we can find ourselves in situations where there is an atmosphere of
confusion around some initiative. As the chart directs, this is evidence that the vision driving the
change has not been communicated well enough. In other words, the effort still needs leadership.
If the general mood is one of frustration, then this is where the manager must step in to make
certain that sufficient resources are available.
To understand the role of Governance better, let's examine the life of a guerrilla movement such
as the early days of our own American Revolution. The various colonies existed under charter
from the King of England—not unlike departments in any corporation. As far as the Leadership
was concerned, the colonies' purpose (the vision) was to produce wealth to allow expansion of
the empire. The colonies were OK with that since they were able to keep a portion of the wealth
for themselves.
The Leadership had placed Governors in each of the colonies to make sure that everything
functioned well and that the colonist/workers could and did focus on their production. Leaders
like to be able to rely on their empire to generate wealth consistently and predictably. The
problem was that the Leaders and the Governors were out of touch with life at the boundaries.
Governance attenuated pretty rapidly as you moved westward, away from the seaport
communication links and seats of governance.
It was hard to focus on making the leaders wealthier when you had to worry about coming back
from a hard day in the fields to find your home burned down and your family gone. Even a
relatively small thing like an illness meant that you might have to cease all "normal" activities in
favor of ministering to the ill or traveling days to either bring the ill to a physician or the
physician to the ill.
In all of this, people learned to manage their own situation. In so doing, they developed their own
leadership and governance skills and processes. They learned to manage the processes and
assessed their effectiveness by the stability they delivered. Ineffective processes were modified
or abandoned.
The lesson here is that there is little need for governance in the boardroom. Governance is most
valuable at the boundaries, where control is weakest. If there is no stability, no consistency and
no predictability at the boundaries, then there is no governance. If Leaders attempt to hold
Managers and "official" Governors accountable for something that doesn't exist, they will find
themselves more and more out of touch as the accountable individuals scramble to create
evidence that they are being effective.
What is data governance accountable for? What do we measure? What are the goals? What are
the trends? Is business intelligence defined for data governance? Do we have data governance?
Tomorrow: Measuring Data Governance

Oxymorons in Abundance (part 3)


Yesterday I stated that governance already exists and we ignore it to the detriment of all. A bit
more discussion on this will be helpful.
Back to the frontier first. A settler had the problem of governing the household. There isn't much
that need be said about that since we still deal with that issue today and probably in much the
same ways. When a settler (at the boundaries of the empire) encountered others like him/her,
they had to come up with ways to govern those relationships. I'm not going to go into those
mechanisms here—cultural anthropologists have published a lot of theory and case studies about
this process. I will say that whatever was negotiated fell into one of two broad categories:
One party was clearly dominant and dictated the terms
The parties created a contract that was seen as mutually beneficial.
Even in the cases of mutually beneficial agreements, there is often a competitive aspect. One of
the parties will think, "Well, I had to agree to this but I'm going to stick to the letter of the
agreement and if they think I'm going to go out of my way to make their lives easier, they'd
better think again."
Fast forward to our modern equivalent. It's easy to see that, at the boundaries of our corporate
empire, where the other party is the source of revenue (client or customer) this is counter-
productive and a sure path to failure. It isn't quite as easy to see that the same holds true at
organizational boundaries and, most importantly, process boundaries. Because of the way
businesses operate culturally, each employee is competing with every other employee in the
same way that settlers at the frontier were often forced to compete with each other.
The result (Dr. W. Edwards Deming exposed this very effectively) is sub-optimal performance
for the process, organization and empire. We can hear this any time we choose to listen. "I know
that data is incomplete, but I don't have time. They're the ones who need it—let them clean it
up." "Yes, it's wrong but it's what they asked for. They're the ones who will have to do it over."
So here is the role of a governance program. Governance exists, but its goals are not those of the
process, organization, empire. In order to replace naturally occurring, organic governance with
governance that is aligned with the corporate vision and carefully designed to further the
strategies and goals of that vision, the organic governance structures, including the attitudes that
created them, must be identified and understood. Then they must be replaced with equally
effective structures.
Are you familiar with the idea of a cow path? This picture shows cow paths—no, not the broad
"road" that the cows are on—the cow paths are those faint lines that meander across the fall line
of the hill. Why are there so many? The cows have their reasons but they aren't talking. Those
paths are like the organic governance in your organization. You've heard the expression paving
the cow paths, which is not considered a good thing to do.
I'll leave you with a final thought. Is redesigning the cow paths a productive effort? The answer
is that it may be and it depends on the objectives, BUT if there is no way to train the cows to use
the new 21st century solution, then even the best vision, strategy and goals will be for naught.

Measuring Governance
I apologize. I said I would address this yesterday. We do have to get back to the guerrilla
movement on the frontiers of the empire, but let's take a little time to look at measurement of
governance.
First, let's agree that data governance is like any other governance except that it focuses on data.
A governance program directed at process or at competency or whatever, would have the same
characteristics? OK, I'll attempt a justification for that statement.
What do we ask of a data governance process? What are the objectives? By the way, I use the
term process here in the sense of a set of activities that are ongoing and have a consistent
purpose. The purpose of the data governance process is to:
Optimize the value of the data resource by insuring that the capture, storage, retrieval,
destruction and use of the resource is done in accordance with established policy, procedure and
standards.
Do you buy it? If not, I'd be pleased to discuss alternative purposes, but the remainder of this
discussion is based on this purpose.
Based on the purpose of data governance then, several perspectives on measurement suggest
themselves. The most obvious one is the QA (quality assurance) perspective. How are we doing
at following established standards? It is tempting to count the number of standards, policies and
procedures because counting is easy to do and there is a tendency among the governors to equate
many laws with good government. Strangely enough, among the governed the emphasis is on the
quality of the laws rather than their quantity. A small number of effective and easily understood
standards may deliver more benefit than a larger number of over-specialized or esoteric ones.
The most effective measurement will be part of the standard or process itself, but some
organizations may find it useful in getting governance going, to do retrospective analysis to see
how well/consistently processes are being applied. Health care makes extensive use of the "chart
review" to gather this kind of data retrospectively. Measurement intrinsic to the process or
standard has the potential to be much more nuanced and useful than that done retrospectively
simply because all of the context is available.
Clearly, though, the nature of the metric(s) is very much determined by the process or standard
itself. For this reason, it makes no sense to discuss metrics or KPIs (key process indicators), a
special kind of metric, without first establishing the process context.
Other perspectives might differentiate among standard, process, and policy or might measure in
conjunction with the data life cycle, specific subject areas or specific usages.
One last point, should you be tempted to think in terms of measuring accountability.
Accountability in the absence of a standard is really approval.
No governance mechanism can exist for long based on approval. Each change in "leadership"
will create massive turmoil as everyone seeks to reorient to a new approval model.

Pragmatism and Sacred Cows


Something happened to me today that has happened before. I thought I had filed the barb off the
hook and that I wouldn't be snagged by this anymore, but I was wrong.
I made the statement that my work would be done when people no longer talked about data
governance having realized that all varieties of governance are "governance." The response was
to remark that statements like that are blue sky and that people will turn me off and look for
someone more pragmatic.
Knowing, as I do, that the statement represents the ultimate in pragmatism—the application of
what is known to make sense of what is unknown—I proceed (very quickly) through several
emotions. I guess I have blunted the hook to the extent that I recognize the emotions as they
appear—which is good. Where I wound up, after passing through the "I don't need this" phase,
was standing in appreciation of Galileo, Darwin and Einstein.
I'm not comparing my contribution to theirs. I am comparing my emotions to what they almost
certainly experienced as they struggled to break through the thickest of all possible walls—
common knowledge.
Sometimes we have other names for common knowledge. We say "everyone knows that..." or
"best practice states..." or any of a host of other code words for what amounts to a sacred cow.
I have learned that the words "practical" or "pragmatic" or even "heresy" are silver bullets most
often applied to protect the sacred cows from attacks by, well, me. I just want you to know that if
you want practical solutions, you should get as far from best practice proponents as possible.
You might even want to venture out to the boundaries of your organization to see what pragmatic
really looks like.

Guerrillas and Governance


Back to the American colonies in 1775.
Over the course of nearly 200 years, the colonies had grown from the equivalent of a handful of
lemonade stands into a dozen or so franchises with new locations opening up in the undeveloped
markets to the west. Since the appointed governors liked comfort and their delegates had few
qualifications beyond their socio-economic status or their relationship to someone important, and
since communication was so poor (or slow, which amounts to the same thing), governance was
like the light of a lantern.
Close to the governor things looked good but you didn't have to get very far away before the
light began to fade and things became less and less distinct. By the time a traveler left the last
organized settlement, he had to be prepared to take care of himself.
The problem for the governors—and the leader back at empire headquarters—was that those
people out there got to be pretty good at taking care of themselves and actually began to enjoy
being out of sight and out of mind. When the leader needed to have better productivity from the
empire, there was some push back.
The governors were under pressure to produce more and the workers out there on the boundaries
were pretty satisfied with things the way they were. The governors asked the leader for more
resources to improve their ability to govern. The leader sent in mercenaries to quiet things down
and implement better governance, by which was meant to suppress dissent and put everyone
back to work for the king.
Of course, the outsiders didn't know anything about the locally designed governance and stepped
all over it because they only had one tool, force.
Enter the guerrilla. There are leaders everywhere and sometimes they have a vision for
governance and the future. When the locals see a better life with the guerrilla governance than
they do with their formally appointed governance, the guerrillas will eventually have their way.
There are many points to be made from this story but I leave them up to you. This is history, but
it's also today.
The one point I do want to make is that governance has to work for the governed as well as for
the empire and that replacing something that is poorly understood but seems to be working may,
if it is not handled well, produce more harm than good.
As always, your comments are welcomed and appreciated.
Practical, Pragmatic, Productive
We kind of like to think of these three as synonymous. Don't they all require a frame of
reference, a context, a perspective? Certainly they do if we want to think of them as synonyms.
Further, in order to be synonyms, wouldn't they all need the same frame of reference, context,
perspective?
I've noticed that people have a hard time with those (like me) who split semantic hairs. And yet,
the long-term success of anything relating to data depends on grasping the nuances of differing
frames of reference, context and perspective and integrating them into something that will be
useful to all.
Beware the architect who listens to your 30 second description of what the strategic information
system must do, then asks "when do you need it?" and disappears.
My son at 6 years asked me for help in building a submarine. I asked him a few questions and
learned that he intended to actually travel under water—not just a play submarine. He had
corresponded with his teacher's son who was an officer aboard an actual nuclear submarine. He
had diagrams and photos. I asked him what materials he proposed to use. He took me to the
garage and showed me some 2x2 and 2x4 lumber and some chicken wire left over from building
a rabbit cage. I asked how he proposed to keep the water out and learned that this was to be my
contribution.
In my career, I have seen the equivalent of this scenario reenacted many times. It's absolutely
amazing how much effort is expended on these projects. Yes, it is hard to tell your 6 year old son
(or your boss) that the project can't be done and it is possible that your credibility will be
diminished. I wonder whether the cost of information technology couldn't be cut by 90% if we
simply learned some better ways of saying "no."
Oh, and, by the way, was it practical, pragmatic, and productive to gather diagrams, photos and
written material, identify materials and find an "expert" to assist with the hard parts? Was I
practical, pragmatic and productive in my approach? What was the result?
Sometimes the impossible must be done but far more often it's better to simply move on to
something more in line with our actual capabilities. And, in the cases where we do have to
accomplish something "impossible" the very first thing we have to do is discard practical,
pragmatic and even productive because those are what brought us to impossible in the first place.

Assistance vs. Solution


How can we believe that someone can give us a "solution" when we haven't even been able to
frame the question? Are we that simple minded? Have we been brought to this—that we are
puppets, manipulated by marketeers?
We all have need of assistance at one time or another. This is a good thing to recognize. I've run
into something I don't understand. I'd better find someone who does understand so that I can
move on.
If we feel ill, we might look for some relief from the cold remedies or pain relief aisle at the
grocery store. We might even go to a drug store and ask for help from the pharmacist on duty.
We may do some research on line or at the library in search of relief. Do we expect to be cured
like this? Most would probably admit that they aren't seeking a solution—just some relief from a
particularly unpleasant symptom while they wait for "natural" healing to happen.
Some, on the other hand, will look for a "solution" by visiting a medical doctor (or maybe a
chiropractor). They will get relief from the most unpleasant symptoms with the advice to "do this
for a week." The result is the same except for the cost. Of course there are times when consulting
an expert is indicated without question. When I have severe, unexplained pain or profuse
bleeding or when a bone suddenly develops a new joint, then it is time to involve outside
expertise.
When we ask for assistance and then insist on telling the expert what to do, we can sometimes
get what we want (say antibiotics), but doing so will not provide a solution and will, in fact, be
detrimental to ourselves and to others.
How does this relate to the business world? Pretty directly as it turns out. When we experience
symptoms, we could look for the source and begin lifestyle changes that will render the source
harmless in the future. We can even apply topical analgesics in the form of temporary hires, a
revenue bond issue or something of that nature to relieve the symptoms while we wait for the
cure to develop. Or we can buy some technological antibiotic at considerable expense and
increase the general frustration/pain level as we try to graft the "solution" into our corporate
body.
When dealing with issues of personal health, does it make sense to ask the pharmaceutical rep
what to do? He or she will undoubtedly have the "solution" in their inventory (or pocket). No
matter how you ask the question, they will have a product that will provide relief. This is exactly
the approach we take when it comes to technology.
What we should do, as a business or an individual, is engage a personal trainer to show us the
lifestyle changes and the personal discipline that will be needed to break out of the cycle of pain.
The trainer will even recommend some aids when appropriate.
Please, please stop going to technology vendors asking for solutions. Please, please stop and
think when "your account manager" calls to tell you about the newest solution available through
him.
No two people and no two businesses are the same. A real solution must fit the individual
physiology and lifestyle. It must come from within with the aid of expert assistance.

A Hammer in Search of a Nail


Are we all familiar with this metaphor? Abraham Maslow ("Hierarchy of Human Needs") is said
to have originated this, but it hardly matters. The concept rings so true that it has probably been
known since the first tool.
I have lately participated in some discussions on LinkedIn with the result that I now believe that
the vast majority of workers in information technology are in possession of a tool that they are
seeking to apply to every problem that they encounter. The really dangerous ones are creating
problems to use their tool on. Oh, wait, there's a name for that—it's called marketing.
Notice, too that I said "in possession of" a tool. It is apparently no longer necessary (if it ever
was necessary) to actually be skilled in the application of your favorite tool.
I once worked with a guy—a programmer in this case, but I'm not looking to single out
programmers—who said something rather like, "We don't support the business. We are the
business." The business was rail transportation and it is true that if all of the applications used by
this particular company were to suddenly disappear, those "left behind" would no doubt have had
to cease operations until they could be reorganized.
That's not the bad part though. The bad part is that this is a really good example of just how far
the notion of "where's the next nail?" can take us. When I believe that the world as I know it is
held together by this tool that I hold in my hand, I am in the midst of a dissociative process. I
don't go home and act this out. When I'm not at work for example, I'm just the guy next door.
When I do get to work, though, I'm still the guy next door—it's just a different door. We—most,
if not all—live in two separate realities. About the only thing that keeps us from being diagnosed
with a dissociative identity disorder (DID) is that we (usually) remember what happened in the
other world.
I could go on at length but the bottom line is that, not only is our work life a separate reality from
our real life, but there is a completely different reality in the executive suite than there is on the
floor, and (for me) most importantly, a separate one for I.T. Leave aside for a moment, the
variety of realities we might encounter as we go from networks to servers to DBA, to data
architecture, to development to QA—it is absolutely amazing that we get anything at all
accomplished.
The poor data architect finds himself stepping into and out of a dozen distinct realities every day.
It is certainly a defense mechanism to take refuge in a favorite tool—the "data model." This is
the talisman used to shield against the swarm of alternative realities. Unfortunately, the tool was
designed for a different purpose, to capture and integrate all the different realities. Nails come in
many forms, too.

Fear, Accountability and Approval


Fear is an interesting thing. Fear is an emotion that is at the root of many other emotions. If
negative emotions comprise a spectrum, then fear is like the sunlight, which, passing through our
situational prism, produces the stress, anxiety, mistrust that we actually feel. It takes a lot of self-
examination and hard work to be able to let the lesser emotions go and find the fear and its
source.
Of all the ways that fear manifests, perhaps the most destructive for a business is that of
controlling behavior. The need for control is based on feelings of inadequacy. Many people feel
inadequate and still manage to function well in a cooperative environment. Sometimes, though, a
person finds himself in a position that he never dreamed of being in and inadequacy, fueled by
the fear of losing it all (by proving that he really is inadequate) creates a desperate need for
control.
This person will find a way to insert himself into as many important committees as possible and
will create new committees if there seems to be a gap in the information flow. This person can't
tolerate subordinates who are successful because they become a threat. They have a dislike for
group contexts and prefer to use one-on-one meetings to better control the message.
The single worst thing effect of controlling behavior is that the controller manipulates everything
so that he has the key decision. This produces several negative impacts including
 an entire organization is slowed to the pace of one individual
 decisions are arrived at through discussion with peers rather than with knowledgeable
subordinates
 information needed by subordinates may be concealed in order to preserve the decision
authority
 frantic scrambles to meet deadlines arrived at without benefit of process
 no closure on "projects" because of information hiding and diminished credibility
 frustration among subordinates—although a really talented controller will be able to keep
this frustration focused among and between subordinates
 after-meeting meetings among subordinates for the purpose of validating perceptions
 much talk about accountability without any accountability
 lack of standards because control diminishes in an objective environment
Do you have someone like this in your organization? How do you deal with it?
One approach might be to create standards around process and measure compliance. With good,
useful and actively used standards accountability can be made real. Without them, the best we
can do is approval. Accountability is objective. Approval is subjective. Accountability creates no
fear. Approval is all about fear.
Your best people have a set of internal standards to which they hold themselves accountable and
they won't stay long in an approval environment once it becomes clear that they will have to
compromise their standards. The ones who do stay...

Begin at the Beginning (but with the end in mind)


Business intelligence, data warehouses, data stores, stewardship, governance... Where do we
begin?
Let's begin by assuming that we want to have something we can recognize as "business
intelligence" when we're done. The most important characteristics of business intelligence will
be:
 meaning (semantic) understood
 applicability (use/utility) understood
 currency (timeliness) understood
 source/lineage/pedigree understood
There may be other characteristics that are of specific interest to a business customer, but these
will put us squarely in the middle of the ballpark.
Before we go any farther, this is going to take more than one post so let's think in terms of a
series of posts addressed to specific roles that, together, create the governance road map.
If business intelligence is the goal, then we must begin with the business and we'll start right at
the top with the CEO. Of all those involved in the business, the CEO has the greatest power to
influence, for better or for worse, the attitudes and motivations that will eventually produce
business intelligence. Others in leadership roles will have similar effects to lesser degrees.
Mr., Mrs. or Ms. CEO, begin by absorbing the four characteristics above. Ask yourself what it
would take for you to feel good about the intelligence presented to you. Once you feel
comfortable about your ownership interest, you will need to become single-minded in your
pursuit of these characteristics.
Practice asking for proof, for information about the intelligence (which will be known as meta
data by some of your staff) and insisting on answers that give you comfort. You may view this
as self-protection, which is a very good viewpoint to have. In fact, if you are consistent in this,
you will find that Sarbanes-Oxley is a piece of cake for your business.
This is where it will start. The next post will address sources of the information that will
eventually emerge as business intelligence.

The Beginning (2)


Because we want our intelligence at our fingertips, easily accessible, we collect and store data in
computer-based filing systems. In return for the convenience of this and not having to store
mountains of virtually useless paper (the bigger the pile, the less useful), we have taken on the
responsibility of hiring and working with various kinds of technology-savvy people.
At the foundation is the programmer, also called a developer or a software engineer. These are
the people who actually create the scripts for the computer to execute. 20 years ago,
programmers were visible to the rest of the organization. Today they are typically segregated and
largely invisible except to the CIO.
These are the people who create the screen forms and buttons and functionality that your front
line employees actually use to capture and look up information. In larger organizations today, the
programmers do not create the filing system for data. Instead, that is designed and built by
someone else (the data architect) and the programmer merely connects to it. The power that the
programmer holds in terms of our four characteristics of good BI is the power to knowingly or
unknowingly subvert the quality of our data.
This is a good time to introduce the concept of the data resource. It is essential that today's
business view the data that is captured, modified, stored, retrieved and archived as a resource in
the same way that capital is a resource or buildings and property is a resource or employees are a
resource. The business must devote the same kind of attention to the data resource as it does to
the financial resource of the company. Neglect or failure to do so will render the data resource
valueless at best and a liability at worst. In between those two extremes, the business will
experience increased costs as your workers struggle to get the quality they need from the data
that feeds the processes they work within.
So the programmers who build screens and functionality that allow corruption into the data
resource are slowly destroying the business just like termites in the framing of your house. It is
vital, in order to end up with the BI we need, that programmers employ processes that are
controlled for quality purposes. Employing process standards for programming will not
guarantee that our four characteristics will be delivered, but NOT doing so will guarantee that we
will never be able to produce the BI that we need.
Before we wrap up this installment, it's good to recognize that there are two kinds of
programmers; those who work for you and those who work for someone else from whom you
purchase or license the programs (or software or applications or systems or...). No business today
is unaffected by programming.
When you are getting control of your processes and instituting quality assurance, you will want
to ask the same questions of your software or application vendors. Today, they are often
unwilling to talk about this and will use deflectors like "proprietary" to avoid the questions. You
need to ask yourself whether you can bet your data resource on "proprietary."
We'll get into this a little deeper next time when we discuss the role of the data architect.

The Beginning (3)


Before we explore the data architect role in terms of our overall goal, I should say that I do not
intend to go into depth for any of the roles. I mentioned the need for standard processes but didn't
specify any of the necessary processes. For one thing, that level of detail is well beyond the
scope of this (or any) blog. For another, the processes and standards must be compatible with the
organizational culture. I wouldn't go so far as to say that any path will do, but I will say that, if
you live in Minneapolis, there is more than one way to get to Miami. The thing to remember is
that there are more constraints as you depart and again as you arrive. We ask, "What's the best
way out of town if I'm headed south?" And then we ask, "What's the best route to city hall if I'm
coming from the north?" But in between we can take the scenic route, the shortest route, the
fastest route, the Civil War route or any other route that seems good at the time as long as we
keep our eventual goal in mind.
From the standpoint of business intelligence and our four characteristics, we would want to pay
special attention to what the programmers are doing or not doing with respect to definitions
(semantics). The data architect will have spent considerable effort in researching and compiling
information about the data. They will have learned about how various kinds of data relate to each
other for different business functions and users and they will have defined quality rules for each
kind of data.
The process standards, to be monitored by Quality Assurance and warrantied by Quality Control,
will ensure that the programmers have those definitions and rules in a format that they can use
and that they do, in fact, use them.
If your programmers work for someone else, the processes and standards will be about
acquisition. They will ensure that the data definitions and relationship rules embodied in the
application or system are compatible with those of your business. You are going to have to lean
hard on your vendors and they will squirm and plead "proprietary." The best advice that I can
offer is to walk away from this vendor. Another vendor will be happy you asked because it will
allow them to really get close to you and they will be proud of their quality processes. The ones
who drag their feet do so because they aren't able to produce the assurance you need. Proprietary
is a euphemism for we don't know.
Become interested in these things. Ask questions. Expect answers that you can understand. Don't
accept arm-waving and diversionary tactics. You will be well on the way to business intelligence
from a high quality, reliable data resource.
Next time: the "business" role

The Beginning (4)


In my experience, those who can do the least about data quality feel most responsible, while
those who have quality literally at their fingertips feel no responsibility whatsoever. This creates
many difficulties for one who wants business intelligence.
I'm going to collect receptionists, customer support, sales, anybody who enters data into any of
your computer information systems under the heading of "the business." We could also include
those who collect or transcribe data to/from paper if it eventually winds up in a computer-
accessible data file.
Maybe we have done too good a job in convincing the people seated at computer keyboards that
they have nothing to fear. In any event, many simply do not pay adequate attention to what they
are doing. There are many reasons for this including:
 heavy work volumes causing a pace that is too fast for error recognition or for going back
to correct errors
 inadequate training
 inadequate guidance built into the user interface
 inattention/distraction
On the other hand, I have encountered way too many instances in which people were actually
aware that they were producing garbage but didn't care enough to do anything about it.
Sometimes it's sabotaging the people in the next department. Sometimes it's a statement to the
supervisor and sometimes it's "so what."
It is true that some data can be repaired after the fact, but the sad fact is that "cleansing" can
never be 100% effective and in some applications cleansing isn't even possible—if it isn't
captured correctly the first time, there's no going back. Repairing bad data is very expensive for
your business. Experts estimate that, for anyone who receives data as part of their work process,
from 30-60% of their work day is spent in rechecking or validating or repairing the data so that
they can do their job.
The data architect will have worked with the folks to understand their data needs so if we can
prove that they have participated in that process and that the architects and developers have
complied with their defined processes, then what we are left with is attitude or training as
problem sources. These are issues for management to resolve.
One final area for the business to think about: there is a need to include problem recognition in
training and provide safe reporting paths for those who do take note and take action. Those at the
front line typically have little awareness of the business that they front-end for. They don't know
that someone cares or that someone can actually do something to fix a problem that they struggle
with on a daily basis.
You will want to include a module on the data resource in the New Employee Orientation. You'll
want to take remediation to department meetings to catch all who missed the new employee
offering. It must be simple to report a problem and there can be no "grilling" of callers by the
support line triage staff. Take the call and send someone to see the problem in person while the
reporting employee works. Use a remote desktop capability to watch the problem happen. There
are many options, but if the caller is made somehow to feel guilty or foolish or ignorant, they
will never call again.
Next time we'll take on the vendors.

The Beginning (5)


In order to get where we want to be (business intelligence) and have the characteristics we are
looking for, we will have to make an alliance with our systems vendors.
Our vendors are grown accustomed to thinking of themselves as "partners" and now it's time to
call their bluff. We have learned that the long term success of our business depends on the
quality of the information that it possesses. Without reliable information, the most efficient and
productive process will fall immediately into non-productive and costly tachycardia. Tachycardia
is the disorder in which the heart's productive rhythm is disrupted and it's consistent, productive
beat turns into an erratic, non-productive, and potentially life-threatening arrhythmia.
How can our partners help us deal with this? They must be responsive to questions about data
models, semantics and relationships. They must show evidence that their internal consistency is
at least as good as ours. You may want to audit their process consistency. Companies that have
achieved the Malcolm Baldrige Quality Award have suppliers who allowed themselves to
become this kind of partner.
You will have a robust Quality Assurance capability and you will want them to be linked with a
similar capability in the suppliers' organizations. Quality Assurance has been alluded to in
previous posts and the QA involvement is data quality should not be minimized.
Quality Processes Produce Quality Data
Processes that are uncontrolled or out of control are incapable of producing reliable data—
whether they are in your organization or in a supplier's organization.
Next time we'll address Management specifically.

Why Management Isn't Enough


If you've been reading this series from the beginning, you may have noticed that, as of now,
there has been no mention of that traditional management responsibility, resource allocation. To
review, we started by listing four critical characteristics of business intelligence
 meaning (semantic) understood
 applicability (use/utility) understood
 currency (timeliness) understood
 source/lineage/pedigree understood
Clearly, none of these is about allocation of resources. Just as clearly, resources will have to be
allocated to create and maintain those characteristics. The point is that management skills do not
come into play until the organization has decided that business intelligence is a capability it must
have.
Prior to this, we have also discussed three distinct functions that are typically thrown together in
the basket labelled "management." They are
 Leadership: moving an organization in the best direction by motivating change
 Management: making the organization effective by managing costs and productivity
 Governance: keeping the organization's productivity high by ensuring consistency and
stability
Creating, developing and maintaining the four essential characteristics of business intelligence is
going to demand all three of these functions. Leadership will be required to define and effect
necessary changes. People accustomed to doing their job as they see fit, may need help in
adjusting from that paradigm to one of "best for the organization." In many cases, managers have
interpreted "organization" to mean the group for which I am responsible. They will need some
leadership to make the adjustment to thinking of the good of the organization as a whole. The
scope of organization will vary depending on the process under discussion.
By now, you may be entertaining doubts. The doubts may center on the organization's ability to
carry this off. They may concern whether the right people, experience, skill sets are available.
They may also be about whether this business intelligence thing is worth the effort. I can't
answer any of those doubts for you, but I can say, without equivocation, that getting to business
intelligence is going to demand change and that the change will demand strong and consistent
leadership.
The people within an organization are fully capable of governing themselves. Indeed, the United
States of America is founded on that premise. We are "a government of laws and not of men"
said John Adams. In the corporate world, laws are known as standards. Employees representing
various business functions and sub-functions will be the best ones to define the standards that
they will follow as long as leaders keep before them the corporate interest. There is a long
history of failed attempts to impose standards on people, including those by King George V that
resulted in the American Revolution. There is no need to repeat those failures.
Once the standards have been defined and agreed to by those affected, and once they have been
reviewed for costs and efficiency by the managers, then what remains is to establish the
equivalent of legislative subcommittees to monitor their consistent use and consult with the
parties when the standards must be improved or abandoned. The cost lies in the change that
produces the standards, not in the governance of the standards. In fact, many organizations
already have a function dedicated to the monitoring of standards application. Quality Assurance
has frequently been given a bad name because it is charged with creation of the standards as well
as their monitoring.
Do you want business intelligence? Do you need current, unambiguous, actionable information
with provenance, or can you be satisfied with really cool charts and graphs in three (or more)
dimensions and lots of colors? You will have to decide.

Programmers Need Leadership


Many programmers are also musicians. Many have fluency in multiple spoken languages. Many
have a gift for mathematics. It seems that these abilities are somehow related in the human brain.
This has been apparent for decades to those who guide students into appropriate careers.
A bit of reflection reveals that these are all somewhat solitary pursuits involving individual
dedication and a large degree of creativity. Of all of these career paths, software development
may be the most accessible and the most remunerative. Pair this with the tendency of many
socially gifted personalities to throw up their hands when confronted with technology or
mathematics beyond what can be done on an adding machine and the corresponding inability or
unwillingness to grasp qualitative differences in abstractions such as software, and you get a
recipe for significant problems.
People who are intimidated by the technology of a personal computer or laptop are glad to grant
credibility to the first person who can make the technology perform the desired tricks. The non-
IT parts of the business have come to terms with this and simply compartmentalize I.T. so that as
few as possible must have anything to do with those people.
Now add an additional dimension: the belief (common these days) that a good manager can
manage anything. This may be true, but having worked with and come to know many developers
(programmers), the credible manager is an absolute rarity in I.T. Just ask any developer (or
network admin, server admin, DBA...). You'll find that a manager who enjoys the credibility and
respect of the "troops" is the very definition of rara avis.
The bottom line is that software developers and the associated technology disciplines comprise
people who have had to figure things out for themselves and who know that their bosses don't
understand what they do. The combination of these two (an ability and an awareness) produces
people who have an approach to life that is similar to that of a cowboy or possibly a farmer. They
are independent and like it that way. They won't turn from a challenge, even if that challenge is
doing something that they believe is appropriate despite known management objections and even
obstacles. They simply know that when it works, everything will be forgiven.
And it is.
I am disheartened when I follow discussion boards on the Internet. A provocative question is
posed concerning methods and the discussion immediately goes to tools. Terms that have been
around for decades are redefined without any acknowledgement of the accepted definition.
Adding insult to injury, this is done even by data architects who, more than anyone, should know
better.
What I am seeing in these discussion boards is a playground full of gifted five-year-olds with
absolutely no supervision. They are capable of amazing feats, but at what cost? If you are the
CEO or the CIO of a company, do you have ANY idea what it costs—not in salary, but in
uncontrolled complexity and corresponding maintenance costs—to allow this?
You can't blame the five-year-olds. They aren't the problem. They are doing exactly what they
were put there to do. If you can't find managers who are capable of establishing some level of
respect and control, then you must at least find leaders among the children and give them a
mentor.
Even mathematicians respond to this. Remember the Manhattan Project? This project produced
the atomic bomb that ended WWII. Dr. Robert Oppenheimer was the mathematician/physicist
leader who guided the work and Gen. Leslie Groves of the Army Corps of Engineers was the
administrative mentor. One mathematician, no matter how capable and creative, could not have
developed the weapon in the time available. But 5000, without leadership could not have done it
either.
Data is available through the Software Engineering Institute at Carnegie-Mellon University on
the cost benefit of a managed development team using defined processes. Once the processes
have been defined (which takes leadership), the manager has only to believe in and rely on the
processes in order to achieve predictable, low-cost results of known quality.
Whatever path you choose, it is you, the executive leader,

Advice Re: Healthcare IT


If you are a healthcare CIO or the top IT director/manager and happen to be reading this, or if
someone has forwarded this post, and if you feel under some pressure to get more accomplished
(or even to get something accomplished), I have some gentle advice. You're going to get phone
calls from angry physicians no matter what you do, so you may as well do the right thing.
It is a fact that, in the history of computing (I.T.), no application or system has ever been
delivered to universal acclaim (despite what the marketing people tell you). Medicine has more
than its share of curmudgeons and you're going to have to let them become part of the
background noise. If you're afraid of losing your job because of the indignation of a few people
then you're probably on the wrong career path.
I hope you knew that already, but it can be reassuring to hear it from an "outsider".
Now, if you're still reading, you may be wondering about what the right thing is that you should
do. The first thing is to get your priorities squared away. Then you will have to work very hard to
get your priorities into the cognitive space of the rest of your organization's upper management.
What are your priorities? What are your 1-, 3-, 5-year or long term objectives? Are they written
down and visible to others? If your answer is no, then your job just became much more difficult.
By the way, if you are a CEO, and the only I.T. objectives you know about are the ones you
handed out, you may want to re-think your approach to I.T.
What does an objective/priority look like for I.T.?
The big rocks (the ones you put into the container first) are capabilities and alignment. In my
career, the most pervasive problem I have seen is that healthcare CIOs invariably focus their
attentions outward from their I.T. organization. In nearly 15 years, I have yet to see a single
initiative directed at developing or improving capabilities come from the top of the I.T.
organization. When such things have happened, they seem always to arise from the introduction
of a new application or a new technology. Just once, maybe we could spend some time to
actually imagine the kind of I.T. organization we think will be needed five or more years from
now. Capabilities, in the healthcare I.T. world come about by accident. The current best case is
when there are a few people within the organization doing this kind of thinking on their own.
Of course, when there are uncoordinated pockets of effort, you don't get the other big rock,
alignment. By the way, if one of your objectives is something like "reduce or contain costs", you
may want to rethink. Cost management is like breathing if you're a manager. To call it a priority
or a goal is to divert attention away from the big rocks. Similarly, any objective that includes the
word "continue" is a waste of paper. Holding the gains must also be at the level of breathing for
managers. You will want to keep measuring to make sure there is no backsliding and take action
if there is, but holding onto gains has to be part of the original planning process and should be
automatic.
Alignment begins with a picture of your organization now and five years from now. Every
manager and supervisor/lead within I.T. must share that picture and understand their own
specific role(s) in it. This is something best accomplished as a team. You can't mandate
alignment. You will want to decide how you will recognize and/or measure the alignment within
I.T. How will you know when you have it? Everyone should understand and be on board. This is
the first step.
It is up to the CIO to bridge between the business strategy and the I.T. strategy to make certain
that as alignment develops within I.T. it is congruent with (aligned with) the business' strategic
plan.
You are fortunate in that your direct reports want all of this as much or more than you do. All
they need in the way of motivation is the belief that you are committed to seeing it through. If
you don't feel confident that you can produce the I.T. vision, don't attempt it on your own. This is
the place for leadership that is confident and sure-footed. Start with a clear and concise vision—
it has to be your vision or you won't be able to commit to it, but you can get help to create it.
Healthcare I.T. cannot get where the nation wants it to go as long as are dissociative in our
thinking. Things aren't the way we want them to be simply because we say so in a presentation to
the CEO or the Board. You know that buying and installing a new system is only the first step of
implementation. What we're talking about here is very similar. Forming a committee or naming a
Director is only the first small step. If you want to know if you have actually achieved your goals
of capability and alignment, I would recommend strongly that you bring in an outside person or
group to conduct an audit.
Good luck! You are the leading edge.

Healthcare: More on I.T. Capabilities


I realize the need to be a bit more specific with respect to capabilities. I have seen a virtually
unbroken string of capabilities discussed that related to the practice of medicine or the business
of healthcare delivery, but in almost fifteen years of experience in healthcare I.T., I have yet to
see a capability goal that is about I.T.
An example may help. Suppose you are the city planner. You could take one of two approaches.
1. You could develop a comprehensive plan for the city that takes into account various
expansion directions and encourages some while discouraging others. The plan would
include both residential and business expansion.
2. You could attempt to satisfy each request that comes before you. You could allow
various developers and politicians to set the agenda for the next six months and
continuously play one against the other for priority.
In the second approach, a deft manipulator can get by for quite a while. The city, on the other
hand, will always be in turmoil and unable to staff its various departments effectively. Budget
dollars will go to the current top project(s) at the time of the budget approval process. But,
because project priority will be ever changing with the political situation, those projects may
start and then go into hibernation. Workers with specific skill sets will find themselves with little
to do while the new high priority project has tough going because there is no money to bring in
the right skills. Equipment winds up sitting in a warehouse or worked far beyond recommended
maintenance cycles.
In the first approach, remarkably modest effort goes into an expansion model. Productive and
non-productive alternatives emerge. The productive ones are fleshed out and the unproductive
are set aside for contingencies. potential streets, sewers (storm and sanitary), gas, electric, phone,
Internet, and other necessary utilities are blocked in. Alternatives and contingencies are
identified and capacities are established. With all of this in place, it is now relatively simple to
budget for the next year's projects and all will be well as long as we don't cave into to political
expediency and begin to allow projects outside the scope of the plan.
There is no problem as long as all negotiation goes on in broad daylight. If a particular out-of-
scope project has popular appeal, it will be a simple matter to say, "OK, here's what it will mean
for our other projects, for our long term plan and for the budget. If it's still popular, we do it.
Back to I.T. capabilities; server capacity, mass storage capacity, network bandwidth and access
points, space, cooling—these are all obvious capacity planning considerations and for that reason
they are at least on the radar for everyone. More subtle be no less important considerations
include the architectures (technology, network, data, and communications) that we want to have,
skills required as we work within the architectures and, not least, the standards architecture that
will be the governance glue that holds everything else together.
Depending on your own experience as a technology worker (rather than as a manager), you may
not have a "feel" for the people, skills, and experience that will be required, nor even for the
implications of various architectures. Again, this is no place to guess. Get your direct reports
together. If they don't seem sure or can't tell a story that is meaningful to you, then get some
outside advice. The skills and attitudes required for a successful SOA (service-oriented
architecture) are much different from those needed in a more "traditional" approach.
If you intend to rely on vendors for everything, you will still need to be able to tell which
vendors fit into your architecture and which don't. They will tell you that they are "architecture-
agnostic" or that they can fit into any architecture—don't believe it. Know about the
characteristics of things that will fit and things that won't. Attempting to force-fit something that
doesn't belong is the quickest way to throw everything you've built into turmoil.
If you're uncomfortable, I apologize for being so open. The answer to getting comfortable isn't
avoidance, though. It lies in making a concerted effort to bootstrap yourself. No budget, no
resource plan, no allocation discussion—management of one thing is not the same as
management of anything. Technology is too big, too complex, too fast-moving to be brought into
line by the classic "cost management and allocation of resources" approach. It needs a team
approach because there's just too much for one person to know.
Do what you're able to do and know what your boundaries are. Be honest with yourself first of
all and with your peers, reports and management. You may wonder if you are the only one at
times, but it is not possible to manage technology without honesty. Unlike people, technology
can't be coerced or manipulated.
Standards Dread
Healthcare in general and from top to bottom seems to have an absolute dread of standards.
Physicians (many if not most) will flop about like a fish on a hook whenever the word comes up.
They fight no matter how the subject is introduced. Whether it be "guidelines", "pathways" or
even "best practice", it doesn't matter. As an interested consumer, I find this disturbing on several
accounts.
1. It demonstrates such a me-first, tunnel vision mentality that, if I were given a clear
choice, I would run as fast as I could away from this and toward anyone who could
demonstrate that they do, in fact, adhere to standards.
2. It has a huge—and negative—effect on everyone else in the healthcare organization. The
physician role is so central in healthcare that, if there is no reliable standard (process) for
the physician, then nursing, registration, coding, billing, systems developers..., no one can
predict from one day to the next or from one patient to the next what they're going to be
asked to do.
3. It holds the practice of medicine back to a level not much different than was seen in the
18th century. Oh, sure, we have better drugs now and diagnostic magic is performed
many times a day using the latest technology, but the outcome for me, the patient, is so
dependent on the physician I see that "bedside manner" often seems to be the most
critical factor in outcomes.
In my 60+ years of life on this planet, I have seen that humanity can be put into two groups—
those who appreciate standards and those who do not. Further, it has always seemed that the
second group is the cause of problems not only for themselves, but for everyone. Let's take a
kind of standard that we're all familiar with—traffic laws. Those who flout the traffic laws are a
hazard to everyone else on or near the road. And note that consistency and predictability are key.
Traffic laws work because they produce consistency and the ability to predict with some
assurance what the other guy is going to do. Those two principles keep everything flowing
smoothly and with minimal (and manageable) disturbance.
We do not realize fully, the value of the standards we employ in this country. John Adams is the
one who noted that we are governed by laws not men. Bribery is a recognized way to get things
done in many parts of the world. Imagine having to find the right person (how do you do that?),
paying to get their attention, then finding out that they weren't the right person after all and
having to start again. We sometimes go through that with building and remodeling contractors
and it makes the evening news. Healthcare isn't much different EXCEPT that we don't get as
heated about it because someone else is paying.
I constantly wonder at the inability of the insurance companies to get provider organizations to
create and use standards.
Today, we are being told that technology is the key to the healthcare crisis (which is a crisis of
out-of-control costs). I am going on record here that technology will only drive costs up unless
the healthcare "system", beginning with physicians, learns to cherish standards, utilize standards,
trade on standards.
We, the patients, must demand a system in which we can rely on standards to produce outcome
and efficacy data allowing comparison of physicians and organizations. It's a sad system in
which the only statistics available for comparison are mortality numbers.

Standards Clarification
A bit of a postscript to the last post:
I can almost hear the snorts of disgust. Many in healthcare will be quick to dismiss the last post
by telling themselves that "we have standards." I can't allow them to let themselves off so easily.
Of course healthcare employs standards. I was, for a brief time, part of a newly formed HIMSS
task force on standards. Healthcare has a wealth of standards, none of which are truly standards
in that all use words such as should and unless and if possible.
Healthcare has not seen fit to develop a framework for standards and no ontology by which to
bring sense and meaning (and thereby value) to the hundreds of standards vying for attention. In
truth, anyone in healthcare can say without fear of contradiction that "we have standards" and
none of those assertions mean anything.
If n people or organizations are doing the same work using n (or even n-1) standards, it cannot be
said that the work is being done in accordance with standards. This is said routinely by each of
the workers but to those who view the work from an objectifying distance, it is quite clear that n
standards is no better than no standard.
Measurement of process in healthcare has a long way to go before SPC principles can be applied.
How, by what process, can healthcare practitioners be brought to believe in the power of process
standards through which measurement standards can be developed? Whose interests are satisfied
by the status quo?

Arm Waving and Obfuscation


"Pay no attention to the curtain!" "Do not look behind the curtain!"
There are two areas in which it seems necessary to ignore the Wizard. One is in the nature of
"healthcare" itself. Most spell checkers refuse to recognize healthcare as a valid word, suggesting
"health care" instead.
Clearly, spell checkers have not been sensitized to the politics involved. Since the dawn of the
HMO nearly 30 years ago, health care has gradually given way to healthcare. What does that
mean for you and me? I don't know if you've noticed, but even the physicians have been shifting
attention away from their own role toward the insurance end.
More than 15 years ago, I attended a software engineering conference in Chicago at which
[former] Surgeon General C. Everett Koop was a keynote speaker. I became disturbed during his
speech when he made comments seeming to grant insurance companies a gate-keeping role in
health care. When I got back home, I wrote him a letter mentioning my impressions and asking
whether that was his intent. Some weeks later I received an envelope with Dr. Koop's return
address and which contained my letter. My "gatekeeper" question was circled (red pencil) and
"NO!" was written (large) in the margin.
Today, physician leaders don't even blush when they tell us that the solution to our healthcare
problem is better access to [insurance] coverage.
Health care can be linked to medicine. Healthcare is a business, pure and simple. When people
are talking about money and where it will come from, they aren't talking about health care.
I do not question the motives of individual physicians in health care, but I question everyone's
motives in healthcare unless they are talking about profits. Even the insurance companies are
being edged out for control. Today, healthcare is controlled by third-party (neither doctor nor
patient) vendors of everything from pharmaceuticals, to technology—especially technology. This
isn't necessarily a bad situation—it all depends on what the goals are. Pick your favorite goal
from this list (or suggest another):
 Profit
 return on investment
 market dominance
 a healthy populace
 access to medical care
 control
 benevolence/compassion
 giving back
 ???
If you are a person who has a need for (or interest in) health care, you really do need to peek
behind the curtain of healthcare to avoid wasting your time, money and energies.

The Health Care (or Healthcare) Vision


I have a vision that can only be fulfilled from the office of the CIO. This vision is characterized
by
 The patient of one is the patient of all
 The history will be available wherever and whenever the patient presents
 The patient may know immediately about the services delivered and the charges for those
services.
 The revenue cycle can be reduced to same-day for covered (under contract) services
 Costs can be reduced by as much as 50% through attention to the quality of the
information captured and used within the system
 People will accept a system of standards when they see that their own interests are
satisfied
 The people in health care are highly motivated and need only a shared vision to be fully
productive
 Alignment is possible when the vision is clear
 Good, reliable, current information is the key to all of this and getting that will demand
the careful cultivation of cultural and attitudinal changes.
I firmly believe that this vision can come to pass with blinding speed in an environment of
openness and honesty in which people are encouraged to care about and for each other.
The Aging Workforce and Your BI
First, in the interest of full disclosure, let me say that I have passed my 60 birthday, though am
not quite old enough to apply for social security benefits.
I saw a piece on the Today Show this morning in which one of those being interviewed made a
statement to the effect that agism or age-based discrimination, is to be found at both ends of
spectrum. That rang true for me. I guess the only question to be answered would be the placing
of some scale on the spectrum so the appropriate remediation planning could be done.
We have seen many articles and interviews regarding the characteristics of various generations
from Gen Y to Baby Boomers (pretty much the endpoints of the employment spectrum). I always
have to remind myself that these "portraits" are generalizations only. It's easy to latch on to a
single trait within a generational portrait and, having seen that trait exhibited by at least one
member of that generation, apply it to all within the age group. Then it's only another small step
to say, well, if that one is true, they're all probably true.
It's also good to remember that tendencies are statistics. An average (or a tendency) requires that
some be above and some below. It isn't necessary for any individual to be average for there to be
an average for a population.
This is all on my mind because I have been following some discussion forums recently and have
begun to form some opinions about abilities of generational groups based on the content of the
posts. I'm struggling against this because, when I stop to reflect, I know I'm making a statistical
mistake.
Here's the thing: If it's happening to me, it's happening to anyone (everyone). It's very dangerous
to attribute knowledge or maturity-related issues to age. Every young person is not immature or
ignorant and every older person is not mature or wise. So this is a reminder that each person
must be appreciated for who they are and the unique contributions they are able to make.
There's a corollary: You can't really know a person until you've met that person face to face and
have shared some of his/her life. Do you know me because you're reading this or because we've
exchanged emails or tweets? There are hard limits to what can be accomplished remotely,
without personal contact. Some younger people know this and many older people do.
If you happen to run a company, you have a bigger labor pool than you need right now. This
won't be the case for long, especially if you're in a scientific, engineering or high-tech market.
People pushed to the side now and left to rust away may not be able to step back in after several
years (or even several months) have passed. When you push people out of one end of the pipe
without making sure there are people entering at the other end, eventually you will find that your
pipe is empty. How will you adapt your business then? The knowledge, creativity and energy
that created your business will be needed to transform it. It won't be about age. It will be about
maturity, wisdom, creativity and energy.

Best Practice and Best


It occurs to me that many people may be confusing best practice with best. I thought I might
devote a few lines here to separating the two concepts in the hope that better (more productive)
use of effort might result.
Best Practice is the survey of how others are doing whatever it is that we would like to do,
picking out those that seem most successful, and documenting how they are doing it.
Best is an objective determination, based upon comparison, and using some standard of
comparison, of several alternatives within a context.
Humans being what they are, we always have to carefully define the context we are working in.
Humans being who they are, the definition is never quite good enough to resolve all doubts. I am
cursed with the gift (cursed with a gift—who's with me?) of immediately seeing similarities and
differences. Show me a rule and I'll find the exceptions. Show me several random situations and
I'll show you the similarities. It's a gift in that I find it very useful—a curse in that it drives other
people crazy.
I'll get right to the bottom line. A best practice effort is only going to produce results for you if
your investigation examines contexts as well as process, tools, governance... To put it another
way, do you have the same history, culture, experiences, skills and attitudes as the organization
you're comparing yourself with? If not, that doesn't mean there's no value for you in their
practice. It does mean that you're going to have to know more about them and about yourself
before you can put your own version of that practice into effect.
As far as best goes, it's time that we gave up on the idea of supremacy. Best in a particular
competition among equally matched competitors is still only good for the moment. That's why
we have continuous improvement. It's why there are frequent surprises at playoff time.
The best you can find is still just the best that you can appreciate. When we know better, we'll do
better.

People First—Technology Somewhere in There


Very briefly, if you don't have the human side in order, meaning attitudes, capabilities, interest,
no technology can be successful. Of course, it is possible to define "successful" so that your
implementation passes the standard, but if that's how you work, you're probably not reading this
anyway.
It certainly does help to have an idea of what kinds of technology benefits you are looking for
and the types of technology that might help to deliver those benefits BEFORE you begin to
design the change management program. Countless technology implementations costing untold
millions (or billions) of hard-won capital dollars have been scrapped or have taken so long that
the purchased technology was obsolete before the implementation was completed. The reason
(and the solution) is in these two paragraphs.

Keeping Your Own Counsel


"The man who says what he thinks is finished, and the man who thinks what he says is an idiot."
[Rolf Hochhuth]
What do you think of this sentiment? Would it help to know more about Herr Hochhuth?
I believe that a significant portion of humanity operates as though these two statements were
true. I'm sure there are many reasons for this, but the root of all the reasons is fear. Fear is such a
powerful emotion in us that many can't even acknowledge it—even to themselves. The result is
that it transforms and appears (I like the word manifests) as another emotion altogether.
Earlier I posted regarding controlling personalities and said that insecurity and low self-worth,
two manifestations of fear, were at the root of the need for control. When I saw this quote on
MSN today, I immediately went back to the control issue. These two statements nullify
everything we teach about collaboration and teamwork. If they ring true to you, may I suggest
that you ask yourself why.
This isn't a casual investigation. You're not going to be truthful with yourself at first. You'll have
to do the kaizen "seven whys" and do the seven whys repeatedly because, if you hope to help
yourself and are attempting to be honest with yourself, you will get different answers each time
you get to the seventh why. One day you will name a specific thing that is the cause of your fear
and you will feel good because knowing is better than not knowing. But this is just the first
plateau.
Now you have to ask why this particular thing is producing fear in you.
Each time you reach a new plateau, you will look around and see your world differently.
Do you recognize in yourself the need for control? Do you share your vision, plans, objectives
freely and listen to the responses? Do you say what you believe and believe what you say?
How can your company, organization, team function productively if you answered no to any of
these questions?

Everything From The Center


Does your life have a center? What is it? You don't have to visit with someone for very long
before you can begin to see the answers to these two questions. The challenge then is to avoid
associating the person with your perception.
A key principle of relationship and communication is that a person can —and will—change. The
change may be profound as in moving the center of their life. In relationship, it is best to assume
the best possible motivation for the other person and, at the same time, give them room to make
mistakes. It is just possible that, even if we share a common center, are motivated by the same
high purpose, and are both "good people", that we might not take exactly the same path to the
goal.
In the best of all worlds, we would be attentive to each other and aware when we seem to be
diverging. When we see this happening, we would talk about the divergence and the reasons for
it. At this point, there have been no mistakes, no recovery is needed. All we need to do is realign
ourselves with our shared goal. The path chosen might be mine or yours or some new one
created out of the improved understanding derived from our conversation.
These principles are as applicable to groups as they are to individuals. Many a "leader" has
brought disaster to himself and his followers by becoming focused on the problem in front of his
nose and losing contact with the overall objective. This happens when the leader's center either
moves inside of himself (glory, revenge, hate...) or was there to begin with (advancement,
wealth, recognition...).
Don't lead your followers into a trap. Keep (or move) your center outside of yourself.
The Problem With Telepathy
The problem with telepathy is that we rely on it but it isn't real, or it's sort of real—well, you
know what I mean.
I have noticed that over the past decade or two we (humans in the United States) have virtually
abandoned communication as an active effort. The expectation today seems to be that whatever I
received must be what was transmitted. We commonly leave an interaction in one of two states:
1. we have more questions than we had prior to the interaction and we take the questions to
others (who typically were not present and never even had the benefit of the
transmission)
2. because we didn't understand what we received, we label the transmitter as a poor
communicator (or an idiot) and assume that we don't really need to know what they were
trying to say
It's the rare individual who actually takes an active role during the interaction to ask for
clarification or context.
When someone says "like, you know" and we smile and nod, we are either relying on telepathy
or intuition or body language or prior knowledge or we simply don't care and we just want to get
away.
Obviously this isn't really a recent development—not even if you consider two decades
"recent"—but it does seem to me that the problem is worsening. I sit in meetings and watch
people. They get wrinkled brows briefly and then they disengage. I know that they should be
engaged as stakeholders, but they aren't. What's the problem?
One thing that we could all work on is finding ways to ask questions that—but wait, that would
be active communication.
One of Stephen R. Covey's 7 Habits is "Seek first to understand, then to be understood." [italics
are mine] The other six are personal and could be honed by a hermit. This one actually assumes
relationship. Relationship is a real thing, unlike telepathy. I'm going to be a lightning rod here
but anyone whose idea of "working on a relationship" is based on the central idea, "You don't
understand me." is going to be disappointed repeatedly. This applies to every kind of human to
human relationship and is the essence of Covey's effectiveness habit.
Before I lose myself in this, I'd better stop. My advice to anyone is
 learn what active communication is (if you aren't participating, you aren't
communicating)
 listening (actively) is probably 90% of communication
 ask questions when they arise and ask them of the right person
 be present in the communication
We're in this together and we are going to sink or swim together. Let's start acting like we know
this.
Governance in Context
Every time I talk with a group of people I come away more convinced that the lack of
governance is the underlying reason for the "failure" of so many technology initiatives.
I have addressed two DAMA chapters this week and those in attendance included a CIO,
managers, data and business architects, data modelers, warehouse analysts... I would guess the
average age of the groups to be about 40 and they came to their current job roles for the most
part via a technology path. They shared stories of corporate in-fighting, closed-mindedness and
self-centeredness that have produced some incredibly poor decisions (or non-decisions). The
general mood seemed to be resignation if not acceptance. They asked repeatedly what could have
been done differently.
Data governance is on many minds today because of some horror stories involving costly
mistakes that were avoidable (see Confessions of a Data Governance Sponsor). Anyone can find
success if they find the right expert partner and if you are really committed to governance, that's
what you'll do. The devil is, as always, in the details.
First of all, how is commitment generated? How does the commitment become focused on [data]
governance? Finally, how can we envision and create something that is independent of individual
champions (not built on a cult of personality)?
Imagine that we live in a nation without governance—the strongest or the most charismatic
become "warlords", accumulating bands of adherents who follow orders and share in the spoils.
Now imagine that we are somehow able, through commitment to a vision, sacrifice and patience,
to create a system of governance in one city. Present day Afghanistan springs to mind as a real-
life example. What will happen to the governance in the city if it can't be extended into the rest
of the country?
Now imagine an example of a country with good governance in which a single city has resisted
or expelled governance. Hollywood has produced many examples of this story.
Which scenario has the best chance of producing uniformly good governance?
One of the companies represented at one my presentations is a very modern one in which all
employees are "team members" and much effort has clearly been spent to create a uniform
image. The team members are proud of the identity that they share. At the same time, this
companies refers to its business units as "pyramids." What message does this convey? I can think
of few metaphors that indicate monolithic autonomy better than pyramid (unless perhaps "silo").
If I work in a terrain of pyramids and want to institute governance, I really have only one
choice—to create governance within a single pyramid. This is analogous to creating governance
in a single city of a lawless land.
Only the person responsible for all the pyramids could turn a commitment to governance into a
common system of governance in all the pyramids. Of course, if we had an organization that had
a presence in all of the pyramids, we could delegate the task of creating governance to them.
They might even have a chance to succeed if everyone understood that their efforts had the
blessing of the supreme leader.
I believe that data governance is somewhat analogous to streets governance or sewers
governance. It is absolutely necessary for the community but doomed to fail unless the vision
and commitment become widespread. Neighborhood Watch can go a long way toward
eliminating unpleasant surprises within a community and one successful neighborhood watch
will stimulate surrounding communities to emulate this "best practice." There is a real limit,
though, on what a neighborhood watch can accomplish and many, independent and
uncoordinated such efforts will provide many gaps through which unpleasantness will find its
way.
A company that is unable or unwilling to do process management does not have a sufficient level
of governance to support a data governance initiative. If standard processes are anathema, forget
about data governance. There must be a level of maturity to set the stage for successful
governance or there must be a universal system that indoctrinates new community members with
governance principles and assigns and explains their role(s). Military organizations understand
this. Our school systems understand this. Every corporation has new employee orientation
programs, most of which contain no reference to standard process and the employee's
responsibility to adhere to standards.
"We are a government of laws and not of men." except when we step into the corporate bubble.
At that point, it is understood that we are to work for the approval of the boss.

Re-Branding
Mass marketing seems to be an American (United States) invention and may be the single most
impactful innovation of the last century. Please note that I make no value judgment. We each
have to make up our own mind whether the impact was positive or negative.
Certainly, it has served to increase wealth so if your standard is ROI then you would have to
view mass marketing as a positive development.
The downside effects are much more difficult to measure—plus virtually nobody wants to talk
about the downside. Just as clearly, people have been convinced that they "need" something that
they didn't even know existed. To that extent, a lot of raw materials were consumed and a lot of
byproducts were produced because of the success of marketing.
Perhaps the biggest downside from my own perspective is the continual re-branding of
technology practices. The effect of this is that everyone is on their heels all the time. We are
bombarded with new acronyms and substantial effort must be expended to learn about them.
Unfortunately, the common result of this effort is the realization that this "new" thing is really a
20 year-old concept with a new name.
Those not equipped to realize this invest even more time and energy in trying to make this new
thing be their magic carpet without ever discovering what it was that kept the rug from flying the
first time around. Technology folks are easy marks since they often are completely unconcerned
with history—newer is obviously better after all.
Lots of money is being made but society is paying the price. Healthcare is the perfect example.
Technology churn is costing billions at a time when everyone recognizes that costs are out of
control. No worries though, we'll just focus all the lights on insurance premiums, thereby
diverting attention away from the decision makers.
BI and Re-Branding
And by the way, there seems to have been a re-branding of "BI" for the mass market.
As recently as a year ago, BI (Business Intelligence) meant something special. Different kinds of
information displayed for very specific purposes. Now it seems to mean "reporting" (although
"BI" is a lot more edgy than "reporting" so probably worth more money.)
If you are buying "BI" and paying BI prices but getting basic reporting then you are a victim of
mass marketing and re-branding (see previous post).

The Status of [Data] Governance


In the past 10 days I have addressed DAMA chapters in Iowa, Minnesota and Wisconsin on the
topic of governance. As part of the presentation, I attempted to learn the status of governance
initiatives within the participating organizations. These organizations ranged from the small (<
1000 employees) to the very large.
There was no enthusiasm surrounding the state of any governance activity and only a few who
were even willing to say that they had any governance in operation.
As a result of this admittedly informal survey, I am willing to state that recognizable [data]
governance is virtually non-existent. But wait, you say, I have read press releases about
enterprise data governance being rolled out at some really big-name corporations.
I have done some personal research at one such big-name company by interviewing in person or
on the phone several individuals within the corporation who are directly involved at several
different levels. This more formal survey revealed a considerable degree of anxiety among those
directly responsible for some piece of the effort. At the same time there was a sense of interested
detachment from those involved in the "governance" of Enterprise Data Governance. Meanwhile
those who should have been heavily involved by virtue of their job responsibilities but weren't
formally part of the structure had a pretty fatalistic attitude about the whole thing.
The overall impression I took away was in the nature of the Emperor's New Clothes. The
comment I heard most frequently (from every one of those interviewed) was, "We're making
progress." These people all have a history with this company that goes back to 1992 and earlier
so I imagine that when they stop to consider the difference between then and now, progress of
many kinds is apparent.
I'm all for making progress, but I have to wonder if we aren't too easily satisfied. If those who set
out on the Oregon trail had been satisfied with progress at this rate, their great-grandchildren
would have been overtaken in their Conestoga wagons by the construction of Interstate 80. It
would seem that one of the worst things that can happen to an organization (company,
corporation, institution) is to create a bubble within which to operate.
Many organizations today seem to have done this and those with the most identifiable corporate
culture and the strongest brand have done the most to create their own distinct and separate
reality in which "making progress" is not only good enough, it is the pinnacle of achievement.
In the USA, we have a model of governance that was the first of its kind and, because of the
model's structure, is viewed as something that can be duplicated elsewhere. We are accustomed
to thinking of this model as democracy but that is a mistake. John Adams (successor to George
Washington) got it right when he analyzed it this way, "We are a government of laws and not of
men."
The corporate equivalent of laws is standards. Governance based on men (and women) runs on
approval while governance based on standards) laws runs on compliance. Clearly compliance
based governance where compliance can be verified by audit is vastly preferable to approval
based governance in which approvals are both slow and subject to reversal for any of a myriad of
reasons.
No system of governance is perfect, but a living system in which standards are subject to
periodic review and can be modified to accommodate external changes, must be preferred over
the alternative. Where will the leaders come from who will do for corporate governance what
Jefferson, Madison, Adams, Franklin and others did for national governance?

Changing
I have been doing a few presentations of late on the subject of "Guerrilla Governance" which is
about the application of guerrilla principles to the quest for good [corporate and data]
governance.
The central theme is commitment founded on a vision and how to use that to create community,
communication and credibility. Through it all, the message is that complaining, wishing and
waiting has not produced results, is not producing result and will never produce results.
I learned that I already have what I need and now I'm working to get that message out. The raw
material, the resource used to power the change we need is in plentiful supply. It's the pain,
frustration, and lack of fulfillment encountered in everyday work life. Even if I do not feel it,
others all around me express these feelings every day.
The norm of work life is approval-seeking. The rare business has created a system of standard
processes and metrics that frees its employees from the need to seek approval. These are the
CMMI Level 5 companies and the Malcolm Baldrige Award winners. The vast majority invest a
handful of people with authority by virtue of a title and force everyone else to seek their approval
in order to change anything.
If you get this, it's up to you to change it. Alignment is the grail sought by management. It is
thought that alignment will produce the "well-oiled machine." The problem is that the "folk
wisdom" of the executive suite and the board room seems to be that the basis of alignment—
vision—is something best kept close. Rather, alignment springs from a common vision. A shared
vision is the shortest path to alignment.
If the leader of the company isn't actively sharing their vision with each and every employee of
the company, then it isn't happening. Reliance on staff meetings to promulgate the vision is very
much like the old party game of telephone. Who knows what the person at the other end is really
hearing? There are other visions out there—I have one myself. Whoever you are, whatever your
job, I urge you to hold yourself accountable to the grandest vision within you until it is replaced
by one even more grand. Be responsible for the change you need, but remember that the change
IS you.
Note to Sec. Sebelius
Secretary Sebelius,
I appreciate very much your stated position (according to Healthcare IT News) that technology
adoption in healthcare is not enough, that interoperability of technology is also necessary for
healthcare reform. I wonder how much you know about interoperability of healthcare
information systems. I wonder only because there is nothing in your published biographical
information that leads me to believe that you have any background-in-depth in a technology
discipline.
I don't mean for this to sound like criticism—it isn't—I think your position is a correct one and
your advisers have done a good job. I wonder if you are aware, though, that there has been talk
of interoperability for several years within the healthcare marketplace and there have even been
claims of the achievement of interoperability. There has even been a "certification standard"
published purporting to validate system interoperability.
All of this isn't worth the effort it took me to type the words. The reason for this "much ado about
nothing" is simply that there is no incentive within the marketplace for the level of cooperation it
would take. Technology of all kinds is the cash cow of healthcare and no one involved has any
reason at all to kill that cow or even to bring it into the barn.
In the early 1980's, the Department of Defense had a very similar problem. Each branch (Navy,
Army, Air Force, Marine Corps, Coast Guard) had its own procurement structure and its own pet
contractors. There were no standards and all that was necessary for a contractor to be successful
was to maintain some level of credibility with the procurement officer(s) involved. The result
was that (for example) Army units in the field couldn't talk to units of other services because
their communications equipment was incompatible. Logistics was a nightmare because of the
variety of spare parts that had to be maintained and computer systems incorporated the "dialect"
of the purchasing service and could not exchange information with the systems of the other
services. This is the surface of the problem. The technological diversity went much deeper as
well to the point where it was a major procurement effort to get two systems to communicate.
NASA was developing plans for an international space station and realized that they were going
to have to fundamentally change the way that systems were specified, developed, and
implemented if there was ever to be any hope of success.
The Defense Dept. took control of the situation through an initiative called Software Technology
for Adaptable, Reliable Systems (STARS). DoD mandated that processes and methods (and their
documentation byproducts) as well as tools and other technology used in the creation of systems
be standardized for the purposes of reducing costs and delivering a level of interoperability.
Healthcare operations and all of their vendors—virtually everyone outside the walls of the DoD
and the Software Engineering Institute at Carnegie-Mellon University remain blissfully unaware
of any of this history, all the while enjoying its fruits.
I want you to know that I believe interoperability can be obtained, but not without the institution
of new paradigms and some major upheavals in the technology vendor community. I have
dedicated 13 years of my life to laying some foundations where I can and I fervently hope that
you have the commitment and the political will to see this through. Without that, government
efforts are likely only to increase costs.

Coaching
It occurs to me that many people probably don't understand what coaching is or how they might
benefit. Since I am advertising myself as a data management coach, the first task in marketing
myself may be to do some education on what should be expected from a coach and differentiate
coaching from consulting.
If your only exposure to coaching is youth activities or watching your favorite team on TV you
may have an idea that coaches call the shots, that they direct, and are to be obeyed. Nothing
could (or should) be further from the truth. My job as your coach is to understand what your
capabilities are (as well as those of your "team") and to use that knowledge to help you find ways
of attacking your goals that are likely to lead to success.
You might also have developed the idea that coaches are cheerleaders and that one of their main
jobs is motivation through exhortation. Again, not true. While I will be quick to affirm strengths
and celebrate success, I will not create unrealistic expectations. A coach's goal is to help you to
understand the most effective ways at your disposal for addressing the problems and challenges
that will confront you.
A youth soccer example will illustrate. If you are fast and by nature aggressive, you can succeed
as a defensive player by attacking the ball and taking it away from your opponent before they
have a chance to score. If you are not the fastest player on the field and are a bit passive or
hesitant, you can still produce a good result for your team by merely staying between the ball and
the goal and delaying your opponent until help arrives or by forcing the play out to the edge of
the field.
In data management, similar principles can be applied. An aggressive, direct approach may
succeed for some while a more calculated and collaborative approach may work better for others.
In any case, you will want your coach to be able to help you find the successful path which calls
for experience as well as expertise on his part. One of the least appreciated values a coach
provides lies not in what you do but in what you DON'T do. Your coach wants you to be
successful and will help you avoid situations in which you can't or are unlikely to succeed.
You have knowledge, talent—all the raw materials for success or you wouldn't be where you are.
Sometimes what you don't have is time or some specialized expertise and in that case you will
want a consultant who can come in and get 'er done. But sometimes this is counterproductive
because you won't be able to keep calling the consultant back each time you need a change or
repair. If you have some time, a coach may be a better alternative since he will leave you with
success strategies and tactics that you can continue to apply.
You want your coach to be at your shoulder, ready to answer your questions but also to be asking
you questions continuously to help organize your thought processes. In that sense a coach is
more than a teacher and more than a mentor. A teacher will not be responsible for the application
of the subject matter. A mentor may be standing by at the end of a phone line. The coach will be
there with you.
Guerrilla Governance
In the March 14, 2009 post "Guerrillas and Governance" I introduced the notion that, because of
long-time inattention to the needs of the people/workers on the frontier (organizational
boundaries), systems of governance will have been developed there and may have been in use for
a long time.
In most cases, this governance will be relatively crude and inadequate. In the modern context, it
might be something as simple as "We don't accept those after 2 PM so that we give ourselves
time to get them done before 5:00."
What we, as guerrilla leaders should perceive is in two parts:
1. This group is dealing with a problem and has a "process" in place for doing so.
2. There is a problem. It is recurring. It has a cost.
If we feel the need to introduce a new level of governance that eliminates the problem rather than
dealing with it on a repeating basis, we must take into account both of these parts.
There is a ready-made community here and they have banded together for mutual protection. We
dare not dismiss that fact or we will create opposition that will resist us to the bitter end. Until we
take the time to make them feel (not just understand) that we really want to help them with their
problem—not ours—they will resist all of our efforts.
The dialogue goes something like this:
you: It looks as though you are experiencing problems with [form, file, request...].
they: You wouldn't believe the kinds of /@#*(^ we get. And it's most of the time.
you: So what do you have to do when you get one like that?
they: When that happens, we have to [lists multiple process steps needed to remediate]. That's
why we have to have a cut-off at 2:00.
you: So, if I understand this right, you are getting unusable or unacceptable input from [another
boundary function]?
they: That's right. They just don't seem to care how much we have to work.
you: What happens when you complain to them?
they: They just say that it's their job to generate [forms, files, requests] and it's our job to process
them.
you: I think there's a good chance that we could guarantee that you wouldn't have to do any of
those process steps you told me about or, if you did, it would be rare. Would that make your lives
easier?
they: Absolutely. How would you do that?
you: First, we should put together a meeting. I've already talked with them and, believe it or not,
they are dealing with similar problems and similar frustrations. I think the solution to your
problem is the same as the solution to theirs. To make sure we need to meet because there are
still a couple of things I need to get clarified. Will you help?
they: Tell me when and where. I can't meet on Tuesdays at all.
And so it begins. You will use their pain to elicit their cooperation. Their cooperation creates a
new community. Community action guarantees compliance. A newly empowered community is
a breeding ground for improvement of many kinds.
This is guerrilla governance. The only requirement to get started is a goal. You will need to be
able to articulate the goal over and over again in many different dialects. In many cases, you will
only want to expose the part of your goal that your audience is able to comprehend. Never try
hide the fact that there is more. You'll simply answer all questions openly and honestly and never
insist that anyone needs to understand your perspective. "We'll improve our understanding as we
go." is a good way to postpone dealing with difficult questions until more education has
occurred.
Always remember, you can't do this without them. Their commitment is vital. Talk freely to
management about progress and remember that management has pain as well. You're a leader.

Governance and Control


There seems to be some confusion in data governance circles concerning the application of
governance—how to make it work. I sat through a tutorial at a recent conference in which the
expert emphasized the need for authority as the key (or a key) to successful governance.

It was never clear to me what the scope of this authority was to be or how it was to be used. I
finally asked the question, "Authority for what?" You may have heard that responsibility without
authority is the recipe for stress and burnout. I thought to pursue this line of thinking as a way to
discover what was meant by data governance. If I know the nature of the authority, I should be
able to deduce the nature of the responsibility. The question never received an answer. What I
got was blank looks.
I felt a strong need to get to the bottom of this since the word "enforce" or "enforcement" was
also used several times. I was becoming extremely uncomfortable.
Friends, if people do not accept governance and cooperate with it, then the governance model
needs to change. We do not need enforcers. We need arbiters, mediators and facilitators. More
than anything else we need teachers. I've heard it said that we all do the best we know how and
when we know better, we'll do better.
Controls and attempts to control do not work in governance. They only create bottlenecks and
delays that encourage people to find other ways. In our local civil government, we call it red tape
and bureaucracy. For example, building permits are required for many home improvements. The
reasons for this requirement are excellent. The permit and the resulting inspections (audits)
protect the current and future homeowner by insuring that the project is safe. In spite of the
obvious benefits, many do-it-yourself homeowners avoid the permit process because the process
is obscure, the standards must be discovered, it can be inconvenient, it adds to the cost and is
known to produce delays. Furthermore, the only way for the scofflaw to be caught is through an
inspection and the authority has no reason to inspect other than the permit. Note that contractors
licensed by the authority are much more likely to comply.
Contrast this to the governance of traffic on roadways. Standards are clearly displayed; drivers
must pass a licensing test demonstrating both physical capacity and knowledge. Law
Enforcement (To Serve and Protect) is primarily tasked with monitoring compliance (which their
mere presence guarantees). Compliance metrics are gathered via various kinds of technology and
governance changes (to speed limits, traffic signals, etc.) are made based on these audits. What if
we had a committee at each intersection with the sole authority to direct traffic?
As you can see, governance requires an initial framework (competence, licensure), a coherent set
of standards (coherent in the sense of both understandable and integrated), and monitoring/audit
capabilities. Anything else is extra and may even get in the way.
The result of good governance is a community that enjoys consistency, predictability and safety
and is mostly free from nasty surprises. The authority that is present is passive and present only
to deal with issues that don't fit within the governance structure. If authority is needed
everywhere, there is no governance anywhere.

Feng Shui and Data Governance


I don't know if feng shui is classified as a science, a craft, an art or if it is part of a spiritual
discipline, but whatever it is those in the midst of setting up data governance may benefit from
some of its principles.
Wikipedia (http://en.wikipedia.org/wiki/Feng_shui) provides a nice discussion of the history and
principles of feng shui. If we think of data and information as the qi (ch'i) or life force of a
business, we can see parallels. We want to encourage the life force to flow through our
"dwelling" (the business). We want to hold the good qi and allow the bad qi to pass through
without being retained.
Feng shui begins by studying the physical geography of the dwelling. If the structure is not yet
built, the site or sites are examined so that it may be built to take advantage of the natural path of
qi. If the structure is already built, everything about its natural geography is taken into account
before any suggestions concerning arrangement of furnishings are made.
Holding an understanding of feng shui in one's mind while architecting a governance solution
might lead to
 spending more time understanding what currently is before seeking to change it
 including all necessary elements (5 in feng shui) in the design such as process, meta data,
master data (conformed dimensions), data quality, metrics, intelligence, ??
 acting small but always within a larger vision
 recognition of yin and yang (actor and receiver) and the need to consider both within all
of the elements
We need our business qi, data and information, to flow freely into the business, through every
part of the business, and out of the business. We need to discourage bad qi by showing it for
what it is and directing it back where it came from. We need to create harmony within and
between business units and functions for the well-being of the business as a whole.
I realize that some readers have already dismissed this with a snort and a sneer. I'm not saying
anything about the effectiveness of feng shui. What I am saying is that the goals of feng shui
should be the goals of data governance and that it is possible to discover some clues as to good
approaches by studying something that has been around for more than 5000 years.

Winning the World Series


How do you win the World Series? Implementing good [data] governance is a lot like winning
the championship, whether World Series, Super Bowl, Stanley Cup, US Open... It's a big goal
accomplished through thousands of smaller ones. It's also similar to achieving Level 5 on the
CMMI.
Let's take a look at how to win the World Series and see if we can learn anything about how to
implement governance.
The first thing to recognize is that it takes an entire season—it isn't done in only a couple of
weeks in October.
It takes the cooperative efforts of an entire organization.
Management must find and hire the right set of talents and abilities.
Coaches must turn the collection of talents and abilities into a team.
Each person must have the desire to excel as a part of a team.
Each person must come to share the vision of Winning The World Series.
Leadership must emerge to keep the vision in front of everyone.
We must win today's game (over and over again).
I must become a baserunner (if a batter) or keep the batter from becoming a baserunner (if in the
field). I recognize that I won't succeed all the time but that doesn't keep me from wanting to
succeed every time. Winning today's game means I must win more of these smaller contests than
I lose.
In order to win the small contests, I am prepared. I practice, I consult coaches, I talk with my
teammates. I cultivate the knowledge as well as the abilities required.
I choose equipment that fits my needs.
I learn to win the contests in my own environment and in foreign environments.
I cultivate personal and team consistency.
When all of these things are done consistently and well, we find ourselves with at least the
opportunity to win the World Series when October finally arrives.
What jumps out at me in all of this is the need for planning, preparation, patience, desire and
commitment. I'm sure that no one out there believes that a governance implementation can be
launched and completed in a few weeks. How long do you think it should take? Months? Years?
Decades? Since there is no finite season or schedule to constrain us, maybe the best answer is
that it will take as long as it takes.
That said, it seems incumbent on us to decide how we'll know when we have completed the task
we have set for ourselves. I realize this seems self-evident and trivial but as I visit with people
and groups I have developed the impression that the stable state is still undefined. What that
means is that we are eternally implementing when we should be improving.
In the absence of another definition of the stable state, I have offered two principles for that state:
1. No [data] pollution and
2. No nasty surprises
Since these represent a whole series of contests, each of which we are committed to winning,
while understanding that we won't win them all, another important property of the stable state is
that it embody learning and self-modification (improvement). When we have created the
property and the principles, we will have "won the world series". The next step is to understand
the contests that make up "today's game" and equip ourselves physically, mentally and
emotionally to win those contests.
Today's problem is that we are losing contests that we don't even know we're involved in. There's
an old poker adage that says "If you look around the table and don't recognize the sucker—it's
you." In [data] governance terms, if we look around and don't see the loser—it's us.

We The People
We the People of the United States, in Order to form a more perfect Union, establish Justice,
insure domestic Tranquility, provide for the common defence, promote the general Welfare, and
secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this
Constitution...
Article 1 of this constitution describes a representative form of governance, recognizing that the
needs for deliberation and timely decision making can best be met in this way. This was
particularly true in a time when travel was by foot or by horse (or other animal propulsion) or by
water propelled wether by wind, oar or paddle.
Two thoughts come to my mind:
1. What might this article say if written today?
2. There has been no need to modify the principles set forth during the ensuing 222 years.
All of this leads to a third thought. If the goals of corporate governance are substantially the
same
 more perfect Union—Every CEO wants the company to operate as a unit, with a single
purpose
 establish Justice—A sense of justice is a prerequisite for people to focus on their duties
and responsibilities.
 insure domestic Tranquility—Inter-personal and inter-organizational dissension is a
primary cause of lost productivity.
 provide for the common defence—The company must defend its position in the
marketplace and each employee is critical to that defense.
 promote the general Welfare—This goes hand-in-hand with justice. It's human nature to
want things to be better.
 secure the Blessings of Liberty—Personal liberty is always subject to the other goals.
then maybe we ought to consider whether the method should be the same.
It's hard for me to consider data governance (which is where I'm coming from) in a vacuum. The
goals of data governance are substantially the same goals outlined above. Defense is about
defending the integrity of the data resource. Union is about consistency. Justice and welfare is
about everyone living by the same rules (thus producing consistency).
I don't want to make data governance sound so impossibly complex that we throw up our hands
in surrender. The message I'm transmitting is that we have models to use. We do not have to
reinvent governance.
One of the difficulties in any governance model is to come up with a definition or picture of "the
governed". We go through life happily assuming that everyone else is "just like me" in terms of
their wants and needs. Mostly that works, but every now and then, we run into someone who
isn't "just like me." When that happens we have two choices. Either we try to make the other
person just like me or we adapt our view of "me" so that it includes some new parameters. In
corporate life, it is exceeding dangerous to assume that anyone in a role different from ours is
"just like me."
Even if we restrict ourselves to data governance, we find that we have to include as "governed"
many who are filling different corporate roles and are definitely not "like" us. Again, I go back to
the American Colonies in the mid eighteenth century. Imposing or trying to impose a set of rules
on people whose life and needs I don't understand is destined for failure. The secondary
message is: either include everyone in designing the rules or (poor second choice) understand the
needs of the others before designing the rules.
Everything I see and hear about data governance is from the point of view of the person whose
role is management of the data resource. There isn't a single person in the marketing department
who would ever conceive of the need for data governance. Of course, we can spend time in
learning to talk the marketing language and becoming familiar with marketing problems, then we
can show them that some kind of governance is needed and they will agree. They might even
agree to invest some time on a committee. Eventually, though, they're going to wonder if this is a
good use of their most precious resource—time.
Making laws (standards) is a messy process. Much of the data governance effort is about the
process—identifying stakeholders, building consensus, the political side of things, while the
standards and processes become a very small box on a big diagram. My thought is that we don't
even know the stakeholders until we understand the processes. The political side is essential, but
there is a lot of good we could be doing if we would focus on the processes and standards.
I keep saying this because, while there may be similarity in the way two corporations handle
governance, I have serious doubts whether it will ever be possible to export one company's
solution to others. The political implications of forcing an outsider's will on a population would
cause "failed" to be stamped on the effort nearly immediately.
Bottom line: You're on a burning platform. Don't wait for someone to save you. What do you
have? What can you do? Do it!
Can and Should
Can and Should are in constant tension. They both imply something that has not yet happened—
in other words, they both are in the future. So here's the key question:
Do you want your future to be composed of cans or do you want a future of shoulds?
Should is closely related to could.
If you could do what you should do, would you do it? If you should and could but don't, what
kind of future do you have before you?
Is your past characterized by "might have", "could have", "would have", "should have", or as my
father was fond of saying, "mighta, woulda, coulda, shoulda?"
What's the difference between could and can? It might be knowledge or it might simply be
practice. For many people, the biggest difference is the realization that there is something beyond
"I can." Parents fill this role as do teachers, mentors and good friends. The process of revealing
the new world of could is known as coaching.
What we should do is a function of goals, history and current context. Most of us get paid to
know what what should be done. Most of us also take the easy way out and do what we can
rather than what we could or should. In fact, "Do what you can," has become a universally
accepted surrender. When the boss says it, it means that
1. they don't know what should be done
2. they don't know what could be done
3. they don't want to be bothered with knocking down roadblocks
4. they don't really care about the outcome
When I say it ("I did what I could.") it means
1. I know what should have been done
2. I know that I could have done more
3. I told them but they wouldn't listen
4. I was not committed to a quality result
We nearly always allow ourselves to choose the familiar path. When faced with a choice
between can and could, we choose to do what we have done in the past—can.
We cannot get the data quality we need unless we have the governance we need and we can have
neither if we continue to do as we've always done. This is macro as well as micro advice.
Governance is not committees and steering groups, though it may have need of such. Data
quality is not one definition, though that may be helpful. Both are about contextual consistency
and predictability. This goal could and should be achieved in whatever ways are appropriate to
the context within which the consistency is desired.
Consistency is a product of process and the foundation of improvement. Once the process
produces consistent output, you have freedom to classify and categorize its output in whatever
ways are suitable to its customers. We are currently engaged in trying to classify, warehouse and
use inconsistent products created by inconsistent processes.
What could we do? What should we do?
Haves and Have Nots
When I speak of have here, it should be clear that I'm referring to resources. Less clear but no
less important things to have include:
 need (acknowledged)
 commitment
 known cause of pain
First and foremost is availability of resources to be applied to making improvements. Data and
information quality diseases have much in common with human diseases in terms of diagnosis
and treatment. There is much discussion today concerning the state of health care in the U.S. The
discussion focuses not on diagnosis or treatment—those aspects are well understood (if
imperfectly practiced)—but on paying for the diagnosis and treatment.
It seems that financial resources or the lack of financial resources is the single most important
determinant of physiological well-being. If we examine the whys behind this, we soon see that
expectations have much to do with it. The person without financial resources learns to expect
that some problems will be chronic and learns to live with them, perhaps at a lower level of
function. The financially well-off person learns to expect that every problem has a cause and a
cure and that time and money will produce the expected well-being.
Neither is absolutely correct and both sets of expectations produce advantages as well as
disadvantages.
We can apply the lessons of health expectations to data quality. Larger or wealthier companies
expect that they will be able to attack a quality issue with sufficient resources to conquer it.
Smaller or less well-off organizations will not feel able to dedicate one or more people to the
issue and will elect to "do the best they can" (see previous post). Small business leaders will see
that everyone must be involved in the solution for it to work and that alone will cause them to
turn away from a frontal attack and "make do." Large business leaders may believe that the right
manager or leader with sufficient resources can bring it off.
Again, neither is absolutely correct.
A person or an organization resigned to living with pain is always going to find it difficult or
impossible to improve while a person or organization immersed in full scale battle with the
problem may well miss opportunities for improvement.
As it turns out, a "data quality" campaign is like a campaign against bacteria—almost
meaningless. Because the scope and scale of the campaign preclude considerations of nuance, we
find that we make enemies from within the ranks and everything degenerates until nothing is
happening. We can make progress against a specific bacterium or a specific quality issue but we
soon realize that we can't hold those gains without creating a framework within which we can
establish trust, confidence and consistency. That framework has come to be called data
governance. In the case of physiological disease, the framework is Medicine.
Whether you're a have or a have not, the resource issue turns out to be far less important than we
might have thought. Consider expectations first.
 Can we live with or adapt to the pain?
 Have we already adapted? How?
 What limitations are imposed by the adaptation?
 We can choose to treat symptoms, cure the disease, and prevent the disease. Which is
within our reach? What can we do? What should we do?
In most cases, the best choice is to treat symptoms while making lifestyle changes to prevent the
disease. Sometimes we have to cure the current disease or we die before we can implement the
lifestyle changes. The point is that we always have options. A specific option must consider the
past, present and future. A combination of options may produce the best result. Last but not least,
have and have not is not really about resources but is about expectations. Commitment is often
born of desperation when we realize that we just can't tolerate the future implied by our current
expectations. Now we're really ready to do something meaningful.

Healthcare and Health Care


I have to revisit this subject in light of recent news and developments. It pains me to see the
confusion that has caused the pollsters and pundits to be able to take shots at something that
everyone wants.
There is concern about cost that is well founded. The problem is that costs come in a variety of
disguises. Now discussions of cost have assumed inordinate importance in the questions of
access. Once again, healthcare is about cost and health care is about access. These are two
distinct issues.
There is virtually no one who advocates the denial of medical care on the basis of inability to
pay. The entire health insurance industry emerged in response to a rise in costs driven by
improvements in the science and technology of medicine. These improvements demanded better
education for medical practitioners (added cost) and the technology has become more complex
resulting in increased cost for both development and support.
While all this was happening, we became a nation of sedentary, over-eating, narcissists who
believe in the idea that modern medicine can fix whatever we do to ourselves and make us
all(well, me anyway) into beautiful people. We have learned to game the system to get the plastic
surgeries we want and the pills we want and the therapies we read about.
Insurance coverages have been broadened continually in response to forces too numerous to
mention with the result that more premium dollars go out requiring more premium dollars to
come in. The insurance companies have developed bureaucratic defenses, requiring second
opinions, demanding justification based on diagnostic testing, even making a practice of denial
of the initial claim to filter out those who aren't really serious. The additional staffing and data
handling is paid for by premium increases and forced out to the medical providers who increase
their charges. Rising costs are everywhere and no one can join the debate with clean hands.
Everyone wants someone else to absorb the costs.
In the midst of all of this, we sometimes lose sight of those who simply stay away from health
care because they don't have enough money to pay for the other, even more basic necessities. We
always lose sight of those who try to take care of themselves by buying health insurance, which
they can afford only by accepting caps and deductibles or limitations on coverages. These people
looks good in the statistics but rarely show up in the doctor's office because paying the premiums
has put them into the same category as the uninsured in that they have no financial resources left
over to pay for the visit. Further, they now have to live within the insurance bureaucracy that
demands diagnostic justification, turning what might have been a $100 office visit into a $300
one.
Access to health care for everyone should be the sole topic in Washington. Access has relatively
simple solutions. Let's solve that problem first, and, by the way, we have already agreed that
ability to pay will not limit access.
Costs are a completely different issue and one that will require all interested parties to make
substantial changes in thinking, planning and delivery.
It is deplorable (to use a word employed by a past President in a slightly different context) that
we continue to allow doctors, administrators, insurance CEOs, technology vendors, pharma, and
politicians to continue to point fingers at each other while the full cost of inadequate health care
is borne by me and you, the patient/consumer.

Managing Technology and People


The thing about management is that it works great for money, time or countable things, but
management of complex, uncountable things is at best a dream. You say "people are countable"
and I'll agree, but only to the extent that the management is about bodies (head count).
Technology and its applications are so complex as to be unmanageable. Edsger Dijkstra spent
much of his career trying to get that message across to the business world and the budding
computer industry. Today, he is remembered more for optimized search algorithms. If that isn't
the industry in a nutshell...
Many years ago I tried to tell people that technology must be controlled, not managed. My
slogan was "control technology or it will control you." We are so in love with the notion of
management and have so much antipathy for control, that this message, like Dijkstra's, falls on
deaf ears. I am not, in any way, comparing myself to Dijkstra. I am comparing his "audience" to
mine. Today, the tag line on my web site (www.michaelmeierdata.com) is "Leadership for
change, Management for effectiveness, Governance for stability." The three are not mutually
exclusive but the probability of finding all three in one person is quite small.
Control is not a bad thing—it is an essential thing. Automobile travel without control would
involve so much risk that no reasonable person would attempt it. The control starts with the
design and production of the machinery itself. An automobile is designed to be controlled. It is
also designed to function within a larger system of controls. Awareness of this allows the
designers to prioritize their efforts and focus on differentiators suggested by the system rather
than on mere "performance" factors. For example, it would be a waste of time designing a
vehicle for mass production that could negotiate a 90-degree turn at 80 mph. The system of
controls insures that this level of performance is unnecessary.
Use of technology without controls is also fraught with risk. We require control over the design
and production of technology to insure that the product is useful and usable within the larger
control framework that is our business context. Control over the use of technology is needed to
insure reliability and safety for all just as our traffic laws and their enforcement produce a sense
of safety and predictability for those of us on public roads.
Because no one likes the idea of control, we are calling this "governance" but make no mistake,
governance must be about controls or any effort is wasted. People want and need consistency.
Consistency produces contentment and the role of government, according to The Art of War
(Tzu) is a contented populace.
So, if by management, you mean counting (or accounting), you won't have success applying it to
personalities or to technology. If, by management, you mean coercion, you can, for a brief time,
deliver the appearance of consistency and contentment with personalities or technology, but you
will only be masking a growing problem. If, by management, you mean a system of controls
(governance) that produces consistency, predictability and reduced risk, only then will you be
able to say that your technology (and your people) are being managed.
Fortunately, the controls necessary for effective implementation and application of technology
are well understood (if largely ignored). The Software Engineering Institute's CMMI and ITIL
are but two specifications for a system of controls. These are thorough and consistent and
understanding them will enable you to create a system tailored to your organizational needs.
The future starts when "control" is accepted and welcomed.

Enforcement and Accountability


Recently I responded to a discussion question on a LinkedIn group forum. The question dealt
with how to enforce standards in a data management and stewardship scenario. The other
responses mentioned the use of various committees and steering groups as well as management
partners for enforcement of standards. One response suggested that a lot of messy people
problems could be avoided if automated tools were used to find areas of non-compliance.
I can't help but think that we, as a society, must be nearing the pinnacle (or the pit) of buck
passing. When I as an individual choose to ignore an incident in which an action by someone
else either ignores the general good or threatens the welfare of all, I am turning my back on
accountability and passing the buck to "someone" else.
There have been many instances in which a malefactor, caught in the act, has told me, "What's it
to you? What do you care?" If I point out that the action was, for example, in violation of
published standards, I might hear, "Nobody follows that. I didn't know it existed until you
showed it to me."
The point is that it takes a lot of will in the face of widespread apathy to be accountable for not
only following standards, but insisting that others follow them. A study, which I am unable to
cite, showed that the rate of deterioration in a neighborhood increases when individual incidents
are ignored. For example, a window broken by vandals goes unrepaired or "tagging" of a wall is
not erased. Ignoring an incident encourages similar incidents and then worse ones.
I realize I am comparing failure to follow standards with vandalism and I do this with intent. If
we assume the existence of a set of standards, they must have a purpose. Normally, the purpose
is related to quality. Every manager wants their organization to run like a "well-oiled machine."
When an organization does runs this way, we say it is a quality organization. Ignoring a standard
is like dropping a grain of sand into the machine. Everything may proceed with one grain, but as
one grain encourages two and then three..., eventually the machine will break down.
Bottom line: if you're looking for someone else to enforce standards, you're looking in the wrong
place. It's up to me and it's up to you. If it's a bad standard, then it's up to me and you to get it
fixed. Without personal, individual accountability, you will never get adherence. Enforcement is
an empty concept even outside the workplace. People will not endure coercion for long.

The Control Myth


Control is a very much misunderstood concept. The bottom line is this:
Self-control is a good thing, an essential thing. Attempts to control others are doomed and will be
harmful to all concerned.
This is difficult to write and it is difficult to publish in a public forum. I know it will be resisted
and may stimulate reactions from others that will not be beneficial to me. I have made a
conscious decision (self-control) and hereby renounce any expectation with respect to responses,
reactions, and results.
I retain my hope that the statement may create thought processes that lead others to alter the
ways in which they interact with their world.
The conscious reader may note that my statements regarding control are framed relative to
people. Control of inanimate objects and such abstraction as process is not only good, it is
mandatory. Please note, however, that control of (for example) a process is not the same as
control of the people involved in it.
You may have heard of Total Quality Control (TQC) or Statistical Process Control (SPC) and the
need to have and use controlled process in order to insure high quality production. A study of
these methods has led me to a new understanding of control.
When we think of control we typically associate notions of power. I control something when I
can make it bend to my will. A manager is said to control an organization or function. A driver
may be fined for failure to have control over their vehicle. This is one of the reasons why TQC
and SPC have such a difficult time gaining traction in the business world in the U.S. (the
environment with which I am most familiar). The control that is the core of Quality methods has
nothing whatsoever to do with my or anyone else's will. If anything we need to look at it from
the opposite direction.
A process is either functioning within understood parameters (in control) or it is not. If it is not,
we say it is out of control. What we mean is that we have just discovered that we don't
understand the parameters as well as we thought we did. Now we can assert our will to change
the process so that the new (improved) understanding becomes part of it. It is potentially life
changing to realize that the process is literally in control—that it is has the control. The evidence
is that it produces what it produces. If we desire to change what is produced, we must listen to
the process, understand its needs, and give it what it needs.
To take a giant step that may require backfill later, if I have a need to control something, the
variability or consistency of output for example, I must give up control to the process. If I have a
need to control the people, I must give control to them as keepers of the process. They have a far
better ear for what the process is asking for and can give it what it needs. When they can't give it
what it needs, they will come to me and tell me what I must do.
To have control, I must give up control.
Life is a process.

Complexity and [Over]Simplification


Pardon my absence (if anyone noticed) while I recovered from a bout of depression brought on
by repeated exposure to thought streams defined by buzzwords and marketing hype. Ulysses
(James Joyce) was memorable for steam of consciousness paragraphs that ran on for page after
page. I did not find the process of navigating someone else's thought stream enjoyable in the
least. This is the same feeling I have been experiencing of late. The process of mapping the path
from stimulus to response (or problem to decision) in humans is poorly understood at best.
My thought was to try to zero in on one kind of problem to see if I could cast some light on the
decision-making process and find out how it gets caught in the deep ruts so often.
If you are a technophile who buys in at the bleeding edge and who speaks in acronymese (let's
say it's pronounced uh-kron'-uh-meez): a language based on acronyms or sets of initial letters
chosen or pronounced in the form of words, for example SQL (for Structured Query Language),
pronounced see-kwel, then you may as well find something else to read because I'm about to ask
you to take some real responsibility. Actually, I'm going to suggest that those who have to listen
to you should demand that you take responsibility.
Many of you will be familiar with the Ishikawa Diagram though you might recognize it as the
fishbone or cause-and-effect diagram. If you have ever used this tool to identify the cause(s) of a
problem, you may remember just how quickly the diagram can become unmanageably
complicated. A few years ago S.M.Casey wrote a book titled Set Phasers on Stun: And Other
True Tales of Design, Technology, and Human Error (Casey, 1998). This book looks at some of
the biggest man-made disasters of the past twenty years in an attempt to identify the cause(s).
While human nature demands that we be able to blame someone for anything that goes wrong,
this research very pointedly shows that each time we identify what we believe to be the cause, it
is always possible to say, "But if x had been alert, the damage would have been minimal or
avoided entirely." In short, the cause is always a related set of events that may have been initially
set in motion by a "proximate cause" (http://en.wikipedia.org/wiki/Proximate_cause). The
problem, if we're interested in insuring that the damage is not repeated, is that legality is
irrelevant as are the needs of human nature. The only way to guarantee that a certain problem
never arises is to guarantee that anything that could be contributory is not allowed to happen.
So, "What does this have to do with me?" you ask. A brief example being better than a long
explanation, here is an actual exchange that happened at a DAMA meeting recently. The chapter
President called for suggestions for small-group discussion following the main presentation. One
of the suggestions was, "Why do we keep making the same mistakes?" The group noted that
1. it may not be possible to answer the question and
2. an answer might well be useless in avoiding the mistakes.
Discussion proceeded to other topics. A specific question was asked involving response-time
performance in a database application. Many possible causes for extended response times were
trotted out without shedding any light. Someone asked about the possibility that a single query,
executed repeatedly might be the point of "failure" and included the suggestion that the DBA
(database administrator) should be able to say whether this was the case or not. The response:
"We don't have a DBA. We thought we could get along without that additional expense."
Another example: Hammering a nail to hold together two pieces of wood is a simple and
straightforward operation that can be accomplished by almost anyone. A group of eight-year-old
boys could hammer many nails in a relatively short time. If the goal is a habitable dwelling,
though, or even a serviceable garage or potting shed, no sane person would entrust the job to the
boys. Note that the issue isn't motivation or lack thereof and it isn't tools nor skill per se, it's
really about basic knowledge concerning the desired result.
Some theoretical knowledge concerning material characteristics and structural formats is
required to deliver a result that will be in service for more than a few minutes.
In the information systems world, the analogy is much more apt than we might be comfortable in
admitting. It isn't possible to be involved in a database design discussion (or even a data
modeling one) without the term "denormalization" popping up. I'm going to use denormalization
as a placeholder for a host of other bits of conventional wisdom in the discussion that follows.
Relational data design is an application of set theory which, in turn is a branch of mathematics.
Normalization rules (or forms) were developed to ensure that set operations would be applicable
and would produce the expected result when applied to a database. Denormalization means
avoiding the application of normalization; in practice, it is rarely an activity. There is no
methodology for denormalization.
My advice to the supervisor, manager, project manager who is told that the database will be (or
has been) denormalized is to ask to see the normalized design and to ask questions about how far
the normalization was taken. My guess is that you will get a lot of verbal tap dancing and arm
waving. Ask that normalization be explained to you at least through the third normal form.
Remember that you will never be able to utilize the full power of your relational database engine
using set operations on a non-normalized database. You will always need programming to get at
the data.
I have participated in many conversations in which people stated unequivocally their feelings
about denormalization and normalization and could not articulate any of the normal forms. The
same holds true for statements about standards, methods, tools... It is apparently necessary today
to have a strong opinion about any topic that arises. I haven't heard, "I don't know enough about
that to have an opinion yet." in several years.
George Santayana may be the first to say that those who cannot remember history are doomed to
repeat it. He certainly won't be the last. Data and information design theory has not evolved
much in the past 20 years nor have software engineering or project management. Despite that,
every day there are dozens of exciting press releases trumpeting the newest (always
trademarked) approach to data management, project management, system development...
In the words of the Wizard, "Don't look behind the curtain!" The same difficult, complex work
must be done today as twenty (or 30 or 40) years ago. We do have better ways today of dealing
with simple repetitive tasks but all that really means is that what is left is more difficult, more
complex, more rigorous. This is no place for amateurs nor lone rangers. Every system will be the
result of a team of experts working for a common goal and relying on one another completely for
their individual expertise. The manager or project manager had better be able to recognize when
tap dancing, arm waving and smoke emission is taking place.
Or maybe all you want is cheap and/or quick.

Answers and Questions


We are living in a time in which seemingly everything is driven by the need for answers. You
may be saying, "So, what?" as you read this but the impression I have developed as I have
moved through this time is that in many, if not most instances, this amounts to nothing more than
"trivial pursuits." A quote found online recently sums it up perfectly:
"We are compelled to drive toward total knowledge, right down to the levels of the neuron and
the gene. When we have progressed enough to explain ourselves in these mechanistic terms...the
result might be hard to accept." [Edward O. Wilson]
Unfortunately, though having answers readily available can get you around the game board
ahead of everyone else, it can't necessarily produce the big win for a cooperative venture. I don't
know what research exists on this subject, if any, but it seems to me that living in "the age of
information" is vastly overrated. If you think it is good to have a library of information at your
fingertips then you may want to think about some desirable properties of information, for
example:
 Is it true?
 Is it accurate?
 Is it current?
 Is it useful?
A relatively recent development has been the creation of a new area of specialization around data
quality. This is not a technical post, though, so I leave it up to the reader to go as deep as
necessary. The point I would like to make is that the useful property trumps all the others.
The publicity that surrounded recent presidential elections in the U.S. shows us that
"information" need not be true or accurate or current in order to be useful. If nothing else, this
should cause us to pause and consider our thirst for information.
The bottom line is that a command of facts is useless unless those facts enable us to accomplish
something. This is the function of experience, education, training, practice... In truth, a person
who commands an encyclopedia of facts and is fluent in acronymese (see previous post) may or
may not be able to accomplish a goal. This is also why we tend to look for experience when we
need help.
The catch is that a person who has no experience, training, or applicable knowledge has no way
of recognizing experience that will be useful. This leads to insistence on things that may have
little or no bearing on eventual achievement. For example, when we need a new roof on our
house, we don't ask that the roofer have experience with a certain brand of shingle or a certain
kind of fastener. We don't specify how the work will be done. We simply insist on straight lines
and no leaks. It's inexplicably strange that we do not take the same approach to technology goals
including those relating to information. Read some recent position descriptions or postings on
Monster or Dice or CareerBuilder. They are chock full of brand names to the extent that the real
goal is obscured.
The proof of my point is the extent to which the information management situation is becoming
more complex, with dramatic increases in the amount of work and the specialization of that
work. These increases in turn produce increases in costs. It's hard to point to achievements today.
Micromanagement, based on a swarm of facts with no understanding produces a lot of activity
and few results.

The Problem With Quality


1. Meeting the spec[ification]
2. Documented adherence to established [process] norms
3. The product's effects are primarily positive (for example, it tastes good and doesn't make
me ill)
4. I'll know it when I see it
Multiple choice: which of the above (choose only one) is the definition you want applied to your
new car?
Does the answer change if we apply the definition to your morning cup of coffee?
Last question: which definition do you apply to the next example of "business intelligence" that
comes before you?
I have two points in mind. First, defining quality is not an exact science even within a specific
context. Second, #4 may be the deciding factor regardless of #s 1-3. In the end, the
consumer/customer merely has to say, "That's not what I was looking for" to relegate a product
to the trash heap. We all know that it does no good to say, "This is what you asked for" (meeting
the spec) or "I did it just like you told me" (followed established procedure) or "One won't hurt
you" (tastes good and not sick—yet).
So what is quality and especially, what is data quality? How we obtain data quality is completely
dependent on the answer to this question.
I'd like to suggest that we divide the question in order to produce at least one useful answer. If
we examine data quality from the perspective of a computer and its logic, we can come up with
an answer that will allow us to progress. The second perspective is obviously the
consumer/customer or the human perspective.
Recently I received an email with what at first glance seemed like an innocuous statement full of
typographic and/or spelling errors, but when I actually looked at it, it was a nearly perfect
illustration of a principle I have been talking about for years.
Teh hmuan mnid is capalbe of mankig sesne of almsot atynhnig taht ftis smoe bisac parmaters.
This is the principle that draws the line between computer logic and human "logic". It is also
what makes programmers (an outmoded term, I know, but best suited to the point I'm making) so
vitally important. There is only one role in the continuum of roles involved in producing an
information system product that must bear the full weight of responsibility for the integrity of the
data quality at the boundary between computer and human. That role is best thought of as
programmer.
Unless you earn your living as a programmer, some alarms may be going off. In fact, if you are a
programmer, some alarms should be going off. I earned a degree in computer science and made a
living as a programmer. From there I moved into data modeling, data administration and
database administration. Now I'm involved in data quality and governance. In all of that time, I
have never come into contact with any training, education, book or even a job description that
addressed my accountability for preserving data quality at the man/machine interface.
This may be a poor forum for this, but my intention is to change this situation right here. My
next few posts will present some background on how a programmer might live up to this
responsibility and some of the forces that will need to be fended off in order to make it a reality.
Next: Programmer as Data Quality Champion

Programmer as Data Quality Champion


The programmer is the one who takes all the wish lists and turns them into something that a
programmable logic device (computer) can execute to fulfill the wishes. Today, several
additional roles have attached themselves to this process. The architects, designers, modelers,
testers... all play an important part in the final product but it is important to remember that these
roles have motivations other than the ability of the product to satisfy wishes. At best they satisfy
a different set of wishes that have more to do with the process than the product.
In the not so long ago days when I started in the information systems industry, none of those
other roles even existed. It was all about programming.
In talking with a programmer, you will detect a hint of pride and superiority based on the sure
knowledge that none of "them" could produce a program that ran without error and produced a
useful result. Other than the "end user" or simply "user", there may be no one lower on the
respect totem pole than the "data" people. The programmer only needs to know what you want it
to "do"; data is just something that you move from one place to another.
In those bygone days before information technology there were organizations known as data
processing. I'm leaving out broad segments of programming known as systems programming
because at the operating system level, the data really is a commodity consisting of groups of
on/off bits known as bytes. In the very act of ignoring this segment of programming we stumble
over the origins of our problem. In early computer systems, there really was no data as we think
of data today.
A programmer could grant wishes by making a process that took large amounts of "data" values
from one file, combined them with large amounts of "data" values from other files and deposited
the resulting "data" values in a new file from which address labels or paychecks were printed.
The programmer's responsibility was simply to make sure the program didn't unexpectedly halt.
At first, they just told the users what the data values had to look like in order to ensure that the
program kept running. When the users proved incapable of guaranteeing the necessary
consistency, programmers took matters into their own hands and created scrubbing programs that
would for example guarantee that a file contained only values that looked like $nnnnnn.nn where
the value of n is from the set (0-9). Now everyone was happy until one day a big order came in
for $1,250,000.00 and was thrown out as erroneous. At the same time, someone figured out how
to divert the fractional round-off amounts into a private account.
I'm leaving out some reasoning steps in an effort to keep this to an essay length. If you get lost,
just drop me a note and I'll be happy to fill in any missing pieces.
Eventually it was realized that we don't have to store data in a form recognizable to humans—the
computer could be taught to present a data value in any format that a human might care to see.
This leap forward allowed programmers to distance themselves even more from the data. The
idea to take away from this is that programmers may not have the same concept of data that you
do.
When non-programmers talk about data, they are typically talking about instances rather than
types. To a non-programmer, "Walgreens" is an example of a piece of data as is "sea foam
green" and "$900 billion." To a programmer, these are all character strings or text values and
may be of three different subrange "types". The subrange (store, color, gross revenue) determines
how the value should be handled and the value may be acceptable if it fits the pattern defined for
the type.
Today, there are many opportunities to enforce patterns on data values and most of them require
no programming at all. The problem is that they all produce errors and error messages that the
typical user could not hope to comprehend. In effect, they cause the program to terminate
unexpectedly. So, despite all the advancements in technology, we are still scrubbing data files.
The alternative is for the programmer to think like a human instead of like a programmable
controller and the problem with this alternative is that it introduces orders of magnitude increases
(x10, x100...) in complexity and corresponding increases in development costs.
So how can programmers become champions of data quality? One relatively simple way would
be to avoid accepting text values as program input. This tactic is a favorite because it defers
many decisions until a later time when "we know more" and the big problem here is that we
never have to go back and change it. An example here might be useful. Imagine that you are
programming a system that accepts input from nurses who are taking vital signs (temperature,
BP, pulse, respiration, height and weight) in a patient exam room. You take the usual shortcut
and implement all the fields on the screen as text.
Everybody is happy because the nurses don't ever have to go back and correct anything and the
program runs without apparent error. One day, though, a health insurance company decides to
reward its contractual clients by paying a slightly higher rate to those who document that they are
doing a consistent job of collecting vitals at every patient visit. Now we're asked to verify that
we do an acceptable job of collecting and recording vital signs. Since the values input to a screen
go directly to a database, we should have no problem. It is, in fact, no problem to count the
records for which there is or is not a value in those fields, however, when we attempt to
aggregate those values to show the range of values or the average value, our query fails. the
aggregation query must convert the text values in the pulse field to integers and the text values in
the temperature field to floating point (real) numbers in order to compute an average.
We finally discover that pulse contains some values like "100.4", "98.5", "SAME"... that cause
an error because they can't be converted to an integer value. When we look at this as a nurse or
physician, we can see that the mind ignores the labels on the screen and simply produces a
picture of the patient based on the values displayed. Our poor computer, though, is unable to
continue. The database architect could have made pulse an integer type and the DBMS would
have enforced that typing by not allowing these values to be stored in the database. Using a text
type allows the DBMS to accept any value for storage. The programmer could enforce a text
value that is guaranteed to convert to an integer or could enforce integer types directly but in
order to do so he or she must handle resulting errors in a way that is understood and accepted by
the nurses.
More often, though, the nurse managers show the incorrect data to the nurses and exhort them to
pay more attention. Do you believe the nurses will respond better to blame and exhortation or to
assistance from the program? Check out W. E. Deming's Red Bead Experiment to get your
answer.
The programmer champion will be suspicious of a discrete valued field whose data type is text.
A value that may be used in a computation or any other operation where a conversion must be
done must be investigated carefully. Any value that may be used as a tag for identifying rolled-
up aggregations, such as store name, must get additional attention if we don't want to see
quarterly sales for "Walgreens" and "Walgreen's" and "Wlagreens". The time to catch and repair
these data quality errors is the very first time they are captured by a computer program. That
makes the programmer responsible. Other roles have a duty to identify situations where these
problems might arise, but only the programmer is positioned to do anything about it.
I realize this is asking a lot. A programmer is only human and can't be expected to know
everything (right?). This suggests another way in which the programmer can become a
champion. Since it isn't possible for one person to know everything that must be known (hard
though that may be to swallow), the programmer must develop enthusiasm for consultation and
collaboration. Every role in your environment was created for a reason and each has its own
goals and responsibilities. The programmer is accustomed to the data people coming with
requests. The requests are nearly always framed in terms of something that the programmer
should do to make the [modeler's, architect's, steward's...] life easier and improve overall quality.
It's easy to understand how this can get old in a hurry. The solution is for the programmer(s) to
sit down with these other roles and get everyone's needs on the table. All of the other roles
mentioned have a different view of data than you do and here's the thing—their view is much
closer to that of the customer/user than yours is. You need each other.
Accept that you are a key member of a team and as such the team can't succeed without your
commitment. The flip side is that you will not be able to enjoy the success you dream of without
the commitment, skills and knowledge of the rest of the team. Be a Data Quality Champion—it's
within your grasp.
Next we'll take a look at some forces that act to keep the team from being all they could be. Stay
tuned for Disturbances in the Force.

Disturbances in the Force


So, if we know how to make things better in terms of data quality and we're motivated to do so,
what's stopping us? A word of caution; what you're about to read may be harmful to your health.
Maybe you're old enough to have lived through the Watergate fiasco and can remember the facts
coming to light one by one in the press until they eventually began to make a complete picture.
Maybe you remember the Hollywood version, All The President's Men, in which the whole
picture was produced in two+ hours rather than months, or maybe the whole thing is in the same
category as the Crimean War for you and is nothing more than a question on a pop quiz in one of
your least favorite subjects.
I'd like to suggest for your consideration that if we want to track down why we are having such a
difficult time accomplishing something that we all claim to want, we need look no further than
the paragraph above to get all the answers we need.
First, let's imagine that data quality is like truth in government. It's a good thing and we would
like to assume that we have it. If, in fact we do not have truth in government (or data quality),
who benefits? The answer is that it is in the interests of those who believe they can/will be
blamed for the status quo to cover up the problems and subvert efforts to get at the facts that can
provide the complete picture. This is especially true if they are responsible for the problems.
Even if the only identifiable responsibility is that the person is the supervisor or manager of the
function(s) that owns the troubled processes, they still may elect to resist and subvert in order to
avoid becoming responsible for the fix.
If we want to avoid this situation, we should absolutely avoid any questions that seem directed at
why or who or even how. We should avoid to the extent possible, any investigation into the past.
Try to keep all discussions focused on process-based causes that might be producing the effects
you are seeing. Do not zoom in on isolated instances but look for trends. Remember, your goal is
not prosecution but consistent quality.
In the words of Bob Woodward's source, Deep Throat, "Follow the money." The programmer-
champion will struggle against this repeatedly. There is a perception that implementing integrity
checking at the point of input represents added cost. Like any other complex process, system
development should seek to minimize total cost of ownership rather than any single cost line
item. If it takes an extra day of programmer time to ensure that we get 99.99% consistency of
integrity in the database and thereby avoid dedicating multiple full-time staff to data clean up,
this is a net cost reduction.
Our system design and project management processes may not be mature enough to assign dollar
values to this, but it should be easy to determine how much money we are spending on fixing
poor quality data every month (or year) and then amending the design and development
processes to devote a fraction of that amount to prevention.
The final perspective to be extracted from our example is that a short attention span provides
little hope of even recognizing that a problem exists let alone understanding it enough to develop
a mitigation strategy. Data quality (and truth in government) requires that everyone be involved.
People are capable of recognizing self-interest within the corporate interest and enough people
will be motivated to act that the ball will be kept moving, but the media of the late 1960's is not
the media of 2009. In the 60's there was an interest in the truth that perhaps doesn't exist today.
In your corporate environment, you may find it easier to maintain a constant pressure of
communication directed at a single theme. The widespread motivation will not be produced by a
single appeal surrounded by banners and fanfare and free cake. A communications campaign
must be designed for the long haul with continuous refreshing of the message.
You don't even have one percent of your employee workforce today who are ready to grapple
with the issue of data quality. You are going to have to break it down in multiple variations and
start with the concept of data itself. What is data? You'll be surprised at what you uncover when
you go out to talk to people about their data. Stay tuned for some samples.

What Is "Data" Anyway?


If you have any experience with phone support (on either end) you will recognize how easy it is
to get deep into a process before realizing that the other person is on a completely different path
than you are. RTFM (Read The Flippin’ Manual) often pops up as the answer to our
communication difficulties, but it clearly is not the answer or it would have been universally
embraced by now.
I happen to be a proponent of the theory that the answers aren't nearly as important as the
questions. As a teacher, I know that learning is happening when the pupil is asking questions—
particularly a series of related questions. This has become very important as I have attempted to
make headway on [data] governance and [data] quality.
My early attempts assumed that everyone knows what data is—and they do, in the same way
that a picture is worth 10,000 words. Each and every person you talk to knows what "data" is and
each has a different idea in mind. For most, it's a picture of the last set of values they looked at.
This might have been a spreadsheet, a graph, a collection of measurements... The key is that data
is a set of values. For some, "data" is a commodity. It is files, stripes on a disk, a percent of
capacity, a quantity of bytes measured in "mega-", "tera-" or "peta-". For still others, "data" is
represented by a schema, model, definition, or some other abstraction.
Given those varying perceptions or perspectives, is it any wonder that at some point in the quest
for "data" anything we find ourselves stuck in the quicksand of confusion. Even when all parties
have been saying the same things and have been agreeing on goals, there comes a point when
someone will say, "We're not going to do that." or "I don't see why that is necessary." or "But
that will change my work flow." This is frequently the point at which everything starts to
unravel.
So what I have learned, and what I offer to you now, is that the initial phase of any data initiative
must be a carefully constructed education process to insure that the quicksand moment never
happens. This must be thought of as risk management. Remember, too, that the really important
questions (and answers) initially are not the ones you're hearing. All of the different
constituencies are going to be much more comfortable exchanging information (or
misinformation) within the tribal group than with "outsiders."
The best way to head off this risk is to carefully choose allies from each constituent tribe and use
informal conversation about their pain points and the ways that data figures in the relief of that
pain. You will be setting these people up to be the "experts" within their respective tribes. Part of
this will be coaching them in how to respond to questions and discussion in which they don't feel
themselves to be on firm ground. They need ways to postpone a response until they've had a
chance to confer with other experts. This is easy to do by setting up a collaborative model.
Rule one of this model is that I never answer for someone else. Everyone can understand that a
situation involves yet another perspective and that it is necessary to involve someone from that
tribe in order to get a complete answer. The most common danger here is "We don't have time
for that." Everyone must understand that this is an absolute red flag event. It signals that we still
have not achieved a universal understanding of objective.
When a red flag event happens, it isn't the same as finding ourselves up to our necks in
quicksand. It just means that we need to engage in some risk mitigation. It's a sign saying
"Quicksand ahead." We will need to bring this person into the fold—usually through informal
and non-threatening discussion with at least one peer or trusted expert.
Your role, should you choose to accept it, is to be a non-judgmental, constant, committed, and
helpful presence that can be relied upon to be a neutral mediator and facilitator who feels like a
friend in any need. Your motives must be above reproach. You cannot count on and should not
hope for recognition. All around you will be better off for your presence.
If you are senior in the organization to this person, you should make sure that you are
appreciating their contribution but they will appreciate non-public affirmation since putting them
in a spotlight may negatively impact their ability to continue to function in the same way.
"What is data anyway?" is a question that requires many answers initially and one answer
eventually. Remember, though, that many people are really only interested in what they have to
do differently. "Data" may have no meaning whatsoever in their day-to-day responsibilities even
though they may be monitoring real-time run charts with instructions to take a specified action
when the line goes above or below a certain point. You can't possibly know where to start or
where to stop in defining data for them. That's why you need the tribal expert.
Don't seek "important." "Helpful" will take you much farther more quickly.

Business, Information and Technology


Are you in management? Do you have annual/quarterly goals? Will you be held accountable for
achieving those goals? Is the accountability expressed in bonus dollars? Is there any possibility
of a zero bonus?
Are you still with me?
How will the achievement of your goal(s) be measured? Please note here that "how" has two
dimensions: one is related to process and the other to a unit of measure. In all of my vast
personal experience, all of the attention has been focused on the unit of measure part (when there
has been any attention at all), and the process part has never even been part of the conversation.
Please understand that what follows is not intended to sling mud at any individual or
organization. My purpose is to clear the air so that we can talk about how we're really going to
achieve our goal(s).
I am going to generalize based on extensive, though anecdotal, experience. In other words, I
have not conducted a survey, scientific or otherwise, and cannot produce data to back up
anything I'm about to say here, so I'm leaving it up to you, the reader to determine whether it
feels like truth or not. Should you feel that this does not ring true, or should you wish to fault me
for not being more objective, I would ask that you produce a study or at least a body of
experience in support of your position.
Awards of bonus dollars tied to achievement are based solely on whether the holder of the
dollars wishes to give them away or not. There is rarely, if ever, any protocol defined for
defining metrics, units of measure or measurement process. You will go into an "annual
assessment" meeting with your boss and he or she may discuss your level of achievement in very
general terms before announcing the amount of your bonus or a recommendation for an
increased level of compensation.
Why is this important, you ask? Well, it is important because it's the way things are done.
Despite vigorous protests to the contrary, the business world is set up to run on subjective
assessment supporting subjective decisions. What, you say that your decisions are "data-driven"
(objective)? I would love to hear the story behind the data that was used to arrive at your most
recent decision.
All of this is background for understanding why "data" initiatives so frequently become mired in
a swamp of politics and personality. Let's walk back from a data-driven decision.
1. You are able to make the decision because you trust the data.
2. You trust the data because you are familiar with and trust its source.
3. You trust the source because you know that it is reliable.
4. You know that it is reliable because it consistently produces information that can be
relied upon.
5. The source has been consistent because it always uses a tried and true methodology (set
of processes) to produce its product.
6. The consistency is possible because the methodology includes steps designed to validate
the source's inputs.
7. The validation decision returns us to #1.
How do you feel about standards (e.g., standard operating procedure or SOP)? If you don't
currently support the creation, implementation auditable use of standards or, at any time in the
past have not done so, you have no right to and almost certainly do not have access to reliable
information and therefore no claim to data-driven decisions. Just to drive the point home, when
your boss decides that you won't be getting that bonus or increase you were counting on, your
only acceptable response is to smile and say thank you.
By the way, if you notice that your bank account (or budget) is suddenly much bigger (or
smaller) than it was yesterday, what is your responsibility? Who are you accountable to? How
much trust can you afford? Now you have some insight into compliance.
The use of technology introduces an additional huge portion of uncertainty into the trust
equation. Take another look at the decision walk-back above and note the points where the use of
technology means adding additional paths and complexity to the validation process. This is what
your data governance people are trying to get their arms around.
To summarize:
 data-driven or intelligence-driven decisions demand trust
 trust demands reliability
 reliability demands consistency
 consistency demands compliance
 compliance demands governance
OK, you can go back to work now.
Survival, Error & Technology
I'm going to pass on some wisdom here. It's not very often that we encounter wisdom today,
especially where technology is concerned, and it's often the case that we don't recognize or
acknowledge wisdom until we're looking back over the wreckage and trying to figure out what
we should have done. I'm probably also setting myself up by labeling this as wisdom but I am so
weary of seeing the same ads with different acronyms and talking to the same people with
different names.
You will never find your way out of the current mess you're in or about to be in by searching for
and hiring someone with recent experience on a specific product. To put it another way, a
specific product, no matter how much buzz it enjoys, is never the answer.
I will be among the first to acknowledge that the use of absolute language (never, always...) and
even the use of unqualified superlatives (best, worst, fastest...) is a habit to be avoided,
nevertheless, decades of experience have proven that the absolute statements in the preceding
paragraph represent wisdom and that failure to heed this wisdom will produce cost overruns,
timeline disasters, confusion, stress, employee turnover and a host of other undesirable
outcomes.
In large part, the success of the human race has been due to our ability to recognize exceptions
without necessarily understanding the rule. My own take on this is that, with today's reliance on
technology, we may have reached the point where the process of natural selection that has honed
this skill over countless generations has now produced a liability. "Something's different," is
enough to put us on guard and may be enough to launch a complex defensive reaction to
preserve the safety of the individual or group.
First of all, while it is still good to recognize exceptions, it is now absolutely essential (that's an
absolute absolute) that we develop the ability to recognize the underlying rule. A study of human
error (Human Error, Set Phasers on Stun...) shows that leaping to conclusions about the rule is
what produces the error condition. In fact, if we can't describe the rule in terms of the logic of the
computer (if... then... else...), we can't rely on technology at all.
You might ask, as I did, how we might acquire this ability. The time tested way is known as
[survivable] experience. There are a host of cause-and-effect analysis tools and techniques that
have the appearance of rigor and reliability and are an improvement over experience, especially
when combined with exhaustive testing, but you will find that even these are more productive
when used by people with experience in the world being analyzed.
Tools are great and another critical human enabler, but—and this can't be over-emphasized—no
tool is so advanced that it runs itself. Every tool, no matter how advanced the technology
requires human hands and a human mind to guide it. If you were to be presented with the
greatest woodworking tool in the world or the most advanced sewing machine or fishing gear or
computer, would you immediately become a master cabinetmaker or designer or fisherman or
software developer? You might note that the only immediate change is one of expectation.
An experienced person with rudimentary tools is more likely to produce a quality result than the
inexperienced person using the "best" tools. The fact that I used a tool just yesterday says
nothing whatsoever about the level of my experience in producing the required outcome. I have
made the mistake of looking for help and focusing too narrowly on what amounts to recent
experience with the tools in my shop. I have learned (through experience) that I will enjoy better
results if I'm learning while interviewing my prospective employee. If I'm talking with someone
whose knowledge stops at the tool's user interface, then I had better be prepared to devote myself
to directing the employee's work. If I have a staff composed of such employees, then I need to
possess all of the requisite experience myself or else be prepared to conduct a project whose
principle product is more experienced workers.
The challenge is to find the right mix of experienced people in supervisory or team lead roles and
people who possess dexterity but are in need of experience. If I'm in a director or management
role, I have to have experience producing a product with that scope. A technology "system" has a
complexity that is beyond human comprehension. The only way to design and build it is through
a process of identifying smaller and simpler pieces, building those and then assembling them into
the final product. You need to look for people who have an appreciation for the amount of effort
this takes and the discipline—both personal and organizational—that it takes.
Stop looking for Oracle or CRM or Rational or even "use case" or "data model" experience
except as clues about the approach that the candidate might be expected to take. I understand that
these things are ideal as targets for a logic rule processor, but the rule ("find resumes that include
these terms") is so simple-minded as to be useless. If your only goal is to turn 1000 resumes into
100, then proceed, but if your goal is to find someone who can get you out of the predicament
you're in, then you should spend more time on your rules so that the exceptions are more
productive.

Principles of Data Governance


Malcolm Chisolm, in a recent column in Information Management (A Principles-Based
Approach to Data Governance) raises an excellent point. In 2006, when I attended my first Data
Governance Conference, there was much discussion around a definition of DG. Implicit in this
discussion was the need for something that was concise, yet comprehensive, and on top of that,
engaging. The idea was that this definition could be used:
 As part of a sales pitch (like a slogan)
 To create synergy within the emerging discipline
 To provide focus for any ongoing methodology efforts
Some present may have had additional motivations, but I think these were the ones on most
people’s minds.
The definition that emerged was acknowledged to be a work in progress. By the 2008
Conference, one of the tutorials quoted three definitions:
1. Data Governance refers to the organizational bodies, rules, decision rights and
accountabilities of people and information systems as they perform information-related
processes.
2. Data Governance is the practice of organizing and implementing policies, procedures and
standards for the effective use of an organization’s structured/ unstructured information
assets.
3. Data Governance: The execution and enforcement of authority over the management of
data assets and the performance of data functions.
These were troublesome to me then, probably for the very reason that Malcolm mentions. All
seem to acknowledge a context based on an organization’s information assets, but their focus
seems to be quite different. The feeling I have is that they are advocating a judicial, legislative
and executive approach to governance.
In the U.S., a Constitution lays out these three perspectives and establishes the mechanics
(framework, architecture) within which governance will be administered. Within the Constitution
and before any of the mechanical parts are discussed, in fact, within the preamble, first principles
are asserted. The writers tell us that what follows will be a system of governance for the purpose
of
 Forming a more perfect union
 Establishing justice
 Insuring domestic tranquility
 Providing for the common defense
 Promoting the general welfare
 Securing the blessings of liberty to ourselves and our posterity
While this model would probably work on its own in establishing [data] governance, there are
just a couple of nuances that will have to be accommodated because our system will not be
working in a representative democracy but in a corporation.
Within the context of our system, leaders are appointed and serve at the pleasure of stockholders
rather than the public. The principle of one-man-one-vote does not apply. One person may
control sufficient votes to dictate to the Board of Directors. Within the day to day operations, the
ability to dictate policy, direct activities and appoint deputies is granted at multiple levels, though
always subject to the pleasure of the higher levels.
Having now established a context, it’s time to agree on some first principles for data governance.
The candidates are:
 The entire corporation must agree to be subject to the system.
 While those placed higher may still, at their pleasure, appoint and dismiss deputies, they
must agree that [data] governance operations will be a factor in those actions.
 It must be understood that within the corporation, domestic tranquility, the common
defense and the general welfare are all dependent upon the information assets owned and
managed by the corporation.
 When the system is followed, all processes will flow smoothly, problems are addressed at
the process level, personal antipathies are secondary to process execution and process
anomalies such as unplanned rework and delays are greatly reduced in number or even
eliminated completely.
 Consistency is everyone’s goal.
 In a work context, surprises are almost always seen as negatives. Our goal must be to
improve the consistency of our processes and their outputs such that surprises become
exceedingly rare (six sigma has been suggested as a goal) and predictability becomes
commonplace.
These principles should be the touchstone(s) of our efforts. Everything we do should be
evaluated on the degree to which these principles are addressed.
I will suggest that these may also be the principles of the corporate Quality Assurance effort and
remind everyone that they are also the basis of Deming’s 14 points as well as other quality
improvement methods. No improvement is possible without first establishing a stable
(consistent) process.
I leave you with one final principle: Data governance will not be implemented as a stand-alone
initiative. If we cannot see data governance as part of a larger, comprehensive system of
governance, we will not be able to address any of the three principles suggested above.

Christmas Wish List


The 13+ years I have spent in Healthcare have sensitized me to some things. I might have
preferred to remain ignorant of many of them. In the spirit of Christmas, which is handy at this
time of year, I'd like to nominate a list of gifts that would bless all residents and citizens of the
U.S.A., regardless of theology or philosophy. Each item in the list relates to health care.
1. I wish that the role of technology could be clearly understood. There are a vast number of
well-funded voices who want us to think that technology is health care or that health care
is technology. In reality, technology is best thought of as a tool—an inert and often
expensive piece of equipment, which, in the right hands, can produce wonderful results.
2. "The Patient" or "our patients" is not the same as "my patient" or "Josie Jones, patient". It
may not be possible to apply technology designed for delivering care to a generic patient
to Ms Jones. That doesn't mean that the technology is bad. It only means that the
technology must allow for deviation in procedure. I wish that the role of abstraction is
system design could be clearly understood.
3. I wish that all of the factions in the healthcare struggles were clear about their goals—
with themselves and with each other. Only by being self-aware, open and open-minded
can the parties negotiate a solution advantageous to all. Doctors, nurses, administrators,
vendors, government and patients are currently at odds. The friction is not only between
factions but within factions. Who will speak for physicians? The A.M.A.? Mayo Clinic?
Who? Who speaks for government, for vendors, for patients, for nurses, for
administrators? Each of these groups functions like a mob—surging to and fro as a strong
voice emerges and then is drowned out. Each group must organize itself before
"healthcare" can be organized.
4. Though I recognize that individuals and groups may be driven by ego to appear more
knowledgeable than the next, I most devoutly wish that each of us might recognize that
the person across the table might actually have some knowledge that we don't. I wish that
we would listen first and assert only when necessary. I wish that we could see ourselves
as occupying the same driverless bus.
There are many more things I might wish for this year but I don't want to seem greedy. May you
each be showered with blessing upon blessing as one of God's beloved and may we fully
appreciate each blessing as it comes.
The Theory of Everything
The US economy, so far as the majority of citizens is concerned, is in the toilet and swirling
rapidly in a clockwise direction. Healthcare, long in its own toilet, has at least stopped swirling
momentarily. All of us have a stake in what happens in those toilets.
I have a stake in another toilet as well. The portion of the economy devoted to technology has
been caught in a vortex since the dot com implosion. I realize it will do no good to link all of
these since linkage simply makes the resulting mess seem even more impervious to any
corrective action.
However, (deep breath) if we don't consider the nature of the connection between them, we have
very little chance of making sustainable progress in any of them. So, as my contribution to
posterity, I nominate the ascent of appearance to the pinnacle of importance in decision making
as the criterion most likely to be acknowledged as the root cause of all three problems.
Since 1950 the rate at which appearances have displaced substance as the motivation for
decisions in the US has increased at a dizzying rate. In the past two years I have seen the nation’s
physicians, as represented by a blue ribbon panel from the Mayo Clinic, state that the answer to
the nation's healthcare woes is better access to insurance. The calling that was Medicine has
emerged as a new entitlement program for the elite.
In the economy, fiduciary responsibility has been replaced by revenue numbers as the force that
justifies all decisions.
In the Technology world, the means have come to justify the ends. Any decision can be justified
if it allows me to position myself as a front-runner, new, hip, cool. "There's an app for that"
allows us to spend unjustifiable amounts of money just to have that app in our pocket. Similarly,
corporations spend unconscionable sums on technology projects for which the need is poorly
understood. Because the "solution" has to be new to give the proper appearance of tech
supremacy, the outcome is always in doubt. Risk isn't so much managed as PR-ed. Spin control
is the name of the game.
In 30 years of working with technology I have learned one lesson that transcends all others:
Either control your technology or it will control you.
In assessing the meaning of this for your own situation it is well to remember that
 Technology demands consistency
 Humans and human organizations are incapable of consistency
Just a few recommendations
 Be clear about WHY you want to do something
 Insist that others are clear about WHY they want to do something
 Choose a path that is known to produce the desired result—or at least choose next steps
that are known to produce appropriate results.
 If you are unable or unwilling to do the above, choose another line of work if possible
and stop complaining if not.
We are all either a part of the solution or a part of the problem. In either event, complaining
about what someone else is doing or not doing will produce absolutely no change.
If you can't tell the difference between substantive value and the appearance of value you should
avoid positions in which you will be called upon to make decisions. If you can tell the difference,
then for all our sakes, make the decision and don't give it to someone else.

Data Quality: Getting Started


If you happen to be a "mover and shaker" or if you aspire to that role, then you'll be looking for
access to one or more key decision makers of influencers closest to the top of your organization.
You'll be determined to convince those people that
1. investing in data quality is a sound business decision
2. you are the right person to produce the ROI that they'll be looking for
If, however, you simply want to make things better as soon as possible and create new friends
and allies while doing so, then you may want to take a different approach.
My recommendation is to use the tactics of the Special Forces. The "Green Berets" were formed
into small teams comprising skills critical to the people they were trying to help. They then went
out to those people and lived with them. Doing this allowed them to gain credibility and to learn
what kinds of changes might (or might not) be acceptable.
The Green Berets helped the people with their work and, while doing so, offered
improvements—small changes that produced higher productivity or more consistent results. The
goal was to create allies.
"Data Quality" represents exactly the same kind of productivity and/or consistency improvement
for our "indigenous" people in whatever part of the company they may serve. A DQ Team may
be as small as one member and can produce results that are shocking in their scope and value as
well as in their lack of cost. It isn't necessary to spend long periods of time "living" with the
people. In fact, one lunch or a couple of coffee breaks will do IF you
1. ask the right questions
2. listen carefully to the responses
3. offer support
4. follow through
You'll ask about what happens when they get incomplete or incorrect forms (data is usually
thought of collectively as a form) from their internal "customer". You'll ask about the extra work
they have to do in such situations. Be prepared for an emotional response, this is what causes
them to miss deadlines, work overtime, add staff... Also be prepared to hear that they simply pass
the problems on because they aren't staffed to deal with them and don't feel accountable for
fixing problems they didn't create.
Listening will uncover the sources of the most frequent or egregious DQ errors. Now you can
mention that you are about to begin a project with those dirty so-and-so's, that it's likely they
don't even know the hardships they're creating, and that you'll be happy to mediate a discussion
amongst the parties to try to find a resolution. You may already have some ideas.
Create the meeting, making sure that ALL parties are represented (you were listening carefully,
right?) and facilitate the discussion, if necessary gently guiding the discussion but never offering
solutions. When the solution is "discovered", the people will own it and will implement it with
minimal assistance from you. If your assistance is required, make certain that you deliver and do
not hold them up.
Follow up by monitoring, coaching, facilitating and then ask if they'd like some help in
publicizing their success. Because you know important people, they'll almost certainly jump at
this opportunity. You give them all the credit and they'll make sure to let everyone know that
your help was both timely and critical.
This approach works and can even result in regular meetings to follow the improvement and to
look for new opportunities.
Two approaches—you choose which one has the highest probability of success for the greatest
number of people in the shortest time at the lowest cost.

What Do You Do When Things Aren't Working The Way You'd Hoped?
Let's pick a context first because a) this problem is pervasive in the world I live in (how about
you?) and b) the context will determine our course of action. I'll use my own life as an example.
I have spent my life seeking to understand my environment so that I could have a chance of
staying out of hot water by being able to predict outcomes. I actually got pretty good at the
predicting part but was never able to translate that into the staying out of hot water part. It turns
out that when you see a result coming that is unwelcome to everyone, hot water is the least of
your worries.
Of course I could have kept quiet and just let things happen but the problem with that is that
almost invariably a minor course change would have prevented the outcome. It always seemed
reasonable to attempt that minor change. Just as invariably there were political implications
involved in any changes to the published plan. Bottom line: my career is littered with "you were
right's" that came three years after I moved on.
So. if you would learn anything from my example, maybe it would be that "being right" carries
no value. Maybe it's that you should just keep your head down and wait for the seniority
promotion of for retirement. Maybe the lesson is that you do what you can and the rest belongs
to someone else.
I will say that over time I have achieved objectives that others considered "impossible" because I
was willing to take risks. The problem there of course if that if the objective was considered
unachievable then no one is prepared when the find themselves standing inside the walls.
I think that this is also the story of data management (to include what has come to be known as
data governance). Organizations have been talking about data management for nearly thirty years
now and there are hundreds if not thousands of experts who will tell you exactly what you should
to enjoy the benefits of good data management practice. What none of them will tell you because
a) you don't want to hear it and b) you wouldn't hire them is that there is no proven
methodology—no set of practices and tools, skills and technology—that will guarantee results.
Why should this be? You would think that in 30 years someone would have stumbled across
something that will deliver predictable results. The answer lies in the subject matter. "Data" is a
concept understood by everyone. Everyone in the boardroom has their favorite data. The issue at
the root of all problems is that "everyone" is seeing data "as through a glass, darkly."
The inability to communicate about data and reach a consensus is what is keeping us from our
objective. To this add the cult of personality that defines the management—let's call it
governance—of the corporation. The decision makers understand nothing of the underwater
portion of the data iceberg, seeing only the table, graph or dashboard that's in front of them.
What must be managed is the abstraction that is data and not the values that are only the visible
portion. When we try to do anything with the abstract, we find that there are side effects on the
visible portion that cause VIP personalities to convulsively respond in exactly the least useful
way.
You can get useful results if your objective is modest. For example, it is possible to get two
business functions who are exchanging data or three or more that have a symbiotic relationship
based on data to take consensus action to stop what is often a great deal of daily pain. The
intractability is encountered when we attempt to broaden the scope to cross departmental or
divisional boundaries. The goals and methods of data management are counterintuitive to those
raised in the power politics of corporate "success."
We usually find ourselves managing data as a commodity, "how much", "how many", "what is
the cost", "who produces", "who consumes", "spoilage rate", "how fast"... While these all have
an attraction in that the answers can be easily captured in one of those tables, graphs, dashboards,
none deal with the underlying problem of managing the abstraction. Data is the most complex
thing that a corporation attempts to manage. It is more complex even than money.
The pity is that we treat data as if putting it into a "piggy bank" solves all our problems. You
heard it here first:
 Technology is no answer—technology can help us sort different kinds of values into
different piggy banks, no more.
 Technical skills (modeling, DBA, quality...) are no answer. The cashier makes use of
such skills to keep his/her drawer in order and reconciled.
 People skills by themselves can't achieve any result except perhaps building meaningless
consensus.
This is enough clues. If you call, don't bother to tell me what DBMS or CRP system or BI tools
you're using. None of those things are of interest until the final stages of a solution. I don't expect
any calls because too much credibility is wrapped up in the current initiative—whatever it is.
When it fails to produce results, a new personality will step in and you'll start the cycle anew.
Someone, someday may actually be willing to take a risk to stop the pain. I'll be retired or
deceased by that time but maybe you'll have learned from this what you should be searching for.

Principle Before Practice


Before I begin, a caution. Do NOT think that I am taking a negative position re: data governance.
On the contrary, I firmly believe that the concept is essential. But what is the concept?
As I watch data governance related position descriptions parade by on DICE.com,
MONSTER.com, etc., I am struck by the focus on the practice of data governance. They are all
about tools and skills and methods and all either assume that everyone knows what the goals are
or (worse yet) that goals for this company are the same as those for every other company.
I have learned at least one thing in my 27 year adventure in all things data-related and that is that
doing something, no matter how efficiently or effectively, is very often a wasted effort without
some forethought about the principles that we are attempting to implement.
Principles provide the glue that links all of our efforts together and the medium that allows us to
be productive even when our last effort was a failure.
As with so many initiatives around data, this wave (governance) has crested and achieved the
status of "best practice" without ever achieving measurable ROI. We have seen smaller-scale
successes—enough to keep hopes high—but not, to date, the enterprise wide success that was
used to sell the initiative in the first place.
But maybe I'm getting ahead of myself. What is the principle that data governance represents?
Some candidates are:
 Make life easier for DBA's by reducing the rate of database schema changes
 Make life easier for developers (programmers) by making requirements less ambiguous
 Reduce cost due to rework by making the rules for data quality and completeness more
accessible at the point of capture
 Make the data warehouse more useful by applying the same rules to data everywhere it is
captured.
 Reduce the number of sales lost because we can't keep track of our customers
 Establish process consistency so that we can begin improvement efforts
 Stop those sobs from delivering junk and expecting me to fix it
 Define and establish (standard) processes to reduce variability in output quality
 your favorite here
Which one(s) do you like? I believe the answer is all of the above and then some. What is the
principle that unifies all of these goals? I discussed this in a broader context in a prior post. My
point though is that if we don't have a firm and commonly held idea of the principle(s) we are
attempting to implement, then no matter what we do, which tools we use, or how skilled the
workers are, we aren't going to accomplish anything meaningful. To put it another way, "If you
don't know where you're going, any route will do."

Data Governance Is...


Recently someone on a LinkedIn discussion forum asked for a definition of data governance
because he had yet to find one that was universally accepted. Now he has about 30 definitions
and all of them are "personal" in the sense that they apparently work for the person who
responded and/or that person's company.
In fact the lessons of the past 10 years or so produce the conviction that data governance is
anything, everything and nothing.
It is anything we need it to be that serves our immediate purpose.
It is everything in that it spans all corporate functions in order to produce the needed results.
It is nothing because we always have to define the term whenever it is used and no two
definitions are the same.
My own input to the discussion was that
Data governance is that part of corporate governance that is concerned with insuring the
integrity of the corporate information resource.
The only useful application for this definition is in establishing a context for any initiatives and
establishing responsibility or accountability. This definition cannot be used as a strategy or a
vision to drive results. It doesn't suggest any metrics. It doesn't help us to isolate key processes
nor does it suggest any standards.
Other definitions you may have seen involve "decision rights". The problem with "decision
rights" is that there are always individuals who will pop up once one of these rightful decisions is
made and insist that the decision in this particular case was rightfully theirs. Quite often this
individual will make a good case and the "rightful" decision will be overturned—often at
considerable cost. When this happens, it calls into question the makeup of the existing decision-
making bodies and can cast a long shadow over the entire concept.
My definition contains a problem in that it invokes another poorly defined concept—that of
corporate governance. As I have discussed in previous posts, corporate governance in any form
that would support the needs of those desperate for data governance is as rare as a polar bear in
west Texas. It is the most challenging, most demanding and most thankless job imaginable to
create a system of governance in the midst of a feudal culture.
This is not to say that results—even valuable and far-reaching results—can't be obtained. Such
results are possible for those who are dedicated, courageous, knowledgeable and visionary. If
one can keep the vision of data governance as part of a corporate culture and pursue integrity for
all information but do it one relationship, one entity—even one attribute—at a time, then real
progress can be made.
I have often wondered why we think we need a definition for data governance when it is so
obviously subjective. Of course data integrity and data quality and even data itself are equally
subjective. None can be discussed without first offering a definition ("What do you mean by
that?") and we don't have definitions that we can quote that are meaningful to "this" audience. In
fact, we are given to definitions that are nearly meaningless even to our colleagues and serve
only to get us all in the same ballpark.
It is possible to take any result and call it data governance (who could argue?). It is possible to
take any corporate initiative and use it to promote data governance. Why don't we simply get
busy and spend our time discussing results instead of definitions? Show me the data that
demonstrates you have brought processes into control for some subset of your company. Let's
talk about which processes are most critical and which represent the greatest opportunity. Let's
get moving.
Leadership, Management, Governance
It is way too easy to become confused where these three functions are concerned. We like to
think of ourselves as leaders when what actually consumes our time and attention is hitting
deadlines and deliverables. Leaders emerge when change is in the air.
When the marketplace is shifting; when the economy is deteriorating and taking our profitability
down with it; when we're no longer able to keep up—that's when leadership is required and
leaders step forward. Anyone can steer a straight course through calm seas with good charts. It
takes a master to sense the environmental factors, inspire confidence among the crew and make
the continuous changes required to keep the ship from breaking up or running aground until
things settle down and we get back into familiar waters. We risk disaster when leadership isn't
acknowledged and permitted to assume control. When the crisis is over, we often find that the
leader has little or no interest in the day-to-day operations of work rosters, schedules,
performance reviews... The leader may make a very poor manager.
The manager is the master of routine. She is the one who keeps the machinery humming and the
product going out the door. He makes sure that time boxes are hit, that deliverables are delivered
and that budgets are created and followed. When the winds of change begin to blow, the manager
who fails to recognize the need for leadership or believes in error that he can handle the
leadership role can cause massive and sometimes irrecoverable damage before she agrees to
[temporarily] relinquish the helm. Exceptional management defines the team, builds the team
and keeps the team vision alive. This management is essential.
Governance is ubiquitous and invisible. Governance is observing, analyzing, formalizing,
monitoring, measuring, improving. Governance establishes the standards to which managers
hold themselves, each other and their teams accountable. It should be readily apparent that
governance is every bit as necessary within a high-performance organization as is management.
An exceptional governance function is a combination of historian, engineer and seer. A liberal
dose of management is required to insure that governance doesn't degenerate into approval by the
de facto expert.
The six sigma governance function will have incorporated leadership into the system of
standards. A set of standards and standard processes may be of little use to the leader in the midst
of the storm but may have helped to prepare that leader to be able to step forward.
Leadership for change, management for effectiveness, governance for stability is the tag line
of Michael Meier Data Management. Data Management will be a microcosm of the enterprise as
a whole. Every point of view (perspective) within the enterprise will be represented in Data
Management. Virtually every process within the enterprise will be examined by Data
Management in an attempt to "get it right." The risk associated with unreliable information (data)
can only be assessed in light of the process(es)—and the personalities— involved. Data
Management is not governance but must include governance as an essential component. As its
name implies, management is its bread and butter. Because it is frequently considered the
homely step child, however, the availability of leadership may well be the key to its success.
Whose Job Is It, Anyway?
There are several reasons why you are reading this blog post. Leaving out all the self-
aggrandizing ones, let's focus on those who actually have the title question in their minds. You
may have been drawn here by an interest in BI (business intelligence), the latest name for
"reporting." It is likely that you have had some bad experiences involving reports or dashboards,
or mash-ups or some other information display/access effort.
You spent what seemed like way too much time getting to an understanding of
 how this this was to be used
 the kind(s) of content that would be useful/acceptable
 how the information should be arranged/displayed
If you are part of a really accomplished organization, you may also have had seemingly endless
discussions concerning
 How "bad" data would be recognized
 How "bad" data would be handled
 Remediation or cleansing processes to reduce the incidence of "bad" data
And, finally, if your organization is in the six-sigma population
 What is "bad" or poor quality data?
 Where does it come from?
 What does it cost?
 Where should we devote our efforts?
 What kinds of efforts offer the greatest ROI?
If you follow the various discussion forums concerning data quality, you will find one question
popping up with regularity: "Who is responsible for Data Quality?" I asked myself why the
question is asked. What prompts the question? It seems not to matter whether the organization
has a reputation for quality, nor whether it has a history with data quality, nor whether the
questioner is experienced or inexperienced, executive, manager, or front-line production. Why?
Having talked with some of these folks and researched the situations of several others and then
simply meditated on this over an extended period of time, I have come up with a few likely
scenarios.
 The questioner knows the answer but either wants validation or a sufficient number of the
"right" answer from people who are likely to be respected.
 The questioner has encountered roadblocks from unexpected directions and is dealing
with surprise and disappointment.
 The questioner is curious about what others are doing.
What I have NOT seen is any evidence that the questioner is sincerely trying to determine how
best to attack the problem of poor data quality. It would be easy to assume from these same
discussion forums and conversations that nearly everyone knows what they're doing and has
either solved the problem or is well on their way to a solution. When we learn enough about
human nature we understand that these people are all feeding their egos (or rather that their egos
are feeding themselves since much of this is unconscious) and that they are taking an incidence
of limited or small-scale success and projecting it into an eventual enterprise level solution. I
have done this myself.
I know this—that data quality, or any other variety of quality for that matter, will not bend to
advanced degrees, nor to mastery of data and information design concepts, nor to any product or
set of products, nor to any effort by marketers, nor even to the best-designed procedures,
methods, governance structures, architectures...
The ONLY way to data quality lies through the hearts and minds of each and every person in
your organization. Each of them has the power to subvert any plan, procedure or method; to
render ineffective any tool or product; and to humble the greatest of egos.
The bottom line is that it must be everyone's job but of course that isn't a satisfying answer
because the very short list of things that everyone believes are important includes things like
breathing, eating, elimination of wastes, procreation, maybe community, relationship,
acknowledgement... This list of universally agreed-upon important things will never
spontaneously include data quality. In fact, if a poll were conducted in the boardroom, it is
unlikely that data quality would appear on the list of things important to the company.
Don't misunderstand—the quality, security and accessibility of your data assets is at least as
important to the continued health and well-being of your company as that of your capital assets.
The problem lies in the fact that data isn't real and tangible. If bad data smelled bad or rusted or
developed crumbling holes, or if it resigned and went elsewhere where it was more appreciated
or was subjected to audits by outside entities, or showed up on a P&L or balance sheet where it
was reviewed by prospective data contributors—THEN it might get some attention.
It is true that anyone can produce an example of data but virtually no one—even those
responsible for collecting and storing its instances—will understand that data is something other
than what they are holding or pointing to or storing. But I digress.
If we can't accept the answer that data quality is everyone's job then we need to move on to
identify the person or corporate function who will be accountable for the quality of the
company's data. It's not possible here to put a name to this accountable party. What we can do,
though is itemize some of the skills and abilities required to help point the way to your unique
name.
First and most important, let's agree that what we are talking about here is cultural change and
cultural change, more than any other kind of change, requires leadership. Already we see that a
corporate function can never be accountable although you can tape a job title on the person's
door when you identify him/her. This leader will be able to move freely across the company and
will be able to give everyone the feeling that they have been heard. This DQL (data quality
leader) will be conversant with principles and practices of quality improvement. The DQL must
be completely comfortable with the nature of data and will not be confused by the display of
samples. Attributes of the data asset as a whole will be the focus of all of the DQL's efforts. S/he
may well choose to shine the spotlight on a segment of the population and may delegate someone
more familiar with that segment to assume the leadership of this effort. That surrogate DQL will
also deal only with population attributes (metrics).
The DQL will never fall into the trap of confusing examples with anything else. An example
may be representative or it may be an anomaly. Only the population metrics allow us to tell the
difference.
Further attempts at guiding your choice may be counter-productive if you begin to feel
manipulated or otherwise used.
A final caution concerns those characteristics that will render a person unsuitable. Just as you
might wonder about a carpenter who feels compelled to talk interminably about his hammer or
his saw, the person who leads with the name of a tool, tool vendor, methodology, author, book,
etc., is unlikely to be the one you're looking for. The last thing that the leader/agent of cultural
change needs is to divert any attention away from the primary focus. Products and tools may be
useful for producing the population metrics discussed above, but beyond that should be well in
the background and completely invisible to the majority of those you are attempting to influence.
Those who confuse files or [mega/tera]bytes with data are likewise unsuitable as are those who
confuse a spreadsheet, chart or report with data.
I hope I haven't made it seem like an impossibility. Talk with people about this and over time
you'll begin to get a feel for what to look for and what to avoid.

Bibliography
Alliance, A. (2013). Agile Alliance::Home. Retrieved May 31, 2013, from Agile Alliance:
http://www.agilealliance.org/
Bernstein, A. J., & Rosen, S. C. (1989). Dinosaur Brains: Dealing with All Those Impossible
People at Work . New York: Ballantine Books.
Casey, S. (1998). Set Phasers on Stun: And Other True Tales of Design, Technology, and Human
Error . Santa Barbara: Aegean Publishing Company.
Chen, P. P. (1976). The Entity-Relationship Model—Toward a Unified View of Data. ACM
Transactions on Database Systems, 9–36.
CITEC. (2012, 11 14). Deming's Red Bead Experiment. Retrieved 06 10, 2013, from YuoTube:
http://www.youtube.com/watch?v=R3ewHrpqclA
Computer Aided Software Engineering. (2013, 05 25). Retrieved 06 14, 2013, from Wikipedia:
http://en.wikipedia.org/wiki/Computer-aided_software_engineering
Crosby, P. (1979). Quality Is Free: The Art of Making Quality Certain . New York: McGraw-
Hill.
Crosby, P. (1984). Quality Without Tears: The Art of Hassle-Free Management. New York:
McGraw-Hill.
Database Normalization. (2013, 07 15). Retrieved 07 18, 2013, from Wikipedia:
http://en.wikipedia.org/wiki/Database_normalization
Deming, W. E. (1982). Out of the Crisis. Boston: MIT Press.
Dijkstra, E. (1982). Selected Writings on Computing: A Personal Perspective, Texts and
Monographs in Computer Science. Springer-Verlag.
English, L. P. (1999). Improving Data Warehouse and Business Information Quality: Methods
for Reducing Costs and Increasing Profits. Hoboken: John Wiley & Sons.
Gilovich, T. (1993). How We Know What Isn't So: The Fallibility of Human Reason in Everyday
Life. New York: The Free Press.
Helbig, Steinwender, Graf, & Kiefer. (2010). Action observation can prime visual object
recognition. Springer Open Choice: Experimental Brain Research, 251-258.
How Long To Form A Habit? (2009, 09 21). Retrieved 05 10, 2013, from PsyBlog:
http://www.spring.org.uk/2009/09/how-long-to-form-a-habit.php
Maslow, A. (1943). A Theory of Human Motivation. Psychological Review, 370-396.
McGilvray, D. ( 2008). Executing Data Quality Projects: Ten Steps to Quality Data and Trusted
Information. Burlington, MA: MorganKaufmann.
Nemoto, M. (1987). Total Quality Control for Management: Strategies and Techniques from
Toyota and Toyoda Gosei. Englewood Cliffs, NJ: Prentice Hall, Inc.
NIST. (n.d.). Baldrige Performance Excellence Program. Retrieved May 8, 2013, from Malcolm
Baldrige National Quality Award Program: http://www.nist.gov/baldrige/enter/apply.cfm
Peters, T. a. (1987). In Search of Excellence: Lessons from America's Best-Run Companies. New
York: HarperCollins.
Rand Corporation. (2012, 07 27). Suboptimization in Operations Problems. Retrieved 05 31,
2013, from Rand: http://www.rand.org/pubs/papers/P326.html
Reason, J. (1990). Human Error. Cambridge, UK: Cambridge University Press.
Redman, T. C. (1992). Data Quality: Management and Technology. New York: Bantam Books.
Silverston, L. a. (2008). The Data Model Resource Book Vol 3: Universal Patterns for Data
Modeling. Indianapolis: Wiley.
Simsion, G. (2006). Data Modeling: Description or Design? (Doctoral Thesis). Melbourne:
University of Melbourne, Department of Information Systems.
Simsion, G. w. (2005, March 1). There's a Lot of New Stuff to Say About Data Modeling.
Retrieved July 30, 2013, from Information Management: http://www.information-
management.com/infodirect/20050311/1022729-1.html?zkPrintable=1&nopagination=1
Tzu, S. (n.d.). The Art of War.
W a n d, Y. i. (1996). Anchoring Data Quality Dimensions in Ontological Foundations.
COMMUNICATIONS OF THE ACM, Vol. 39, No. 11, 86-95.
i Maximizer
People who are especially talented in the Maximizer theme focus on strengths as a way to stimulate personal
and group excellence. They seek to transform something strong into something superb.

Connectedness
People who are especially talented in the Connectedness theme have faith in the links between all things. They
believe there are few coincidences and that almost every event has a reason.

Ideation
People who are especially talented in the Ideation theme are fascinated by ideas. They are able to find
connections between seemingly disparate phenomena.

Strategic
People who are especially talented in the Strategic theme create alternative ways to proceed. Faced with any
given scenario, they can quickly spot the relevant patterns and issues.

Learner
People who are especially talented in the Learner theme have a great desire to learn and want to continuously
improve. In particular, the process of learning, rather than the outcome, excites them.
(Rath, 2007)

You might also like