Data Knowledge and Risk Part II

When we are building a set of attributes whose totality indicates a person, thing or state upon from which we can take action, we need to make sure that the data is both accurate and complete.  Unfortunately, rarely do you get either.  Even the most diligent will make data entry errors or find the information unavailable at the time of entry.  This problem would be managable provided there were clear rules for handling missing
information, however, as a practitioner, you rarely run into such pleasant territory.  A null can mean anything, for example, the data was deleted, the data is missing, the data is unknown, it does not apply, the data was not entered.  By definition a null set is an empty set , so if you stare long enough you can hallucinate whatever you want into the empty space.  Given the foregoing, when we start to do joins across disparate repositories, and aggregate data, it can become impossible to make logical inferences. Ambiguity in data may be tolerable for humans because they muddle their way through, but for computers in their binary universe it’s a mess.  The null has an adverse affect on data integrity so key attributes are frequently constrained in SQL databases so they cannot be null.  Any store of data is a store of information about the world.  It is the systems knowledge of the world.  The more uncertain that knowledge, the more inaccurate the data, the bigger impact on productivity and decision making.  That impact increases risk and can do so significantly and cumulatively.  It creates a problem for basic induction i.e., instance confirmation.  This is not a new or an original observation.  How long have companies been working on master data management?  As mentioned above IAM initiatives force changes, critical data gets verified, corrected or filled in, risk reduction policies enforced and that has tangential benefits to the company, benefits easily missed.


