SAP (the company) has started an initiative called ‘Lighthouse’ to actively market pilot projects for NW IdM 7. Customers who are interested will receive ‘special’ ramp-up benefits and are asked to share lessons learned and pain points during the implementation of the product. Might be a good opportunity for some to bash on consultants like us … No, seriously, anyone interested? Leave a comment if you are.
As promised this the last time I will ever write about the oxymoron unstructured data; I already feel like a Harpy in The Wood of Self-murderers except I am torturing those who have committed intellectual suicide. Some of you reading this may find this post harsh but as Richard Bandler observed you only need to feel insulted if it applies to you.
Google unstructured data and it returns 700,000 hits. Wikipedia starts with this definition:
“Unstructured data (or unstructured information) refers to masses of (usually) computerized information which do either not have a data structure or one that is not easily readable by a machine.”
The hyperlink to data structure says this: “In computer science, a data structure is a way of storing data in a computer so that it can be used efficiently. It is an organization of mathematical and logical concepts of data.”
I couldn’t make this up if I wanted to; so according to the Wikipedia author(s) the only thing that qualifies for unstructured data would be a pseudo random number generator, books, magazines, and printed documents. I can’t imagine this is what he had in mind when defining the term. He most likely used unstructured knowledge to write it, that and a crayola.
Those with whom I have argued over this term tend to respond in three ways. For some when I have shown them the denotative nonsense of it , fall back to its connotative meaning. “I agree with you; it is an oxymoron but the term is useful to distinguish between data in databases and the all the other data” A utilitarian argument for a term whose descriptive power is based on a shared hallucination of meaning. Apparently once you have gathered a sufficiently large number of idiot enthusiasts you have a powerful semantic swarm for persuading the me-too masses. Others, less bright or perhaps intellectually honest will argue that when you put the words together they take on new meaning, like mixing colors I suppose or the German compound noun and LSD. Now, you really can see the meaning. Finally, there are the invincibly ignorant whom no argument conceived could convince otherwise.
What has this term contributed to the advancement of computer science, to knowledge, or to decision making? Anyone? As far as I can tell it has only contributed to marketing, to pseudo-intellectual posturing, pretense of knowledge, the selling of quackery and of course, as always, a PhD thesis.
I closed part II of this post with a series of assumptions in order to move the discussion forward and ended with asking a rhetorical question of what should be done to protect and control information. In some cases the answer is very little. This may seem like apostasy for someone whose career has been in information security but stringent controls on information can be just as damaging as too little. The strictest policies I ever seen on email and web usage were a local government bureaucracy, not exactly a hot bed of original thinking.
So in the management of information we have paradox, a sort of Laffer curve of information security. Paradox is a well known characteristic of complex systems and multivalued logic. Slow is fast, low risk is high risk, total knowledge is uncertainty etc. Think of the ‘wall’ between intelligence and law enforcement assets that existed prior to 9/11 here in the US. When we encounter paradox it provides an opportunity for original thinking between periods of grinding our teeth. How long did Zeno’s paradoxes stand before they were solved by Newton/Leibniz with the creation of calculus*?
Information security wouldn’t be necessary if people could be trusted. The only reason to spend money on security is to control, contain, or stop the damage of malicious people. Information stored in database management systems are easy to control, however, other information in the form of instant messaging, spreadsheets, emails, word processing documents are far more difficult. There are content management solutions, digital rights solutions, IM gateways and the like. How often do you find them utilized? Frequently, it is used piecemeal, a knee jerk response to a consistent minor problem. Despite this massive army of technical solutions, you rarely find a cohesive, integrated deployment of the technologies in the corporate landscape. From this result we can draw one of the following conclusions, the risk is so low it makes no sense to spend the money, we are aware of the risk and impact but don’t believe it will happen, or we are uninformed, that is, ignorant of the risk. It is difficult to convince some one to take measures against a risk that only happens once every 10 years. Anti-virus software wasn’t deployed in a lot of companies until email viruses took down their systems with regularity. If I give you a paper cut everyday, you won’t wait too long to take action against me but If I beat you in the parking lot with an ax handle once every ten years, you will be far more lax in your defensive posture despite the enormous difference in damage.
When we have information properly classified by the information owners, it is necessary to think through who the other stakeholders are, and speak to them before we start the lock down process. Who may find this information useful and valuable besides the group whom created it? What is their perspective? What of others who may wish to acquire access to this information ad hoc because they believe they can add something? Soliciting these other views allow us to make reasonable judgments about sharing and use of information. Once the foregoing is done we need to consider whether our annual expected loss and the probability we are dealing with is a regular paper cut or a body damaging beating. Here is the point we have reached:
- Identified external and internal risks across the value chain
- Prioritized the risks
- Risk processes are aligned with goal setting
- Proper task organization and structure
- Information is classified
- Annual expected loss and probabilities estimated
Now access controls can be placed on the information. If high value information is not already stored in database management systems, create the conceptual models and create the database. This gives integrity and tighter controls. Other forms of data may be stored and secured by application whether exchange offline databases, content management systems, and the like. Identity management workflows can be set to handle access requests.
Areas of beneficial research would be intelligent agents (perhaps based on the work of Stephen Thaler) for permitting temporary access using stored individual models of existing permission grants. Modeling of the dynamic systems looking for equilibrium between unrestricted and locked down information. Also better methods for creating and reducing knowledge into conceptual models and stored and queried against truly relational database management systems.
There is quite a lot of work listed in these three parts that just isn’t being done in a disciplined way, or in many cases, done at all. The majority of this work does not require heavy investments in new technologies but it does require business process realignment. Marco I hope I answered your question.
* Philosophically they may still be unresolved.