epistemology and ontology: the long story

You are presumably here because you want to be here.

In recent years, we have passed most of the technical barriers to implementing large or complex systems. With processing power, disk space and even network capacity out of the way as barriers, we are starting to run into much more fundamental barriers having to do with epistemology and ontology.

Two introductory paragraphs are helpful to shape the long story, one on epistemology and the other on ontology. Although some readers may associate these terms with antique philosophers, almost without exception large companies and institutions are paying people to deal with these two topics - although today's practitioners are not necessarily aware that they have been tossed into the deep end of the pool to sink or swim.

1. The Epistemology of our IT world

Epistemology is the study of how we know what we think we know. "Real" epistemology is the study of the intersection of sensory experience with our minds' logical, thinking and feeling selves.

As left to us by a succession of philosophers such as Descartes, Locke, Berkeley etc. - epistemology is somewhat like today's physics in having arrived at a state far removed from a common sense view. This state is symptomatic that "knowing" and "knowing you know" are much harder than it looks to the more casual observer of these matters.

Fortunately, in the context of this discussion, we are looking at IT system-level epistemology - that is, the epistemology of interior worlds that we create and control. However, some of the uncertainties and complexities of real-world epistemology cannot help but carry over to the IT informational world, especially as we move beyond narrow world views and forms-oriented keyboard input.

Few people in business are paid to be philosophers, but many business exercises - e.g., meeting Sarbanes-Oxley requirements for "truth in financial reporting" - are exercises in systems-intensive epistemology. Such an exercise becomes particularly real for the corporate officers and auditors required to attest that they "know" what they need to know and that they have a sound basis for attesting to their financial reporting. There are fines and imprisonment as the downside risks of epistemological failure.

An in important premise of this "long story" is that we should think of a system such as an enterprise system such as SAP R/3 or Peoplesoft or many others as being equivalent to a "camera." They capture on disk external phenomena ranging from weights and measures to peoples' choices, much like a camera captures images. The ERP, CRM, PDM or whatever "knows what it knows" based on the interaction of sensory input - today mostly keyboard input, but with a growing use of direct sensors - with its application-level and operating system logic.

Even if two companies use the same packaged application, how they implement it and populate it with "master" data creates epistemological differences - and like cameras, the end configurations will have differing resolution and bandwidth sensitivity.

To jump ahead, note that that epistemology may differ significantly between companies even if their "ontology" is very similar (e.g., they are in the same industry doing the same physical work).

2. Ontology (or metaphysics): the study of being.

Ontology inevitably is intertwined with epistemology, because what's really "out there" impacts our knowledge, while in turn what we think we know biases our view of what's "out there." In the information technology trade, "data about data" is referred to as "metadata," with "meta" being derived from metaphysics, and with metaphysics being synonymous with ontology. "Metadata" therefore defines the ontology of the computer system.

In shaping our information technology, we get to revisit - consciously or otherwise - the Medieval Scholastic ontological debates between "Realism" versus "Nominalism."

The realist regards notions such as "triangles" (a closed three-sided figure made up of three intersecting straight lines in the same plane) as "real." Realism is appealing, because once one characterizes a conceptual triangle, all triangles conform - e.g., the sum of their angles = 180 degrees.

The nominalist regards notions such as a "triangle" as mere names which we apply to sets of phenomena that in some respect look similar, but are not. If you look at a closed, single plane three-sided figure through a microscope and still see a "triangle," the nominalist will suggest that you need a better microscope. Indeed, by sufficiently acute observation, not only is there no "triangle," there is no such thing as a straight line or even a line.

However, the people who design and build information systems tend to be "realists," leading to collisions between the untidy external world and the internal IT representation of that world. For example, if someone designs software or a database involving details of "triangles," it is almost certainly going to require that the sum of the angles equal 180 degrees. Therefore, a lot of money, sweat and tears has been spent in trying to force fit the variability and richness of the outside world into "realist" metadata templates.

The end result is a gap between "technical metadata" which tends to be narrow and purist versus "use" metadata - e.g., that reflects actual, perhaps messy uses. For example, "use" metadata may has "rules" such as "if customer name includes 'XYZ' " address line 2 contains contract number concatenated with customer cost center.

With respect to the ontological view of the world "out there," the nominalists long ago won, with more recent assistance of modern physics, or at least the nominalists are so far ahead that nobody keeps score anymore.

Informational "Relativity"

Today, our systems are expanding beyond their formerly relatively narrow informational scope, taking us into the very large world of cross-functional, cross entity solutions. The consequences are stretch and collision.

For example, a U.S.-centric payroll system of twenty years ago could count on and indeed enforce a worldview in which everyone is an employee or former employee with one and only one U.S. Social Security Number. The "camera" simply could not "see" anyone else.

However, as business changes push the payroll system to stretch functionally - e.g., to support non-payroll Human Resources functions, to becomes a module of a multi-country IT architecture, or to become a source of project hours tracking for consultants and other non-employees - the "metadata" that sufficed in its original, narrow world also must stretch. Epistemological expansion creates great stresses - e.g., the happy little payroll system perhaps has to be rebuilt or replaced.

"Collision" creates even greater stresses. For example, a given physical automobile is a "unit" to someone who merely buys and drives one, or for that matter to a business entity that sells or rents automobiles as finished goods or that insures one. Governmental authorities try to make a given automobile unique and traceable by assigning each one a unique vehicle identification number (VIN in the U.S.), and VIN is of course the "key" in rental car or insurance company systems. To the systems supporting these roles, the notion of "parts" of an automobile are not meaningful.

However, to someone who builds or maintains automobiles or steals them for a "chop shop", a given VIN represents a collection of parts traveling in close proximity, perhaps temporarily if a "chop shop" or heavy maintenance is the vehicle's destiny.

The automotive designer's epistemology is different entirely, because a particular physical automobile is meaningless - the designer works with conceptual components of conceptual automobile designs - lovely CAD representations with no dings or road tar.

If business needs converge -, for example, making it important to the rental car company to enable engineering-fascinated renters to select a rental car based on niceties of transaxle design - a worldview crisis ensues. Perhaps some XML "bridging" can span the design system and the rental car tracking system. True "melding" - e.g., importing detailed design information into the rental car system or vice versa - typically would be impossible absent a substantial system redesign and rebuild. Where epistemological and ontological differences are great, there are no easy technology answers for expanding a given systems worldview - e.g., mere use of XML doesn't expand the "camera's" fidelity or bandwidth sensitivity.

Attempts to address differences in worldview

Starting thirty years ago, there have been some optimistic expectations regarding the formulation and implementation of "standards" to bridge such differences - primarily within the EDI world.

Despite an enormous amount of work and some worthwhile progress, EDI managed to address only comparatively low hanging fruit - primarily production-oriented (or "direct" materials) supply chain integration. For example, the major automobile OEMs who design and build cars and the Tier 1 companies that design and build major subassemblies have fairly similar epistemological and ontological perspectives. In contrast, an automobile OEM has few epistemological overlaps with office copiers or the chemicals used in engineering experiments, so EDI did not work anywhere nearly as well as for indirect products as for production materials.

Note that many of the reasons given for the comparatively narrow use of EDI - and, especially, of the voluntary use of EDI as oposed to "commanded" use - are either only partially correct or are entirely wrong. For example, contrary to rumor, EDI software and services are not too expensive for medium and smaller companies, nor is "doing" EDI all that complex. Its "batch" and "store and forward" nature are not the root causes either. The barriers stem from the inherent complexity and sometimes low payback of bridging worldviews between the gateways.

"Between the gateways" is better understood if eBusiness standards are compared with physical standards. In the world of physical product standards, there is, for example, such a thing as a "standard" #303 can of peas. A #303 can of vegetables is "born" in a packing plant, moves physically unchanged through a multi-level, multi-hop physical distribution chain to the grocery store. There, still unchanged, it then goes onto a shelf, into a shopper's cart and after purchase and the ride to the consumer's home, into the consumer's pantry and kitchen. At time of use, the #303 can is opened and recycled - its mission complete. End-to-end, it stays a #303 can of peas and it contains the same peas that the packinghouse put into it.

In the world of EDI and successor standards, there are "standard" purchase orders - for example an ANSI 850 or the freshly minted UBL (universal business language) XML order standard from OASIS. However, the EDI or XML standard order is not "born" in the "packing plant," because the "packing plant" (the end customer's purchasing system) creates some proprietary version. The "standard" order is instead created only at an outbound gateway by "repacking" the proprietary order. Repacking may involve substantive content changes as well as format changes. The standard order is then electronically shipped to a recipient.

At an inbound gateway, the recipient will unpack it and translate it into yet another proprietary format for import into its sales order system.

In a multi-tier distribution environment (e.g., a make-to-order capital good), the order transaction may go through another cycle of packing and unpacking to be sent on to the manufacturer, put into another proprietary system, etc.

Gateway to gateway EDI or XML "standards" are therefore not the equivalent of physical product standards, because they do not survive from point of origin to point of use. There is no immediate solution at hand, because the proposition that internal systems will "natively" speak "EDI" or "UBL" - both with respect to format and content - is perhaps too wrenching to contemplate. On the other hand, that probably should be accepted as a future destiny.

Enter XML

Today, eXtended Markup Language (XML) becomes an important aid, because in packaging data within XML constructs the data carries with it a bit of "worldview" (a k a "metadata') so that it can be more easily used within a different "world."

However, like EDI, XML is not in itself a solution because bits and pieces of metadata are not sufficient in themselves to span worldviews. As discussed below, two other initiatives are needed - the adoption of shared, standardized "metadata," and worldview simplification - i.e., running a company with less metadata are both important.

Enterprise Data Complexity Abounds

Metadata abounds - in data dictionaries, data models, entity-relationship diagrams, and other systems design artifacts. Some data is elevated to the exalted category of "master data" - and to the degree that master data is abstracted from real transactions it is closer to being "metadata" rather than "real" data. A lot of rule-generated "data" is by my definition "synthetic" data - e.g., if a certain field in a sales transaction is set to X because of a rule regarding the alignment of the deliver-to postal code with a postal code cluster aligned to business unit "X."

Documents (as opposed to transactions or events) are epistemologically and ontologically troublesome, because they bundle nuggets of metadata, rules, synthetic data, transaction data, connecting text meaningful only to humans. Such a bundle is difficult for machines to construct and deconstruct.

EDI and XML "documents" - although more disciplined than most - share these complexities. For example, an EDI order potentially can bundle many dozens of deliveries to multiple plants over extended periods of time. At least in theory, an ANSI ship notice can document a freight train load of diverse goods, by product, car, pallet and box. The problem in a two-sided world - buyer and seller - is that they do not often share the same epistemology, and adding in logistics providers, sales tax authorities and others, the unbundling and exception management snarl gets worse.

Molecular-level transactions - e.g., one plant, one delivery, a few products - are vastly better because of their back to basics simplicity.

Practical Measures for the Future

To cut to the chase, the need is to speak a common language, which almost certainly will be a very constrained language. I would add that the language will be made up of "declarative sentences" - there is in my view a bit too much attention on nouns. The internal and external transaction-related content, format and process needs to be standardized, so that there is little or no "translation" required to move transactions between entities.

Interestingly, this "less is more" notion is made more feasible by advanced in technology. Often, the two problematic "A" words - abstraction and aggregation - are thought to be needed for "simplicity" and performance. Simplicity through abstraction really means attaining apparent "uniformity" by limiting the epistemological camera's fidelity and ignoring detail. Performance enhancement through aggregation trades of very high cost labor (to define, implement and communicate the rules of abstraction) to save on very cheap disk drives and processors.

We now have the means at hand to move ahead with such a proposal - even if at an evolutionary pace.