Limits on Machine Intelligence Makes XML Standardization Difficult
Although we have a growing abundance of "brute force" computing resources, we face limits imposed by lack of computer "intelligence." Those limits in many respects gave impetus to XML adoption, while those same limits help make XML standardization difficult. What is important is to be realistic in exploiting the "brute" without getting waylaid by the brute's lack of intelligence. Below human language translation is used as a benchmark regarding limits on machine intelligence.
The Human Language Translation "Canary"
Automated human language translation is somewhat the equivalent to the miner's canary - birds taken down into the mines to serve as indicators of the healthiness of the mine's atmosphere. The language translation "canary" is pretty wobbly, long after many of us expected computer "intelligence" to crack the human language translation problem.
Below is a benchmark recently published by the Financial Times. On September 21, 2005, the FT published a sidebar describing the translation from French to English of a few lines of text taken from a novel by Balzac. It published four versions: 1) Balzac's French original, 2) an amateur human translation into English, 3) a professional human translation as published by Penguin and 4) an automated translation into English.
Below is quoted the last sentence for each version.
1. Balzac's original in French
"C'est a Paris la plus grande expression connue de la personnelle chez l'homme."
2. The FT's amateur human translation to English of that sentence:
"In Paris this is the greatest personal expression of satisfaction a man can show."
3. The professional human translation:
"...in Paris the expression of male self-satisfaction can go no further."
Note that the professional human translator not only re-expressed the statement in English, but altered the sentence structure to subordinate this sentence to the preceding one, presumably to re-create a more Balzacian flow.
4. And, finally, the canary's efforts - the automated machine translation:
"This is at Paris more big expression known personal one with the man."
For someone who neglected language studies, but hoped to be rescued by IT technology, this benchmark of the "state of the art" is very grim reading. Forty years or fifty year into the computer age, the"more big expression" that comes to mind is that "machines are stupid." In this instance, the canary died, indicating a pretty unhealth atmosphere.
The disappointing progress of automated human language translation is not only important in itself, but symptomatic of limits to other applications of "machine intelligence." If you are relying on "machine intelligence" in other venues, it had better be with respect to some "closed," entirely mechanistic problem - e.g., such as playing a highly abstracted game such as chess with bounded rules and moves and with a simple, mechanistic objective. Closed sorts of games that from a human perspective appearwhat appear very difficult - for example dictionary attacks on passwords or other "secrets" that require a billion or so "brute force" tries - are far more likely to be successful than machine translation of a few words of text from French to English.
Although as humans we can take some comfort in the shortcomings of computers with respect to "intelligence" and "judgment", the uncomfortable consequence is that we are having to work a lot harder than otherwise would be necessary.
Impact of these limits on XML standardization
XML has been characterized as enabling "self-describing" data, and indeed, the use of XML and related informational constructs provides a lot of surround to help with inter-entity data translation and inter-operability processes.
Of course if machines were "intelligent" XML standards would either be unneeded or radically simplified.
The author of this article not only was overly optimistic regarding the progress of human language translation, but also about the declining need for document and data standards of the EDI sort. A decade or so ago, it appeared likely that machines would soon be able to "read" the same information content that human beings commonly use in business, in which case there would be no need for machine-oriented "standard" orders, shipping notices, invoices, etc. The inability of computers to translate Balzac with high fidelity not only protects a lot of human translators' jobs, but it means that we need to continue to struggle with such standards.
XML only eases rather than eliminating the struggle. The introduction and evolution of XML represents progress in making data exchange more of a finite rules game, fitted for machine manipulation. The XML constructs that travel with the payload data, along with the XML constructs that represent the trading relationship, in effect serve up the rules of the game.
The fact that XML is primarily "of the machine, by the machine and for the machine" means that human XML practitioners have to follow a path of "dumbing down" content to machine level. "Dumbing down" requires, for example, constraints and precision in name spaces, vocabularies, and other constructs that are far more restrictive than would be required by human beings.
Additionally, the representation of business process - if those representations are fed to machines to serve as "orchestration" and "choreography" - have to follow a similar "dumbing down." Of course, process representation depends on standards regarding name spaces, vocabularies and message or document constructs. Therefore, there is a sometimes vicious cycle in which changes, adds and deletes in process destabilize namespaces, vocabularies, et al, while in turn shortcomings and rigidities in XML conventions can constrict and distort processes.
What is sometimes forgotten or ignored is that all these information constructs exist to encapsulate an often challenging "real world." While one can dumb down both process and data representations to accommodate the limits imposed by "stupid" machines, it is exceedingly difficult to "dumb down" reality. IT representations of business processes are generally low fidelity replicas of what is really happening "out there" - remember the 500,000 pixel early digital cameras, now upwards of 5,000,000 pixels? Our necessary selectivity as to what level of detail to capture and how to represent it has to synchronize with the realities of business and with our changing objectives and perspectives.
It is therefore not surprising that XML standards have proliferated. What seemed to make sense in accommodating one perspective may later be found not to fit some other perspective. Even in proliferated form XML standards face difficult barriers to adoption, because trivial (to people) variations can provoke severe data and process conflicts for machines. As a result, XML standards practitioners are on an endless treadmill, updating existing standards and adding new ones.
The above statements are not meant to convey notions of "good" or "bad" and certainly not to criticize those who work in the field of "machine intelligence." Instead, the above views highlight that we function within limits imposed by real-world technology and that neither the innovation of XML nor of "ontology" nor of the "Symantec Web" can do much more than ameliorate fundamental shortcomings in machine intelligence.
Although there may be advances, new techniques, new patents, etc., as a practical matter until one sees automated human language translation functioning well, the default assumption should be that "machine intelligence" is an oxymoron and that the process of dumbing down to machine-friendly levels will consume a lot of XML standards-setting resources.
Colts Neck Solutions LLC