Bridge The Communication Gap
There are hundreds of spoken languages in the world, everything from Abkhaz to Zulu. No wonder we sometimes have difficulty communicating with other people.
The same is true of computer systems; they often speak different languages. There's a whole range of hardware platforms running dozens of different operating systems and thousands of different applications. Yet, increasingly it's necessary for these systems to talk to one another. A billing system in one company may need to exchange information with an inventory system in another and a logistics system in still another. How can these computers communicate without the help of human translators?
The answer is XML.
What is XML?
XML stands for eXtensible Markup Language. (Yes, another instance where the lowly E has been forgotten.) It's a way of structuring text so as to identify what that text is for, or what it means.
Take as an example an address:
LightingStrike Studios
PO Box 24040
Cambridge, Ontario
N1R 8E6
As literate humans we can tell quite easily that the company name is LightingStrike Studios and that it's located in the city of Cambridge. We could do so even if we rearranged it. But how would a computer do that? It would take a fair bit of programming logic to identify the various components of an address presented as plain text. Wouldn't it be easier if we identified each component with tags?
<address>
<company>LightingStrike Studios</company>
<street>PO Box 24040</street>
<city>Cambridge</city> <province>Ontario</province>
<postalcode>N1R 8E6</postalcode>
</address>
This is what XML does; it adds structure to text to remove ambiguities and allow computers to identify specific types of data.
How is XML used?
Suppose we need to share information from our accounting system with one of our suppliers. XML is well suited for this. All we have to do is agree with that supplier on a common set of tags and what they mean. Many modern databases -- such as Oracle and DB2 -- are XML aware; they can import and export XML files. Some even store their data in an XML format.
XML is also often used as the basis for document production systems. An author produces a source document in XML, and that source is used to generate various output formats, such as printed manuals, web pages, PDF files, and online help systems. In a single-source process like this, revisions only need to be made once and then target documents regenerated.
How is XML different from HTML?
If we're familiar with HTML, the language of web pages, the code example above will look familiar. Indeed, both HTML and XML are defined by the same organization, the World Wide Web Consortium. (http://www.w3.org/) But there are important differences.
While XML is used to identify and classify data for consumption by machines, HTML is used to control the presentation of data to human readers on web pages. While most web browsers adhere -- more or less -- to the HTML standard and thus only need to understand a limited set of tags, changing business requirements may dictate an ever expanding set of XML tags. This is where the X in XML comes in. By being extensible, XML doesn't define specific tags, rather a syntax for writing those tags. Once you understand the syntax, you can define any tags you need. In the example above, we could have used <zipcode> instead of <postalcode>. We could have added <areacode> and <phonenumber> and <faxnumber>. We only need to define our tags in a Document Type Definition or DTD.
There are also basic syntactic differences between HTML and XML. HTML is not case-sensitive; XML is. (In HTML you could properly have <DIV> followed by </div>. In XML, <POSTALCODE> followed by </postalcode> wouldn't be well-formed.) HTML ignores white space; XML preserves white space. HTML may allow you mix the nesting order of elements; XML requires consistent nesting.
Notice that we used the expression "well-formed" to describe properly written XML? An XML document is well-formed if adheres to the W3C's XML Recommendation. An XML document is valid if it adheres to its Document Type Definition.
How do you write XML?
You could write a complex XML document using nothing more than a simple text editor, a little knowledge, and a whole lot of patience. But from the brief list of syntax issues we've just discussed, it's easy to see that a specialized editor would be a big help. Fortunately there are many options, both commercial and free. Some of these let you work in a WYSIWYG (what-you-see-is-what-you-get) environment as you would in a word processor, without ever having to see your tags. Other let you see and manipulate your tags directly. Some will enforce valid documents by only letting you use tags appropriate for the section of the document you're in, according to a selected DTD. Some editors will allow you to produce HTML and PDF output, while some only generate XML files which you can then process with other tools. For an extensive list of XML editors, check out O'Reilly Media's xml.com site (http://www.xlm.com).
Just as HTML has become the standard format for almost all Internet web content, XML is well on its way to becoming the standard format for information exchange. If you're not speaking XML already, it's likely you soon will be. Or, at least, your applications will.