Seems like you hear more and more about XML coming into
its own these days. In fact, some folks tout it as the universal
data format. This article provides a brief introduction
to XML, its uses, and some of its problems.
What is XML?
XML stands for "eXtensible Markup Language" and
is, basically, a format for representing information in
a structured, neutral way. 'Structured' means that all data
exists in a "parent-child" relationship. 'Neutral'
means that the data is defined according to an open standard,
so developers can see exactly how XML is defined and even
participate in its evolution. An XML file is a simple text
file, which can be transferred across the Internet just
like HTML.
Here is a sample of a simple XML document:
<?xml version="1.0"?>
<cars>
<car id="1">
<nickname>Old
Junky</nickname>
<engine type="4
cylinder" />
</car>
</cars>
While this looks quite a bit like HTML, there are major
differences. The most fundamental difference is that XML
tags tell you "what something is" whereas HTML
tags tell you "how something should be laid out."
In HTML, an <i> tag means that some text should be
displayed in italics. In contrast, the <cars> tag
in XML means that we're talking about things called "cars,"
and you can display them however you want.
Another difference you might notice is that the <engine>
tag has a closing slash at the end. Whereas in HTML you
can use tags like <BR> with no closing tag, XML requires
that all tags be closed with either a closing tag: <car></car>
or with a closing slash within the tag: <engine type="4
cylinder" />. XML that follows these rules is called
"well-formed" XML.
The Document Type Definition (DTD) is a separate document
that specifies what data must be in the XML document. In
the example above, the DTD might tell you that every "car"
must have one, and only one, "engine," but it
can have zero or more "nicknames." If your XML
document meets the requirements of the DTD, it is said to
be "valid."
What can you do with XML?
An XML file contains specially formatted information, as
shown above. The software that goes through the XML and
breaks the information up for other software to use is called
a "parser." The great thing about parsers is that
the same parser can be used on well-formed XML no matter
where it came from. That means that different systems, or
people, can put information into an XML format, and any
other system can read and understand it.
The independence of the parser from the data source means
that XML is good for data exchange. When I request information
from your database, your database can reply in XML without
knowing anything about my system. This allows developers
to "de-couple" systems and use XML as the neutral
glue between them. Even old or "legacy" systems
can be fitted with XML adapters and learn to communicate
with newer technologies.
For example, suppose I'm in California and have a website
which lists cars for sale, and you are a car dealer in Oklahoma.
I want to include your current cars for sale on my site
(along with many other dealers). Every time somebody brings
up my "Cars for Sale" page, my system can make
a request to your system for your current inventory. If
you present your information in XML format as shown above,
my parser can easily break it up and I can display it in
my own site's "look and feel." I might use XSL
to transform the data into my look and feel.
Currently, with the advent of Microsoft's .Net, the press
is beginning to talk about other uses of XML, like SOAP
(Simple Object Access Protocol.) SOAP is a way to package
a request to a remote system in XML format. Because XML
is just simple text, it can be transferred across the Internet
just like HTML, parsed, used to package the results, and
sent back.
The Downsides of XML
But all is not roses with XML. For one thing, XML parsers
are notoriously slow. It is faster for your database to
deliver data directly to your application without transforming
it into XML. That means, for example, that a Content Management
System (CMS) handling information in XML format will require
a lot of processing power. If it is going to be served on
a busy intranet or Web site, it may need to pre-process
the information off-line into a more familiar format (like
HTML).
Another practical issue with XML is the need for skilled
developers who understand how to use the various technologies
correctly, and who fully understand the applications and
limitations of these technologies. For example, while some
browsers are beginning to understand XML and XSL, most don't
currently. That means it would probably not be a good way
to present your site. On the server-side, if a native database
connection is practical for large volumes of data, it will
almost always be faster than using XML as an intermediate
format. The best technology in the world isn't any good
if no one at your company knows how to apply it correctly.
In summary, XML is great for some things and terribly misplaced
for others. It's often viewed by programmers with the reverence
previously reserved for Java with claims that it's the "only
way to go." However, just as you would probably not
want to write a Java servlet to process a simple form, XML
may not be the best choice for your site or application.
Consider the positive and negative features summarized above,
and make your decision based on your specific situation.