At first glance
XML looks quite similar to
HTML in that it is made up of
text,
tags and
attributes.
Upon closer inspection though, they show themselves to be
quite different. While HTML concerns itself with how data should be
displayed only, XML allows a sense of what the data
means to be incorporated into the document.
For example you might markup an
address in HTML like this
<TABLE>
<TR>
<TD>484</TD><TD>St Kilda Road</TD>
</TR>
<TR>
<TD>Melbourne</TD>
</TR>
<TR>
<TD>VIC</TD><TD>3000</TD>
</TR>
</TABLE>
However in an
XML document it might look like this
<ADDRESS>
<NUMBER>484</NUMBER>
<STREET>St Kilda Road</STREET>
<CITY>Melbourne</CITY>
<STATE>VIC</STATE>
<PCODE>3000</PCODE>
</ADDRESS>
Notice how the XML adds
structure and
meaning to the data. While in the HTML "
St Kilda Road" is just some text in a table in the HTML, the XML specifies that it is a STREET and is part of an ADDRESS.
Of course there are many different structures and meanings that can be applied to the same
data. For example if we weren't really interested in the above data as an address but wanted to perform some sort of
syntactic analysis on it we might specify it in another piece of XML as
<SENTENCE>
<NUMERAL> 484 </NUMERAL>
<NOUN type="proper">
<ABBREVIATION>St</ABBREVIATION>
<NAME>Kilda</NAME>
</NOUN>
<NOUN>Road</NOUN>
<NOUN type="proper">Melbourne</NOUN>
<ABBREVIATION>VIC</ABBREVIATION>
<NUMERAL>3000</NUMERAL>
</SENTENCE>
in which case we don't see it as an ADDRESS but as NUMERALS, NOUNS and ABBREVIATIONS grouped into a SENTENCE.
You can do this sort of thing because XML is
eXtensible. Unlike HTML which has a
static set of tags, you can create new XML tags to
confer whatever meaning and
structure you wish to data. In fact
if you think about it the HTML fragment first shown is also XML but the tags used are designed to specify the structure for displaying
arbitrary text.
Actually all HTML documents could be thought of as XML documents if it werent for the fact that XML is a bit
stricter. Specifically
1) All XML must be
well formed.
HTML is very forgiving when it comes to syntax (which has lead to a lot of very sloppy HTML being produced) but XML isn't. In order to be well formed an XML document must, among other things, have closing tags for all opening tags and present them in the right order.
( For a full description of what constitutes a well formed XML document see http://www.ucc.ie/xml/#FAQ-WF )
The vast majority of HTML documents out there are not well formed, but if they were then they would all also be XML documents.
2) You can specify that XML must also be
valid
If you do so then you must provide a Document Type Declaration ( DTD ) for the XML to be validated against. A DTD specifies rules that the tags and elements in the XML document must follow to be considered valid. For example you could specify in a DTD that the contents of a NUMBER tag as used above must consist of one or more numerals followed optionally by a letter.
Then a document containing
<NUMBER>27a</NUMBER>
is valid, but one containing
<NUMBER>ABC</NUMBER>
would not be.
Basically a DTD allows you to formally specify a type of XML document and hence the structure and meaning to be conferred to the data.