Tuesday, May 28, 2013

XXHTML

Do you know what XML is?  An example of XML is:

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<person>
  <name type="common">Bob</name>
  <name type="scientific">Homo Sapiens</name>
  <intelligence>Average</intelligence>
</person>

XML is a standard data format.  It looks like HTML but you can make up your own tags and attribute names.

If you put the XML above into a file named bob.xml and load it into a browser like Firefox, you get a nice view into the XML data, laid out in a tree.  The browser shows this as a courtesy.  It is only useful to the programmer as an informational tool; the tree display isn't used in programs, web sites or end users.

If you rename bob.xml to bob.html and load bob.html into a browser, it might be blank or it might be an unformatted jumble of text.

To display the XML in HTML, you can use XSL.

First, you need to add a reference to the XSL file from the XML file:

<?xml-stylesheet type="text/xsl" href="reader.xsl" ?>

Now, the XML file looks like this:

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<?xml-stylesheet type="text/xsl" href="reader.xsl" ?>

<person>
  <name type="common">Bob</name>
  <name type="scientific">Homo Sapiens</name>
  <intelligence>Average</intelligence>
</person>

Next, you need to write a XSL stylesheet.  A partial version of the reader.xsl file might look like:

<?xml version="1.0" encoding="utf-8"?>  

<!DOCTYPE xsl:stylesheet  [
   <!ENTITY nbsp   "&#160;">
]>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:str="http://example.com/namespace" exclude-result-prefixes="str">
<xsl:output method="html" encoding="iso-8859-1" doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>

<xsl:template match="/">
<html>
<head>
<title>People</title>
</head>
<body>
<xsl:for-each select="//person">
  <span>
    ...
  <xsl:value-of select="text()" />
    ...
  <xsl:choose>
    <xsl:when test="position() mod 2 = 1">
  ...
  </span>
</xsl:for-each>
</body>
</html>
</xsl:template>

</xsl:stylesheet>

Ugh.  All that trouble, just to get some simple HTML.  Not to mention that ordinary HTML hackers aren't likely to understand your XML and they surely won't understand your XSL.  Plus, when you select "View Source" from the menus in a browser, many browsers show only the original XML and don't show the XSL or the final HTML that is shown in a browser.

Why does it have to be so hard?  Why can't renaming bob.xml to bob.html just work, at least in some simple way?

I propose a simple standard: XXHTML.  This stands for "XML friendly XHTML".

Instead of using custom XML tags, XXHTML use XHTML like this:

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<?xml-stylesheet type="text/xsl" href="reader.xsl" ?>
<html>
<head>
<title>People</title>
<style>
span {
  display: block;
}
</head>
<body>
  <div class="person">
    <span class="name" type="common">Bob</span>
    <span class="name" type="scientific">Homo Sapiens</span>
    <span class="intelligence">Average</span>
  </div>
</body>
</html>

Rather than use custom tags, like person, let's use standard XHTML tags, like span, but encode them such that a standard XHTML property, like class, encapsulates custom XML tag name.  We can do that in such a way that it is easy to pick out all the information using XML and XPath in XSL but still have it be normal looking HTML.

By doing this, renaming bob.xml to bob.html is actually useful and makes sense to HTML hackers.  But it also provides all the same functionality in XML and XSL.

XXHTML is a win-win.