Standards, BEST PRACTICE, accessibility, usability, XML
In 2005 it is high time to start serving XHTML as XML on a grand scale. Others have been doing it for years. I have been doing it since Christmas. Switching between XHTML as xml and text/html is easy using the HTTP accept header.
The xmlization of the Internet is slowly gathering momentum. XHTML replaced HTML in January 2000, and several XML applications like SVG (Scalable Vector Graphics) and XForms are in place only waiting to be supported by the browsers. The day MS Internet Explorer starts supporting the XML applications and XHTML as XML, we better be ready.
We have been living in a phase of transition. For the sake of backward compatibility XHTML has until now been used like some other variant of HTML with no apparent benefits to end users. But in the year 2005 all browsers worth mentioning support XHTML as XML except Internet Explorer.
The great day Internet Explorer delivers, we more than anything else need to distinguish between new and old versions of Internet Explorer. We can just as well get the proper testing in place now in order to get experience.
Serving XHTML the old way with mime-type "text/html", the browsers are second guessing the webpage author trying to show the webpage no matter how badly it's coded, and no matter how much of the code is missing. This is the recipe for an Internet where nothing really matters and where no serious business is possible.
When serving XHTML with mime-type "application/xhtml+xml" the web page must be well-formed. Just one violation of the markup rules of well-formedness and the browsers will only show an error message. That is the recipe of quality web pages based on modules of xml applications. The "fond" of the recipe works today. Why not be ready to move fast, when it becomes possible to harvest the benefits of XML in a not that distant future?
XHTML 1.0 was made to be a standard of transition. We are allowed to use either text/html as mime-type for the sake of backward compatibility or application/xhtml+xml when serving XHTML as XML. XHTML 1.1 should only be used together with mime-type "application/xhtml+xml".
What we need is a method to serve our web pages as "XHTML 1.1" using mime-type "application/xhtml+xml" for the browsers understanding it, and as "XHTML 1.0 Strict" using mime-type "text/html" for not yet XML-compliant browsers.
When we need to serve browsers differently, we should always when possible test for objects in browsers or use some type of content negotiation, treating all browsers on equal terms. If we test using the actual names or versions of browsers we run the risk of discrimination.
Testing the HTTP accept header send by browsers is the ideal method for negotiating mime-types. Should we send the browser XHTML 1.1 with mime-type "application/xhtml+xml" or XHTML 1.0 Strict with mime-type "text/html"? An article by Mark Pilgrim from early 2003 was one of the first pointing to the use of the HTTP accept-header [1].
Content-Negotiation using the HTTP accept-header has also been promoted by W3C [2], and many front-runners have been using the technique for years [3]. There are many articles and tutorials around explaining how to do it [4], but many of them are rather sketchy and even flawed, so you must be prepared to add a lot of common sense to it yourself. See notes for RFCs on application/xhtml+xml media type [5] and HTTP accept-header [6] and W3C note about XHTML Media Types [7].
Web servers are configured to use default mime-types for specific file extensions and can be configured to set mime-type based on content-negotiation. In Apache we can use the mod_rewrite module. I and most other website makers don't control the webserver of the live website. We must set the mime-type with code "server-side" using ASP, JSP, PHP, ColdFusion, Perl, Python (CGI script), etc. I am using ASP.NET and C# at the moment.
When a browser or other user agent or webbot (search engine indexer, html validator, webcrawler) requests a web page, it can send an HTTP accept header with the request signaling to the web server what mime-types it understands. Some user agents or webbots like W3C's HTML Validator don't send an accept header because they don't need one.
All we need to do is to test if the accept-header sent by the browser, etc., contains the string "application/xhtml+xml". If it does, we send it file A, XHTML 1.1 and mime-type application/xhtml+xml, if it doesn't we send it file B, XHTML 1.0 Strict and mime-type text/html.
It doesn't matter if we have two physical files or if we generate the needed file on the run out of a relational database or out of our XML data store. It is easy to make two versions. In the case of SmackTheMouse, it was only necessary to take the XHTML 1.1 version, delete the XML declaration, change the DTD and add one metatag, to create the XHTML 1.0 Strict version served as text/html. The proper mime-type is first added to the file when it is served.
File A:
XHTML 1.1 using mime-type application/xhtml+xml
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
File B:
XHTML 1.0 Strict using mime-type text/html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
In order not to break our code, we must first test if a HTTP accept header exists. If we forget to do that our code will break when a user agent without a HTTP accept header comes by requesting a webpage. If a HTTP accept header doesn't exist we send file "A". RFC 2068, Hypertext Transfer Protocol, explicit says: "If no Accept header field is present, then it is assumed that the client accepts all media types" (See note 6). If the accept header exists we continue our testing.
Where in the process can we do the testing and add the proper mime-type? There are several ways to do it but since my website is doing URL rewriting, I have found it natural to do the testing as part of the URL rewriting.
The URL of the document you are reading is http:/www.smackthemouse.com/xhtmlxml. For this URL I have two files ready: xhtmlxml_a.html and xhtmlxml_b.html. This is just an example, I actually use other filenames based on dates. The URL is fed to a function I have called "testHTTPaccept" where it is put into a variable, I have called "filename". I also declare a variable "accept" just to make the code easier to read.
private void testHTTPaccept(string filename)
{
if(Request.ServerVariables["HTTP_ACCEPT"] != null)
{
string accept = Request.ServerVariables["HTTP_ACCEPT"];
if(accept.ToLower().IndexOf("application/xhtml+xml") != -1)
{
Response.ContentType = "application/xhtml+xml";
Response.WriteFile("/" + filename + "_a.html");
Response.End();
}
else
{
Server.Transfer("/" + filename + "_b.html");
}
}
else
{
Response.ContentType = "application/xhtml+xml";
Response.WriteFile("/" + filename + "_a.html");
Response.End();
}
}
I start by calling the function like this: testHTTPaccept("xhtmlxml"). In the first line the URL, in this case "xhtmlxml", is being put into the the string variable "filename". In the first IF I test if an accept-header exists. If not, the code jumps to the last ELSE. ContentType "application/xhtml+xml is added and the URL is rewritten to xhtmlxml_a.html.
The following code is all what is needed to add mime-type (ContentType) "application/xhtml+xml" to some file we can call "someXhtmlFile.html. Comepare this example with the code above.
Response.ContentType = "application/xhtml+xml";
Response.WriteFile("someXhtmlFile.html");
Response.End();
If the HTTP accept-header exists, it is safe to put its content into the variable I have called "accept". I change the content to lowercase, "accept.ToLower()", just to make the test more robust and then the code tests if the string "application/xhtml+xml" is part of the content:
if(accept.ToLower().IndexOf("application/xhtml+xml") != -1)
If the string "application/xhtml+xml" is found the URL is rewritten to xhtmlxml_a.html, and "application/xhtml+xml" is used as ContentType (mime-type). If the string "application/xhtml+xml" is not found in the accept-header, the URL is rewritten to xhtmlxml_b.html and mime-type "text/html" is send along by default.
Serving XHTML as XML has caused close to no problems at a website like SmackTheMouse. I have only had problems I could solve in less than an hour. It pays off to have used strict versions of HTML and XHTML for five years and for being so used to well-formed and valid code that I can't write it any other way.
In HTML it has been and still is very common to write element names in uppercase ("making them stick out") and to do the same in CSS. This is wrong in XHTML and you can just as well stop doing it. The DTD of XHTML says that all element and attribute names must be in lower case. That is the way they are written in the DTD.
Since I have used valid XHTML for years my element names just like my attribute names have been in lower case for as long. But when starting serving XHTML as XML the element names in the external stylesheet must also be in lower case. XML is case sensitive. I had forgotten one "H2" in my external CSS and had to change it to "h2".
It has paid off that I never use anything but external CSS. It took my only a minute to fix one external stylesheet when I started serving my XHTML web pages as XML.
Until this day web designers have been used to the body element in HTML being the top element of the viewport. The visual presentation inside the browser's window begins in the body element.
In XHTML as XML the viewport starts with the document's top element, the html element. It means that it is also necessary to style the html element. At my website all pages have zero margins and padding in body, {margin:0em; padding:0em}. The new viewport caused only problems for the background-color property.
Styling the html element is so contrary to common sense that browsers like Opera, Safari and Amaya don't support it yet even though they support XHTML as XML. Only Mozilla Firefox is doing it the new way. Interesting what Internet Explorer is going to do the day IE delivers.
Styling the html element was so foreign to the founding fathers of CSS, that we can't even use the class attribute in the html element if we want our documents to validate. Now we have to learn new ways for nothing, we have to add new chapters to all books and tutorials about CSS, and millions of web designers have to struggle with this new problem.
If we have a small weblog style website of just one "page", we can use the html element as type selector in our stylesheet:
html{background-color: #FFF5EE; color: black}
or
html, html body{background-color: #FFF5EE; color: black}
If we have many layouts we must use id values in the html element. We can make up a handful of class-like id values like "c1", "c2", and "c3". But we must remember that no other id can now use these values. If we want a page to have let us say "seashell" as background-color, all we need to do is to give the html element of that page an id="c1". In the external stylesheet we can know declare #c1 like this:
#c1{background-color: #FFF5EE; color: black}
Even though we have only declared an id in the html element it works for Mozilla Firefox and also for Opera when XHTML is served as XML. And it also works in IE6 when served as XHTML 1.0 Strict using text/html. Descendant selectors have been supported by all browsers for a long time.
But it does not work in W3C's own browser, Amaya 9.0, the bottom part of my test page is not colored. This is only a problem if the webpage is shorter than the viewport. Since I have never met anyone using Amaya for surfing (people using it use it for editing, I guess), I can live with that.
If we want to be friendly to Amaya or if it turns out that some other browsers like older versions of IE have the same problem, we can just expand our CSS rule so it looks like this:
#c1, #c1 body{background-color: #FFF5EE; color: black}
Only few webdesigners are using descendant selectors today. In my opinion classes in CSS should only make our pages dirty if descendant selectors, etc., can't do the job. Using a handful of class-like id values for the html element and descendant selectors is probably the generic text book method to solve the new viewport problem in XHTML served as XML.
The only major problem when moving from XHTML as text/html to XHTML as XML is JavaScript. All except one of my relatively few JavaScripts worked right a way, but I have used W3C Document Object Model since it was possible.
Example gratis: We can't use document.write any more. If you have a lot of old style JavaScripts you better start moving to XHTML served as XML as fast as possible. You will need a lot of experiments and hard work to get your JavaScripts working. Better adapt now when you have time to be in charge of the process. It is going to be no fun, the day you are forced to do it.
The XML declaration, <?xml version="1.0" ?>, in its shortest form, is the proper way to start an XML document and also an XHTML document served as XML. When we serve text/html we should not use the XML declaration. Text/html has nothing to do with XML and by some bug it switches Internet Explorer 6.0 into "quirks" mode.
Quirks mode in IE6 means that IE's old wrong "box model" for CSS is used. It includes padding and width of the border in the width of the content. The correct "box model" adds width of padding and border to width of content to get overall width. The box model problem is only relevant if you use big values for padding.
In universal web design flexibility and adaptability is a major concern. At a website like SmackTheMouse padding is always as small as possible, usually 1em or 8-16 pixels. We don't want to waste valuable real estate for nothing. With so low values for padding the difference between "compliant" and "quirks" mode in IE6 doesn't matter.
When moving to XHTML served as XML, we still serve text/html to Internet Explorer 6.0. It is only natural not to use the XML declaration when XHTML 1.0 Strict is served as text/html. It is not XML. Quirks mode in IE is not a problem when moving to XHTML as XML.
Some browsers like FireFox has a nice info-feature you can use to see what mime-type a webpage is using. One of my articles looks like this:
We are all a little worried about how Google will treat our webpages, when Googlebot comes by. At the moment "application/xhtml+xml" is not part of Googlebot's HTTP accept header. Googlebot indexes the XHTML 1.0 Strict version of my web pages using mime-type text/html.
How can we find out what mime-type Googlebot is indexing? We can log Googlebot's HTTP accept-header when Googlebot makes requests to our webserver. We can also look at the source code of the web pages Google have cached from websites known to serve either application/xhtml+xml or text/html. The mime-type will not show directly but the DTD is indirectly telling us the mime-type at least in my case.
Ian Hickson has proposed what we can call a "cold turkey" approach to webdesign [8]. Don't bother about learning, using and doing anything until the very day that you can do it right all the way. That has certainly no resemblance to the way most people use computers and the Internet and is not the natural way to learn XHTML.
According to the document it is ok to serve XHTML as XML and as text/html but we must do both. We must not use text/html only for a start as a first step in a natural learning process. On the contrary, I propose to go along with W3C explicitly calling XHTML 1.0 a standard of transition, encouraging us to serve XHTML as text/html as a beginning. That is what W3C has been doing on its own website until this very day. That is what most of us have been doing for a start.
The battle for an XML based Internet is not yet won. We should facilitate XHTML as XML, make it easy to implement and use, we must start experimenting with XHTML as XML on a broad scale. We should promote XHTML all over the place and encourage webdesigners to start using it as text/html as soon as they have learned the first rule of well-formedness.
Let the XHTML brand be known. All browsers have supported it as text/html without problems for almost five years. Already today XHTML can be served as XML to almost all browsers except Internet Explorer.
The Road to XHTML 2.0: MIME Types, Mark Pilgrim, 2003.
Content-Negotiation Techniques to serve XHTML 1.0 as text/html and application/xhtml+xml, W3C Tutorial, 2003.
Some websites serving xml or text/html depending on the browser: Anne's Weblog about Markup & Style, 456 Berea Street and Juicy Studio.
Serving up XHTML with the correct MIME type, Simon Jessey.
The 'application/xhtml+xml' Media Type, RFC 3236, 2002.
Hypertext Transfer Protocol HTTP/1.1, RFC 2616, 1999.
XHTML Media Types, W3C Note, 2002.
Sending XHTML as text/html Considered Harmful, Ian Hickson, 2002.
Copyright © Jesper Tverskov, 2005
Last updated 2006-09-02